• 00 07-03 (4) Point3R: Streaming 3D Reconstruction with Explicit Spatial Pointer Memory Point3R: Streaming der 3D-Rekonstruktion mit explizitem Raumzeigerspeicher Point3R: 利用明确的空间指内存进行三维重建 2507.02863v1
  • 01 07-03 Answer Matching Outperforms Multiple Choice for Language Model Evaluation Antwort Matching Outperforms Multiple Choice für Sprachmodell-Bewertung 语言模式评价的多种选择 2507.02856v1
  • 02 07-03 Urban Region Pre-training and Prompting: A Graph-based Approach Urban Region Pre-Training und Prompting: Ein graphischer Ansatz 培训前和推动:基于图表的方法 2408.05920v4
  • 03 07-03 MOTIF: Modular Thinking via Reinforcement Fine-tuning in LLMs MOTIF: Modulares Denken durch Verstärkung Feinabstimmung in LLMs MOTIF:通过强化微调在LLM中进行模块思考 2507.02851v1
  • 04 07-03 LLM Hypnosis: Exploiting User Feedback for Unauthorized Knowledge Injection to All Users LLM Hypnose: Nutzung des Benutzerfeedbacks für unautorisierte Wissensinjektion für alle Benutzer LLM Hypnisis:利用用户反馈,为所有用户提供未经授权知识注射 2507.02850v1
  • 05 07-03 MvHo-IB: Multi-View Higher-Order Information Bottleneck for Brain Disorder Diagnosis MvHo-IB: Multi-View Higher-Order Informationen Engpass für Gehirnerkrankungen Diagnose MvHO-IB:脑紊乱诊断多视高端信息瓶颈 2507.02847v1
  • 06 07-03 Evaluating Frontier Models for Stealth and Situational Awareness Bewertung von Frontier-Modellen für Stealth und Situationsbewusstsein 评价隐形和了解情况的边界模式 2505.01420v4
  • 07 07-03 LLM-Driven Treatment Effect Estimation Under Inference Time Text Confounding LLM-getriebene Behandlung Wirkung Schätzung unter Inferenzzeit Text Confounding LLM-驱动处理效应估算 2507.02843v1
  • 08 07-03 StepHint: Multi-level Stepwise Hints Enhance Reinforcement Learning to Reason StepHint: Mehrstufige stufenweise Hinweise stärken das Lernen zur Vernunft 步进提示:多级分步骤将强化学习提升到合理 2507.02841v1
  • 09 07-03 From Web Search towards Agentic Deep Research: Incentivizing Search with Reasoning Agents Von der Web-Suche in Richtung Agentic Deep Research: Incentivizing Search with Reasoning Agents 从网络搜索到代理深层研究:激励使用理性代理进行搜索 2506.18959v3
  • 10 07-03 Membership Inference Attacks as Privacy Tools: Reliability, Disparity and Ensemble Mitgliedschafts-Inferenz Angriffe als Datenschutz-Tools: Zuverlässigkeit, Disparität und Ensemble 以成员攻击作为隐私工具:可靠性、差异性和共性 2506.13972v2
  • 11 07-03 ExPO: Unlocking Hard Reasoning with Self-Explanation-Guided Reinforcement Learning ExPO: Entsperren harter Vernunft mit selbsterklärungsgeführtem Verstärkungslernen ExPO: 以自我剥削指导强化学习来解锁困难理由 2507.02834v1
  • 12 07-03 DNN-Based Precoding in RIS-Aided mmWave MIMO Systems With Practical Phase Shift DNN-basierte Vorkodierung in RIS-gestützten mmWave MIMO-Systemen mit praktischer Phasenverschiebung 以DNN为基础的RIS-援助毫米 wave MIMO系统实际阶段调整前编码 2507.02824v1
  • 13 07-03 SynapseRoute: An Auto-Route Switching Framework on Dual-State Large Language Model SynapseRoute: Ein Auto-Routen-Schaltrahmen für das Dual-State Large Language Model SynapseRoute:关于两州大语言模式的自动运行切换框架 2507.02822v1
  • 14 07-03 Measurement as Bricolage: Examining How Data Scientists Construct Target Variables for Predictive Modeling Tasks Messung als Bricolage: Untersuchung, wie Datenwissenschaftler Zielvariablen für vorausschauende Modellierungsaufgaben konstruieren 计量为 “ 饼干测量 “ :审查数据科学家如何构建预测模拟任务的目标变量 2507.02819v1
  • 15 07-03 Replicable Distribution Testing Reproduzierbare Verteilungsprüfung 可复制分发测试 2507.02814v1
  • 16 07-03 In-Training Multicalibrated Survival Analysis for Healthcare via Constrained Optimization In-Training Multikalibrierte Überlebensanalyse für das Gesundheitswesen durch eingeschränkte Optimierung 通过 “ 限制最佳化 “ ,对通过 “ 限制优化 “ 进行的 “ 多校制 “ 多校准生存分析 “ 进行的 “ 多校准 “ 培训中 “ 多校准 “ 健康生存分析 “ 2507.02807v1
  • 17 07-03 Transferrable Surrogates in Expressive Neural Architecture Search Spaces Übertragbare Surrogate in Expressive Neural Architecture Suchplätze 显性神经结构搜索空间中的可转让代孕人 2504.12971v3
  • 18 07-03 Learning to Coordinate Bidders in Non-Truthful Auctions Lernen, Bieter in nicht-wahrheitsgemäßen Auktionen zu koordinieren 学习如何在非真话拍卖中协调投标人 2507.02801v1
  • 19 07-03 Self-Steering Deep Non-Linear Spatially Selective Filters for Efficient Extraction of Moving Speakers under Weak Guidance Selbstlenkende, nicht-lineare, räumlich selektive Filter zur effizienten Extraktion von beweglichen Lautsprechern unter schwacher Anleitung 在薄弱指导下有效抽取流动发言者的自制深不滑空间选择过滤器 2507.02791v1
  • 20 07-03 GPAS: Accelerating Convergence of LLM Pretraining via Gradient-Preserving Activation Scaling GPAS: Beschleunigung der Konvergenz des LLM-Vortrainings durch Gradient-Preserving Activation Scaling GPAS:通过 “ 渐进式保留动力扩增 “ 加速汇集LLM预备训练 2506.22049v2
  • 21 07-03 Understanding and Improving Length Generalization in Recurrent Models Verständnis und Verbesserung der Längenverallgemeinerung in rekurrenten Modellen 经常模式中的了解和改进时间长度的概述 2507.02782v1
  • 22 07-03 Self-Correction Bench: Revealing and Addressing the Self-Correction Blind Spot in LLMs Selbstkorrektionsbank: Enthüllung und Adressierung des Selbstkorrektions-Blindflecks in LLMs 自我校正法官:在LLMs中披露和处理自我校正的盲人点 2507.02778v1
  • 23 07-03 LLM-Powered Prediction of Hyperglycemia and Discovery of Behavioral Treatment Pathways from Wearables and Diet LLM-Powered Vorhersage von Hyperglykämie und Entdeckung von Verhaltenstherapie Wege von Wearables und Diät LLM 法力预测超血糖和从穿衣和饮食中发现的行为处理途径 2503.03935v2
  • 24 07-03 KERAP: A Knowledge-Enhanced Reasoning Approach for Accurate Zero-shot Diagnosis Prediction Using Multi-agent LLMs KERAP: Ein wissensbasierter Ansatz für genaue Null-Shot-Diagnose-Vorhersage mit Multi-Agent LLMs KERRAP: 利用多种试剂LLMs进行准确零光诊断预测的知识强化理由说明方法 2507.02773v1
  • 25 07-03 Grounding Intelligence in Movement Erdungsintelligenz in der Bewegung 移动中的地面情报 2507.02771v1
  • 26 07-03 Avoiding Catastrophe in Online Learning by Asking for Help Vermeiden von Katastrophen im Online-Lernen, indem Sie um Hilfe bitten 通过请求帮助避免在线学习中的灾难 2402.08062v5
  • 27 07-03 MaizeField3D: A Curated 3D Point Cloud and Procedural Model Dataset of Field-Grown Maize from a Diversity Panel MaizeField3D: Ein kuratierter 3D-Punkt-Cloud- und Verfahrensmodell-Datensatz von Feld-Grown Maize aus einem Diversity-Panel Maize Fire3D:来自多样性小组的3D点云和实地增长磁场程序模型数据集 2503.07813v3
  • 28 07-03 Contextual Online Pricing with (Biased) Offline Data Kontextuelle Online-Preise mit (Biased) Offline-Daten 在线在线在线在线上上上上下线数据(比亚) 2507.02762v1
  • 29 07-03 Quantifying the Cross-sectoral Intersecting Discrepancies within Multiple Groups Using Latent Class Analysis Towards Fairness Quantifizierung der sektorübergreifenden Intersecting-Diskrepanzen innerhalb mehrerer Gruppen mittels Latent Class-Analyse zur Fairness 量化多个群体内部的跨部门交叉差异,利用编内分类分析进行量化,以实现公平 2407.03133v4
  • 30 07-03 Fast and Simplex: 2-Simplicial Attention in Triton Schnell und Simplex: 2-Simplizielle Aufmerksamkeit in Triton 快速和简单: Tritton 中的 2 简化注意 2507.02754v1
  • 31 07-03 Towards a Novel Measure of User Trust in XAI Systems Auf dem Weg zu einem neuartigen Maß für das Vertrauen der Nutzer in XAI-Systeme 争取对XAI系统用户信任的新措施 2405.05766v2
  • 32 07-03 Linear Attention with Global Context: A Multipole Attention Mechanism for Vision and Physics Lineare Aufmerksamkeit mit globalem Kontext: Ein mehrpoliger Aufmerksamkeitsmechanismus für Vision und Physik 全球一线关注:视觉和物理多极关注机制 2507.02748v1
  • 33 07-03 Early Signs of Steganographic Capabilities in Frontier LLMs Frühe Anzeichen von Steganographischen Fähigkeiten in Frontier LLMs 边疆长长体动物能力早期信号 2507.02737v1
  • 34 07-03 Classification by Separating Hypersurfaces: An Entropic Approach Klassifizierung durch Trennen von Hyperoberflächen: Ein entropischer Ansatz 按分离超表层分类: 2507.02732v1
  • 35 07-03 Bourbaki: Self-Generated and Goal-Conditioned MDPs for Theorem Proving Bourbaki: Selbsterzeugte und zielkonditionierte MDPs für Theorem-Proving Bourbaki:为理论证明提供自创和有目标、有条件的MDP 2507.02726v1
  • 36 07-03 Hierarchical Multi-Label Contrastive Learning for Protein-Protein Interaction Prediction Across Organisms Hierarchisches Multi-Label Kontrastives Lernen für Protein-Protein-Interaktionsvorhersage über Organismen hinweg 高等级多标签差异学习,促进各种组织间蛋白因-蛋白因-蛋白质-蛋白因互动预测 2507.02724v1
  • 37 07-03 On Characterizations for Language Generation: Interplay of Hallucinations, Breadth, and Stability Über Charakterisierungen für die Sprachgenerierung: Interplay von Halluzinationen, Breadth und Stabilität 语言生成特征:幻觉、面包和稳定之间的相互作用 2412.18530v2
  • 38 07-03 Gradient-Based Model Fingerprinting for LLM Similarity Detection and Family Classification Gradient-Based Model Fingerprinting für LLM Ähnlichkeitserkennung und Familienklassifizierung LLM相似性探测和家庭分类的渐进式样指纹 2506.01631v2
  • 39 07-03 A Comprehensive Machine Learning Framework for Micromobility Demand Prediction Umfassendes Machine Learning Framework für die Nachfragevorhersage der Mikromobilität 微观流动需求预测综合机械学习框架 2507.02715v1
  • 40 07-03 A Forget-and-Grow Strategy for Deep Reinforcement Learning Scaling in Continuous Control Eine Forget-and-Grow-Strategie für tiefe Verstärkungs-Lernskalierung in kontinuierlicher Kontrolle 连续控制深入加强学习的 “ 遗忘与增长 “ 战略 2507.02712v1
  • 41 07-03 Fluid Democracy in Federated Data Aggregation Fluide Demokratie in Federated Data Aggregation 联邦数据聚合中流体民主 2507.02710v1
  • 42 07-03 Multi-Agent Reinforcement Learning for Dynamic Pricing in Supply Chains: Benchmarking Strategic Agent Behaviours under Realistically Simulated Market Conditions Multi-Agenten-Verstärkungs-Lernen für dynamische Preise in Lieferketten: Benchmarking Strategic Agent Verhaltensmuster unter realistisch simulierten Marktbedingungen 供应链动态定价多机构强化学习:在现实模拟市场条件下确定战略代理行为基准 2507.02698v1
  • 43 07-03 RLHGNN: Reinforcement Learning-driven Heterogeneous Graph Neural Network for Next Activity Prediction in Business Processes RLHGNN: Verstärkung Lernorientiertes Heterogenes Graph Neuronales Netzwerk für die nächste Aktivitätsvorhersage in Geschäftsprozessen RLHGNN: 业务流程下一个活动预测的强化学习驱动的异质图形神经网络 2507.02690v1
  • 44 07-03 HAPI: A Model for Learning Robot Facial Expressions from Human Preferences HAPI: Ein Modell zum Lernen von Roboter-Gesichtsausdrücken aus menschlichen Präferenzen HAPI: 人类首选机器人法表达方式学习模式 2503.17046v2
  • 45 07-03 Learning few-step posterior samplers by unfolding and distillation of diffusion models Lernen der wenigen Schritte hinteren Sampler durch Entfalten und Destillation von Diffusionsmodellen 通过推广模型的推出和蒸馏,学习几步后后后继采样器 2507.02686v1
  • 46 07-03 Learning Traffic Anomalies from Generative Models on Real-Time Observations Verkehrsanomalien aus generativen Modellen auf Echtzeit-Beobachtungen lernen 实时观测生成模型的学习交通异常现象 2502.01391v3
  • 47 07-03 Detection of Disengagement from Voluntary Quizzes: An Explainable Machine Learning Approach in Higher Distance Education Detection of Disengagement from Voluntary Quizzes: Ein erklärbarer maschineller Lernansatz in der höheren Fernlehre 探测自愿Quizzes脱离自愿Quizzes:高远程教育中可解释的机械学习方法 2507.02681v1
  • 48 07-03 Embedding-Based Federated Data Sharing via Differentially Private Conditional VAEs Einbetten-basierter Federated Data Sharing über differentially Private Conditional VAEs 通过差异私人条件性自愿和有条件的 嵌入式基于嵌入式的联邦联邦数据分享 2507.02671v1
  • 49 07-03 Guided Generation for Developable Antibodies Geführte Generation für entwickelbare Antikörper 可开发抗体的代人制 2507.02670v1
  • 50 07-03 OmniDraft: A Cross-vocabulary, Online Adaptive Drafter for On-device Speculative Decoding OmniDraft: Ein Cross-Vocabulary, Online Adaptive Drafter für die gerätespezifische Dekodierung 总括草案:跨词汇、在线在线可调适性套用投机下限设计图纸 2507.02659v1
  • 51 07-03 Fair Deepfake Detectors Can Generalize Faire Deepfake-Detektoren können generalisieren 公平深假探测器可普遍化 2507.02645v1
  • 52 07-03 On Efficient Bayesian Exploration in Model-Based Reinforcement Learning Effiziente bayesische Exploration im modellbasierten Ausbau-Lernen 关于在示范强化学习中高效探索贝耶斯人 2507.02639v1
  • 53 07-03 High-Order Deep Meta-Learning with Category-Theoretic Interpretation High-Order Deep Meta-Learning mit kategorie-theoretischer Interpretation 具有分类理论判读的高等级深层元元学习 2507.02634v1
  • 54 07-03 Medical Data Pecking: A Context-Aware Approach for Automated Quality Evaluation of Structured Medical Data Medical Data Pecking: Ein kontextorientierter Ansatz zur automatisierten Qualitätsbewertung strukturierter medizinischer Daten 医疗数据分级:结构医疗数据自动质量评价的内幕意识方法 2507.02628v1
  • 55 07-03 A Matrix Variational Auto-Encoder for Variant Effect Prediction in Pharmacogenes Ein Matrix-Varianten-Auto-Encoder für Variant-Effekt-Vorhersage in Pharmakogenen 药用植物可变效应预测矩阵变式自动编码器 2507.02624v1
  • 56 07-03 L-VAE: Variational Auto-Encoder with Learnable Beta for Disentangled Representation L-VAE: Variationaler Auto-Encoder mit erlernbarer Beta zur entfremdeten Darstellung L-VAE: 具有拆分代表的可学习贝塔变量自动编码器 2507.02619v1
  • 57 07-03 Interpreting Graph Inference with Skyline Explanations Graphenableitung mit Skyline-Erläuterungen interpretieren 与天线解释解释的图表推理 2505.07635v2
  • 58 07-03 Orientation-Aware Sparse Tensor PCA for Efficient Unsupervised Feature Selection Orientation-Aware Sparse Tensor PCA für effiziente, unüberwachte Feature-Auswahl 用于高效、不受监督地物选择的五氯苯甲醚 2407.16985v3
  • 59 07-03 Lost in Latent Space: An Empirical Study of Latent Diffusion Models for Physics Emulation Lost in Latent Space: Eine empirische Studie von Latent Diffusionsmodellen für die Emulation von Physik 失落于隐时空:物理模拟的中子扩散模型经验研究 2507.02608v1
  • 60 07-03 Down with the Hierarchy: The ‘H’ in HNSW Stands for “Hubs” Nieder mit der Hierarchie: Das ‘H’ in HNSW steht für “Hubs” 打倒等级制度: HNSW的”H”代表”Hubs” 2412.01940v3
  • 61 07-03 Alleviating Attack Data Scarcity: SCANIA’s Experience Towards Enhancing In-Vehicle Cyber Security Measures Benachteiligung von Angriffsdaten: SCANIAs Erfahrung zur Verbesserung von Cybersicherheitsmaßnahmen im Fahrzeug 减轻攻击数据稀缺性:SCANIA在加强车辆内部网络安全措施方面的经验 2507.02607v1
  • 62 07-03 De-AntiFake: Rethinking the Protective Perturbations Against Voice Cloning Attacks De-AntiFake: Die Protective Perturbations gegen Stimmen-Klonangriffe neu denken De-Antifake:重新思考对语音克隆攻击进行保护性干扰 2507.02606v1
  • 63 07-03 StructTransform: A Scalable Attack Surface for Safety-Aligned Large Language Models StructTransform: Eine skalierbare Angriffsfläche für sicherheitsorientierte große Sprachmodelle 条形变形:用于安全-统一大语言模型的可缩放攻击表面 2502.11853v2
  • 64 07-03 Padé Approximant Neural Networks for Enhanced Electric Motor Fault Diagnosis Using Vibration and Acoustic Data Padé Ungefähre neurale Netzwerke für verbesserte elektrische Motorfehlerdiagnose mit Vibration und akustischen Daten 利用振动和声频数据增强电动机动车故障诊断 2507.02599v1
  • 65 07-03 SURE-VQA: Systematic Understanding of Robustness Evaluation in Medical VQA Tasks SURE-VQA: Systematisches Verständnis der Robustheitsbewertung in medizinischen VQA-Aufgaben SNU-VQA: 系统了解医疗VQA任务中的强力评价 2411.19688v3
  • 66 07-03 Revisiting Active Learning under (Human) Label Variation Aktives Lernen unter (menschlichen) Label-Varianten 在(人)标签标签变换下重新审查积极学习 2507.02593v1
  • 67 07-03 Scalable Interconnect Learning in Boolean Networks Skalierbares Interconnect-Lernen in Booleschen Netzwerken 布尔网络中的可缩放互连学习 2507.02585v1
  • 68 07-03 SoccerDiffusion: Toward Learning End-to-End Humanoid Robot Soccer from Gameplay Recordings SoccerDiffusion: Auf dem Weg zum Lernen von End-to-End Humanoid Robot Soccer aus Gameplay-Aufnahmen 足球传播:从游戏游戏录音录制中学习最终到最终人类的机器人足球 2504.20808v2
  • 69 07-03 Towards an Explainable Comparison and Alignment of Feature Embeddings Auf dem Weg zu einem erklärbaren Vergleich und der Ausrichtung von Feature-Embeddings 争取实现可解释的比较和地貌隐含物的一致 2506.06231v2
  • 70 07-03 Transformers Don’t Need LayerNorm at Inference Time: Scaling LayerNorm Removal to GPT-2 XL and the Implications for Mechanistic Interpretability Transformer benötigen keine Layernorm zur Inferenzzeit: Skalierung der Layernorm Entfernung zu GPT-2 XL und die Implikationen für mechanistische Interpretierbarkeit 突变器不需要在推断时间使用层内线: 将层内线缩放去除到 GPT-2 XL 和对机械可解释性的影响 2507.02559v1
  • 71 07-03 Fading memory and the convolution theorem Das verblassende Gedächtnis und das Konvolutionstheorem 过时的记忆和变幻的定理 2408.07386v3
  • 72 07-03 Adaptive Probabilistic ODE Solvers Without Adaptive Memory Requirements Adaptive probabilistische ODE-Lösemittel ohne adaptive Speicheranforderungen 无适应性内存要求的适应性概率性ODE溶剂 2410.10530v2
  • 73 07-03 Flow Matching on Lie Groups Flow Passend zu Lie-Gruppen 与 Lis Group 匹配 2504.00494v2
  • 74 07-03 Interleaved Gibbs Diffusion: Generating Discrete-Continuous Data with Implicit Constraints Interleaved Gibbs Diffusion: Diskrete-kontinuierliche Daten mit impliziten Einschränkungen generieren Gibbs Interlebert Gibbs 扩散:产生具有隐含限制的分散持续数据 2502.13450v2
  • 75 07-03 AI Research Agents for Machine Learning: Search, Exploration, and Generalization in MLE-bench KI-Forscher für maschinelles Lernen: Suche, Exploration und Generalisierung in der MLE-Bench AI 机器学习研究代理机构:MLE-Bench的搜索、探索和普及 2507.02554v1
  • 76 07-03 Position: A Theory of Deep Learning Must Include Compositional Sparsity Position: Eine Theorie des tiefen Lernens muss kompositorische Sparsamkeit einschließen 位置:深深学习理论必须包含构成平等 2507.02550v1
  • 77 07-03 Offline Reinforcement Learning for Learning to Dispatch for Job Shop Scheduling Offline-Verstärkung Lernen für das Lernen zum Versand für Job Shop Scheduling 离线强化学习,学习到派到招工商店日程安排的调度 2409.10589v4
  • 78 07-03 Higher-Order Singular-Value Derivatives of Rectangular Real Matrices Hohe Ordnung Singular-Value Derivate der rechteckigen Realmatrizen 矩形真实矩阵的高端圆形单值衍生物 2506.03764v3
  • 79 07-03 Assessing Quantum Advantage for Gaussian Process Regression Bewertung des Quantenvorteils für Gaussian Process Regression 评估高山进程倒退的量度优势 2505.22502v2
  • 80 07-03 Ordinality in Discrete-level Question Difficulty Estimation: Introducing Balanced DRPS and OrderedLogitNN Normalität in Diskrete-Level-Frage Schwierigkeitsgrad Schätzung: Einführung ausgewogene DRPS und bestelltLogitNN 不同层次问题难以估计的常态:采用平衡的DRPS和有秩序LogitNN 2507.00736v2
  • 81 07-03 A Square Peg in a Square Hole: Meta-Expert for Long-Tailed Semi-Supervised Learning Eine quadratische Peg in einem quadratischen Loch: Meta-Experte für langes semi-überwachtes Lernen 广场洞中的一个广场塞格:长期辍学的半模拟学习的元专家 2505.16341v2
  • 82 07-03 RetrySQL: text-to-SQL training with retry data for self-correcting query generation RetrySQL: Text-zu-SQL-Training mit Retry-Daten zur Selbstkorrektur der Abfragegenerierung 重新trySQL: 文本到 SQL 培训, 并重试数据, 用于自校查询生成 2507.02529v1
  • 83 07-03 Deep learning four decades of human migration Tiefes Lernen von vier Jahrzehnten menschlicher Migration 四十年人类移徙的深入学习 2506.22821v2
  • 84 07-03 Self-Supervised Frameworks for Speaker Verification via Bootstrapped Positive Sampling Selbstüberwachte Rahmenbedingungen für die Lautsprecherverifizierung durch Bootstrapped Positive Sampling 通过推动积极抽样,自我监督的演讲人核查框架 2501.17772v3
  • 85 07-03 Reconsidering the energy efficiency of spiking neural networks Über die Energieeffizienz neuronaler Netze nachdenken 重新考虑突发神经网络的能源效率 2409.08290v2
  • 86 07-03 Traveling Across Languages: Benchmarking Cross-Lingual Consistency in Multimodal LLMs Sprachenübergreifendes Reisen: Benchmarking Cross-Lingual Consistency in multimodalen LLMs 跨语言旅行:多模式LLM中跨语言一致基准 2505.15075v2
  • 87 07-03 Self-Guided Process Reward Optimization with Redefined Step-wise Advantage for Process Reinforcement Learning Selbstgesteuerte Prozess-Reward-Optimierung mit neu definiertem Schrittweiser Vorteil für Prozess-Verstärkungs-Lernen 自指导流程向上优化,具有重新定义的逐步改进的流程强化学习优势 2507.01551v2
  • 88 07-03 TFOC-Net: A Short-time Fourier Transform-based Deep Learning Approach for Enhancing Cross-Subject Motor Imagery Classification TFOC-Net: Ein kurzzeitiger Fourier-Transform-basierter Deep-Learning-Ansatz zur Verbesserung der Klassifizierung bereichsübergreifender Motorbilder TFOC-Net:一个短时间短时间的四倍变换式深学习方法,以加强跨主题机动机图象分类 2507.02510v1
  • 89 07-03 The Choice of Normalization Influences Shrinkage in Regularized Regression Die Wahl der Normalisierung beeinflusst Schrumpfung bei regularisierter Regression 正常化的影响在正规化后退过程中的缩减 2501.03821v3
  • 90 07-03 IndianBailJudgments-1200: A Multi-Attribute Dataset for Legal NLP on Indian Bail Orders IndianBailJudgments-1200: Ein Multi-Attribut-Datensatz für legale NLP auf indischen Bail-Aufträgen IndianBailJail Judgments-12000:印度保释令法律国家保护程序多属性数据集 2507.02506v1
  • 91 07-03 Continual Gradient Low-Rank Projection Fine-Tuning for LLMs Kontinuierliche gradiente Low-Rank-Projektion Feintuning für LLMs LLMM 连续渐进式低射频低射射精度 2507.02503v1
  • 92 07-03 Online Conformal Prediction with Efficiency Guarantees Online-Konforme Vorhersage mit Effizienzgarantien 具有效率保障的在线非正式预测 2507.02496v1
  • 93 07-03 Reliability-Adjusted Prioritized Experience Replay Reliability-Adjusted Prioritized Experience Replay 调整了可靠性调整后确定优先经验重述 2506.18482v2
  • 94 07-03 MC-INR: Efficient Encoding of Multivariate Scientific Simulation Data using Meta-Learning and Clustered Implicit Neural Representations MC-INR: Effiziente Kodierung von multivariaten wissenschaftlichen Simulationsdaten mit Meta-Learning und clusterierten Impliziten Neuraldarstellungen MC-INR:利用元学习和集成隐隐性神经图示对多变量科学模拟数据进行有效编码 2507.02494v1
  • 95 07-03 Deep Transfer Learning for Kidney Cancer Diagnosis Deep Transfer Learning für Nierenkrebs-Diagnose 肾脏癌症诊断深入转移学习 2408.04318v2
  • 96 07-03 Learning Massive-scale Partial Correlation Networks in Clinical Multi-omics Studies with HP-ACCORD Learning Massive-scale Partial Correlation Networks in klinischen Multi-Omics-Studien mit HP-ACCORD 与HP-ACCORD在临床多工程学研究中学习大规模大规模局部关联网络 2412.11554v3
  • 97 07-03 Variational Kolmogorov-Arnold Network Unterschiedliches Kolmogorov-Arnold-Netzwerk 变化科尔莫戈罗夫-阿诺尔德网络 2507.02466v1
  • 98 07-03 Quantum-enhanced causal discovery for a small number of samples Quantenverstärkte kausale Entdeckung für eine kleine Anzahl von Proben 少数样本的量吨量强化因果发现 2501.05007v2
  • 99 07-03 Universal Collection of Euclidean Invariants between Pairs of Position-Orientations Universelle Sammlung von Euklidischen Invarianten zwischen Paaren von Positionsorientierungen 位置对等位置对等之间通用的欧立地异异异物收集 2504.03299v2
  • 100 07-03 Roto-Translation Invariant Metrics on Position-Orientation Space Roto-Translation Invariante Metrics auf Positions-Orientierungsraum 定位空间的旋转变换变化计量器 2504.03309v2
  • 101 07-03 Red grape detection with accelerated artificial neural networks in the FPGA’s programmable logic Rote Traubenerkennung mit beschleunigten künstlichen neuronalen Netzwerken in der programmierbaren Logik des FPGA FPGA的可编程逻辑的红葡萄探测与加速人工神经网络 2507.02443v1
  • 102 07-03 A Deep Learning Powered Numerical Relativity Surrogate for Binary Black Hole Waveforms Eine tief lernfähige numerische Relativitätsüberlagerung für Binary Black Hole Waveforms 二进制黑洞波形的深学习动力数字相对相对性替代工具 2412.06946v2
  • 103 07-03 Unsupervised Cognition Unüberwachte Kognition 未受监督的认知 2409.18624v3
  • 104 07-03 Anatomical Foundation Models for Brain MRIs Anatomische Grundlagenmodelle für Hirn-MRTs 脑吸管机模型解剖基础模型 2408.07079v4
  • 105 07-03 The Artificial Scientist – in-transit Machine Learning of Plasma Simulations Der Künstliche Wissenschaftler – in-transit maschinelles Lernen von Plasmasimulationen 人造科学家 – – Plasma模拟模拟的中转机器学习 2501.03383v3
  • 106 07-03 Is Complex Query Answering Really Complex? Ist die Antwort auf komplexe Abfragen wirklich komplex? 复杂查询回答是否真的复杂? 2410.12537v3
  • 107 07-03 Determination Of Structural Cracks Using Deep Learning Frameworks Bestimmung von strukturellen Rissen mittels Deep Learning Frameworks 利用深学习框架确定结构裂缝 2507.02416v1
  • 108 07-03 ForceGrip: Reference-Free Curriculum Learning for Realistic Grip Force Control in VR Hand Manipulation ForceGrip: Reference-Free Curriculum Learning for Realistic Grip Force Control in VR Hand Manipulation ForgGrip: VR 手动操纵中现实力力控的无参考课程学习 2503.08061v4
  • 109 07-03 EquiTabPFN: A Target-Permutation Equivariant Prior Fitted Networks EquiTabPFN: Eine Ziel-Permutation Equivariant Prior Fitted Networks EquiTabPFN: 目标-变异等同先前适合的网络 2502.06684v2
  • 110 07-03 Significativity Indices for Agreement Values Signifikativitätsindizes für Vereinbarungswerte 协议价值的量化指数 2504.15325v2
  • 111 07-03 S2FGL: Spatial Spectral Federated Graph Learning S2FGL: Raumspektrales Federiertes Graphenlernen S2FGL: 空间光谱联邦图表学习 2507.02409v1
  • 112 07-03 Attention-based clustering Aufmerksamkeitsbasiertes Clustering 以关注为主的集群集群 2505.13112v2
  • 113 07-03 Benchmarking Akan ASR Models Across Domain-Specific Datasets: A Comparative Evaluation of Performance, Scalability, and Adaptability Benchmarking Akan ASR-Modelle über Domain-spezifische Datensätze: Eine vergleichende Bewertung von Leistung, Skalierbarkeit und Anpassungsfähigkeit 确定Akan ASR模型基准的全域具体数据集:业绩比较评价、可缩放性和可调适性 2507.02407v1
  • 114 07-03 Improving Consistency in Vehicle Trajectory Prediction Through Preference Optimization Verbesserung der Konsistenz in der Fahrzeug-Trajektorie-Vorhersage durch Preference-Optimierung 通过优先优化提高车辆轨迹预测的一致性 2507.02406v1
  • 115 07-03 XGeM: A Multi-Prompt Foundation Model for Multimodal Medical Data Generation XGeM: Ein Multi-Prompt-Stiftungsmodell für multimodale medizinische Datengenerierung XGeM:多式医疗数据多式生成多式医疗多模式基金会模式 2501.04614v3
  • 116 07-03 Wildlife Target Re-Identification Using Self-supervised Learning in Non-Urban Settings Wildlife Target Re-Identification mit selbstüberwachtem Lernen in nicht-Urban-Einstellungen 在非城市环境中利用自我监督的学习重新确定野生动物目标 2507.02403v1
  • 117 07-03 TABNet: A Triplet Augmentation Self-Recovery Framework with Boundary-Aware Pseudo-Labels for Medical Image Segmentation TABNet: Ein Triplet Augmentation Self-Recovery Framework mit Boundary-Aware Pseudo-Labors für die Segmentierung medizinischer Bilder TABNet:三联增强自我恢复框架,配有医疗图像分割的边界-软件软件-标签 2507.02399v1
  • 118 07-03 Posterior Transition Modeling for Unsupervised Diffusion-Based Speech Enhancement Hinterer Übergang Modellierung für unüberwachte Diffusions-basierte Sprachverbesserung 未经监督的基于传播的语音增强的外向过渡建模 2507.02391v1
  • 119 07-03 MInCo: Mitigating Information Conflicts in Distracted Visual Model-based Reinforcement Learning MInCo: Informationskonflikte im distraktierten visuellen Modell-basierten Verstärkungslernen abmildern MINCO: 减缓视觉模型化强化学习中的信息冲突 2504.04164v3
  • 120 07-03 Sparse Gaussian Processes: Structured Approximations and Power-EP Revisited Sparse Gaussian Processes: Strukturierte Annäherungen und Power-EP Revisited 粗浅的高斯进程:结构相近和电源-EP 重新审视 2507.02377v1
  • 121 07-03 Gateformer: Advancing Multivariate Time Series Forecasting through Temporal and Variate-Wise Attention with Gated Representations Gateformer: Multivariate Zeitreihen durch Temporale und Variate-Wise-Achtung mit gezahnten Darstellungen prognostizieren Gatefex:通过时空和蒸汽预报推进多变时间序列 2505.00307v3
  • 122 07-03 Good Representation, Better Explanation: Role of Convolutional Neural Networks in Transformer-Based Remote Sensing Image Captioning Gute Darstellung, bessere Erklärung: Rolle konvolutionärer neuraler Netzwerke bei transformerbasierter Fernerkundung Bildunterschriften 良好的表述,更好的解释:进化神经网络在以变换器为基础的遥感图像描述中的作用 2502.16095v2
  • 123 07-03 Exploring the Integration of Large Language Models in Industrial Test Maintenance Processes Erforschung der Integration von großen Sprachmodellen in industrielle Testwartungsprozesse 探索将大语言模型纳入工业试验维护工艺 2409.06416v2
  • 124 07-03 Deep Reinforcement Learning-Based DRAM Equalizer Parameter Optimization Using Latent Representations Tiefenverstärkte DRAM-Equalizer-Parameter-Optimierung mit latenten Darstellungen 利用延迟代表方式优化深度强化学习-基于深强化的 DRAM 平准参数 2507.02365v1
  • 125 07-03 Offline Reinforcement Learning with Penalized Action Noise Injection Offline-Verstärkung Lernen mit Penalized Action Noise Injection 用惩罚性行动噪音注射加强离线强化学习 2507.02356v1
  • 126 07-03 Text-Aware Image Restoration with Diffusion Models Text-Aware Bildrestaurierung mit Diffusionsmodellen 带有传播模型的文本软件图像恢复 2506.09993v2
  • 127 07-03 Privacy-Preserving Operating Room Workflow Analysis using Digital Twins Datenschutz-Bewahrung von Betriebsraum-Workflow-Analysen mit digitalen Zwillingen 利用数字双对进行工作流量分析 2504.12552v2
  • 128 07-03 UniNet: A Unified Multi-granular Traffic Modeling Framework for Network Security UniNet: Ein einheitliches Multi-Granular Traffic Modeling Framework für Netzwerksicherheit UniNet:网络安全统一多重点交通示范框架 2503.04174v2
  • 129 07-03 DeltaSHAP: Explaining Prediction Evolutions in Online Patient Monitoring with Shapley Values DeltaSHAP: Erklären von Vorhersageentwicklungen bei der Online-Patientenüberwachung mit Shapley-Werten DelsaSHAP: 解释在有阴影值的在线患者监测中的预测演变 2507.02342v1
  • 130 07-03 Skywork-Reward-V2: Scaling Preference Data Curation via Human-AI Synergy Skywork-Reward-V2: Scaling Preference Data Curation via Human-AI Synergy Skywork-Reward-V2:通过人类-AI协同增强优先数据曲线 2507.01352v2
  • 131 07-03 SecAlign: Defending Against Prompt Injection with Preference Optimization SecAlign: Abwehr gegen prompte Injektion mit Preference-Optimierung Secallign: 以优先优化防范迅速注射 2410.05451v3
  • 132 07-03 Path Planning using a One-shot-sampling Skeleton Map Pfadplanung mit einer One-Shot-Sampling-Skeleton-Karte 使用单点抽样的 Skeleton 地图进行路径规划 2507.02328v1
  • 133 07-03 High-Performance Reinforcement Learning on Spot: Optimizing Simulation Parameters with Distributional Measures Leistungsstarkes Stärkungslernen vor Ort: Simulationsparameter mit distributiven Maßnahmen optimieren 高绩效加强现场学习:以分配措施优化模拟参数 2504.17857v3
  • 134 07-03 Aligning Frozen LLMs by Reinforcement Learning: An Iterative Reweight-then-Optimize Approach Ausrichten von gefrorenen LLMs durch Verstärkungslernen: Ein iteratives Reweight-then-Optimize-Ansatz 通过强化学习将冻结的LLMs与 “ 强化学习:一种过渡性再加权再优化方法 “ 相匹配 2506.17828v2
  • 135 07-03 Transformer-based EEG Decoding: A Survey Transformer-basierte EEG-Dekodierung: Eine Umfrage 以变换器为基础的电子EEG编码:调查 2507.02320v1
  • 136 07-03 Generalization vs. Specialization under Concept Shift Verallgemeinerung vs. Spezialisierung unter Konzeptverschiebung 普遍化与概念转变下的专业化 2409.15582v2
  • 137 07-03 Improving Constrained Generation in Language Models via Self-Distilled Twisted Sequential Monte Carlo Verbesserung der eingeschränkten Generierung in Sprachmodellen durch selbstdestillierte Twisted Sequential Monte Carlo 通过自行筛选的蒙蒙特卡洛顺序改进语言模型中受约束的一代 2507.02315v1
  • 138 07-03 Holistic Continual Learning under Concept Drift with Adaptive Memory Realignment Ganzheitliches kontinuierliches Lernen unter Konzept Drift mit adaptiver Memory-Neuausrichtung 适应性记忆调整概念下的全面持续学习 2507.02310v1
  • 139 07-03 Distributional Soft Actor-Critic with Diffusion Policy Verteilungs-Soft-Actor-Kritik mit Diffusionspolitik 配发软软软动作- 带有传播政策批评器 2507.01381v2
  • 140 07-03 Neural CRNs: A Natural Implementation of Learning in Chemical Reaction Networks Neurale CRNs: Eine natürliche Umsetzung des Lernens in chemischen Reaktionsnetzwerken CRNs:化学反应网络学习的自然应用 2409.00034v3
  • 141 07-03 Bypass Back-propagation: Optimization-based Structural Pruning for Large Language Models via Policy Gradient Bypass Back-Propagation: Optimierungsbasiertes Structural Pruning für große Sprachmodelle über Policy Gradient Bypass 后回通信:通过 “ 政策梯度 “ 优化基于优化的结构结构,为大语言模式提供缓冲 2406.10576v3
  • 142 07-03 Kernel Density Bayesian Inverse Reinforcement Learning Kernel Dichte Bayesian Inverse Verstärkung Lernen 内核冷密度贝耶斯反强化学习 2303.06827v4
  • 143 07-03 REINFORCE++: An Efficient RLHF Algorithm with Robustness to Both Prompt and Reward Models REINFORCE++: Effizienter RLHF-Algorithmus mit Robustheit sowohl für Prompt- als auch für Reward-Modelle REINFORCE++: 高效的RLHF对快速模型和奖励模型具有强力的测算法 2501.03262v4
  • 144 07-03 DoMIX: An Efficient Framework for Exploiting Domain Knowledge in Fine-Tuning DoMIX: Ein effizientes Framework zur Nutzung von Domain-Wissen im Feintuning DoMIX:一个在微调中利用域知识的有效框架 2507.02302v1
  • 145 07-03 Implicit Counterfactual Data Augmentation for Robust Learning Implizite gegenfaktische Datenvergrößerung für robustes Lernen 强力学习所需的反事实数据放大 2304.13431v3
  • 146 07-03 Knowledge Graph-Based Explainable and Generalized Zero-Shot Semantic Communications Wissensgraphisch erklärbare und generalisierte zero-shot semantische Kommunikation 基于知识的图表可解释和通用的零热语义通信 2507.02291v1
  • 147 07-03 Optimal strategies to perform multilingual analysis of social content for a novel dataset in the tourism domain Optimale Strategien zur mehrsprachigen Analyse sozialer Inhalte für einen neuartigen Datensatz im Tourismusbereich 为旅游领域新数据集的社会内容进行多语种社会内容分析的最佳最佳战略 2311.14727v2
  • 148 07-03 Prompt Disentanglement via Language Guidance and Representation Alignment for Domain Generalization Prompt Entwirren über Sprachführung und Repräsentation Ausrichtung für Domain-Verallgemeinerung 通过语言指导和代表对称调整,迅速解脱 2507.02288v1
  • 149 07-03 Content filtering methods for music recommendation: A review Methoden zur Inhaltsfilterung für Musikempfehlungen: Eine Rezension 音乐建议的内容过滤方法:审查 2507.02282v1
  • 150 07-03 Causal Representation Learning with Generative Artificial Intelligence: Application to Texts as Treatments Kausales Repräsentationslernen mit generativer Künstlicher Intelligenz: Anwendung auf Texte als Behandlungen 产生人工智能的因果代表性学习:应用文字作为治疗 2410.00903v3
  • 151 07-03 It’s Hard to Be Normal: The Impact of Noise on Structure-agnostic Estimation Es ist schwer, normal zu sein: Der Einfluss von Lärm auf die strukturagnostische Abschätzung 很难正常:噪音对结构-不可计量估计的影响 2507.02275v1
  • 152 07-03 Benchmarking Generalizable Bimanual Manipulation: RoboTwin Dual-Arm Collaboration Challenge at CVPR 2025 MEIS Workshop Benchmarking Generalizable Bimanual Manipulation: RoboTwin Dual-Arm Collaboration Challenge bei CVPR 2025 MEIS Workshop 基准的可通用二手操纵:2025年欧洲气象和气象科学研究所讲习班上的机器人双臂双臂合作挑战 2506.23351v2
  • 153 07-03 Explainable AI for Comprehensive Risk Assessment for Financial Reports: A Lightweight Hierarchical Transformer Network Approach Erklärbare KI für umfassende Risikobewertung für Finanzberichte: Ein leichter Hierarchischer Transformer-Netzwerkansatz 用于财务报告综合风险评估的可解释的AIAI:轻量级高压变压器网络方法 2506.23767v2
  • 154 07-03 Neural Graph Matching Improves Retrieval Augmented Generation in Molecular Machine Learning Neural Graph Matching verbessert retrieval Augmented Generation im molekularen maschinellen Lernen 在分子机学习中改进回流增强一代的神经图相匹配 2502.17874v2
  • 155 07-03 NLP4Neuro: Sequence-to-sequence learning for neural population decoding NLP4Neuro: Sequence-to-Sequence-Lernen für neuronale Populationsdekodierung NLP4Neuro:神经人口解码的序列到序列学习 2507.02264v1
  • 156 07-03 MemAgent: Reshaping Long-Context LLM with Multi-Conv RL-based Memory Agent MemAgent: Umgestalten von Langkontext-LLM mit Multi-Conv RL-basierten Speicheragenten MemerAgent: 与基于多Conv RL的内存代理重塑长文本LLM 2507.02259v1
  • 157 07-03 Uncertainty-aware Reward Design Process Unsicheres Design von Belohnungen 不确定性-对不确定性的认知度-奖励设计流程 2507.02256v1
  • 158 07-03 Listwise Preference Alignment Optimization for Tail Item Recommendation Listwise Preference Alignment Optimization for Tail Item Empfehlung 用于反尾项目建议的优先对齐优化列表 2507.02255v1
  • 159 07-03 Circuit-tuning: A Mechanistic Approach for Identifying Parameter Redundancy and Fine-tuning Neural Networks Schaltungs-Tuning: Mechanistischer Ansatz zur Identifizierung von Parameter Redundanz und Feinsteuerung neuraler Netzwerke 电路调控:确定参数冗余和精微调整神经网络的机械化方法 2502.06106v2
  • 160 07-03 Adapter-Enhanced Semantic Prompting for Continual Learning Adapter-erweiterte semantische Prompting für kontinuierliches Lernen 增强适应性强化的持续学习的语义提示 2412.11074v3
  • 161 07-03 Transfer Learning for Matrix Completion Transfer-Lernen für Matrix-Vervollständigung 用于完成矩阵表完成的转移学习 2507.02248v1
  • 162 07-03 Order Acquisition Under Competitive Pressure: A Rapidly Adaptive Reinforcement Learning Approach for Ride-Hailing Subsidy Strategies Auftragserwerb unter Wettbewerbsdruck: Ein schnell adaptiver Lernansatz zur Stärkung der Reit-Hailing-Subsidy-Strategien 在竞争压力下取得订单:快速适应性强化学习办法,用于骑乘-快乐补贴战略 2507.02244v1
  • 163 07-03 EigenLoRAx: Recycling Adapters to Find Principal Subspaces for Resource-Efficient Adaptation and Inference EigenLoRAx: Recycling-Adapter, um wesentliche Subräume für ressourceneffiziente Anpassung und Schlussfolgerung zu finden EigenLORAx:为资源效率适应和推断寻找主要子空间的再循环适应器 2502.04700v4
  • 164 07-03 VERBA: Verbalizing Model Differences Using Large Language Models VERBA: Verbalisieren von Modellunterschieden mit großen Sprachmodellen VERBA: 使用大语言模式 2507.02241v1
  • 165 07-03 Bi-modality medical images synthesis by a bi-directional discrete process matching method Bi-Modalität medizinische Bilder Synthese durch eine bi-direktionale diskrete Prozess-Matching-Methode 以双向离散进程匹配方法对双向单离散过程匹配方法进行双向医学图像合成 2409.03977v3
  • 166 07-03 Escaping Platos Cave: JAM for Aligning Independently Trained Vision and Language Models Escaping Platons Cave: JAM for Aligning Independently Trained Vision and Language Models 撤离排洞穴:调整独立培训的愿景和语言模式JAM 2507.01201v2
  • 167 07-03 Fast AI Model Splitting over Edge Networks Schnelle KI-Modellaufspaltung über Edge-Netzwerke 快速 AI 模型在边缘网络上分割 2507.01041v2
  • 168 07-03 Privacy-Preserving Quantized Federated Learning with Diverse Precision Privacy-Preserving Quantized Federated Learning with Diverse Precision 具有多种精密度的隐私保护量化联邦学习 2507.00920v2
  • 169 07-03 PhysicsCorrect: A Training-Free Approach for Stable Neural PDE Simulations PhysikKorrektur: Ein trainingsfreier Ansatz für stabile neurale PDE-Simulationen 物理校正:稳定神经物质模拟的无培训方法 2507.02227v1
  • 170 07-03 DecoRTL: A Run-time Decoding Framework for RTL Code Generation with LLMs DecoRTL: Ein Laufzeit-Decoding-Framework für RTL-Code-Generierung mit LLMs DecoRTL: 使用LLMs的RTL代码生成运行时间解码框架 2507.02226v1
  • 171 07-03 Metric Design != Metric Behavior: Improving Metric Selection for the Unbiased Evaluation of Dimensionality Reduction Metrisches Design != Metrisches Verhalten: Verbesserung der Metrischen Auswahl für die unvoreingenommene Bewertung der Dimensionalitätsreduktion 计量设计 ! = 计量行为:改进对减少多维度的无偏见评价的计量选择 2507.02225v1
  • 172 07-03 Where to find Grokking in LLM Pretraining? Monitor Memorization-to-Generalization without Test Wo finden Sie Grokking in LLM Pretraining? Überwachen Sie Memorization-to-Generalization ohne Test 在 LLLM 预修课程中在哪里找到 Grokking ? 监视不试验的记忆化到普及 。 2506.21551v2
  • 173 07-03 Multivariate de Bruijn Graphs: A Symbolic Graph Framework for Time Series Forecasting Multivariate de Bruijn Graphen: Ein symbolisches Graphen-Framework für die Vorhersage von Zeitreihen 布鲁伊图多变量图:时间序列预测符号图框架 2505.22768v3
  • 174 07-03 Hybrid least squares for learning functions from highly noisy data Hybride kleinste Quadrate für das Lernen von Funktionen aus hochlauten Daten 从高噪音数据中学习函数最小混合方格 2507.02215v1
  • 175 07-03 SciGA: A Comprehensive Dataset for Designing Graphical Abstracts in Academic Papers SciGA: Ein umfassender Datensatz zur Gestaltung grafischer Abstracts in wissenschaftlichen Papieren SciGA: 用于设计学术论文制图摘要的综合数据集 2507.02212v1
  • 176 07-02 (3) Latent Chain-of-Thought? Decoding the Depth-Recurrent Transformer Latent Chain-of-Thought? Dekodierung des Tiefen-Recurrent Transformers 点解深度- Rent 变换器 2507.02199v1
  • 177 07-02 Temporally Consistent Koopman Autoencoders for Forecasting Dynamical Systems Vorübergehend konsistent Koopman Autoencoder für die Vorhersage dynamischer Systeme 用于预测动态系统的温度一致 Koopman 自动编译器 2403.12335v3
  • 178 07-02 cVLA: Towards Efficient Camera-Space VLAs cVLA: Auf dem Weg zu effizienten Kamera-Raum-VLAs cVLA: 努力实现高效的相机-空间VLAs 2507.02190v1
  • 179 07-02 KAIROS: Scalable Model-Agnostic Data Valuation KAIROS: Skalierbare modell-agnostische Datenbewertung 可缩放模型 – – 不可计量数据估价 2506.23799v2
  • 180 07-02 Bayesian Invariance Modeling of Multi-Environment Data Bayesische Invarianz Modellierung von Multi-Umwelt-Daten 多种环境数据的贝耶斯因果建模 2506.22675v2
  • 181 07-02 Tree-based Learning for High-Fidelity Prediction of Chaos Baumbasiertes Lernen für hochtreue Vorhersage von Chaos 以树为本的 “ 以树为本的学习:对混乱现象的高成熟预测 “ 的学习 2403.13836v2
  • 182 07-02 Analyzing and Improving Speaker Similarity Assessment for Speech Synthesis Analyse und Verbesserung der Speaker-Ähnlichkeitsbewertung für Sprachsynthese 分析和改进议长对发言综述的相似性评估 2507.02176v1
  • 183 07-02 Hardware and Software Platform Inference Schlussfolgerung zur Hardware- und Softwareplattform 硬件和软件平台 2411.05197v2
  • 184 07-02 A physics informed neural network approach to simulating ice dynamics governed by the shallow ice approximation Ein physikerisch informierter neuronaler Netzwerkansatz zur Simulation der Eisdynamik, die durch die flache Eisannäherung bestimmt wird 利用物理知情神经网络法模拟由浅冰近似值管理的冰动态 2504.08136v2
  • 185 07-02 The unstable formula theorem revisited via algorithms Die instabile Formel Theorem revisited über Algorithmen 通过算法重新检视不稳定的公式定理 2212.05050v3
  • 186 07-02 Towards Bio-Inspired Robotic Trajectory Planning via Self-Supervised RNN Auf dem Weg zu einer bio-inspirierten Roboter-Trajektorienplanung durch selbstüberwachte RNN 走向通过自我监督RNN进行生物激励机器人轨迹规划 2507.02171v1
  • 187 07-02 Beyond Scale: The Diversity Coefficient as a Data Quality Metric for Variability in Natural Language Data Beyond Scale: Der Diversity-Koeffizient als Data Quality Metric für Variabilität in natürlichen Sprachdaten 超越尺度:多样性系数作为衡量自然语言数据可变性的数据质量计量标准 2306.13840v4
  • 188 07-02 Non-Convex Optimization with Spectral Radius Regularization Nicht konvexe Optimierung mit Spektralradius-Regularisierung 非Convex 优化光谱半半径常规化 2102.11210v2
  • 189 07-02 Statistical Inference for Responsiveness Verification Statistische Schlussfolgerung zur Überprüfung der Reaktionsfähigkeit 应对性核查的统计推论 2507.02169v1
  • 190 07-02 Quantifying the Importance of Data Alignment in Downstream Model Performance Quantifizierung der Bedeutung der Datenausrichtung in Downstream-Modellleistung 量化数据协调在下游模式绩效中的重要性 2501.08496v3
  • 191 07-02 Reliable algorithm selection for machine learning-guided design Zuverlässige Algorithmusauswahl für maschinelles Lernen-geführtes Design 机器学习制导设计可靠算法选择 2503.20767v2
  • 192 07-02 Non-exchangeable Conformal Prediction for Temporal Graph Neural Networks Nicht austauschbare konforme Vorhersage für zeitliche Graphen-Neural-Netzwerke 时图神经网络的非正规非正式非正式预测 2507.02151v1
  • 193 07-02 ODE$_t$(ODE$_l$): Shortcutting the Time and Length in Diffusion and Flow Models for Faster Sampling ODE$_t$(ODE$_l$): Verkürzung der Zeit und Länge in Diffusions- und Strömungsmodellen für eine schnellere Probenahme ODE$_t$(ODE$_美元):缩短传播的时间和长度以及更快抽样的流程模型 2506.21714v2
  • 194 07-02 Generating Hypotheses of Dynamic Causal Graphs in Neuroscience: Leveraging Generative Factor Models of Observed Time Series Generieren von Hypothesen dynamischer Kausalgraphen in der Neurowissenschaft: Nutzung generativer Faktorenmodelle beobachteter Zeitreihen 在神经科学中生成动态因果图的假设:利用观测时间序列的生成因数模型 2505.20697v3
  • 195 07-02 Parameters vs FLOPs: Scaling Laws for Optimal Sparsity for Mixture-of-Experts Language Models Parameter vs FLOPs: Skalierungsgesetze für optimale Sparsamkeit für Mixture-of-Experts Sprachmodelle 参数 vs FLOPs: 用于混合专家语言模型的最佳质量的缩放法律 2501.12370v3
  • 196 07-02 Generative Latent Diffusion for Efficient Spatiotemporal Data Reduction Generative Latent Diffusion für effiziente Spatiotemporale Datenreduktion 高效的平时数据减少的生成前传播 2507.02129v1
  • 197 07-02 CROP: Circuit Retrieval and Optimization with Parameter Guidance using LLMs CROP: Circuit Retrieval und Optimierung mit Parameterführung mit LLMs CROP: 使用LLMM 提供参数指导的电路检索和优化 2507.02128v1
  • 198 07-02 Identifying Systems with Symmetries using Equivariant Autoregressive Reservoir Computers Identifizierung von Systemen mit Symmetrien mit äquivalenten Autoregressiven Reservoir-Computern 使用等量自动递减储量计算机识别对称系统 2311.09511v4
  • 199 07-02 Can Artificial Intelligence solve the blockchain oracle problem? Unpacking the Challenges and Possibilities Kann Künstliche Intelligenz das Blockchain-Orakel-Problem lösen? Auspacken der Herausforderungen und Möglichkeiten 人工智能能解决连锁铁链和手链的问题吗? 2507.02125v1
  • 200 07-02 Scaling Collapse Reveals Universal Dynamics in Compute-Optimally Trained Neural Networks Skalierungskollaps offenbart universelle Dynamiken in berechenbaren neuralen Netzwerken 计算最佳培训神经网络中的缩放 缩放 折叠 Revials 通用动态 2507.02119v1
  • 201 07-02 TAROT: Targeted Data Selection via Optimal Transport TAROT: Gezielte Datenauswahl über optimalen Transport TAROT:通过最佳运输选择目标数据 2412.00420v2
  • 202 07-02 Parametric Neural Amp Modeling with Active Learning Parametrische Neuralamp-Modellierung mit aktivem Lernen 与积极学习学习建模 2507.02109v1
  • 203 07-02 Resolving Turbulent Magnetohydrodynamics: A Hybrid Operator-Diffusion Framework Auflösen der Turbulenten Magnetohydrodynamik: Ein hybrides Operator-Diffusions-Framework 溶解扰动磁磁流流体动力学:混合操作者-扩散框架 2507.02106v1
  • 204 07-02 A robust and adaptive MPC formulation for Gaussian process models Eine robuste und adaptive MPC-Formulierung für Gaußsche Prozessmodelle Gaussian 过程模型的强有力和适应性的多氯三氯杀螨(MPC)配方 2507.02098v1
  • 205 07-02 Asymptotically perfect seeded graph matching without edge correlation (and applications to inference) Asymptotisch perfekt gesäte Graphen ohne Randkorrelation (und Anwendungen zur Inferenz) 无边缘关联(和推断应用)的种子化图表匹配 2506.02825v2
  • 206 07-02 Energy-Based Transformers are Scalable Learners and Thinkers Energiebasierte Transformer sind skalierbare Lernende und Denker 以能源为基础的变换器是可缩放的学习者和思想家 2507.02092v1
  • 207 07-02 Sample Complexity Bounds for Linear Constrained MDPs with a Generative Model Probenkomplexitätsbounds für lineare eingeschränkte MDPs mit einem Generativen Modell 具有创创型模型的线性封闭式微DP样品复杂度宽度 2507.02089v1
  • 208 07-02 Learning From Crowdsourced Noisy Labels: A Signal Processing Perspective Von Crowdsourced Noisy Labels lernen: Eine Perspektive der Signalverarbeitung 从众包噪音标签学习:信号处理视角 2407.06902v2
  • 209 07-02 Evaluating the Promise and Pitfalls of LLMs in Hiring Decisions Bewertung des Versprechens und der Fälle von LLMs bei Hiring-Entscheidungen 评估LLM女士在雇用决定中的许诺和机会 2507.02087v1
  • 210 07-02 Non-negative matrix factorization algorithms generally improve topic model fits Nicht-negative Matrixfaktorisierungsalgorithmen verbessern generell Themenmodell-Anpassungen 非负矩阵非负矩阵乘数化算法一般改进专题模式 2105.13440v3
  • 211 07-02 Selective Feature Re-Encoded Quantum Convolutional Neural Network with Joint Optimization for Image Classification Selektives Feature neukodiertes Quantum-Konvolutionäres Neuralnetzwerk mit gemeinsamer Optimierung für die Bildklassifizierung 具有图像分类联合优化功能的量子革命神经网络 2507.02086v1
  • 212 07-02 GeoAda: Efficiently Finetune Geometric Diffusion Models with Equivariant Adapters GeoAda: Effizient feintune geometrische Diffusionsmodelle mit äquivalenten Adaptern GeoAda: 配有等差适应器的有效微粒几何扩散模型 2507.02085v1
  • 213 07-02 Adaptive Iterative Soft-Thresholding Algorithm with the Median Absolute Deviation Adaptive iterative Soft-Thresholding-Algorithmus mit der Median Absolute Abweichung 与中位绝对偏离相适应性迭替性软- 悬浮- 悬浮- 悬浮持量 2507.02084v1
  • 214 07-02 Unraveling particle dark matter with Physics-Informed Neural Networks Entwirren von Teilchen Dunkle Materie mit physik-informierten Neuronalen Netzwerken 用物理内成形神经网络解除粒子暗物质 2502.17597v2
  • 215 07-02 Towards Explaining Deep Neural Network Compression Through a Probabilistic Latent Space Auf dem Weg zur Erklärung der Komprimierung eines tiefen neuralen Netzwerks durch einen probabilistischen Latentenraum 致力于通过概率低端空间解释深神经网络压缩 2403.00155v3
  • 216 07-02 A Model-Consistent Data-Driven Computational Strategy for PDE Joint Inversion Problems Eine modellkonsistente datengestützte Computerstrategie für gemeinsame PDE-Inversionsprobleme PDE 联合流入问题数据驱动计算示范战略 2210.09228v3
  • 217 07-02 Privacy-Preserving in Connected and Autonomous Vehicles Through Vision to Text Transformation Schutz der Privatsphäre in vernetzten und autonomen Fahrzeugen durch Vision zur Texttransformation 通过愿景在连接和自主车辆中保护隐私到文字转换 2506.15854v2
  • 218 07-02 Reasoning on a Budget: A Survey of Adaptive and Controllable Test-Time Compute in LLMs Begründung eines Budgets: Eine Übersicht über adaptive und kontrollierbare Test-Time Compute in LLMs 预算理由:关于LLMM中适应和可控试验时间计算调查 2507.02076v1
  • 219 07-02 Exploring the Design Space of Diffusion Bridge Models Erforschung des Designraums von Modellen der Diffusionsbrücke 探索传播桥梁模型的设计空间 2410.21553v2
  • 220 07-02 HCVR: A Hybrid Approach with Correlation-aware Voting Rules for Feature Selection HCVR: Ein hybrider Ansatz mit korrelationsbewussten Abstimmungsregeln für die Feature Selection HCVR:关于选择地物的具有关联意识的表决规则的混合办法 2507.02073v1
  • 221 07-02 Benign Overfitting without Linearity: Neural Network Classifiers Trained by Gradient Descent for Noisy Linear Data Benign Overfiting ohne Linearität: Neurale Netzwerkklassifikatoren von Gradient Descent für Noisy Linear Data ausgebildet 无线性地穿透穿透: 由梯度下层训练的神经网络分类器, 用于 Noisy 线性数据 2202.05928v5
  • 222 07-02 Byzantine-Robust Gossip: Insights from a Dual Approach Byzantinisch-Robuster Klatsch: Einblicke aus einem doppelten Ansatz Byzantine-Robust Gossip:从双向办法中透视 2405.03449v2
  • 223 07-02 SPACE-SUIT: An Artificial Intelligence Based Chromospheric Feature Extractor and Classifier for SUIT SPACE-SUIT: Ein auf Künstlicher Intelligenz basierender chromosphärischer Feature Extractor und Klassifikator für SUIT 天基:人造情报基础铬化层地物提取器和SUIT分类器 2412.08589v2
  • 224 07-02 How Well Does GPT-4o Understand Vision? Evaluating Multimodal Foundation Models on Standard Computer Vision Tasks Wie gut versteht GPT-4o Vision? Bewertung multimodaler Basismodelle auf Standard Computer Vision Aufgaben GPT-4o GPT-4o如何理解愿景?评估标准计算机愿景任务多模式基金会模式 2507.01955v1
  • 225 07-02 Test-Time Scaling with Reflective Generative Model Test-Zeit-Skalierung mit reflektierendem Generativem Modell 具有反反思考生成模型的试验时间缩放 2507.01951v1
  • 226 07-02 Characterizing control between interacting subsystems with deep Jacobian estimation Charakterisierung der Kontrolle zwischen interagierenden Teilsystemen mit tiefer Jacobian-Schätzung 确定具有深度 Jacobian 估计的交互子系统之间的控制特性 2507.01946v1
  • 227 07-02 SpecCLIP: Aligning and Translating Spectroscopic Measurements for Stars SpecCLIP: Richten und Übersetzen spektroskopischer Messungen für Sterne spectCLIP: 恒星光谱测量的对齐和转换 2507.01939v1
  • 228 07-02 A first-order method for nonconvex-nonconcave minimax problems under a local Kurdyka-Łojasiewicz condition Eine Methode erster Ordnung für nicht konvex-nichtkonkave Minimax-Probleme unter einem lokalen Kurdyka-Łojasiewicz Zustand 当地Kurdyka-ojasiewicz状态下非convex-nonconcoln非conculev 迷你 问题的第一阶方法 2507.01932v1
  • 229 07-02 Exploring a Hybrid Deep Learning Approach for Anomaly Detection in Mental Healthcare Provider Billing: Addressing Label Scarcity through Semi-Supervised Anomaly Detection Erforschung eines hybriden Deep-Learning-Ansatzes für Anomalienerkennung in der psychischen Gesundheitsversorgung Anbieter Rechnungsstellung: Adressierung von Label Scarcity durch semi-überwachte Anomalienerkennung 探索在精神保健提供者的账单中以异性检测为异性探索的混合深学习方法:通过半超异性检测解决标签的稀缺问题 2507.01924v1
  • 230 07-02 Gradient-Adaptive Policy Optimization: Towards Multi-Objective Alignment of Large Language Models Gradient-Adaptive Policy Optimization: Auf dem Weg zu einer multi-objektiven Ausrichtung großer Sprachmodelle 渐进式政策优化:实现大语言模式多目标一致 2507.01915v1
  • 231 07-02 Advancing Magnetic Materials Discovery – A structure-based machine learning approach for magnetic ordering and magnetic moment prediction Advancing Magnetic Materials Discovery – Ein strukturbasierter maschineller Lernansatz für magnetische Ordnung und magnetische Momentvorhersage 推进磁材料发现 – – 一种基于结构的机器学习方法,用于磁定序和磁瞬时预测 2507.01913v1
  • 232 07-02 High-Layer Attention Pruning with Rescaling Hochebene Aufmerksamkeit Pruning mit Rescaling 高关注度 以降降降为缓冲 2507.01900v1
  • 233 07-02 Recursive Training Loops in LLMs: How training data properties modulate distribution shift in generated data? Rekursive Trainingsschleifen in LLMs: Wie modulieren Trainingsdateneigenschaften die Verteilungsverschiebung in generierten Daten? LLMM中的递归培训循环:培训数据特性如何调整生成数据的分布变化? 2504.03814v3
  • 234 07-02 STEM Diffraction Pattern Analysis with Deep Learning Networks STEM Diffraktionsmusteranalyse mit Deep Learning Networks STEM 利用深学习网络分析违规模式 2507.01889v1
  • 235 07-02 A computationally frugal open-source foundation model for thoracic disease detection in lung cancer screening programs Ein rechnerisch frugales Open-Source-Stiftungsmodell für Thorax-Erkennung in Lungenkrebs-Screening-Programmen 肺癌筛查方案中胸腔酸疾病检测的计算节节制开源基础模型 2507.01881v1
  • 236 07-02 Evolving HPC services to enable ML workloads on HPE Cray EX Evolving HPC-Dienste, um ML-Workloads auf HPE Cray EX zu ermöglichen 不断演化的HPC服务,使HPE Cray EX 的ML工作量得以完成 2507.01880v1
  • 237 07-02 Towards Foundation Auto-Encoders for Time-Series Anomaly Detection Towards Foundation Auto-Encoder für Zeitreihenanomalienerkennung 建立用于时间-中外异常探测的自动计算器基金会 2507.01875v1
  • 238 07-02 Adapting Probabilistic Risk Assessment for AI Anpassung der probabilistischen Risikobewertung für KI 适应大赦国际的概率风险评估 2504.18536v3
  • 239 07-02 Distribution Matching for Self-Supervised Transfer Learning Verteilung passend für selbstüberwachtes Transfer-Lernen 自控传输学习配对分布 2502.14424v2
  • 240 07-02 Low-Perplexity LLM-Generated Sequences and Where To Find Them Low-Perplexity LLM-generierte Sequenzen und wo sie zu finden sind 低重复性 LLM 生成序列及其查找地点 2507.01844v1
  • 241 07-02 Automatic Rank Determination for Low-Rank Adaptation via Submodular Function Maximization Automatische Rangbestimmung für Low-Rank-Anpassung über Submodulare Funktionsmaximierung 通过子模块函数最大化自动确定低 Rank 适应的自动排序 2507.01841v1
  • 242 07-02 GLM-4.1V-Thinking: Towards Versatile Multimodal Reasoning with Scalable Reinforcement Learning GLM-4.1V-Denken: Auf dem Weg zu einer vielseitigen multimodalen Vernunft mit skalierbarem Stärkungslernen GLM-4.1V-思考:努力以可缩放的强化学习实现多式多式理由 2507.01006v2
  • 243 07-02 Generalization performance of narrow one-hidden layer networks in the teacher-student setting Generalisierungsleistung von schmalen ein-versteckten Ebenennetzwerken im Lehrer-Studenten-Setting 师生环境中狭窄的单层网络的普及性业绩 2507.00629v2
  • 244 07-02 Out-of-Distribution Detection Methods Answer the Wrong Questions Out-of-Distribution Detection Methoden beantworten die falschen Fragen 解答错误问题的方法 2507.01831v1
  • 245 07-02 mGRADE: Minimal Recurrent Gating Meets Delay Convolutions for Lightweight Sequence Modeling mGRADE: Minimales Recurrent-Gating trifft auf Verzögerungskonvolutionen für leichte Sequenzmodellierung mGRADE: 轻量级序建模的最小经常定时满足延迟变数 2507.01829v1
  • 246 07-02 MILP-SAT-GNN: Yet Another Neural SAT Solver MILP-SAT-GNN: Noch ein neuraler SAT Solver MILP-SAT-GNN:然而,另一个神经SAT溶剂 2507.01825v1
  • 247 07-02 TD-MPC-Opt: Distilling Model-Based Multi-Task Reinforcement Learning Agents TD-MPC-Opt: Destillieren modellbasierter Multi-Task-Verstärkungs-Learning-Agenten TD-MPC-Opt: 以示范为基础的多任务加强学习机构 2507.01823v1
  • 248 07-02 Deep Reinforcement Learning for Traveling Purchaser Problems Deep Verstärkung Learning for Traveling Buyer Probleme 为旅行购买者问题深入强化学习 2404.02476v6
  • 249 07-02 LoRA Fine-Tuning Without GPUs: A CPU-Efficient Meta-Generation Framework for LLMs LoRA Feintuning ohne GPUs: Ein CPU-effizientes Meta-Generation-Framework für LLMs LoRA 无GPUs的精细调整:LLMs的CPU-提高功能元元发光框架 2507.01806v1
  • 250 07-02 Towards Decentralized and Sustainable Foundation Model Training with the Edge Auf dem Weg zu einer dezentralisierten und nachhaltigen Stiftung Modellausbildung mit der Kante 实现权力下放和可持续基础建设的边缘模式培训 2507.01803v1
  • 251 07-02 The Anatomy of Evidence: An Investigation Into Explainable ICD Coding Die Anatomie der Beweise: Eine Untersuchung zur erklärbaren ICD-Kodierung 证据解剖学:调查可解释的 ICD 编码 2507.01802v1
  • 252 07-02 Neural Entropy-stable conservative flux form neural networks for learning hyperbolic conservation laws Neurale Entropie-stabile konservative Flussform neuronale Netzwerke zum Lernen hyperbolischer Erhaltungsgesetze 为学习双曲保护法而形成的神经网络 2507.01795v1
  • 253 07-02 How Do Vision-Language Models Process Conflicting Information Across Modalities? Wie verarbeiten Vision-Language-Modelle widersprüchliche Informationen über Modalitäten hinweg? 愿景-语言模型如何以不同方式处理信息冲突问题? 2507.01790v1
  • 254 07-02 SpikeNAS: A Fast Memory-Aware Neural Architecture Search Framework for Spiking Neural Network-based Embedded AI Systems SpikeNAS: Ein schnelles Speicher-Aware Neural Architecture Such-Framework für neurale eingebettete KI-Systeme SpikNAS: 以嵌入式AI系统为基础的Spiking神经网络嵌入式人工智能系统快速记忆-智能神经结构搜索框架 2402.11322v4
  • 255 07-02 MuRating: A High Quality Data Selecting Approach to Multilingual Large Language Model Pretraining MuRating: Ein qualitativ hochwertiger Datenauswahlansatz zur Mehrsprachigen Vorschulung großer Sprachmodelle 词汇:多语言大语言模式预科培训的高质量数据选择方法 2507.01785v1
  • 256 07-02 BranchNet: A Neuro-Symbolic Learning Framework for Structured Multi-Class Classification BranchNet: Ein neuro-symbolisches Lernrahmenwerk für strukturierte Multi-Klassen-Klassifikation 分支网:结构化多类分类的神经-双曲学习框架 2507.01781v1
  • 257 07-02 Beating Transformers using Synthetic Cognition Transformer mit synthetischer Kognition schlagen 使用合成认知击打变形器 2504.07619v3
  • 258 07-02 Retrieving snow depth distribution by downscaling ERA5 Reanalysis with ICESat-2 laser altimetry Rückgewinnung der Schneetiefe durch Downscaling ERA5 Reanalyse mit ICESat-2 Laseraltimetry 通过用 ICESat-2 激光测高法进行重新分析,缩小ERA5 的尺度,获取积雪深度分布 2410.17934v2
  • 259 07-02 Long-Context Linear System Identification Langkontext-Linearsystem-Identifikation 长全文线性系统识别 2410.05690v2
  • 260 07-02 Interact2Vec – An efficient neural network-based model for simultaneously learning users and items embeddings in recommender systems Interact2Vec – Ein effizientes neuronales Netzwerk-basiertes Modell zum gleichzeitigen Lernen von Benutzern und Elementen in Empfehlungssysteme Interact2Vec – – 一个有效的神经网络模式,用于同时学习用户和项目嵌入建议系统 2506.22648v2
  • 261 07-02 Improving Consistency Models with Generator-Augmented Flows Verbesserung von Konsistenzmodellen mit Generator-Augmented Flows 改进发电机加速流动的一致性模式 2406.09570v4
  • 262 07-02 Enhanced Generative Model Evaluation with Clipped Density and Coverage Verbesserte Generative Modellbewertung mit Clipped Dichte und Abdeckung 具有缩小密度和覆盖率的强化生成模型评价 2507.01761v1
  • 263 07-02 AirRadar: Inferring Nationwide Air Quality in China with Deep Neural Networks AirRadar: Rückschlüsse auf landesweite Luftqualität in China mit tiefen neuralen Netzwerken 空中雷达:与深神经网络一起推断中国全国空气质量 2501.13141v2
  • 264 07-02 Visual Structures Helps Visual Reasoning: Addressing the Binding Problem in VLMs Visuelle Strukturen helfen Visual Reasoning: Das Bindeproblem in VLMs beheben 视觉结构帮助视觉理性:解决VLM 中具有约束力的问题 2506.22146v2
  • 265 07-02 Tuning without Peeking: Provable Privacy and Generalization Bounds for LLM Post-Training Tuning ohne Peeking: Provable Privacy and Generalization Bounds for LLM Post-Training 无足足迹的注资:LLM培训后可实现的隐私和普遍化的圈子 2507.01752v1
  • 266 07-02 Time-Series JEPA for Predictive Remote Control under Capacity-Limited Networks Zeitreihen-JEPA für vorausschauende Fernsteuerung unter kapazitätsbeschränkten Netzen 在能力限制的网络下进行预测遥控的JEPA时间系列JEPA 2406.04853v2
  • 267 07-02 A Real-Time Digital Twin for Type 1 Diabetes using Simulation-Based Inference Ein Echtzeit-Digital Twin für Typ-1-Diabetes mit simulationsbasierter Schlussfolgerung 使用模拟推断法为1型糖尿病实时数字双 2507.01740v1
  • 268 07-02 Extracting Interpretable Models from Tree Ensembles: Computational and Statistical Perspectives Extrahieren von interpretierbaren Modellen aus Baumensembles: Computational and Statistical Perspectives 从树形集合中提取解释模型:计算和统计视角 2506.20114v2
  • 269 07-02 Average Calibration Error: A Differentiable Loss for Improved Reliability in Image Segmentation Durchschnittlicher Kalibrierungsfehler: Ein differenzierbarer Verlust für verbesserte Zuverlässigkeit in der Bildsegmentierung 平均校准误差:图像分割法可靠性提高的可区别损失 2403.06759v2
  • 270 07-02 ECCV 2024 W-CODA: 1st Workshop on Multimodal Perception and Comprehension of Corner Cases in Autonomous Driving ECCV 2024 W-CODA: 1. Workshop zur multimodalen Wahrnehmung und Verständlichkeit von Eckfällen im autonomen Fahren ECCV 2024 W-CODA:第一次关于自主驾驶时对拐角案例的多模式认识和了解的讲习班 2507.01735v1
  • 271 07-02 Leveraging Genetic Algorithms for Efficient Demonstration Generation in Real-World Reinforcement Learning Environments Nutzung genetischer Algorithmen für effiziente Demonstrationsgenerierung in realen Lernumgebungen zur Stärkung der Welt 在现实世界加强学习环境中利用遗传遗传价值促进高效示范的生成 2507.00762v2
  • 272 07-02 Token Communication in the Era of Large Models: An Information Bottleneck-Based Approach Token Communication in der Ära der großen Modelle: Ein auf Informationen basierender Ansatz 大模式时代中的 Tokon 通信:基于信息瓶颈的办法 2507.01728v1
  • 273 07-02 Unified Triplet-Level Hallucination Evaluation for Large Vision-Language Models Unified Triplet-Level Halluzination Evaluation für große Vision-Sprache Modelle 大型视觉语言模型统一三维级幻觉评价 2410.23114v3
  • 274 07-02 NGAT: A Node-level Graph Attention Network for Long-term Stock Prediction NGAT: Ein knotenbasiertes Graphen-Achtungsnetzwerk für langfristige Bestandsvorhersage NGAT: 长期库存预测节点图表关注网络 2507.02018v1
  • 275 07-02 Revisiting Learning Rate Control Überprüfung der Steuerung der Lernquoten 重新审查学习率控制 2507.01724v1
  • 276 07-02 15,500 Seconds: Lean UAV Classification Leveraging PEFT and Pre-Trained Networks 15.500 Sekunden: Lean UAV Classification Leveraging PEFT und Pre-Trained Networks 15,500秒: Lean UAV 分类利用PEFT和训练前网络 2506.11049v2
  • 277 07-02 A Review of Bayesian Uncertainty Quantification in Deep Probabilistic Image Segmentation Eine Überprüfung der Bayesian Uncertainty Quantification in Deep Probabilistic Image Segmentation 对贝叶斯不确定因素在深概率图像分割中量化的回顾 2411.16370v4
  • 278 07-02 Agent Ideate: A Framework for Product Idea Generation from Patents Using Agentic AI Agent-Idee: Ein Rahmen für Produkt-Ideen-Erzeugung aus Patenten mit Agent-KI Agent Ideate: 使用Agentic AI 专利产品创意一代框架 2507.01717v1
  • 279 07-02 Enhancing Expressivity of Quantum Neural Networks Based on the SWAP test Steigerung der Expressivität von Quantum-Neuralen Netzwerken auf Basis des SWAP-Tests 根据全部门办法测试,提高量子神经网络的表达性 2506.16938v2
  • 280 07-02 B-PL-PINN: Stabilizing PINN Training with Bayesian Pseudo Labeling B-PL-PINN: Stabilisierendes PINN-Training mit Bayesian Pseudo-Etikettierung B-PL-PINN:与Bayesian Pseudo标签公司一起稳定PINN培训 2507.01714v1
  • 281 07-02 Grower-in-the-Loop Interactive Reinforcement Learning for Greenhouse Climate Control Grower-in-the-Loop Interaktives Verstärkungslernen für Greenhouse Climate Control 种植者在Loop-Loop 互动强化学习促进温室气候控制 2505.23355v2
  • 282 07-02 Relational Causal Discovery with Latent Confounders Beziehungale kausale Entdeckung mit latenten Gründern 与内部扰乱者发生关系性因果发现 2507.01700v1
  • 283 07-02 Variational Graph Convolutional Neural Networks Variationale Graphenkonvolutionale Neurale Netzwerke 变形图 革命神经神经网络 2507.01699v1
  • 284 07-02 Dynamic Similarity Graph Construction with Kernel Density Estimation Dynamische Ähnlichkeit Graphenkonstruktion mit Kernel Dichteschätzung 具有内核密度估计的动态相似性图建图 2507.01696v1
  • 285 07-02 FAMES: Fast Approximate Multiplier Substitution for Mixed-Precision Quantized DNNs–Down to 2 Bits! FAMES: Fast ungefähre Multiplizierer-Substitution für gemischt-Präzision Quantisierte DNNs–Down auf 2 Bits! 快速近似乘数替代混合精度量化 DNNs - 下到 2 位! 2411.18055v3
  • 286 07-02 PERTINENCE: Input-based Opportunistic Neural Network Dynamic Execution PERTINENCE: Inputbasierte opportunistische neurale Netzwerk-Dynamische Ausführung 以投入为基础的机会神经网络动态执行 2507.01695v1
  • 287 07-02 EP-Diffuser: An Efficient Diffusion Model for Traffic Scene Generation and Prediction via Polynomial Representations EP-Diffusor: Ein effizientes Diffusionsmodell für die Verkehrsszenengenerierung und -vorhersage über polynomische Darstellungen EP-Diffuser:通过多边代表制有效传播交通景点生成和预测模式 2504.05422v2
  • 288 07-02 GPT, But Backwards: Exactly Inverting Language Model Outputs GPT, aber zurück: Genaue Invertierung von Sprachmodell-Ausgaben GPT, 但向后: 完全反转语言模型产出 2507.01693v1
  • 289 07-02 A generative modeling / Physics-Informed Neural Network approach to random differential equations Eine generative Modellierung / Physik-informierte Neural Network Ansatz zu zufälligen Differentialgleichungen 随机差异方程的基因建模/物理-内建神经网络方法 2507.01687v1
  • 290 07-02 Enhancing Robustness to Missing Modalities through Clustered Federated Learning Verbesserung der Robustheit fehlender Modalitäten durch clustered Federated Learning 通过分组联邦学习增强对失踪模式的有力性 2505.06911v2
  • 291 07-02 Blending Supervised and Reinforcement Fine-Tuning with Prefix Sampling Mischen von Supervised und Verstärkung Feintuning mit Präfix-Sampling 与前缀抽样混合监管和强化精细推荐 2507.01679v1
  • 292 07-02 AsyncFlow: An Asynchronous Streaming RL Framework for Efficient LLM Post-Training AsyncFlow: Asynchrones Streaming RL-Framework für effizientes LLM-Post-Training AsyncFlow: 高效LLM 培训后高效LLM的同步流流RL框架 2507.01663v1
  • 293 07-02 Co-Optimizing Reconfigurable Environments and Policies for Decentralized Multi-Agent Navigation Co-Optimierung neu konfigurierbarer Umgebungen und Politiken für dezentralisierte Multi-Agent-Navigation 共同优化可重新配置的环境和权力下放多机构导航政策 2403.14583v2
  • 294 07-02 Contrastive Learning and Adversarial Disentanglement for Privacy-Aware Task-Oriented Semantic Communication Kontrastives Lernen und zweifelhafte Entflechtung für die datenschutzbewusste, aufgabenorientierte semantische Kommunikation 隐私-软件任务导向的语义交流的矛盾学习和反对解脱 2410.22784v3
  • 295 07-02 NegMerge: Sign-Consensual Weight Merging for Machine Unlearning NegMerge: Sign-Consensual Weight Merging für maschinelles Lernen NegMelge: 机器脱学的信号-协同重力合并 2410.05583v2
  • 296 07-02 SPoT: Subpixel Placement of Tokens in Vision Transformers SPoT: Subpixel-Platzierung von Token in Vision Transformers SPOT: 视觉变形器中当当量子像素配置 2507.01654v1
  • 297 07-02 Sublinear Regret for a Class of Continuous-Time Linear-Quadratic Reinforcement Learning Problems Sublinearer Bedauern für eine Klasse von linear-Quadratischen Lernproblemen 连续时线性强化学习问题分类的子线性遗憾 2407.17226v5
  • 298 07-02 On the Fundamental Impossibility of Hallucination Control in Large Language Models Über die grundsätzliche Unmöglichkeit der Halluzinationskontrolle in großen Sprachmodellen 关于大语言模型中幻听控制的基本不可能性 2506.06382v2
  • 299 07-02 GradMetaNet: An Equivariant Architecture for Learning on Gradients GradMetaNet: Eine gleichwertige Architektur zum Lernen auf Gradienten GradMetaNet:关于渐进式学习的等同架构 2507.01649v1
  • 300 07-02 Dance Dance ConvLSTM Tanztanz ConvLSTM 舞蹈舞蹈ConvLSTM 2507.01644v1
  • 301 07-02 Initialization Method for Factorization Machine Based on Low-Rank Approximation for Constructing a Corrected Approximate Ising Model Initialisierungsmethode für Factorization Machine basierend auf Low-Rank-Annäherung für die Konstruktion eines korrigierten ungefähren Ising-Modells 建筑正确近似误差型号的基于低Rank近似度的保理机的初始化方法 2410.12747v3
  • 302 07-02 Truthful Elicitation of Imprecise Forecasts Wahre Botschaft von ungenauen Prognosen 以真真真真真真真真真切的易感简易预报 2503.16395v3
  • 303 07-02 Kernel Recursive Least Squares Dictionary Learning Algorithm Kernel Rekursive Least Squares Wörterbuch Lernen Algorithmen 最小广场字典学习算法 2507.01636v1
  • 304 07-02 Non-collective Calibrating Strategy for Time Series Forecasting Nicht-kollektive Kalibrierstrategie für Zeitreihenprognosen 非集体时间序列预测校准战略 2506.03176v2
  • 305 07-02 Tile and Slide : A New Framework for Scaling NeRF from Local to Global 3D Earth Observation Tile and Slide : Ein neuer Rahmen für die Skalierung von NeRF von lokaler bis globaler 3D-Erdbeobachtung 平板和幻灯片:从地方向全球3D地球观测扩大内域FF的新框架 2507.01631v1
  • 306 07-02 Adapting Rule Representation With Four-Parameter Beta Distribution for Learning Classifier Systems Anpassung der Regeldarstellung mit Vier-Parameter Beta-Distribution für Lernklassifikatoren 调整教学分类系统有四分之四张贝塔分配的细则代表制 2506.03602v2
  • 307 07-02 Query Complexity of Classical and Quantum Channel Discrimination Abfrage der Komplexität der klassischen und Quantenkanal-Diskriminierung 古典和量道歧视的复杂程度问题 2504.12989v2
  • 308 07-02 When Less Is More: Binary Feedback Can Outperform Ordinal Comparisons in Ranking Recovery Wenn weniger ist mehr: Binäre Rückkopplung kann übertreffen Ordinal-Vergleiche in Ranking Recovery 少少少少少:二进反馈在评分回收方面能够超过常规比较 2507.01613v1
  • 309 07-02 Upper and lower bounds for the Lipschitz constant of random neural networks Obere und untere Grenzen für die Lipschitz-Konstante zufälliger neuronaler Netzwerke 随机神经网络Lipschitz常量的上下界限 2311.01356v4
  • 310 07-02 Survivability of Backdoor Attacks on Unconstrained Face Recognition Systems Überlebensfähigkeit von Hintertürangriffen auf ungezwungene Gesichtserkennungssysteme 对不受限制的面对面承认系统的后门攻击的可存活性 2507.01607v1
  • 311 07-02 Data Agent: A Holistic Architecture for Orchestrating Data+AI Ecosystems Data Agent: Eine ganzheitliche Architektur für die Orchestrierung von Daten+AI-Ökosystemen 数据代号:一个用于管弦化数据+AI生态系统的综合结构 2507.01599v1
  • 312 07-02 Analysis of Muon’s Convergence and Critical Batch Size Analyse der Konvergenz von Muon und der kritischen Batch-Größe Muon的聚合和关键批量大小分析 2507.01598v1
  • 313 07-02 Efficiently Vectorized MCMC on Modern Accelerators Effizient vektorisierte MCMC auf modernen Beschleunigern 现代加速器问题MCMC 2503.17405v2
  • 314 07-02 Autonomous AI Surveillance: Multimodal Deep Learning for Cognitive and Behavioral Monitoring Autonome KI-Überwachung: Multimodales Deep Learning für Kognitives und Verhaltensmonitoring AI 自主监测:认知和行为监测的多式深层学习 2507.01590v1
  • 315 07-02 A Privacy-Preserving Indoor Localization System based on Hierarchical Federated Learning Ein datenschutzerhaltendes Indoor-Lokalisierungssystem basierend auf hierarchischem Federated Learning 基于等级联邦学习的维护隐私室内本地化系统 2507.01581v1
  • 316 07-02 Transfer Learning for VLC-based indoor Localization: Addressing Environmental Variability Transfer Learning für VLC-basierte Indoor-Localization: Adressierung von Umweltvariabilität 以VLC为基础的室内本地化转让学习:解决环境变化多变性问题 2507.01575v1
  • 317 07-02 Is merging worth it? Securely evaluating the information gain for causal dataset acquisition Lohnt es sich zu verschmelzen? Sichere Auswertung des Informationsgewinns für die kausale Datenerfassung 合并是否值得? 安全评估因果数据集获取的信息收益 2409.07215v3
  • 318 07-02 On the Effect of Ruleset Tuning and Data Imbalance on Explainable Network Security Alert Classifications: a Case-Study on DeepCASE Über die Auswirkungen von Ruleset Tuning und Datenunwucht auf erklärbare Netzsicherheits-Alarmklassifikationen: eine Fallstudie zu DeepCASE 规则图案和数据平衡对可解释的网络安全警报分类的影响:深 CASE案例研究 2507.01571v1
  • 319 07-02 TRACED: Transition-aware Regret Approximation with Co-learnability for Environment Design TRACED: Transition-aware regret Annäherung mit Mitlernbarkeit für Umweltdesign TRACEED: 环境设计中具有共负环境设计共负作用的过渡-意识到遗憾相近情况 2506.19997v2
  • 320 07-02 Time Series Representations for Classification Lie Hidden in Pretrained Vision Transformers Zeitreihendarstellungen für die Klassifizierung liegen in vorgebildeten Visionstransformatoren versteckt 未受过训练的愿景变异者中隐藏的分类隐隐藏物的 时间序列 2506.08641v2
  • 321 07-02 Unsupervised Panoptic Interpretation of Latent Spaces in GANs Using Space-Filling Vector Quantization Unüberwachte Panoptische Interpretation von Latent Spaces in GANs mittels raumfüllender Vector-Quantisierung 利用空间填空矢量量化,在GANs中无监督地对低空空间进行全光解释 2410.20573v2
  • 322 07-02 Who Should I Listen To? Adaptive Collaboration in Personalized Federated Learning Wem sollte ich zuhören? Adaptive Zusammenarbeit im personalisierten Federated Learning 个人化联邦学习中的适应性协作 2507.00259v2
  • 323 07-02 How Weight Resampling and Optimizers Shape the Dynamics of Continual Learning and Forgetting in Neural Networks Wie Gewicht-Resampling und Optimierer die Dynamik des kontinuierlichen Lernens und Vergessens in neuralen Netzwerken gestalten 重力再采样和优化如何塑造神经网络中持续学习和遗忘的动态 2507.01559v1
  • 324 07-02 Learned-Database Systems Security Gelernte Datenbanksysteme Sicherheit 数据库系统安全 2212.10318v4
  • 325 07-02 ZeCO: Zero Communication Overhead Sequence Parallelism for Linear Attention ZeCO: Null Kommunikation Überkopfsequenz Parallelität für lineare Aufmerksamkeit ZeCO: 线性注意的无通信重叠序列平行主义 2507.01004v2
  • 326 07-02 AI and Remote Sensing for Resilient and Sustainable Built Environments: A Review of Current Methods, Open Data and Future Directions KI und Fernerkundung für resiliente und nachhaltig gebaute Umgebungen: Eine Übersicht über aktuelle Methoden, offene Daten und zukünftige Richtungen 适应性和可持续建筑环境的AI和遥感:对当前方法、开放数据和未来方向的审查 2507.01547v1
  • 327 07-02 MARVIS: Modality Adaptive Reasoning over VISualizations MARVIS: Modalität Adaptive Begründung über VISualisierungen MARVIS: 与VIS定性有关的模式调整理由 2507.01544v1
  • 328 07-02 Parsimonious Gaussian mixture models with piecewise-constant eigenvalue profiles Parsimonische Gauß-Mischmodelle mit stückweise-konstanten Eigenwertprofilen 配有麻片- 恒定等离子值剖面图的帕西莫斯高斯混合混合物模型 2507.01542v1
  • 329 07-02 Consistency of Learned Sparse Grid Quadrature Rules using NeuralODEs Konsistenz der gelernten Sparse Grid-Quadratur-Regeln mit NeuralODEs 使用神经元的 “ 利用神经元 “ 的 “ 突变网格宽度规则 “ 一致性 2507.01533v1
  • 330 07-02 Chargax: A JAX Accelerated EV Charging Simulator Chargax: Ein JAX beschleunigter EV-Ladesimulator 电荷: JAX 加速 EV 充电模拟器 2507.01522v1
  • 331 07-02 Loss Functions in Diffusion Models: A Comparative Study Verlustfunktionen bei Diffusionsmodellen: Eine vergleichende Studie 传播模型中的功能损失:比较研究 2507.01516v1
  • 332 07-02 Mamba Guided Boundary Prior Matters: A New Perspective for Generalized Polyp Segmentation Mamba Guided Boundary Prior Matters: Eine neue Perspektive für die generalisierte Polyp-Segmentierung Mamba 向导边界前事项:普遍聚分法的新视角 2507.01509v1
  • 333 07-02 Scaling Up Liquid-Resistance Liquid-Capacitance Networks for Efficient Sequence Modeling Skalierung von Flüssig-Resistenz-Netzwerken für eine effiziente Sequenzmodellierung 增强增强流动性恢复力的流动性能力网络,以建立高效序列建模 2505.21717v3
  • 334 07-02 SimBank: from Simulation to Solution in Prescriptive Process Monitoring SimBank: Von der Simulation zur Lösung in der Prescriptive Process Monitoring SimBank:从模拟到规范程序监测的解决方案 2506.14772v3
  • 335 07-02 Meteoroid stream identification with HDBSCAN unsupervised clustering algorithm Meteoroid-Stream-Identifikation mit HDBSCAN unüberwachtem Clustering-Algorithmus 与 HDBScAN 不受监督的群集算法的气象流识别 2507.01501v1
  • 336 07-02 optimizn: a Python Library for Developing Customized Optimization Algorithms optimizn: eine Python-Bibliothek für die Entwicklung maßgeschneiderter Optimierungsalgorithmen 优化: 用于开发自定制优化算法的 Python 图书馆 2503.00033v2
  • 337 07-02 Diffusion-based Iterative Counterfactual Explanations for Fetal Ultrasound Image Quality Assessment Diffusionsbasierte iterative Gegenfaktische Erklärungen für fetale Ultraschall-Bildqualitätsbewertung 胎儿超声波图像质量评估的基于传播的替代反事实解释 2403.08700v2
  • 338 07-02 On the Trade-off between Flatness and Optimization in Distributed Learning Auf dem Trade-off zwischen Flatness und Optimierung im verteilten Lernen 关于分配式学习的平衡与优化的平衡 2406.20006v2
  • 339 07-02 How to Securely Shuffle? A survey about Secure Shufflers for privacy-preserving computations Wie man sicher Shuffle? Eine Umfrage über Secure Shuffler für die Privatsphäre-Erhaltung Berechnungen 如何安全打乱? 关于用于隐私保护计算的安全打碎器的调查 2507.01487v1
  • 340 07-02 ManifoldMind: Dynamic Hyperbolic Reasoning for Trustworthy Recommendations ManifoldMind: Dynamische hyperbolische Begründung für vertrauenswürdige Empfehlungen MManifulMind: 值得信赖的建议的动态超双曲理由 2507.02014v1
  • 341 07-02 Optimizing Methane Detection On Board Satellites: Speed, Accuracy, and Low-Power Solutions for Resource-Constrained Hardware Optimierung der Methan-Erkennung auf Board-Satelliten: Geschwindigkeit, Genauigkeit und Low-Power-Lösungen für ressourcenbeschränkte Hardware 优化板上卫星的甲烷探测:快速、准确和低功率办法 2507.01472v1
  • 342 07-02 Zero-Incentive Dynamics: a look at reward sparsity through the lens of unrewarded subgoals Zero-Incentive Dynamics: ein Blick auf Belohnungssparsität durch die Linse der unerhörten Subgoals 零奖励零激励动态:从未获得回报的次级目标的角度来看待奖励的宽度 2507.01470v1
  • 343 07-02 Cross-platform Smartphone Positioning at Museums Plattformübergreifende Smartphone-Positionierung bei Museen 博物馆的跨平台智能手机定位 2507.01469v1
  • 344 07-02 Adaptive NAD: Online and Self-adaptive Unsupervised Network Anomaly Detector Adaptive NAD: Online- und Self-Adaptive Unüberwachte Netzwerk-Anomaly Detector 适应性NAD: 在线和自适应性不受监督的网络异常探测器 2410.22967v4
  • 345 07-02 Symbolic identification of tensor equations in multidimensional physical fields Symbolische Identifizierung von Tensorgleichungen in mehrdimensionalen physikalischen Feldern 在多层面物理领域对高方方形的符号识别 2507.01466v1
  • 346 07-02 Efficient Online Reinforcement Learning Fine-Tuning Need Not Retain Offline Data Effizientes Online-Verstärktes Lernen Feinsteuerung braucht keine Offline-Daten 不保留离线数据 2412.07762v3
  • 347 07-02 Tensor Program Optimization for the RISC-V Vector Extension Using Probabilistic Programs Tensor-Programmoptimierung für die RISC-Vektorerweiterung mittels probabilistischer Programme 利用概率方案优化RISC-V矢量扩展 2507.01457v1
  • 348 07-02 FE-LWS: Refined Image-Text Representations via Decoder Stacking and Fused Encodings for Remote Sensing Image Captioning FE-LWS: Verfeinerte Bild-Text-Darstellungen über Decoder Stacking und Fused Encodings für Remote Sensing Image Captioning FE-LWS:通过用于遥感图像说明的解码器堆放和引信编码系统改进图像文本演示文示 2502.09282v2
  • 349 07-02 Rank-1 Matrix Completion with Gradient Descent and Small Random Initialization Rang-1-Matrix-Vervollständigung mit gradienten Abstieg und kleiner zufälliger Initialisierung RAN-1 矩阵完成, 以渐变源和小随机初始化完成 2212.09396v3
  • 350 07-02 Machine Learning Based Stress Testing Framework for Indian Financial Market Portfolios Machine Learning Based Stress Testing Framework für indische Finanzmarktportfolios 印度金融市场投资组合机械学习压力测试框架 2507.02011v1
  • 351 07-02 EdgeLoRA: An Efficient Multi-Tenant LLM Serving System on Edge Devices EdgeLoRA: Ein effizientes Multi-Tenant LLM Serving System auf Edge-Geräten EdgeloRA:一个高效的长长长长长长长长长长长长长长长长长长边缘装置服务系统 2507.01438v1
  • 352 07-02 Efficient Split Federated Learning for Large Language Models over Communication Networks Effizientes Split-Federated-Learning für große Sprachmodelle über Kommunikationsnetzwerke 针对通信网络的大型语言模式的高效分裂联邦学习 2504.14667v2
  • 353 07-02 Pensieve Grader: An AI-Powered, Ready-to-Use Platform for Effortless Handwritten STEM Grading Pensieve Grader: Eine KI-Powered, Ready-to-Use Plattform für mühelose handschriftliche STEM-Grading Pensieve grafer: 一个AI授权的无力手写STEM分级的现用平台 2507.01431v1
  • 354 07-02 Gradient Short-Circuit: Efficient Out-of-Distribution Detection via Feature Intervention Gradient Short-Circuit: Effiziente Out-of-Distribution-Erkennung über Feature-Intervention 渐进短路:通过地物干预有效外分配探测 2507.01417v1
  • 355 07-02 A deep solver for backward stochastic Volterra integral equations Ein tiefer Löser für rückwärts stochastische Volterra-Integralgleichungen 一个深深解器,用于反向蒸蒸蒸蒸蒸气伏特拉集成方程式 2505.18297v2
  • 356 07-02 Decomposing Prediction Mechanisms for In-Context Recall Zersetzen von Vorhersagemechanismen für In-Context-Recall 内回顾的可预测性机制 2507.01414v1
  • 357 07-02 Evaluating LLM Agent Collusion in Double Auctions Bewertung von LLM Agent Collusion in Doppelauktionen 评价双重拍卖中的LLM Agent串通 2507.01413v1
  • 358 07-02 DeFusion: An Effective Decoupling Fusion Network for Multi-Modal Pregnancy Prediction DeFusion: Ein effektives Entkopplungs-Fusionsnetzwerk für multi-Modalen Schwangerschaftsvorhersage DFusion:多模式妊娠预测有效脱钩融合网络 2501.04353v2
  • 359 07-02 Coherent Online Road Topology Estimation and Reasoning with Standard-Definition Maps Kohärente Online-Road-Topologie Schätzung und Begründung mit Standard-Definitionskarten 与标准定义地图一致的在线道路地形图估计和理由 2507.01397v1
  • 360 07-02 Surrogate Modeling via Factorization Machine and Ising Model with Enhanced Higher-Order Interaction Learning Surrogate Modellierung über Factorization Machine und Ising Model mit erweitertem höhere-Order-Interaktions-Lernen 通过强化高阶互动学习的保理机和Ising模型的代理模拟 2507.01389v1
  • 361 07-02 Momentum Does Not Reduce Stochastic Noise in Stochastic Gradient Descent Momentum reduziert nicht stochastische Geräusche bei stochastischem Gradientenabstieg 沙粒渐变后代中不减少蒸汽噪音的动力 2402.02325v5
  • 362 07-02 LUSD: Localized Update Score Distillation for Text-Guided Image Editing LUSD: Lokalisierte Update-Score-Destillation für textgeführte Bildbearbeitung LUSD: 用于文本引导图像编辑的地方化更新评分蒸馏 2503.11054v2
  • 363 07-02 Direct Quantized Training of Language Models with Stochastic Rounding Direkte Quantisierte Schulung von Sprachmodellen mit stochastischer Rundung 直接量化的语言模式直接量化培训,并进行盘点四舍四入 2412.04787v2
  • 364 07-02 Can Large Language Models Develop Strategic Reasoning? Post-training Insights from Learning Chess Können große Sprachmodelle eine strategische Begründung entwickeln? 大语言模式能否制定战略理由? 培训后从学习象棋中透视 2507.00726v2
  • 365 07-02 Feature Reweighting for EEG-based Motor Imagery Classification Feature-Umgewichtung für EEG-basierte Motor Imagery-Klassifikation 以EEG为基础的机动图象分类特点的重新加权 2308.02515v2
  • 366 07-02 Active Measurement: Efficient Estimation at Scale Aktive Messung: Effiziente Abschätzung im Maßstab 活动计量: 规模上的有效估算 2507.01372v1
  • 367 07-02 Activation Reward Models for Few-Shot Model Alignment Aktivierungs-Reward-Modelle für wenige scharfe Modellausrichtung 微小光模型对齐的激励奖得奖模型 2507.01368v1
  • 368 07-02 Continual Learning with Strategic Selection and Forgetting for Network Intrusion Detection Kontinuierliches Lernen mit strategischer Auswahl und Vergessen für Netzwerkintrusionserkennung 利用战略选择和遗忘的不断学习,以探测网络入侵 2412.16264v4
  • 369 07-02 Tightly-Coupled LiDAR-IMU-Leg Odometry with Online Learned Leg Kinematics Incorporating Foot Tactile Information Eng gekoppelte LiDAR-IMU-Leg-Odometrie mit Online-Lektionskinematik, die Fußtaktile Informationen enthält LIDAR-IMU-Leg Odo量度与包含脚触觉信息的在线学习腿脚力学 2506.09548v2
  • 370 07-02 Efficient Kilometer-Scale Precipitation Downscaling with Conditional Wavelet Diffusion Effiziente Kilometer-Scale Niederschlagsabsenkung mit bedingter Wavelet-Diffusion 与有条件的波子扩散相降幅 2507.01354v1
  • 371 07-02 Dataset Distillation via the Wasserstein Metric Datensatz Destillation über die Wasserstein Metric 通过瓦塞斯坦测量仪进行数据集蒸馏 2311.18531v3
  • 372 07-02 Aitomia: Your Intelligent Assistant for AI-Driven Atomistic and Quantum Chemical Simulations Aitomia: Ihr intelligenter Assistent für KI-getriebene Atomistische und Quantum Chemical Simulationen Aitomia:您对AI-Driven原子学和量子化学模拟的智能助理 2505.08195v2
  • 373 07-02 Pre-training Large Memory Language Models with Internal and External Knowledge Vorschulung großer Speicher Sprachmodelle mit internem und externem Wissen 具有内部和外部知识的大型记忆语言模型 2505.15962v2
  • 374 07-02 Backdooring Bias (B^2) into Stable Diffusion Models Hintertürige Bias (B^2) in stabile Diffusionsmodelle 后门Bisas (B2) 进入稳定扩散模型 2406.15213v3
  • 375 07-02 Drug Discovery SMILES-to-Pharmacokinetics Diffusion Models with Deep Molecular Understanding Drug Discovery SMILES-to-Pharmacokinetics Diffusion Modelle mit tiefem molekularen Verständnis 具有深分子理解的药物发现SMILES到药用动力学扩散模型 2408.07636v2
  • 376 07-02 Diffusion Policies for Risk-Averse Behavior Modeling in Offline Reinforcement Learning Diffusionsrichtlinien für risikoorientierte Verhaltensmodelle im Offline-Verstärkungslernen 关于在离线强化学习中建立风险与规避行为模拟模型的政策的传播 2403.17646v2
  • 377 07-02 Reasoner for Real-World Event Detection: Scaling Reinforcement Learning via Adaptive Perplexity-Aware Sampling Strategy Begründung für die Erkennung von Real-World-Events: Scaling-Verstärkung durch adaptive Perplexity-Aware Sampling-Strategie 现实世界事件发现的理由:通过适应性重复-软件抽样战略加强学习 2507.01327v1
  • 378 07-02 SWinMamba: Serpentine Window State Space Model for Vascular Segmentation SWinMamba: Serpentine Fenster Zustand Raummodell für Vaskuläre Segmentierung SWinMamba: 蛇角窗口状态血管分割空间模型 2507.01323v1
  • 379 07-02 ICLShield: Exploring and Mitigating In-Context Learning Backdoor Attacks ICLShield: In-Context-Lernen Hintertürangriffe erforschen und abmildern ILLShield:探索和减少内线学习后门攻击 2507.01321v1
  • 380 07-02 Fourier Series Guided Design of Quantum Convolutional Neural Networks for Enhanced Time Series Forecasting Fourier Series Guided Design Quantum Convolutional Neural Networks for Enhanced Time Series Forecasting 用于增强时间序列预测的量子进量神经网络的四ier系列制导设计 2404.15377v3
  • 381 07-02 Neural Hamiltonian Operator Neural Hamiltonian Betreiber 神经汉密尔顿操作员 2507.01313v1
  • 382 07-02 DiffusionLight-Turbo: Accelerated Light Probes for Free via Single-Pass Chrome Ball Inpainting DiffusionLight-Turbo: Beschleunigte Lichtsonden kostenlos über Single-Pass Chrome Ball Inpainting DifulLight-Turbo:通过单塑料铬球涂漆免费快速光探测 2507.01305v1
  • 383 07-02 Empirical Bayes Estimation for Lasso-Type Regularizers: Analysis of Automatic Relevance Determination Empirische Bayes-Schätzung für Lasso-Typ-Regularisatoren: Analyse der automatischen Relevanzbestimmung Lasso-Type正规化器的经验性贝量估计:自动相关性确定分析 2501.11280v4
  • 384 07-02 OralBBNet: Spatially Guided Dental Segmentation of Panoramic X-Rays with Bounding Box Priors OralBBNet: Spatially Guided Dental Segmentierung von Panorama X-Rays mit Bounding Box Priors 口服BBNet:全景X光空间引导牙科分离 2406.03747v3
  • 385 07-02 Far From Sight, Far From Mind: Inverse Distance Weighting for Graph Federated Recommendation Weit weg vom Sehen, weit weg vom Denken: Inverse Distanzgewichtung für Graph Federated Empfehlung 远离视觉,远离心智:对 “ 绿联建议 “ 的反距离加权 2507.01285v1
  • 386 07-02 VLAD: A VLM-Augmented Autonomous Driving Framework with Hierarchical Planning and Interpretable Decision Process VLAD: Ein VLM-Augmented Autonomous Driving Framework mit Hierarchical Planning und Interpretable Decision Process VLAD: 具有等级规划和解释性决定程序的VLM增强自主驱动框架 2507.01284v1
  • 387 07-02 A Framework for Mining Collectively-Behaving Bots in MMORPGs Ein Rahmen für den Abbau kollektiv verhaltender Bots in MMORPGs 《MMOMOPGs集体经营的矿床框架》 2501.10461v2
  • 388 07-02 BoltzNCE: Learning Likelihoods for Boltzmann Generation with Stochastic Interpolants and Noise Contrastive Estimation BoltzNCE: Likelihoods für die Boltzmann-Generation mit stochastischen Interpolanten und Geräuschkontrastabschätzungen BoltzNCE: 学习博尔茨曼一代人与斯托克内刑警的相似之处和噪音对抗估计 2507.00846v2
  • 389 07-02 PULSE: Practical Evaluation Scenarios for Large Multimodal Model Unlearning PULSE: Praktische Bewertungsszenarien für großes multimodales Modell-Unlearning PULSE:大型多模式模式不学习的实际评价设想 2507.01271v1
  • 390 07-02 $μ^2$Tokenizer: Differentiable Multi-Scale Multi-Modal Tokenizer for Radiology Report Generation $μ^2$Tokenizer: Differentiable Multi-Scale Multi-Modal Tokenizer für Radiologie Report Generation $2 $2 收缩器:用于产生放射学报告的可区别的多规模多式多式调控器 2507.00316v2
  • 391 07-02 Combating Confirmation Bias: A Unified Pseudo-Labeling Framework for Entity Alignment Bekämpfung von Konfirmations-Bias: Ein einheitliches Pseudo-Labeling-Rahmenwerk für die Ausrichtung von Unternehmen 打击确认的偏见:统一实体统一化框架 2307.02075v4
  • 392 07-02 Dynamic Matching with Post-allocation Service and its Application to Refugee Resettlement Dynamisches Matching mit dem Post-Allocation Service und seiner Anwendung auf die Neuansiedlung von Flüchtlingen 与员额分配服务及其适用于难民重新安置的动态匹配 2410.22992v2
  • 393 07-02 Automated Classification of Volcanic Earthquakes Using Transformer Encoders: Insights into Data Quality and Model Interpretability Automatisierte Klassifizierung von Vulkan-Erdbeben mit Transformer-Encodern: Einblicke in Datenqualität und Modellinterpretierbarkeit 利用变换器计算器对火山地震进行自动分类:对数据质量和模型解释的透视 2507.01260v1
  • 394 07-02 AIRES: Accelerating Out-of-Core GCNs via Algorithm-System Co-Design AIRES: Beschleunigung von Out-of-Core-GCNs über Algorithm-System Co-Design AIRES:通过Algorithm-System共同设计加速将全球氯化萘超出核心 2507.02006v1
  • 395 07-01 (2) Embedding-Space Diffusion for Zero-Shot Environmental Sound Classification Einbettungs-Space-Diffusion für Zero-Shot-Environmental Sound-Klassifikation 内嵌式空间扩散用于零热无害环境分类 2412.03771v2
  • 396 07-01 SFO: Piloting VLM Feedback for Offline RL SFO: Pilotierung von VLM Feedback für Offline RL SFO: 试验离线RL的VLM反馈 2503.01062v4
  • 397 07-01 Jump-Start Reinforcement Learning with Self-Evolving Priors for Extreme Monopedal Locomotion Jump-Start Verstärkungslernen mit sich selbst entwickelnden Prioren für extreme monopedale Locomotion 与自我演化前科一起进行跳跃启动强化学习,以进行极端单式移动 2507.01243v1
  • 398 07-01 Beyond First-Order: Training LLMs with Stochastic Conjugate Subgradients and AdamW Beyond First-Order: Training LLMs mit stochastischen Konjugat-Subgradienten und AdamW 第一班之后:与斯托卡共产子公司和AdamW公司合办的培训LLM培训 2507.01241v1
  • 399 07-01 GenBFA: An Evolutionary Optimization Approach to Bit-Flip Attacks on LLMs GenBFA: Ein evolutionärer Optimierungsansatz für Bit-Flip-Angriffe auf LLMs GenBFA:对LMLM的点翻攻击的演进优化办法 2411.13757v4
  • 400 07-01 Quantum Machine Learning in Transportation: A Case Study of Pedestrian Stress Modelling Quantum Machine Learning in Transportation: Eine Fallstudie zur fußgängerischen Stressmodellierung Qantum 运输机械学习:虫害应激模拟模型案例研究 2507.01235v1
  • 401 07-01 CAM-NET: An AI Model for Whole Atmosphere with Thermosphere and Ionosphere Extension CAM-NET: Ein KI-Modell für ganze Atmosphäre mit Thermosphäre und Ionosphärenerweiterung CAM-NET:具有热层和电离层扩展作用的AI全大气模型 2506.19340v3
  • 402 07-01 2HandedAfforder: Learning Precise Actionable Bimanual Affordances from Human Videos 2HandedAfforder: Präzises Lernen von lebensfähigen Bimanual Affordances aus menschlichen Videos 2HandAfforder: 学习人类视频中可操作的双人经济能力 2503.09320v3
  • 403 07-01 PAE MobiLLM: Privacy-Aware and Efficient LLM Fine-Tuning on the Mobile Device via Additive Side-Tuning PAE MobiLLM: Datenschutzerklärung und effizientes LLM Feintuning auf dem mobilen Gerät über Additive Side-Tuning PAE MobillLLM: 通过Additive 侧向托盘对移动设备进行隐私-软件和高效LLM精美调整 2507.01216v1
  • 404 07-01 Deep Learning-Based Intrusion Detection for Automotive Ethernet: Evaluating & Optimizing Fast Inference Techniques for Deployment on Low-Cost Platform Deep Learning-based Intrusion Detection für Automotive Ethernet: Bewertung und Optimierung schneller Inferenztechniken für den Einsatz auf Low-Cost-Plattform 用于汽车以太网的深入学习入侵探测:评估并优化快速推断技术,以便在低成本平台上部署 2507.01208v1
  • 405 07-01 DGenNO: A Novel Physics-aware Neural Operator for Solving Forward and Inverse PDE Problems based on Deep, Generative Probabilistic Modeling DGenNO: Ein neuartiger, physikbewusster Neuraloperator für die Lösung vorwärts- und inverser PDE-Probleme basierend auf tiefer, generativer probabilistischer Modellierung DGenNO:基于深生概率模型模型的解决前向和反向PDE问题新物理觉醒神经操作员 2502.06250v3
  • 406 07-01 Fair Algorithms with Probing for Multi-Agent Multi-Armed Bandits Faire Algorithmen mit Probing für Multi-Agent Multi-Armed Bandits 多代理多武装强盗验证法的公允算法 2506.14988v2
  • 407 07-01 Are Large Brainwave Foundation Models Capable Yet? Insights from Fine-tuning Sind große Brainwave-Stiftungsmodelle noch praktikabel? Erkenntnisse aus der Feinsteuerung 大型脑波基金会模型有能力吗? 2507.01196v1
  • 408 07-01 Distributional Information Embedding: A Framework for Multi-bit Watermarking Einbettung von Distributionsinformationen: Ein Rahmen für Multi-Bit-Wasserzeichen 分发信息嵌入式信息:多位标水的框架 2501.16558v2
  • 409 07-01 Diffusion Explorer: Interactive Exploration of Diffusion Models Diffusion Explorer: Interaktive Erforschung von Diffusionsmodellen 传播探索:传播模型的互动探索 2507.01178v1
  • 410 07-01 Rewind-to-Delete: Certified Machine Unlearning for Nonconvex Functions Rewind-to-Delete: Zertifiziertes Maschinen-Entlernen für nicht konvexe Funktionen 倒回到删除: 认证机器不学习非convex函数 2409.09778v5
  • 411 07-01 ConceptAttention: Diffusion Transformers Learn Highly Interpretable Features KonzeptAchtung: Diffusionstransformatoren lernen hochinterpretable Funktionen 概念意图:传播变异器学习高可解释特性 2502.04320v2
  • 412 07-01 Squat: Quant Small Language Models on the Edge Squat: Quant kleine Sprachmodelle am Rand Squt: 边边缘的量化小语言模型 2402.10787v2
  • 413 07-01 Self-reflective Uncertainties: Do LLMs Know Their Internal Answer Distribution? Selbstreflektierende Unsicherheiten: Kennen LLMs ihre interne Antwortverteilung? 自我反感的不确定性:LLMs知道他们的内部答案分布吗? 2505.20295v2
  • 414 07-01 Vehicle-group-based Crash Risk Prediction and Interpretation on Highways Risikovorhersage und -interpretation auf Autobahnen auf Fahrzeuggruppenbasis 公路撞车风险预测和解释 2402.12415v3
  • 415 07-01 FlashDP: Private Training Large Language Models with Efficient DP-SGD FlashDP: Privates Training von großen Sprachmodellen mit effizienter DP-SGD 私营部门培训大型语言模式,具有高效率的DP-SGD 2507.01154v1
  • 416 07-01 BioPars: A Pretrained Biomedical Large Language Model for Persian Biomedical Text Mining BioPars: Ein vorgebildetes biomedizinisches Großsprachmodell für persischen biomedizinischen Textbergbau BioPars:波斯生物医学材料开采的预先培训的生物医学大语言模型 2506.21567v2
  • 417 07-01 LZ Penalty: An information-theoretic repetition penalty for autoregressive language models LZ-Strafe: Eine informationstheoretische Wiederholungsstrafe für autoregressive Sprachmodelle LZ惩罚:对自动递减语言模式的 信息理论重复处罚 2504.20131v2
  • 418 07-01 A Review on Sound Source Localization in Robotics: Focusing on Deep Learning Methods Eine Übersicht über die Lokalisierung von Schallquellen in der Robotik: Fokussierung auf Deep-Learning-Methoden 关于 “ 机器人中无害源本地化:侧重于深层学习方法 “ 的审评 2507.01143v1
  • 419 07-01 Spectral Manifold Harmonization for Graph Imbalanced Regression Spektrale Manifold Harmonisierung für Graph Imbalanced Regression 图I平衡回归的光谱蒙面协调 2507.01132v1
  • 420 07-01 Tensor Decomposition Networks for Fast Machine Learning Interatomic Potential Computations Tensor-Dekompositionsnetzwerke für schnelles maschinelles Lernen Interatomare Potentialrechnungen 快速机器学习跨原子潜能值计算器的电锯分解网络 2507.01131v1
  • 421 07-01 On Design Principles for Private Adaptive Optimizers Über Design-Prinzipien für private Adaptive Optimierer 私人适应性优化剂设计原则 2507.01129v1
  • 422 07-01 Temporal Chain of Thought: Long-Video Understanding by Thinking in Frames Temporale Gedankenkette: Langes-Video-Verstehen durch das Denken in Rahmen 时时思维链:通过思考框架来长期理解 2507.02001v1
  • 423 07-01 Landslide Detection and Mapping Using Deep Learning Across Multi-Source Satellite Data and Geographic Regions Erdrutscherkennung und Kartierung mittels Deep Learning in Multi-Source-Satellitendaten und geographischen Regionen 跨多来源卫星数据和地理区域利用深度学习进行滑坡探测和绘图 2507.01123v1
  • 424 07-01 Why Neural Network Can Discover Symbolic Structures with Gradient-based Training: An Algebraic and Geometric Foundation for Neurosymbolic Reasoning Warum neurales Netzwerk symbolische Strukturen mit gradient-based Training entdecken kann: Eine algebraische und geometrische Grundlage für neurosymbolische Reasoning 为何神经网络网络能够通过渐进式培训发现象征性结构:一个用于神经元理性的代数和几何基金会 2506.21797v2
  • 425 07-01 A Neural Operator based on Dynamic Mode Decomposition Ein neuraler Operator auf Basis von Dynamic Mode Decomposition 基于动态模式分解的神经操作器 2507.01117v1
  • 426 07-01 A LoD of Gaussians: Unified Training and Rendering for Ultra-Large Scale Reconstruction with External Memory A LoD of Gaussians: Unified Training und Rendering für Ultra-Large Scale Rekonstruktion mit externem Speicher Gaussian AD: 利用外部记忆进行超大型重建的统一培训和投标 2507.01110v1
  • 427 07-01 Geometry-aware 4D Video Generation for Robot Manipulation Geometry-aware 4D-Video-Generation für Roboter-Manipulation 用于机器人操纵的视频生成 2507.01099v1
  • 428 07-01 Proof of a perfect platonic representation hypothesis Nachweis einer perfekten platonischen Darstellungshypothese 证明完美拉子代表假设的证据 2507.01098v1
  • 429 07-01 STONet: A neural operator for modeling solute transport in micro-cracked reservoirs STONet: Ein neuronaler Operator zur Modellierung des Löttransports in mikrogekrackten Reservoirs Stonet:在微裂开的储油层中模拟溶液运输的神经操作员 2412.05576v2
  • 430 07-01 Description of the Training Process of Neural Networks via Ergodic Theorem : Ghost nodes Beschreibung des Ausbildungsprozesses von Neuronalen Netzwerken über Ergodic Theorem : Geisterknoten 描述Ergodic定理神经网络培训过程:幽灵节点 2507.01003v1
  • 431 07-01 SPGD: Steepest Perturbed Gradient Descent Optimization SPGD: Steepest Perturbed Gradient Descent Optimization SPGD: 最深的受扰动的渐变人后裔优化 2411.04946v2
  • 432 07-01 Diffuse-CLoC: Guided Diffusion for Physics-based Character Look-ahead Control Diffuse-CLoC: Geführte Diffusion für physikbasierte Charakter-Blick-Ahead-Steuerung Diffuse-CLOC:向导扩散,用于基于物理的字符外观控制 2503.11801v2
  • 433 07-01 Box Pose and Shape Estimation and Domain Adaptation for Large-Scale Warehouse Automation Box Pose und Formschätzung und Domain-Anpassung für großformatige Lagerautomatisierung 用于大型仓库自动化的框框和形状估计及域域适应 2507.00984v1
  • 434 07-01 Enhancing LLM Agent Safety via Causal Influence Prompting Verbesserung der Sicherheit von LLM-Agenten durch ursächlichen Einfluss 通过原因影响促进增强LLM代理安全 2507.00979v1
  • 435 07-01 Meta-Posterior Consistency for the Bayesian Inference of Metastable System Meta-Posterior-Konsistenz für die Bayesische Schlussfolgerung des Metastable-Systems 贝耶斯元元元系统推论的元相近性 2408.01868v2
  • 436 07-01 Reasoning as an Adaptive Defense for Safety Vernunft als adaptive Verteidigung für die Sicherheit 以 “ 安全适应防卫 “ 为由 2507.00971v1
  • 437 07-01 Scalable Feature Learning on Huge Knowledge Graphs for Downstream Machine Learning Skalierbares Feature-Lernen auf riesigen Wissensgraphen für Downstream Machine Learning 下流机器学习大型知识图的可缩放功能学习 2507.00965v1
  • 438 07-01 Benchmarking the Discovery Engine Benchmarking der Discovery Engine 设定发现引擎基准 2507.00964v1
  • 439 07-01 Not All Water Consumption Is Equal: A Water Stress Weighted Metric for Sustainable Computing Nicht jeder Wasserverbrauch ist gleich: Ein Wasserdruck-gewichtetes Metric für nachhaltiges Rechnen 并非所有水消耗量都相等:可持续计算中的水应激反应加权计量 2506.22773v2
  • 440 07-01 Large Language Model Confidence Estimation via Black-Box Access Große Sprachmodell-Konfidenzschätzung über Black-Box-Zugriff 通过黑箱访问大语言模型信任度估计 2406.04370v4
  • 441 07-01 Uncertainty Quantification of Wind Gust Predictions in the Northeast United States: An Evidential Neural Network and Explainable Artificial Intelligence Approach Ungewissheit Quantifizierung von Windgust-Vorhersagen im Nordosten der Vereinigten Staaten: Ein evidentielles neurales Netzwerk und erklärbarer Ansatz für künstliche Intelligenz 美国东北部风速预测的不确定性量化:证据神经网络和可解释的人工情报方法 2502.00300v2
  • 442 07-01 Atmospheric model-trained machine learning selection and classification of ultracool TY dwarfs Atmosphärische modellgestützte maschinelle Lernauswahl und Klassifizierung von ultracoolen TY-Zwergen 超酷醇TTY矮星的大气示范培训机学习选择和分类 2507.00957v1
  • 443 07-01 MLR-Bench: Evaluating AI Agents on Open-Ended Machine Learning Research MLR-Bench: Bewertung von KI-Agenten auf Open-Ended Machine Learning Research MLR-Bench:评估AI公司在开放式机械学习研究方面的代理机构 2505.19955v2
  • 444 07-01 MVP: Winning Solution to SMP Challenge 2025 Video Track MVP: Gewinnlösung für SMP Challenge 2025 Video Track MVP: 赢得SMP挑战2025视频音轨的解决方案 2507.00950v1
  • 445 07-01 Time Series Foundation Models are Flow Predictors Time Series Foundation Modelle sind Flow Predictors 时间系列基础模型是流动预测模型 2507.00945v1
  • 446 07-01 RaGNNarok: A Light-Weight Graph Neural Network for Enhancing Radar Point Clouds on Unmanned Ground Vehicles RaGNNarok: Ein Licht-Gewichts-Graph-Neural-Netzwerk zur Verbesserung von Radarpunktwolken auf unbemannten Bodenfahrzeugen RaGNNNARok:增强无人驾驶地面飞行器雷达点云的光重图形神经网络 2507.00937v1
  • 447 07-01 Development and Comparative Evaluation of Three Artificial Intelligence Models (NLP, LLM, JEPA) for Predicting Triage in Emergency Departments: A 7-Month Retrospective Proof-of-Concept Entwicklung und vergleichende Auswertung von drei Modellen der Künstlichen Intelligenz (NLP, LLM, JEPA) zur Vorhersage von Triage in Notfallabteilungen: Eine 7-Monats-Retrospektive Proof-of-Concept 应急部门预测外逃的三个人工情报模型(NLP、LLM、JEPA)的开发和比较评价:7月7日回溯性概念校对 2507.01080v1
  • 448 07-01 Conformal Inference under High-Dimensional Covariate Shifts via Likelihood-Ratio Regularization Konforme Schlussfolgerung unter hochdimensionalen Kovariate Verschiebungen über Likelihood-Ratio Regularisierung 通过传统-拉蒂奥正规化,在高多样性可变性转变下发生非正式推论 2502.13030v5
  • 449 07-01 Understanding Generalization in Node and Link Prediction Verallgemeinerung in Knoten und Link-Vorhersage verstehen 理解节点和链接中的通用化 2507.00927v1
  • 450 07-01 HyperFusion: Hierarchical Multimodal Ensemble Learning for Social Media Popularity Prediction HyperFusion: Hierarchisches multimodales Ensemble Lernen für Social Media Popularity Prediction HyperFusion:社会媒体大众化预测的等级式多模式学习组合组合 2507.00926v1
  • 451 07-01 Bridging SFT and DPO for Diffusion Model Alignment with Self-Sampling Preference Optimization Überbrückung von SFT und DPO für Diffusionsmodell-Ausrichtung mit selbstprobender Preference-Optimierung 连接SFT和DPPO的传播模型与自样最佳优化 2410.05255v2
  • 452 07-01 Persistence Paradox in Dynamic Science Persistenzparadox in der dynamischen Wissenschaft 动态科学中的持久性悖论 2506.22729v2
  • 453 07-01 The Hidden Life of Tokens: Reducing Hallucination of Large Vision-Language Models via Visual Information Steering Das verborgene Leben von Tokens: Verringerung der Halluzination großer Vision-Sprachenmodelle durch visuelle Informationslenkung 《神的隐秘生命:通过视觉信息指导减少大型视觉语言模型的幻觉》 2502.03628v2
  • 454 07-01 TABASCO: A Fast, Simplified Model for Molecular Generation with Improved Physical Quality TABASCO: Ein schnelles, vereinfachtes Modell für die molekulare Erzeugung mit verbesserter physikalischer Qualität TABASCO:一个快速简化的提高物理质量的分子生成模型 2507.00899v1
  • 455 07-01 An in depth look at the Procrustes-Wasserstein distance: properties and barycenters Ein tiefer Blick auf die Entfernung von Procrustes-Wasserstein: Eigenschaften und Barycenter 深度查看Procrustes-Wasserstein 距离: 属性和采油器 2507.00894v1
  • 456 07-01 Lion Secretly Solves Constrained Optimization: As Lyapunov Predicts Lion Secretly Solves Constrained Optimization: Wie Lyapunov voraussagt 限制优化:如Lyapunov预测 2310.05898v7
  • 457 07-01 Scaling Laws Are Unreliable for Downstream Tasks: A Reality Check Skalierungsgesetze sind für Downstream-Aufgaben unzuverlässig: Ein Realitätscheck 增强法律对下流任务不可靠:一个现实检查 2507.00885v1
  • 458 07-01 A Scalable and Quantum-Accurate Foundation Model for Biomolecular Force Field via Linearly Tensorized Quadrangle Attention Ein skalierbares und Quantum-akkurates Fundamentalmodell für das biomolekulare Kraftfeld durch lineares Viereck Achtung 通过线性天分矩形注意进行生物分子力场的可缩缩和量子精确基础模型 2507.00884v1
  • 459 07-01 NN-Former: Rethinking Graph Structure in Neural Architecture Representation NN-Former: Graphenstruktur in der Darstellung neuraler Architektur neu denken NN-前:神经结构代表结构的重新思考图结构 2507.00880v1
  • 460 07-01 Unsupervised Attributed Dynamic Network Embedding with Stability Guarantees Unüberwachtes zugeschriebenes dynamisches Netzwerk mit Stabilitätsgarantien einbinden 带有稳定保障的未受监督的属性动态网络 2503.02859v2
  • 461 07-01 Reasoning by Superposition: A Theoretical Perspective on Chain of Continuous Thought Vernunft durch Überlagerung: Eine theoretische Perspektive auf Kette des kontinuierlichen Denkens 由超位推理推理:对持续思考链的理论观点 2505.12514v2
  • 462 07-01 Realism in Action: Anomaly-Aware Diagnosis of Brain Tumors from Medical Images Using YOLOv8 and DeiT Realismus in Aktion: Anomalie-Aware Diagnose von Hirntumoren aus medizinischen Bildern mit YOLOv8 und DeiT 现实在行动中:利用YOLOv8和DeiT对医疗图象中的脑肿瘤进行反常-警告诊断 2401.03302v4
  • 463 07-01 Template-Fitting Meets Deep Learning: Redshift Estimation Using Physics-Guided Neural Networks Template-Fitting trifft auf Deep Learning: Redshift-Schätzung mittels physikgeführter neuraler Netzwerke 使用物理辅助神经网络进行重新定时估计 2507.00866v1
  • 464 07-01 Machine Learning-based Early Detection of Potato Sprouting Using Electrophysiological Signals Machine Learning-based Early Detection of Potato Sprouting mit elektrophysiologischen Signalen 利用电生理信号早期探测马铃薯流 2507.00862v1
  • 465 07-01 Aligning Learning and Endogenous Decision-Making Ausrichten des Lernens und endogene Entscheidungsfindung 协调学习和内部决策 2507.00851v1
  • 466 07-01 Quantum Approximate Optimization Algorithm for Spatiotemporal Forecasting of HIV Clusters Quanten Ungefähre Optimierungsalgorithmus für die räumliche zeitliche Vorhersage von HIV-Clustern 艾滋病毒群集分布时性预测的 量 量 近 优化优化值 2507.00848v1
  • 467 07-01 Do Echo Top Heights Improve Deep Learning Nowcasts? Verbessern Echo Top Heights Deep Learning Nowcasts? Echo高层高地能改善深造即时播音吗? 2507.00845v1
  • 468 07-01 Fully Differentiable Lagrangian Convolutional Neural Network for Physics-Informed Precipitation Nowcasting Ganz unterschiedlich lagrangianisches Konvolutions-Neural-Netzwerk für physikinformierte Ausscheidungs-Nowcasting 完全区别的拉格朗日拉格朗日 物理化降水革命神经网络 2402.10747v2
  • 469 07-01 Discrete Diffusion in Large Language and Multimodal Models: A Survey Diskrete Diffusion in großen Sprach- und Multimodalmodellen: Eine Umfrage 大语言和多模式模型的分解传播:调查 2506.13759v2
  • 470 07-01 Stylometry recognizes human and LLM-generated texts in short samples Stylometrie erkennt menschliche und LLM-generierte Texte in kurzen Proben tytylometerm在短样本中确认人类和LLM产生的文本 2507.00838v1
  • 471 07-01 Studying and Improving Graph Neural Network-based Motif Estimation Untersuchung und Verbesserung der graphischen Neuralnetz-basierten Motivationsschätzung 研究和改善图形神经网络基于Motif 估计 2506.15709v2
  • 472 07-01 yProv4ML: Effortless Provenance Tracking for Machine Learning Systems yProv4ML: Müheloses Provenienz-Tracking für maschinelle Lernsysteme yProv4ML: 机器学习系统无穷无尽的证明跟踪 2507.01078v1
  • 473 07-01 Good Enough to Learn: LLM-based Anomaly Detection in ECU Logs without Reliable Labels Gut genug zu lernen: LLM-basierte Anomalieerkennung in ECU-Logs ohne zuverlässige Etiketten 足以学习:无可靠标签的ECU日志中基于LLM的反常检测 2507.01077v1
  • 474 07-01 LangTime: A Language-Guided Unified Model for Time Series Forecasting with Proximal Policy Optimization LangTime: Ein sprachgeführtes einheitliches Modell für die Vorhersage von Zeitreihen mit proximaler Politikoptimierung LangTime:一个语言指导统一模型,用于以最优政策优化方式预测时间序列 2503.08271v2
  • 475 07-01 On best approximation by multivariate ridge functions with applications to generalized translation networks Bei der besten Annäherung durch multivariate Ridge-Funktionen mit Anwendungen zu generalisierten Übersetzungsnetzwerken 多变量山脊功能与通用翻译网络应用的最佳近似率 2412.08453v3
  • 476 07-01 EvoPress: Accurate Dynamic Model Compression via Evolutionary Search EvoPress: Präzise dynamische Modellkompression durch evolutionäre Suche Evopress: 通过进化搜索准确的动态模型压缩 2410.14649v2
  • 477 07-01 Provenance Tracking in Large-Scale Machine Learning Systems Provenienzverfolgung in großformatigen Maschinen-Lernsystemen 大型机器学习系统中的证书追踪系统 2507.01075v1
  • 478 07-01 A Probabilistic Approach to Wildfire Spread Prediction Using a Denoising Diffusion Surrogate Model Ein probabilistischer Ansatz zur Wildfire-Spread-Vorhersage mit einem Denoising Diffusion Surrogate-Modell 利用低度扩散扩散代用品模型对野火扩散预测采取概率办法 2507.00761v1
  • 479 07-01 Chameleon: A MatMul-Free Temporal Convolutional Network Accelerator for End-to-End Few-Shot and Continual Learning from Sequential Data Chameleon: Ein matmul-freies Temporal Convolutional Network Accelerator für Ende-zu-Ende wenig scharfes und kontinuierliches Lernen aus sequentiellen Daten 变色素:从序列数据中连续到最后小热和连续学习的 Mat-Mul-无时空演变网络加速器 2505.24852v2
  • 480 07-01 SINDy on slow manifolds SINDy auf langsamen Mannigfaltigkeiten 慢速方块的辛迪尼 2507.00747v1
  • 481 07-01 Evaluating LLMs and Prompting Strategies for Automated Hardware Diagnosis from Textual User-Reports Bewertung von LLMs und Prompting-Strategien für automatisierte Hardware-Diagnose aus Text-User-Reports 从文字用户报告中评价LLMs和自动硬件诊断快速战略 2507.00742v1
  • 482 07-01 Conditional Local Independence Testing for Dynamic Causal Discovery Conditional Local Independence Testing für dynamische Kausal Entdeckung 动态原因发现有条件的局部独立测试 2506.07844v2
  • 483 07-01 Machine Learning-Based Analysis of ECG and PCG Signals for Rheumatic Heart Disease Detection: A Scoping Review (2015-2025) Machine Learning-based Analysis of EKG and PCG Signals for Rheumatic Heart Disease Detection: A Scoping Review (2015-2025) 以机械学习为基础,对用于风湿心脏病检测的ECG和PCG信号进行机学分析:范围审查(2015-2025年) 2505.18182v2
  • 484 07-01 Gradient Descent Algorithm in Hilbert Spaces under Stationary Markov Chains with $φ$- and $β$-Mixing Gradient Descent Algorithm in Hilbert Spaces unter Stationären Markov Ketten mit $φ$- und $β$-Mixing 希尔伯特空域中固定的马克夫链下的梯度底源对数值,以美元和 美元混合法混合法计算 2502.03551v2
  • 485 07-01 Towards Efficient Parametric State Estimation in Circulating Fuel Reactors with Shallow Recurrent Decoder Networks Auf dem Weg zu einer effizienten parametrischen staatlichen Schätzung bei Brennstoffreaktoren mit shallow recurrent Decoder-Netzwerken 利用浅层经常拆解网络对循环燃料反应堆进行高效参数国家估算 2503.08904v2
  • 486 07-01 Aleatoric and Epistemic Uncertainty Measures for Ordinal Classification through Binary Reduction Aleatorische und epistemische Unsicherheitsmaßnahmen für die Ordnungsklassifizierung durch Binärreduktion 通过减少二进制分级对奥氏分类采取疏离和分导不确定性措施 2507.00733v1
  • 487 07-01 Guided Unconditional and Conditional Generative Models for Super-Resolution and Inference of Quasi-Geostrophic Turbulence Geführte und bedingte Generative Modelle für Super-Resolution und Schlussfolgerung der Quasi-Geostrophen Turbulenz 指导关于超分辨率和推论的无附加条件和有条件生成模型, 2507.00719v1
  • 488 07-01 Leveraging Nested MLMC for Sequential Neural Posterior Estimation with Intractable Likelihoods Leveraging Nested MLMC für sequentielle neurale hintere Abschätzung mit intractable Likelihoods 利用内嵌的 MLMC 进行序列神经波的序列神经波、与可吸引的隐隐性进行的其他估计 2401.16776v2
  • 489 07-01 Large Reasoning Models are not thinking straight: on the unreliability of thinking trajectories Große Vernunftmodelle denken nicht gerade: über die Unzuverlässigkeit von Denkbahnen 大型理性模型思维不直:思考轨迹不可靠 2507.00711v1
  • 490 07-01 Sliding Puzzles Gym: A Scalable Benchmark for State Representation in Visual Reinforcement Learning Sliding Puzzles Gym: Ein skalierbarer Benchmark für die staatliche Repräsentation im visuellen Stärkungslernen Sliding 谜题 Gym: 国家参与视觉强化学习的可缩放基准 2410.14038v4
  • 491 07-01 SCAWaveNet: A Spatial-Channel Attention-based Network for Global Significant Wave Height Retrieval SCAWaveNet: Ein räumlich-channelbasiertes Netzwerk für die weltweite Erfassung signifikanter Wellenhöhen SCAWaveNet:全球重大波高高度回收空间通道关注网 2507.00701v1
  • 492 07-01 T2I-R1: Reinforcing Image Generation with Collaborative Semantic-level and Token-level CoT T2I-R1: Verstärkung der Bildgenerierung mit kollaborativem Semantik- und Token-Level CoT T2I-R1:与合作语义级和Token 级COT加强图像生成 2505.00703v2
  • 493 07-01 A Test-Function Approach to Incremental Stability Ein Test-Funktions-Ansatz zur Inkrementellen Stabilität 递增稳定检验功能办法 2507.00695v1
  • 494 07-01 Iterative Resolution of Prompt Ambiguities Using a Progressive Cutting-Search Approach Iterative Auflösung von Prompt-Ambiguitäten mittels eines progressiven Cutting-Search-Ansatzes 采用逐步切割和搜寻办法迅速解决问题 2505.02952v2
  • 495 07-01 Diffusion Classifier Guidance for Non-robust Classifiers Diffusions-Klassifikator-Leitfaden für nichtrobuste Klassifikatoren 非紫外分类的传播分类指南 2507.00687v1
  • 496 07-01 Testing the spin-bath view of self-attention: A Hamiltonian analysis of GPT-2 Transformer Testen der Spin-Bad-Ansicht der Selbstachtung: Eine Hamiltonian Analyse von GPT-2 Transformer 测试自觉自觉的自吹泡泡视图:汉密尔顿对GPT-2变形器的分析 2507.00683v1
  • 497 07-01 Learning dynamical systems from data: Gradient-based dictionary optimization Dynamische Systeme aus Daten lernen: Gradient-basierte Wörterbuchoptimierung 从数据中学习动态系统:基于梯度的字典优化 2411.04775v2
  • 498 07-01 Harnessing the Power of Reinforcement Learning for Adaptive MCMC Die Kraft des Stärkungslernens für adaptive MCMC nutzen 利用加强学习能力促进适应性MCMC 2507.00671v1
  • 499 07-01 The Number of Trials Matters in Infinite-Horizon General-Utility Markov Decision Processes Die Zahl der Prüfungen in Unendlichen-Horizon allgemeinen-Utility Markov Entscheidungsprozesse 无限期 – – 霍里松通用通用马尔科夫决定程序中的审理事项数目 2409.15128v2
  • 500 07-01 Continuous Wavelet Transform and Siamese Network-Based Anomaly Detection in Multi-variate Semiconductor Process Time Series Kontinuierliche Wavelet-Transformation und siamesische Netzwerk-basierte Anomalie-Erkennung in Multivariate Semiconductor Prozesszeitreihe 多变半导体工艺时间序列中的连续波盘变换和暹粒网络反异探测 2507.01999v1
  • 501 07-01 Audio-3DVG: Unified Audio - Point Cloud Fusion for 3D Visual Grounding Audio-3DVG: Unified Audio - Point Cloud Fusion für 3D Visual Grounding 音频-3DVG:3D视觉地面定位的统一音频 - 点云融合 2507.00669v1
  • 502 07-01 Identity Preserving 3D Head Stylization with Multiview Score Distillation Identität 3D-Head-Stylisierung mit Multiview-Score-Destillation erhalten 身份 保留 3D 带有多视图分数蒸馏的顶板化 2411.13536v2
  • 503 07-01 Neural Augmented Kalman Filters for Road Network assisted GNSS positioning Neural Augmented Kalman Filter für Straßennetz unterstützt GNSS-Positionierung 协助导航卫星系统定位的公路网络神经增强卡尔曼过滤器 2507.00654v1
  • 504 07-01 Cognitive Load-Aware Inference: A Neuro-Symbolic Framework for Optimizing the Token Economy of Large Language Models Kognitive Load-Aware-Schlussfolgerung: Ein neuro-symbolisches Framework zur Optimierung der Token-Ökonomie großer Sprachmodelle 认知式负载- 软件提醒推断:一个优化大语言模型调制经济的神经- 交元框架 2507.00653v1
  • 505 07-01 GANs Secretly Perform Approximate Bayesian Model Selection GANs führen heimlich ungefähre bayesische Modellauswahl durch GANs GANs 秘密进行近贝耶西亚模式选择 2507.00651v1
  • 506 07-01 Cooperative Sheaf Neural Networks Kooperative Schaf-Neural-Netzwerke 合作社牛羊神经网络 2507.00647v1
  • 507 07-01 Hebbian Physics Networks: A Self-Organizing Computational Architecture Based on Local Physical Laws Hebbian Physics Networks: Eine selbstorganisierende Computational Architecture basierend auf lokalen physikalischen Gesetzen Hebbian物理网络:基于当地物理法的自组织计算架构 2507.00641v1
  • 508 07-01 Mirror Online Conformal Prediction with Intermittent Feedback Spiegel Online Konforme Vorhersage mit intermittierendem Feedback 附带间断反馈的 “ 镜 “ 在线非正式预测 2503.10345v4
  • 509 07-01 Forward Reverse Kernel Regression for the Schrödinger bridge problem Rückwärts-Kernel-Regression für das Schrödinger-Brückenproblem Schrödinger桥桥问题前向内核倒退 2507.00640v1
  • 510 07-01 Training-Conditional Coverage Bounds under Covariate Shift Training-Conditional Coverage Bounds unter Kovariate Shift 共同变换下的培训 – – 有条件的覆盖范围 2405.16594v2
  • 511 07-01 A novel Trunk Branch-net PINN for flow and heat transfer prediction in porous medium Ein neuartiges Trunk Branch-Net PINN für die Vorhersage von Durchfluss und Wärmeübertragung im porösen Medium 新的Trunk分支网PINN,用于对多孔介质的流量和热传输预测 2501.16362v2
  • 512 07-01 Towards the Training of Deeper Predictive Coding Neural Networks Auf dem Weg zur Ausbildung tiefer vorausschauender Coding-Neural-Netzwerke 走向深更可预测编码神经网络培训 2506.23800v2
  • 513 07-01 Positive region preserved random sampling: an efficient feature selection method for massive data Positive Region erhaltene Zufallsstichprobe: eine effiziente Feature-Auswahlmethode für massive Daten 积极区域保留随机随机抽样:大规模数据的有效特征选择方法 2507.01998v1
  • 514 07-01 Geometric Gaussian Approximations of Probability Distributions Geometrische Gauß-Annäherungen der Wahrscheinlichkeitsverteilungen 概率分布分布的近似几何几何 2507.00616v1
  • 515 07-01 Residual Reward Models for Preference-based Reinforcement Learning Residual Reward-Modelle für das Preference-basierte Verstärkungslernen 以优惠为基础的强化学习的剩余奖励模式 2507.00611v1
  • 516 07-01 Integrating Expert Labels into LLM-based Emission Goal Detection: Example Selection vs Automatic Prompt Design Integration von Experten-Etiketten in LLM-basierte Emissionszielerkennung: Beispielauswahl vs Automatisches Prompt-Design 将专家标签纳入基于LLM的LLM排放目标探测:选择实例与自动即时设计 2412.06432v2
  • 517 07-01 A Practical Guide to Interpretable Role-Based Clustering in Multi-Layer Financial Networks Ein praktischer Leitfaden für ein rollenbasiertes Clustering in Multi-Layer-Finanznetzwerken 多行业金融网络基于可解释作用的集群化实用指南 2507.00600v1
  • 518 07-01 Hierarchical Decentralized Stochastic Control for Cyber-Physical Systems Hierarchische dezentrale stochastische Steuerung für Cyber-Physische Systeme 网络物理系统等级分层存储控制 2506.22971v2
  • 519 07-01 Multiresolution Analysis and Statistical Thresholding on Dynamic Networks Multiauflösungsanalyse und statistische Thresholding auf dynamischen Netzwerken 多分辨率分析和统计动态网络 2506.01208v2
  • 520 07-01 Downscaling Neural Network for Coastal Simulations Downscaling Neural Network für Küstensimulationen 沿海模拟的降尺度神经网络 2408.16553v2
  • 521 07-01 Binned semiparametric Bayesian networks Verbundene semiparametrische Bayesische Netze Binned半参数网络 2506.21997v2
  • 522 07-01 Quantum Circuit Structure Optimization for Quantum Reinforcement Learning Quantum Circuit Structure Optimization for Quantum Reinforcement Learning 量子强化学习的量子电路结构优化 2507.00589v1
  • 523 07-01 AI-Generated Video Detection via Perceptual Straightening KI-generierte Videoerkennung durch Wahrnehmungsjustierung AI - 通过感知直线探测视频 2507.00583v1
  • 524 07-01 Rotational Sampling: A Plug-and-Play Encoder for Rotation-Invariant 3D Molecular GNNs Rotationsproben: Ein Plug-and-Play-Encoder für Rotations-Invariante 3D-Molekular-GNNs 旋转取样:旋转-变量3D分子GNNs的插管和布边编码器 2507.01073v1
  • 525 07-01 Integrating Dual Prototypes for Task-Wise Adaption in Pre-Trained Model-Based Class-Incremental Learning Integration von Dual Prototypen für die Task-Wise-Anpassung in vorgefertigtes modellbasiertes Klassen-Inkrementelles Lernen 将双重原型结合,用于在培训前的基于模型的高级课堂强化学习中进行任务-任务-任务-调整适应 2411.17766v3
  • 526 07-01 Foundation Models for Clinical Records at Health System Scale Gründungsmodelle für klinische Aufzeichnungen auf der Health System Scale 保健系统规模的临床记录模型基础 2507.00574v1
  • 527 07-01 The Curse of Depth in Large Language Models Der Fluch der Tiefe in großen Sprachmodellen 大语言模型深度诅咒 2502.05795v2
  • 528 07-01 Inverse Design in Nanophotonics via Representation Learning Inverses Design in der Nanophotonik über Representation Learning 通过代表制学习对纳米体进行反向设计 2507.00546v1
  • 529 07-01 Not All Attention Heads Are What You Need: Refining CLIP’s Image Representation with Attention Ablation Nicht alle Aufmerksamkeit Köpfe sind, was Sie brauchen: Verfeinerung CLIPs Bilddarstellung mit Aufmerksamkeit Ablation 并非所有的焦点都指你需要什么:完善CLIP的“关注推介”图像代表 2507.00537v1
  • 530 07-01 Exploring Large Action Sets with Hyperspherical Embeddings using von Mises-Fisher Sampling Große Action-Sets mit Hypersphärischen Einbettungen mit von Mises-Fisher Sampling erkunden 利用 von Misses-Fisher 取样法探索具有超球嵌入的大型动作集 2507.00518v1
  • 531 07-01 Simulation-Efficient Cosmological Inference with Multi-Fidelity SBI Simulationseffiziente kosmologische Schlussfolgerung mit Multi-Fidelity SBI 多功能性模拟有效宇宙学推断 2507.00514v1
  • 532 07-01 Medical Image Segmentation Using Advanced Unet: VMSE-Unet and VM-Unet CBAM+ Medizinische Bildsegmentierung mit Advanced Unet: VMSE-Unet und VM-Unet CBAM+ 利用高级Unet:VMSE-Unet和VM-Unet CBAM+ 2507.00511v1
  • 533 07-01 Plastic tensor networks for interpretable generative modeling Kunststoff-Tensor-Netzwerke für interpretierbare generative Modellierung 用于可解释的基因建模的可塑多压网络 2504.06722v2
  • 534 07-01 MuteSwap: Silent Face-based Voice Conversion MuteSwap: Silent Face-based Voice Conversion MuteSwap: 沉默的面部语音转换 2507.00498v1
  • 535 07-01 PNAct: Crafting Backdoor Attacks in Safe Reinforcement Learning PNAct: Backdoor-Angriffe im sicheren Ausbau lernen PN法案:在安全加强学习中进行后门攻击 2507.00485v1
  • 536 07-01 UFGraphFR: Graph Federation Recommendation System based on User Text description features UFGraphFR: Graph Federation Recommendation System basierend auf Funktionen der Benutzertextbeschreibung UFGgraphFR:基于用户文本描述特征的图表联合会建议系统 2501.08044v3
  • 537 07-01 Physics-Aware Style Transfer for Adaptive Holographic Reconstruction Physics-Aware Style Transfer für adaptive holographische Rekonstruktion 用于适应性全球重建的物理软件样式转让 2507.00482v1
  • 538 07-01 Posterior Inference in Latent Space for Scalable Constrained Black-box Optimization Hintere Inferenz im Latent Space für skalierbare, eingeschränkte Black-Box-Optimierung 缓空空间中可缩缩的黑盒优化 2507.00480v1
  • 539 07-01 Bisecle: Binding and Separation in Continual Learning for Video Language Understanding Bisecle: Bindung und Trennung im kontinuierlichen Lernen für das Verständnis von Videosprachen Bisecle:为了解视频语言而持续学习中的约束和分离 2507.00469v1
  • 540 07-01 Diversity Conscious Refined Random Forest Diversity Bewusst Verfeinerter Zufallswald 改良的随机森林 2507.00467v1
  • 541 07-01 Recurrent Memory-Augmented Transformers with Chunked Attention for Long-Context Language Modeling Recurrent Memory-Augmented Transformers mit Chunked Aufmerksamkeit für Lang-Kontext-Sprachenmodellierung 经常用内存增强型变换器,吸引人们注意长文语言建模 2507.00453v1
  • 542 07-01 Best Agent Identification for General Game Playing Best Agent Identification für das allgemeine Spiel 普通游戏游戏最佳代理标识 2507.00451v1
  • 543 07-01 Overcoming Long-Context Limitations of State-Space Models via Context-Dependent Sparse Attention Überwinden von Langkontext-Grenzen von State-Space-Modellen über Kontext-Abhängige Sparse-Achtung 克服国家空间模型通过环境依赖性分散关注而克服国家空间模型的长文限制 2507.00449v1
  • 544 07-01 Iterative Distillation for Reward-Guided Fine-Tuning of Diffusion Models in Biomolecular Design iterative Destillation für reward-guided Feintuning von Diffusionsmodellen im biomolekularen Design 生物分子设计中扩散模型的奖励-引导精细推荐热蒸馏 2507.00445v1
  • 545 07-01 A Minimalist Method for Fine-tuning Text-to-Image Diffusion Models Minimalistische Methode zur Feinabstimmung von Text-zu-Bild-Diffusions-Modellen 微微调文本到图像传播模型的微量微调方法 2506.12036v3
  • 546 07-01 A Recipe for Causal Graph Regression: Confounding Effects Revisited Ein Rezept für kausale Graphenregression: Konfounding Effekte wiederaufgenommen Causal 图表回归的食谱: 重审的困扰效应 2507.00440v1
  • 547 07-01 Autonomy by Design: Preserving Human Autonomy in AI Decision-Support Autonomie nach Design: Erhaltung der menschlichen Autonomie in der KI-Entscheidungsunterstützung 设计自主:在AI决定支持中维护人类自主 2506.23952v2
  • 548 07-01 Both Asymptotic and Non-Asymptotic Convergence of Quasi-Hyperbolic Momentum using Increasing Batch Size Sowohl asymptotische als auch nicht-asymptotische Konvergenz von Quasi-Hyperbolic Momentum mit zunehmender Batchgröße 使用递增批量大小的 准-双曲调运动的无症状和非症状融合 2506.23544v2
  • 549 07-01 RadZero: Similarity-Based Cross-Attention for Explainable Vision-Language Alignment in Radiology with Zero-Shot Multi-Task Capability RadZero: Ähnlichkeitsbasierte Cross-Attention für erklärbare Vision-Sprachenausrichtung in der Radiologie mit Zero-Shot-Multi-Task-Fähigkeit RadZero:在无热多任务能力的放射学中,对可解释的视觉-语言协调进行基于相似的交叉关注 2504.07416v2
  • 550 07-01 Flexible Language Modeling in Continuous Space with Transformer-based Autoregressive Flows Flexible Sprachmodellierung im kontinuierlichen Raum mit transformerbasierten autoregressiven Strömungen 具有以变换器为基础的自动递减流动的连续空间灵活语言建模 2507.00425v1
  • 551 07-01 Find a Scapegoat: Poisoning Membership Inference Attack and Defense to Federated Learning Find a Scapegoat: Vergiftung der Mitgliedschaft Inferenzangriff und Verteidigung zu Federated Learning 寻找一条“Scamegoat”:毒瘾成员攻击和防御联邦学习组织 2507.00423v1
  • 552 07-01 Diffusion Disambiguation Models for Partial Label Learning Diffusion Disambiguation Modelle für partielles Label-Lernen 部分标签学习的分解差异模型 2507.00411v1
  • 553 07-01 Parameter-Efficient Fine-Tuning via Circular Convolution Parameter-Effizient Feintuning über Kreiskonvolution 通过循环革命提高参数效率 2407.19342v4
  • 554 07-01 Teaching Time Series to See and Speak: Forecasting with Aligned Visual and Textual Perspectives Lehrende Zeitreihen zum Sehen und Sprechen: Vorhersagen mit gerichteten visuellen und textuellen Perspektiven 阅读和发言教学时间系列:以一致的视觉和文字视角预测 2506.24124v2
  • 555 07-01 GRAND: Graph Release with Assured Node Differential Privacy GRAND: Graph Release mit gesichertem Node Differential Privacy GRAND: 带有有保证的节点差异隐私的图表发布 2507.00402v1
  • 556 07-01 Few-shot Classification as Multi-instance Verification: Effective Backbone-agnostic Transfer across Domains Wenige Schuss-Klassifikation als Multi-Instance-Verifizierung: Effektive Backbone-agnostische Übertragung über Domains hinweg ” 多功能核查 “ 分类:跨域的有效后骨-不可知转让 2507.00401v1
  • 557 07-01 Two-Stage Regularization-Based Structured Pruning for LLMs Zweistufiges Regularisierungs-basierendes strukturiertes Pruning für LLMs LLMM 双级正规化和结构化 2505.18232v2
  • 558 07-01 PEAKS: Selecting Key Training Examples Incrementally via Prediction Error Anchored by Kernel Similarity PEAKS: Auswahl von Schlüsseltrainingsbeispielen inkrementell über Vorhersagefehler durch Kernel-Ähnlichkeit verankert EAKS:通过预测错误逐步选择关键培训实例 2504.05250v4
  • 559 07-01 HelixPipe: Efficient Distributed Training of Long Sequence Transformers with Attention Parallel Pipeline Parallelism HelixPipe: Effizientes Training von langen Sequenztransformatoren mit Aufmerksamkeit Paralleler Pipeline-Parallelismus HelixPipe:对长序列变异器进行有效分布式培训,注意平行管道平行平行平行 2507.00394v1
  • 560 07-01 Towards Large-Scale In-Context Reinforcement Learning by Meta-Training in Randomized Worlds Auf dem Weg zu groß angelegtem In-Context-Verstärkungslernen durch Meta-Training in Randomized Worlds 通过在随机化世界中进行元培训进行大规模加强学习 2502.02869v2
  • 561 07-01 MoNE: Replacing Redundant Experts with Lightweight Novices for Structured Pruning of MoE MoNE: Redundante Experten mit leichten Novizen für strukturiertes Beschneiden von MoE ersetzen MONE:将冗余专家替换为教育部结构调节的轻量级新手 2507.00390v1
  • 562 07-01 BlockDialect: Block-wise Fine-grained Mixed Format Quantization for Energy-Efficient LLM Inference BlockDialekt: Blockweise feinkörnige Mischformat-Quantisierung für energieeffiziente LLM-Inferenz BlockDiaect: 节能LLM 推论的粗件精细混合格式量化 2501.01144v4
  • 563 07-01 SPIRAL: Self-Play on Zero-Sum Games Incentivizes Reasoning via Multi-Agent Multi-Turn Reinforcement Learning SPIRAL: Selbst-Spiel auf Null-Sum-Spiele Anreize zur Vernunft durch Multi-Agent Multi-Turn Verstärkungs-Lernen SPIRAL: 在零桑运动会上自玩 2506.24119v2
  • 564 07-01 Analogical Learning for Cross-Scenario Generalization: Framework and Application to Intelligent Localization Analogisches Lernen für Cross-Szenario-Verallgemeinerung: Rahmen und Anwendung auf intelligente Lokalisierung 跨设想普遍化的模拟学习:智能本地化的框架和应用 2504.08811v2
  • 565 07-01 A Good Start Matters: Enhancing Continual Learning with Data-Driven Weight Initialization Ein guter Start: Kontinuierliches Lernen mit datengetriebener Gewichtsinitialisierung verbessern 良好开端事项:加强持续学习,以数据驱动的重量初始化 2503.06385v2
  • 566 07-01 Neuro-Informed Joint Learning Enhances Cognitive Workload Decoding in Portable BCIs Neuro-informiertes gemeinsames Lernen verbessert Kognitive Workload-Dekodierung in tragbaren BCIs Neuro-内建联合学习促进可移动生物CI的认知工作量解码 2506.23458v2
  • 567 07-01 Ansatz-free Hamiltonian learning with Heisenberg-limited scaling Ansatzfreies Hamilton-Lernen mit Heisenberg-begrenzter Skalierung 无安萨兹的汉密尔顿学习 2502.11900v2
  • 568 07-01 Prediction of Freezing of Gait in Parkinsons Disease using Explainable AI and Federated Deep Learning for Wearable Sensors Vorhersage der Einfrierung von Gait bei Parkinson-Krankheit mit erklärbarer KI und Federated Deep Learning für tragbare Sensoren 利用可解释的AI和穿戴感应器的联邦深层学习,预测帕金ons疾病中盖特的冷冻情况 2507.01068v1
  • 569 07-01 CoCMT: Communication-Efficient Cross-Modal Transformer for Collaborative Perception CoCMT: Kommunikationseffizienter Cross-Modal Transformer für kollaborative Wahrnehmung COCMT: 用于协作认知的通信效率高的跨模式变换器 2503.13504v2
  • 570 07-01 Lifelong Learning of Video Diffusion Models From a Single Video Stream Lebenslanges Lernen von Video-Diffusions-Modellen aus einem einzigen Video-Stream 终生学习单一视频流的视频传播模型 2406.04814v3
  • 571 07-01 Neural Networks Generalize on Low Complexity Data Neurale Netzwerke verallgemeinern auf Daten mit geringer Komplexität 通用低复杂度数据神经网络 2409.12446v4
  • 572 07-01 Data-Driven Exploration for a Class of Continuous-Time Linear–Quadratic Reinforcement Learning Problems Daten-getriebene Exploration für eine Klasse von kontinuierlichen-Zeit-Linear–Quadratische Verstärkung Lernprobleme 连续-时线性-重压强化学习问题分类数据探索 2507.00358v1
  • 573 07-01 Transformers from Diffusion: A Unified Framework for Neural Message Passing Transformer von Diffusion: Ein einheitliches Framework für neurale Message Passing 来自传播的变换器:神经信息传递的统一框架 2409.09111v4
  • 574 07-01 Augmented Physics-Based Li-ion Battery Model via Adaptive Ensemble Sparse Learning and Conformal Prediction Erweitertes physikbasiertes Li-Ionen-Batteriemodell über Adaptives Ensemble Sparse Lernen und konforme Vorhersage 通过适应性共聚稀散学习和非正规预测,增强基于物理的利离电池模型 2507.00353v1
  • 575 07-01 Junk DNA Hypothesis: Pruning Small Pre-Trained Weights Irreversibly and Monotonically Impairs “Difficult” Downstream Tasks in LLMs Junk-DNA-Hypothese: Beschneiden kleiner vortrainierter Gewichte irreversibel und monoton beeinträchtigt “schwierige” Downstream-Aufgaben in LLMs 理顺低流中“困难”的“困难”下流任务(LLMS) 2310.02277v3
  • 576 07-01 Geological and Well prior assisted full waveform inversion using conditional diffusion models Geologische und gut vorher assistierte Vollwellenformumwandlung mit bedingten Diffusionsmodellen 使用有条件扩散模型,事先协助的地质和良好古学 2412.06959v2
  • 577 06-30 (1) Evaluation of a Foundational Model and Stochastic Models for Forecasting Sporadic or Spiky Production Outages of High-Performance Machine Learning Services Bewertung eines Basismodells und stochastische Modelle zur Vorhersage sporadischer oder würziger Produktionsausfälle hochleistungsfähiger Machine Learning Services 评价预测高性能机器学习服务零星或斯皮生产流出的基础模型和存储模型 2507.01067v1
  • 578 06-30 ETTA: Elucidating the Design Space of Text-to-Audio Models ETTA: Erklärung des Designraums von Text-zu-Audio-Modellen ETTA: 说明文本到模拟模型的设计空间 2412.19351v2
  • 579 06-30 Exploring Theory-Laden Observations in the Brain Basis of Emotional Experience Erforschung von Theorie-Laden-Beobachtungen in der Brain-Basis emotionaler Erfahrung 在情感经历的大脑基础中探索理论拉拉观察 2507.00320v1
  • 580 06-30 Stacked conformal prediction Gestapelte konforme Vorhersage 堆叠一致预测 2505.12578v2
  • 581 06-30 Open-ended Scientific Discovery via Bayesian Surprise Offene wissenschaftliche Entdeckung über Bayesian Surprise 通过贝叶斯惊喜的不限名额科学发现 2507.00310v1
  • 582 06-30 TabNSA: Native Sparse Attention for Efficient Tabular Data Learning TabNSA: Native Sparse Aufmerksamkeit für effizientes Tabular Data Learning TabNSA: 提高图表数据学习效率的本地分散关注 2503.09850v2
  • 583 06-30 MamNet: A Novel Hybrid Model for Time-Series Forecasting and Frequency Pattern Analysis in Network Traffic MamNet: Ein neuartiges Hybridmodell für die Zeitreihenprognose und Frequenzmusteranalyse im Netzverkehr MamNet:网络交通时间-系列预测和频率模式分析新混合模型 2507.00304v1
  • 584 06-30 Structure-preserving Lift & Learn: Scientific machine learning for nonlinear conservative partial differential equations Structure-preserving Lift & Learn: Wissenschaftliches maschinelles Lernen für nichtlineare konservative partielle Differentialgleichungen 结构保护提升和学习:非线性保守部分差异方程的科学机器学习 2507.00301v1
  • 585 06-30 Enhancing Interpretability in Generative Modeling: Statistically Disentangled Latent Spaces Guided by Generative Factors in Scientific Datasets Verbesserung der Verdolmetschbarkeit in der Generativen Modellierung: Statistisch entfremdete Latentenräume, geleitet von generativen Faktoren in wissenschaftlichen Datensätzen 加强产生模型的可解释性:科学数据集中产生因素所引导的统计分解的低端空间 2507.00298v1
  • 586 06-30 Empirical and computer-aided robustness analysis of long-step and accelerated methods in smooth convex optimization Empirische und computergestützte Robustheitsanalyse von langen und beschleunigten Methoden in reibungsloser konvexer Optimierung 经验分析和计算机辅助的稳健性分析,分析如何以长步骤和加速方法顺利优化顺流优化 2506.09730v3
  • 587 06-30 Outlier Weighed Layerwise Sparsity (OWL): A Missing Secret Sauce for Pruning LLMs to High Sparsity Outlier Weigged Layerwise Sparsity (OWL): Eine fehlende Geheimsoße für das Beschneiden von LLMs auf hohe Sparsität 边边边边边边边的层间平分(OWL): 高平面的普鲁宁长程 2310.05175v4
  • 588 06-30 The Automated LLM Speedrunning Benchmark: Reproducing NanoGPT Improvements Der Automatisierte LLM Speedrunning Benchmark: NanoGPT-Verbesserungen reproduzieren 自动LLM快速运行基准:复制纳米GPT改进 2506.22419v2
  • 589 06-30 Double Q-learning for Value-based Deep Reinforcement Learning, Revisited Doppeltes Q-Lernen für wertbasiertes Deep Enforcement Learning, Revisited 以价值为基础的深强化学习的双重学习 2507.00275v1
  • 590 06-30 Making a Pipeline Production-Ready: Challenges and Lessons Learned in the Healthcare Domain Herstellung einer Pipeline-Produktion: Herausforderungen und Lektionen im Bereich Healthcare 《管道生产-准备:保健领域的挑战和经验教训》 2506.06946v2
  • 591 06-30 Examining Reject Relations in Stimulus Equivalence Simulations Prüfung der Ablehnung von Beziehungen in Stimulus-Equivalenz-Simulationen 在刺激等效模拟中审查拒绝关系 2507.00265v1
  • 592 06-30 Room Scene Discovery and Grouping in Unstructured Vacation Rental Image Collections Raumszene Entdeckung und Gruppierung in unstrukturierten Ferienhäusern Bildersammlungen 在无结构的假期中发现和分组房间景点 2507.00263v1
  • 593 06-30 Bregman Centroid Guided Cross-Entropy Method Bregman Centroid geführte Cross-Entropie-Methode Bregman Central 制导跨计算机方法 2506.02205v2
  • 594 06-30 Disentangled Feature Importance Entwirrte Eigenschaft Bedeutung 分解的特性重要性 2507.00260v1
  • 595 06-30 Gym4ReaL: A Suite for Benchmarking Real-World Reinforcement Learning Gym4ReaL: Eine Suite für das Benchmarking von Real-World Verstärkungslernen Gym4ReaL:衡量现实世界加强学习的成套办法 2507.00257v1
  • 596 06-30 Llama-Nemotron: Efficient Reasoning Models Llama-Nemotron: Effiziente Denkmodelle Llama-Nepotron: 高效推理模型 2505.00949v4
  • 597 06-30 Generative Modeling of Full-Atom Protein Conformations using Latent Diffusion on Graph Embeddings Generative Modellierung von Voll-Atom-Protein-Konformationen mit Latent Diffusion auf Graph-Embeddings 利用在图形嵌入器上延迟扩散生成全原子蛋白质变形的生成模型 2506.17064v3
  • 598 06-30 Developing Lightweight DNN Models With Limited Data For Real-Time Sign Language Recognition Entwicklung leichter DNN-Modelle mit begrenzten Daten für Echtzeit-Sign Language-Erkennung 开发轻型DNN模型,具有实时手语识别的有限数据 2507.00248v1
  • 599 06-30 Scaling Inference-Time Search with Vision Value Model for Improved Visual Comprehension Skalierung der Inferenz-Zeit-Suche mit Vision Value Model für verbesserte visuelle Wahrnehmung 增强视觉理解的视觉价值模型的增强推论-实时搜索 2412.03704v3
  • 600 06-30 An Unconditional Representation of the Conditional Score in Infinite-Dimensional Linear Inverse Problems Eine bedingungslose Darstellung des Bedingten Scores in unendlich-dimensionalen linearen Inversen-Problemen 无限、两极线性反问题中有条件评分的无条件代表性 2405.15643v3
  • 601 06-30 Causal Machine Learning in IoT-based Engineering Problems: A Tool Comparison in the Case of Household Energy Consumption Kausales maschinelles Lernen in IoT-basierten Engineering-Problemen: Ein Tool-Vergleich im Fall des Haushaltsenergieverbrauchs 以木工工程问题为基础的因果机械学习:家庭能源消费工具比较 2505.12147v3
  • 602 06-30 Interpretable AI for Time-Series: Multi-Model Heatmap Fusion with Global Attention and NLP-Generated Explanations Interpretierbare KI für die Time-Serie: Multi-Model Heatmap Fusion mit globaler Aufmerksamkeit und NLP-generierten Erklärungen 时间序列可解释的 AI:全球关注的多模型热图融合和NLP - 引人注意的解释 2507.00234v1
  • 603 06-30 A Graph-Based Classical and Quantum Approach to Deterministic L-System Inference Ein auf Graphen basierender klassischer und Quantumansatz zur deterministischen L-System-Inferenz 以图表为基础的确定性L-系统系统推断法的古学和量法 2411.19906v3
  • 604 06-30 PPFL-RDSN: Privacy-Preserving Federated Learning-based Residual Dense Spatial Networks for Encrypted Lossy Image Reconstruction PPFL-RDSN: Privacy-Preserving Federated Learning-based Residual Dense Spatial Networks for Encrypted Lossy Image Reconstruction PPFL-RDSN: 用于加密失传图像重建的基于隐私-保护联邦学习的残余常量空间网络 2507.00230v1
  • 605 06-30 Soft Dice Confidence: A Near-Optimal Confidence Estimator for Selective Prediction in Semantic Segmentation Weiche Würfel-Konfidenz: Ein nahezu optimaler Vertrauens-Schätzer für selektive Vorhersagen in semantischer Segmentierung Soft Dice 信任度: 精度分割选择性预测的近最佳信任模拟器 2402.10665v3
  • 606 06-30 Avoid Forgetting by Preserving Global Knowledge Gradients in Federated Learning with Non-IID Data Vermeiden Sie das Vergessen, indem Sie globale Wissensgradienten im Föderierten Lernen mit nicht-ID-Daten bewahren 避免在使用非二二二维数据进行联邦学习时因保留全球知识进步而被遗忘 2505.20485v3
  • 607 06-30 Holistic Artificial Intelligence in Medicine; improved performance and explainability Ganzheitliche Künstliche Intelligenz in der Medizin; verbesserte Leistung und Erklärbarkeit 医学领域综合人工医学知识;改进绩效和解释性 2507.00205v1
  • 608 06-30 Storing overlapping associative memories on latent manifolds in low-rank spiking networks Speicherung überlappender assoziativer Erinnerungen an latente Mannigfaltigkeiten in niederrankigen Spiking-Netzwerken 将重叠的关联记忆存储在低级喷水网络中潜伏的柱体上 2411.17485v2
  • 609 06-30 Embedding-based Retrieval in Multimodal Content Moderation Einbetten-basiertes Retrieval in multimodale Content Moderation 以嵌入为基础的多模式内容调控中回存 2507.01066v1
  • 610 06-30 Identifying the Truth of Global Model: A Generic Solution to Defend Against Byzantine and Backdoor Attacks in Federated Learning (full version) Die Wahrheit des globalen Modells identifizieren: Eine generische Lösung gegen byzantinische und Hintertürangriffe im Federated Learning (Vollversion) 查明全球模式真相:在联邦学习联盟中防范拜占庭和后门攻击的一般解决办法(全文) 2311.10248v3
  • 611 06-30 Intelligent Routing for Sparse Demand Forecasting: A Comparative Evaluation of Selection Strategies Intelligentes Routing für Sparse Demand Forecasting: Eine vergleichende Bewertung von Auswahlstrategien 用于粗略需求预测的智能路标:选择战略的比较评价 2506.14810v2
  • 612 06-30 What Makes Local Updates Effective: The Role of Data Heterogeneity and Smoothness Was lokale Updates effektiv macht: Die Rolle von Daten Heterogenität und Glätte 是什么使本地更新有效:数据多样化和平稳的作用 2507.00195v1
  • 613 06-30 RocketKV: Accelerating Long-Context LLM Inference via Two-Stage KV Cache Compression RocketKV: Beschleunigung der Langkontext-LLM-Inferenz über zweistufige KV-Cache-Kompression RocketKV: 通过两步KV缓存压缩加速长文本LLM推导 2502.14051v2
  • 614 06-30 Beyond Sensor Data: Foundation Models of Behavioral Data from Wearables Improve Health Predictions Jenseits von Sensordaten: Basismodelle von Verhaltensdaten von Wearables verbessern Gesundheitsprognosen 超越感官数据:从改进健康预测的易穿改进健康预测 2507.00191v1
  • 615 06-30 Text-to-Level Diffusion Models With Various Text Encoders for Super Mario Bros Text-zu-Level-Diffusionsmodelle mit verschiedenen Text-Encodern für Super Mario Bros 超级马里奥兄弟公司各种文本编码器的文本到级传播模型 2507.00184v1
  • 616 06-30 RLCAD: Reinforcement Learning Training Gym for Revolution Involved CAD Command Sequence Generation RLCAD: Verstärkungs-Lerntraining Gym für Revolution beteiligt CAD-Befehlssequenz-Generierung RELAD: 革命参与的CAD指挥序列生成者加强学习培训 2503.18549v2
  • 617 06-30 Generative Intervention Models for Causal Perturbation Modeling Generative Interventionsmodelle für die kausale Störungsmodellierung 造成助动模型模型的产生干预模型 2411.14003v2
  • 618 06-30 BlackBoxToBlueprint: Extracting Interpretable Logic from Legacy Systems using Reinforcement Learning and Counterfactual Analysis BlackBoxToBlueprint: Interpretierbare Logik aus Legacy-Systemen mit Verstärkungslernen und kontrafaktischer Analyse extrahieren BlackBoxto蓝图:利用强化学习和反事实分析从遗留系统中提取可解释逻辑 2507.00180v1
  • 619 06-30 Robust Representation Consistency Model via Contrastive Denoising Robustes Darstellungs-Konsistenz-Modell durch Kontrast-Denoisieren 通过反排斥做法形成强力代表性一致性模式 2501.13094v2
  • 620 06-30 Data Uniformity Improves Training Efficiency and More, with a Convergence Framework Beyond the NTK Regime Dateneinheit verbessert Trainingseffizienz und mehr, mit einem Konvergenzrahmen jenseits des NTK-Regimes 数据统一提高培训效率,提高培训效率,提高培训效率,提高培训效率,除国家培训制度之外,采用统一框架 2506.24120v1
  • 621 06-30 Navigating with Annealing Guidance Scale in Diffusion Space Navigieren mit Annealing Guidance Scale im Diffusionsraum 与在扩散空间使用 AnnaALing 指导比例值导航 2506.24108v1
  • 622 06-30 Development of Hybrid Artificial Intelligence Training on Real and Synthetic Data: Benchmark on Two Mixed Training Strategies Entwicklung von hybriden Künstlichen Intelligenz-Schulungen zu realen und synthetischen Daten: Benchmark für zwei gemischte Trainingsstrategien 发展关于真实数据与合成数据的混合人文情报培训:两项混合培训战略的基准 2506.24093v1
  • 623 06-30 SEUF: Is Unlearning One Expert Enough for Mixture-of-Experts LLMs? SEUF: Reicht es für LLMs für Mixture-of-Experts aus, einen Experten zu lernen? SEUF:不学习一位专家是否足以使混合专家LLM公司受益? 2411.18797v2
  • 624 06-30 SQUASH: A SWAP-Based Quantum Attack to Sabotage Hybrid Quantum Neural Networks SQUASH: Ein SWAP-basierter Quantenangriff zur Sabotage hybrider Quantum-Neural-Netzwerke 以全部门办法为基础的对破坏混合量子神经网络的量子攻击 2506.24081v1
  • 625 06-30 Optimization, Isoperimetric Inequalities, and Sampling via Lyapunov Potentials Optimierung, isoperimetrische Ungleichheiten und Probenahme über Lyapunov-Potenziale 优化、同位素不平等和通过Lyapunov潜力取样 2410.02979v4
  • 626 06-30 What Makes Treatment Effects Identifiable? Characterizations and Estimators Beyond Unconfoundedness Was macht Behandlungseffekte identifizierbar? Charakterisierungen und Stimulatoren jenseits von Unbegründetheit 是什么使治疗效果可以识别?特征和估计因素超越无根据 2506.04194v2
  • 627 06-30 Value-Compressed Sparse Column (VCSC): Sparse Matrix Storage for Redundant Data Wertkomprimierte Sparse-Spalte (VCSC): Sparse Matrix Storage für redundante Daten 压压式稀释列(VCSC):为冗余数据储存散式矩阵 2309.04355v2
  • 628 06-30 Logit-Gap Steering: Efficient Short-Suffix Jailbreaks for Aligned Large Language Models Logit-Gap Steering: Effiziente Short-Suffix Jailbreaks für ausgerichtete große Sprachmodelle Lologit-Gap 指导:通用大语言模型的高效短后休息室 2506.24056v1
  • 629 06-30 Consensus-based optimization for closed-box adversarial attacks and a connection to evolution strategies Konsensbasierte Optimierung für Closed-Box-Adversarial-Angriffe und eine Verbindung zu Evolutionsstrategien 以共识为基础优化对封闭箱对抗性攻击的优化和与演变战略的联系 2506.24048v1
  • 630 06-30 Agent.xpu: Efficient Scheduling of Agentic LLM Workloads on Heterogeneous SoC Agent.xpu: Effiziente Planung von Agentic LLM Workloads auf heterogenen SoC Agent.xpu: 高效地安排对异基因 soC 的Agentic LLM 工作负荷 2506.24045v1
  • 631 06-30 Faster Diffusion Models via Higher-Order Approximation Schnellere Diffusionsmodelle über höhere Rangannäherung 通过高分近端传送快速扩散模型 2506.24042v1
  • 632 06-30 Unsupervised Sparse Coding-based Spiking Neural Network for Real-time Spike Sorting Unüberwachte Sparse Coding-basierte Spiking Neural Network für Echtzeit Spike Sortierung 用于实时 Spiking Spiking Spiking Spiking Spiking Sypt Scorting 的无监督的 Sparse 编码基础 Spiking 神经网络 2506.24041v1
  • 633 06-30 Refined climatologies of future precipitation over High Mountain Asia using probabilistic ensemble learning Verfeinerte Klimatologien der zukünftigen Niederschlag über High Mountain Asia mit probabilistic ensemble learning 利用概率共变学习,改进高山区亚洲今后降水的气候学 2501.15690v3
  • 634 06-30 Experimenting, Fast and Slow: Bayesian Optimization of Long-term Outcomes with Online Experiments Experimentieren, schnell und langsam: Bayesische Optimierung langfristiger Ergebnisse mit Online-Experimenten 实验、快速和缓慢:利用在线实验优化长期成果 2506.18744v2
  • 635 06-30 Position: Machine Learning Conferences Should Establish a “Refutations and Critiques” Track Position: Machine Learning Konferenzen sollten einen “Refutations and Critiques” Track erstellen 职位:机器学习会议应建立“反驳和批评”轨道 2506.19882v2
  • 636 06-30 Riddle Me This! Stealthy Membership Inference for Retrieval-Augmented Generation Befreien Sie mich damit! Stealthy Mitgliedschaft Inferenz für Retrieval-Augmented Generation 中我这个! 偷盗会员身份的回溯性 被支持的一代人的推论 2502.00306v2
  • 637 06-30 Post-processing of EEG-based Auditory Attention Decoding Decisions via Hidden Markov Models Nachbearbeitung von EEG-basierten Auditory-Achtungsdekodierungsentscheidungen über Hidden Markov-Modelle 通过隐藏Markov模式处理基于EEG的审计员注意后解码决定 2506.24024v1
  • 638 06-30 Semiparametric Double Reinforcement Learning with Applications to Long-Term Causal Inference Semiparametrisches Doppel-Verstärkungs-Lernen mit Anwendungen zur langfristigen Kausalität 半参数双重强化学习,应用到长期因果关系推断 2501.06926v3
  • 639 06-30 Bridging Theory and Practice in Link Representation with Graph Neural Networks Überbrückung von Theorie und Praxis in der Linkdarstellung mit Graph Neural Networks 将代表性与图表神经网络联系起来的架桥理论和实践 2506.24018v1
  • 640 06-30 Connecting phases of matter to the flatness of the loss landscape in analog variational quantum algorithms Verbinden von Phasen der Materie mit der Ebenheit der Verlustlandschaft in analogen Variations-Quantenalgorithmen 在模拟变异量算法中将物质阶段与损失平整情况相连接 2506.13865v2
  • 641 06-30 Graph Neural Networks in Wind Power Forecasting Schaubild Neurale Netze in der Windkraftprognose 风力预报中的神经网络图 2507.00105v1
  • 642 06-30 Pixel super-resolved virtual staining of label-free tissue using diffusion models Pixel super gelöste virtuelle Färbung von etikettfreiem Gewebe mit Diffusionsmodellen 使用扩散模型对无标签组织使用无标签组织进行像素超解解超像素的虚拟涂污 2410.20073v2
  • 643 06-30 Minimax and Bayes Optimal Best-arm Identification: Adaptive Experimental Design for Treatment Choice Minimax und Buchten Optimale Best-Arm-Identifikation: Adaptives Experimentaldesign für die Behandlungswahl Minimax和Bayes 最佳最佳武器识别:治疗选择的适应性实验设计 2506.24007v1
  • 644 06-30 Provably Efficient and Agile Randomized Q-Learning Wahrscheinlich effizient und agil Randomized Q-Learning 高效率和简易随机化的学习 Q-学习 2506.24005v1
  • 645 06-30 The Illusion of Progress? A Critical Look at Test-Time Adaptation for Vision-Language Models Die Illusion des Fortschritts? Ein kritischer Blick auf Test-Zeit-Anpassung für Vision-Sprache-Modelle 《进步的幻影? 对视觉语言模型的试验时间适应的批判性审视》 2506.24000v1
  • 646 06-30 Z-REx: Human-Interpretable GNN Explanations for Real Estate Recommendations Z-REx: Human-Interpretable GNN Erklärungen für Immobilien-Empfehlungen Z-REx:对不动产建议的人-可解释的GNN(GNN)解释 2503.18001v2
  • 647 06-30 TTRL: Test-Time Reinforcement Learning TTRL: Test-Zeit-Verstärkungs-Lernen TTRL: 试验时间强化学习 2504.16084v3
  • 648 06-30 DGSAM: Domain Generalization via Individual Sharpness-Aware Minimization DGSAM: Domänenverallgemeinerung durch individuelle Schärfen-Bewusst-Minimierung DGSAM:通过个人锐化-最小化 2503.23430v2
  • 649 06-30 The Jacobian and Hessian of the Kullback-Leibler Divergence between Multivariate Gaussian Distributions (Technical Report) Jacobian und Hessian der Kullback-Leiibler Divergenz zwischen multivariaten Gaussian Verteilungen (Technischer Bericht) 《多种变数高斯发行(技术报告)之间的 库尔贝克-利比尔差异(技术报告)》的Jacobian和Hessian 2506.23996v1
  • 650 06-30 ROCK: A variational formulation for occupation kernel methods in Reproducing Kernel Hilbert Spaces ROCK: Eine Variationsformulierung für Okkupationskern-Methoden in der Reproduzierung von Kernel Hilbert Spaces ROCK: 复制凯尔内尔·希尔伯特航天公司职业内核方法的变式配方 2503.13791v2
  • 651 06-30 Machine Understanding of Scientific Language Maschinelles Verständnis der wissenschaftlichen Sprache 科学语言机器理解 2506.23990v1
  • 652 06-30 Table Foundation Models: on knowledge pre-training for tabular learning Tabelle Grundlagenmodelle: Wissensvorausbildung für das tabellarische Lernen 表 基础基础表 模式:关于以表格形式学习的知识预培训模式 2505.14415v2
  • 653 06-30 LLM Agents Are the Antidote to Walled Gardens LLM-Agenten sind das Gegenmittel zu ummauerten Gärten LLM 药剂是被围墙隔绝的花园的抗药剂 2506.23978v1
  • 654 06-30 A Scalable Approach for Safe and Robust Learning via Lipschitz-Constrained Networks Ein skalierbarer Ansatz für sicheres und robustes Lernen über Lipschitz-Kontrainierte Netzwerke 通过Lipschitz-Consered 网络推广安全和有力的学习方法 2506.23977v1
  • 655 06-30 SP$^2$OT: Semantic-Regularized Progressive Partial Optimal Transport for Imbalanced Clustering SP$^2$OT: Semantisch-regularisierter Progressiver Partieller Optimaler Transport für unausgewogenes Clustering SP$2$OT: 用于平衡集束的语义-分解渐进式最佳部分运输 2404.03446v2
  • 656 06-30 UMA: A Family of Universal Models for Atoms UMA: Eine Familie universeller Modelle für Atome UMA: “ 原子世界模型大家庭 “ 2506.23971v1
  • 657 06-30 Learning Constraints Directly from Network Data Lernbeschränkungen direkt aus Netzwerkdaten 直接来自网络数据的学习限制 2506.23964v1
  • 658 06-30 ADReFT: Adaptive Decision Repair for Safe Autonomous Driving via Reinforcement Fine-Tuning ADReFT: Adaptive Entscheidungsreparatur für sicheres autonomes Fahren durch Verstärkung Feintuning ADREFT: 安全自主驾驶的适应性决定修补 2506.23960v1
  • 659 06-30 Intelligent Orchestration of Distributed Large Foundation Model Inference at the Edge Intelligente Orchestrierung der verteilten Large Foundation Model Inferenz am Rande 分散在边缘的大基金会模型推断 2504.03668v2
  • 660 06-30 Bridging the Gap with Retrieval-Augmented Generation: Making Prosthetic Device User Manuals Available in Marginalised Languages Bridging the Gap with Retrieval-Augmented Generation: Prothetische Geräte-Handbücher in marginalisierten Sprachen verfügbar machen 缩小与回收-启动一代人的差距:在边际语言中提供假肢设备用户手册 2506.23958v1
  • 661 06-30 Scaling and renormalization in high-dimensional regression Skalierung und Renormalisierung bei hochdimensionaler Regression 高维回归中的缩放和重新整顿 2405.00592v4
  • 662 06-30 QPART: Adaptive Model Quantization and Dynamic Workload Balancing for Accuracy-aware Edge Inference QPART: Adaptive Modell-Quantisierung und dynamische Workload-Balancing für akkurat-bewusste Edge-Inferenz QPART: 适应性模型量化和动态工作量平衡,以利准确度认知边缘推断 2506.23934v1
  • 663 06-30 Industrial brain: a human-like autonomous neuro-symbolic cognitive decision-making system Industrielles Gehirn: ein menschenähnliches autonomes neuro-symbolisches kognitives Entscheidungssystem 工业大脑:像人一样的自主性神经 – – 精神 – – 精神 – – 左侧神经 – – 认知决策系统 2506.23926v1
  • 664 06-30 Reinforcement Learning for Synchronised Flow Control in a Dual-Gate Resin Infusion System Verstärktes Lernen zur Synchronisierung der Durchflussregelung in einem Dual-Gate-Resin-Infusionssystem 在双气体Resin Incent Incent System 中进行强化学习,以进行同步流动控制 2506.23923v1
  • 665 06-30 The Trilemma of Truth in Large Language Models Das Trilemma der Wahrheit in großen Sprachmodellen 大语言模型中的真理三边 2506.23921v1
  • 666 06-30 Intrinsic Dimensionality of Fermi-Pasta-Ulam-Tsingou High-Dimensional Trajectories Through Manifold Learning: A Linear Approach Intrinsische Dimensionalität von Fermi-Pasta-Ulam-Tsingou Hochdimensionale Trajektorien durch manifoldes Lernen: Ein linearer Ansatz Fermi-Pasta-Ulam-Tsingou 高多元轨迹的内在维度 2411.02058v2
  • 667 06-30 Learning robust parameter inference and density reconstruction in flyer plate impact experiments Erlernen von robusten Parameterinferenzen und Dichterekonstruktionen in Flyer-Platteneinschlagsexperimenten 飞行板块撞击实验中学习强力参数推力和密度重建 2506.23914v1
  • 668 06-30 RawMal-TF: Raw Malware Dataset Labeled by Type and Family RawMal-TF: Raw Malware Datensatz nach Typ und Familie gekennzeichnet RawMal- TF: Raw Maware 数据集按类型和家庭标签 2506.23909v1
  • 669 06-30 Beyond Statistical Learning: Exact Learning Is Essential for General Intelligence Beyond Statistical Learning: Exaktes Lernen ist für die allgemeine Intelligenz unerlässlich 超越统计学习:精确学习对于一般情报至关重要 2506.23908v1
  • 670 06-30 Simultaneous Multi-Robot Motion Planning with Projected Diffusion Models Gleichzeitige Multi-Roboter-Bewegungsplanung mit projizierten Diffusionsmodellen 与预测的传播模型同步的多机器人运动规划 2502.03607v2
  • 671 06-30 Green AI in Action: Strategic Model Selection for Ensembles in Production Grüne KI in Aktion: Strategische Modellauswahl für Ensembles in der Produktion 绿色AI “ 行动 “ :生产集合战略示范选择 2405.17451v2
  • 672 06-30 CBAGAN-RRT: Convolutional Block Attention Generative Adversarial Network for Sampling-Based Path Planning CBAGAN-RRT: Convolutional Block Attention Generatives Adversarial Network für die stichprobengestützte Pfadplanung CBAGAN-RRT: 以抽样为基础的路径规划革命性阻力引引引反向网络 2305.10442v2
  • 673 06-30 Towards transparent and data-driven fault detection in manufacturing: A case study on univariate, discrete time series Auf dem Weg zur transparenten und datengesteuerten Fehlererkennung in der Fertigung: Eine Fallstudie zu univariaten, diskreten Zeitreihen 争取在制造业中发现透明和数据驱动的故障:关于单体、离散时间序列的案例研究 2507.00102v1
  • 674 06-30 Spurious-Aware Prototype Refinement for Reliable Out-of-Distribution Detection Spurious-Aware Prototypen-Verfeinerung für zuverlässige Out-of-Distribution-Detection 可靠分销外探测的精密原始类型精炼 2506.23881v1
  • 675 06-30 DFReg: A Physics-Inspired Framework for Global Weight Distribution Regularization in Neural Networks DFReg: Ein physik-inspirierter Rahmen für die weltweite Regularisierung der Gewichtsverteilung in neuralen Netzwerken DFReg:神经网络全球重量分配规范化的物理启发框架 2507.00101v1
  • 676 06-30 Benchmarking Spiking Neural Network Learning Methods with Varying Locality Benchmarking-Spiking Neural Network Lernmethoden mit unterschiedlicher Lokalität 参照不同地点的神经网络学习方法 2402.01782v2
  • 677 06-30 Chain of Thought in Order: Discovering Learning-Friendly Orders for Arithmetic Kette des Denkens in Ordnung: Lernen-Freundliche Aufträge für Arithmetik entdecken 思维链有序:发现对自学的学习友好命令 2506.23875v1
  • 678 06-30 WeatherEdit: Controllable Weather Editing with 4D Gaussian Field WeatherEdit: Kontrollierbare Wetterbearbeitung mit 4D Gaussian Field 气象编辑: 4D Gaussian 字段的可控天气编辑 2505.20471v2
  • 679 06-30 Emergent musical properties of a transformer under contrastive self-supervised learning Emergente musikalische Eigenschaften eines Transformators unter kontrastivem selbstüberwachtem Lernen 在对比式自我监督的学习下,变压器的新兴音乐特性 2506.23873v1
  • 680 06-30 When Plants Respond: Electrophysiology and Machine Learning for Green Monitoring Systems Wenn Pflanzen reagieren: Elektrophysiologie und maschinelles Lernen für grüne Überwachungssysteme 植物响应:电生理和机械学习促进绿色监测系统 2506.23872v1
  • 681 06-30 Scaling Self-Supervised Representation Learning for Symbolic Piano Performance Skalierendes Selbstüberwachtes Repräsentationslernen für symbolische Klavierperformance 象征钢琴表演自我监督代表学习 2506.23869v1
  • 682 06-30 Differentially Private Synthetic Data Release for Topics API Outputs Differential private Synthetische Datenfreigabe für Themen API-Ausgaben 不同专题APIP产出的私人不同合成数据发布 2506.23855v1
  • 683 06-30 Use Sparse Autoencoders to Discover Unknown Concepts, Not to Act on Known Concepts Verwenden Sie Sparse Autoencoder, um unbekannte Konzepte zu entdecken, nicht um auf bekannte Konzepte zu handeln 使用粗略自动编码器发现未知概念, 而不是对已知概念采取行动 2506.23845v1
  • 684 06-30 Pressing Intensity: An Intuitive Measure for Pressing in Soccer Pressing Intensity: Eine intuitive Maßnahme für das Drücken im Fußball 压力强度:足球压力的直觉措施 2501.04712v2
  • 685 06-30 EFPI: Elastic Formation and Position Identification in Football (Soccer) using Template Matching and Linear Assignment EFPI: Elastische Formation und Positionsidentifikation im Fußball (Soccer) unter Verwendung von Template Matching und Linear Assignment EFPI: 使用模板匹配和线性指派的足球(足球)弹性形成和定位识别 2506.23843v1
  • 686 06-30 Proving the Limited Scalability of Centralized Distributed Optimization via a New Lower Bound Construction Nachweis der begrenzten Skalierbarkeit der zentralisierten verteilten Optimierung durch eine neue untere Bound-Konstruktion 证明通过新建下下界建筑的集中分配最佳优化的有限可扩展性 2506.23836v1
  • 687 06-30 Robustness of Decentralised Learning to Nodes and Data Disruption Robustheit des dezentralisierten Lernens zu Knoten und Datenstörung 分散学习至节点和数据干扰的强力 2405.02377v2
  • 688 06-30 Hybrid Quantum Neural Networks with Amplitude Encoding: Advancing Recovery Rate Predictions Hybride Quantum-Neural-Netzwerke mit Amplitudenkodierung: Steigerung der Erholungsrate Vorhersagen 具有大度编码的硬化量子神经网络:提高恢复率预测 2501.15828v5
  • 689 06-30 Supercm: Revisiting Clustering for Semi-Supervised Learning Supercm: Clustering für semi-überwachtes Lernen revisitieren 超强cm:为半监督学习重新研究集群 2506.23824v1
  • 690 06-30 Early-Stage Anomaly Detection: A Study of Model Performance on Complete vs. Partial Flows Early-Stage-Anomaly Detection: Eine Studie der Modellleistung auf komplette vs. Teilströmungen 早期短期异常检测:关于完整与部分流动的示范绩效研究 2407.02856v3
  • 691 06-30 Learning World Models With Hierarchical Temporal Abstractions: A Probabilistic Perspective Weltmodelle lernen mit hierarchischen zeitlichen Abstraktionen: Eine probabilistische Perspektive 具有等级性时地抽象现象的学习世界模式:概率视角 2404.16078v3
  • 692 06-30 HyperMono: A Monotonicity-aware Approach to Hyper-Relational Knowledge Representation HyperMono: Ein Monotonik-bewusster Ansatz zur Hyperrelationalen Wissensdarstellung HyperMono:超关系知识代表性的单感意识方法 2404.09848v3
  • 693 06-30 SGD with Adaptive Preconditioning: Unified Analysis and Momentum Acceleration SGD mit adaptiver Vorkonditionierung: Unified Analysis und Momentum Acceleration SGD 具有适应性预设条件:统一分析和加速动力 2506.23803v1
  • 694 06-30 Adaptive Out-of-Control Point Pattern Detection in Sequential Random Finite Set Observations Adaptive Out-of-Control-Punkt-Mustererkennung in sequentiellen Random-Finite-Set-Beobachtungen 序列随机有限性观察中的适应性外控制点探测模式 2506.23802v1
  • 695 06-30 Gibbs randomness-compression proposition: An efficient deep learning Gibbs Randomness-Kompressionssatz: Ein effizientes Deep Learning Gibbs 随机性压缩建议:高效的深层学习 2505.23869v2
  • 696 06-30 NeuralOM: Neural Ocean Model for Subseasonal-to-Seasonal Simulation NeuralOM: Neurales Ozeanmodell für die Simulation von Subsaisonal-zu-Seasonal 神经力OM:次季节到季节模拟神经海洋模型 2505.21020v2
  • 697 06-30 Advancing Learnable Multi-Agent Pathfinding Solvers with Active Fine-Tuning Erlernbare Multi-Agent-Pathfinding-Lösemittel mit aktiver Feinsteuerung 具有积极微调功能的推进可学习多机构探索式解答器 2506.23793v1
  • 698 06-30 Quantum computing and artificial intelligence: status and perspectives Quantenrechentechnik und künstliche Intelligenz: Status und Perspektiven 量子计算和人工智能:现状和前景 2505.23860v3
  • 699 06-30 When GNNs Met a Word Equations Solver: Learning to Rank Equations (Extended Technical Report) Wenn GNNs einen Wortgleichungen Solver trafen: Lernen zu Ranggleichungen (erweiterter technischer Bericht) 当 GNNs 遇到一个单等式解答器: 学习到等级( 扩展技术报告) 2506.23784v1
  • 700 06-30 Mamba-FETrack V2: Revisiting State Space Model for Frame-Event based Visual Object Tracking Mamba-FETrack V2: Zustands-Weltraummodell für Frame-Event-basiertes Visuelles Objekt-Tracking Mamba-FETRack V2:重新研究基于框架日的视觉物体跟踪国家空间模型 2506.23783v1
  • 701 06-30 Calibrating Graph Neural Networks with Wavelet-Aware Temperature Scaling Kalibrierung von Graphen-Neuralnetzwerken mit Wavelet-Aware-Temperaturskalierung 使用波盘软件温度缩放校准图形神经网络 2506.23782v1
  • 702 06-30 Towards Automated Self-Supervised Learning for Truly Unsupervised Graph Anomaly Detection Auf dem Weg zu einem automatisierten selbstüberwachten Lernen für wirklich unbeaufsichtigte Graphenanomalienerkennung 走向为真正无人监督的异常图像探测进行自动自动自监督学习 2501.14694v2
  • 703 06-30 Addressing the Inconsistency in Bayesian Deep Learning via Generalized Laplace Approximation Die Inkonsistenz in Bayesian Deep Learning über die allgemeine Laplace-Annäherung ansprechen 解决通过普遍拉拉近拉拉位办法解决巴耶斯深层学习不一致性问题 2405.13535v4
  • 704 06-30 Model-driven Stochastic Trace Clustering Modellgetriebenes Stochastisches Trace Clustering 由模型驱动的蒸蒸气追踪集束 2506.23776v1
  • 705 06-30 A Consequentialist Critique of Binary Classification Evaluation Practices Eine konsequentistische Kritik an binären Klassifikationsbewertungspraktiken 二元分类评价做法的相继性优异性评分评价做法 2504.04528v2
  • 706 06-30 Benchmarking Uncertainty Quantification Methods for Large Language Models with LM-Polygraph Benchmarking Uncertainty Quantification Methods for Large Language Models mit LM-Polygraph 与LM-Porgraph 参照大语言模型的不确定性量化方法 2406.15627v4
  • 707 06-30 Training of Spiking Neural Networks with Expectation-Propagation Ausbildung von neuralen Spiking-Netzwerken mit Erwartungs-Propagation 培训有期望-促进的Spiking神经网络 2506.23757v1
  • 708 06-30 Aligning Evaluation with Clinical Priorities: Calibration, Label Shift, and Error Costs Bewertung mit klinischen Prioritäten ausrichten: Kalibrierung, Label Shift und Fehlerkosten 使评价与临床优先事项:校准、标签转换和错误成本相一致 2506.14540v3
  • 709 06-30 Visual Encoders for Data-Efficient Imitation Learning in Modern Video Games Visuelle Encoder für dateneffizientes Imitationslernen in modernen Videospielen 现代视频游戏中数据高效模拟学习视觉编码器 2312.02312v3
  • 710 06-30 Value-Free Policy Optimization via Reward Partitioning Wertfreie Politikoptimierung über Belohnungspartitionierung 无价值政策优化,通过评分分分股 2506.13702v2
  • 711 06-30 Recovering Imbalanced Clusters via Gradient-Based Projection Pursuit Wiederherstellung unausgewogener Cluster über gradient-based Projektionsverfolgung 通过基于渐进的预测追求恢复平衡的群集 2502.02668v2
  • 712 06-30 Online model learning with data-assimilated reservoir computers Online-Modelllernen mit datengestützten Reservoir-Computern 利用数据扩散储油库计算机进行在线示范学习 2504.16767v2
  • 713 06-30 Refine-POI: Reinforcement Fine-Tuned Large Language Models for Next Point-of-Interest Recommendation Verfeinerungs-POI: Verstärkte Feinabstimmungen für große Sprachmodelle für die nächste Point-of-Interest-Empfehlung 参考-POI:为下一个利得点建议加强精微推荐的大型语言模式 2506.21599v2
  • 714 06-30 Radioactive Watermarks in Diffusion and Autoregressive Image Generative Models Radioaktive Wasserzeichen in Diffusion und autoregressiven Bildgenerativen Modellen 扩散和自动递减图像生成模型中放射性水标记 2506.23731v1
  • 715 06-30 System-Embedded Diffusion Bridge Models Modelle der System-Eingebetteten Diffusionsbrücke 系统嵌入式扩散桥模型 2506.23726v1
  • 716 06-30 Deep Learning-Based Semantic Segmentation for Real-Time Kidney Imaging and Measurements with Augmented Reality-Assisted Ultrasound Deep Learning-based Semantic Segmentation für Echtzeit-Kidney-Imaging und Messungen mit Augmented Reality-Assisted Ultrasound 实时肾脏成像和计量增强现实辅助超声波 2506.23721v1
  • 717 06-30 DABstep: Data Agent Benchmark for Multi-step Reasoning DABstep: Data Agent Benchmark für mehrstufige Vernunft DABstep:多步理由数据代理基准 2506.23719v1
  • 718 06-30 Towards Efficient and Accurate Spiking Neural Networks via Adaptive Bit Allocation Auf dem Weg zu effizienten und präzisen Spiking Neuronal Networks über adaptive Bit Allocation 通过适应性比位分配实现高效和准确的Spiking神经网络 2506.23717v1
  • 719 06-30 Rethinking Aleatoric and Epistemic Uncertainty Rethinking Aleatoric und Epistemic Uncertainty 重新思考疏离和分解不确定性 2412.20892v2
  • 720 06-30 SmileyLlama: Modifying Large Language Models for Directed Chemical Space Exploration SmileyLlama: Ändern von großen Sprachmodellen für gerichtete chemische Weltraumforschung SmileyLlama:修改用于定向化学空间探索的大型语言模型 2409.02231v3
  • 721 06-30 Methodology for an Analysis of Influencing Factors on 3D Object Detection Performance Methodik für eine Analyse von Einflussfaktoren auf die 3D-Objekterkennungsleistung 3D物体探测性能影响因素分析方法 2411.08482v3
  • 722 06-30 Error Optimization: Overcoming Exponential Signal Decay in Deep Predictive Coding Networks Fehler-Optimierung: Überwindung exponentieller Signaldekay in tiefen vorausschauenden Codierungsnetzwerken 错误 优化 : 克服深预报编码网络中的指数信号衰减 2505.20137v2
  • 723 06-30 Learning Modular Exponentiation with Transformers Modulare Exponentiation mit Transformern lernen 与变换器一起学习模块化指数 2506.23679v1
  • 724 06-30 Thinking About Thinking: SAGE-nano’s Inverse Reasoning for Self-Aware Language Models Denken über das Denken: SAGE-nano’s Inverse Reasoning for Self-Aware Language Models 思考思考:SAGE-nano 自我意识语言模型的反向理由 2507.00092v1
  • 725 06-30 Sparsing Law: Towards Large Language Models with Greater Activation Sparsity Sparsing Law: Auf dem Weg zu großen Sprachmodellen mit größerer Aktivierungssparsität 评分法:走向大语言模式,具有更大的激活率平等性 2411.02335v4
  • 726 06-30 Generating Heterogeneous Multi-dimensional Data : A Comparative Study Heterogene mehrdimensionale Daten generieren: Eine vergleichende Studie 生成异质多维数据:比较研究 2507.00090v1
  • 727 06-30 On the Domain Robustness of Contrastive Vision-Language Models Auf dem Gebiet Robustheit von kontrastiven Vision-Sprachen-Modellen 关于不同愿景-语言模型域的强力 2506.23663v1
  • 728 06-30 Modular Distributed Nonconvex Learning with Error Feedback Modulares verteiltes Nonconvex-Lernen mit Fehler-Feedback 含有错误反馈的模块分布式非 Confonvex 学习 2503.14055v2
  • 729 06-30 Robust LLM Unlearning with MUDMAN: Meta-Unlearning with Disruption Masking And Normalization Robustes LLM-Unlearning mit MUDMAN: Meta-Unlearning mit Disruptionsmasken und Normalisierung 与 MUDMAN 一起重新学习: 以干扰蒙蔽和正常化的方式重新学习 2506.12484v3
  • 730 06-30 A new machine learning framework for occupational accidents forecasting with safety inspections integration Ein neuer Rahmen für maschinelles Lernen bei Arbeitsunfällen, die mit der Integration von Sicherheitsinspektionen prognostiziert werden 新的工伤事故预测机械学习新框架,安全检查一体化 2507.00089v1
  • 731 06-30 GL-LowPopArt: A Nearly Instance-Wise Minimax-Optimal Estimator for Generalized Low-Rank Trace Regression GL-LowPopArt: Ein nahezu instance-weiser Minimax-Optimal-Estimator für generalisierte Low-Rank-Trace-Regression GL-LowPopArt: 通用低射线回归近似实例- Wisis Minimax- 最佳最佳模拟器 2506.03074v4
  • 732 06-30 Geminet: Learning the Duality-based Iterative Process for Lightweight Traffic Engineering in Changing Topologies Geminet: Lernen des duality-basierten Iterativprozesses für Leichtbau in wechselnden Topologien Geminet:学习基于质量的循环程序,以在变化的地形学中轻量级交通工程 2506.23640v1
  • 733 06-30 A Nonlinear Low-rank Representation Model with Convolutional Neural Network for Imputing Water Quality Data Ein nichtlineares Low-Rank-Darstellungsmodell mit konvolutionärem Neuralnetzwerk zur Imputierung von Wasserqualitätsdaten 非线性低级别代表模式,与计算水质数据革命神经网络 2506.23629v1
  • 734 06-30 Brain Tumor Detection through Thermal Imaging and MobileNET Gehirntumorerkennung durch Thermische Bildgebung und MobileNET 通过热成像和移动网检测脑肿瘤 2506.23627v1
  • 735 06-30 Evaluating K-Fold Cross Validation for Transformer Based Symbolic Regression Models Bewertung der K-Fold Cross-Validierung für Transformer-basierte symbolische Regressionsmodelle 评估基于变换器的符号回归模型的 K- Fold 交叉验证 2410.21896v2
  • 736 06-30 pUniFind: a unified large pre-trained deep learning model pushing the limit of mass spectra interpretation pUniFind: ein einheitliches großes vortrainiertes Deep-Learning-Modell, das die Grenze der Massenspektreninterpretation sprengt pUniFind:一个统一的大型预先培训的深层学习模式,推进质量光谱解释的极限 2507.00087v1
  • 737 06-30 Overparametrized models with posterior drift Überparametrisierte Modelle mit posteriorer Drift 带有后游漂流的过度平衡模型 2506.23619v1
  • 738 06-30 CPT: Competence-progressive Training Strategy for Few-shot Node Classification CPT: Kompetenz-fortschrittliche Ausbildungsstrategie für wenige Knotenklassifikation CPT: 微粒节点分类能力渐进培训战略 2402.00450v5
  • 739 06-30 TinyAlign: Boosting Lightweight Vision-Language Models by Mitigating Modal Alignment Bottlenecks TinyAlign: Lightweight Vision-Language Models durch Abmilderung von Modal Alignment Engpässe fördern 微量对称:通过减缓模式调整瓶头,促进轻轻量度视觉-语言模型 2505.12884v2
  • 740 06-30 ChemMiner: A Large Language Model Agent System for Chemical Literature Data Mining ChemMiner: Ein großes Sprachmodellsystem für chemische Literatur Data Mining Chemminer:化学文献数据采矿大语言示范剂系统 2402.12993v2
  • 741 06-30 Challenging Gradient Boosted Decision Trees with Tabular Transformers for Fraud Detection at Booking.com Anspruchsvolle Gradient Boosted Entscheidung Bäume mit Tabular Transformers für Betrug Detection bei Booking.com 与制表变换器一起挑战逐步推动决策树,以便在订书时发现欺诈。 com 2405.13692v2
  • 742 06-30 When Will It Fail?: Anomaly to Prompt for Forecasting Future Anomalies in Time Series Wann wird es fehlschlagen?: Anomalie zur Prognose zukünftiger Anomalien in der Zeitreihe 何时会失败? :在《时间序列》中对未来异常现象作出预测时不时地迅速作出预测。 2506.23596v1
  • 743 06-30 Transition Matching: Scalable and Flexible Generative Modeling Transition Matching: Skalierbare und flexible Generative Modellierung 过渡匹配:可缩放和灵活生成模型 2506.23589v1
  • 744 06-30 Detect \& Score: Privacy-Preserving Misbehaviour Detection and Contribution Evaluation in Federated Learning Detect \& Score: Privacy-Preserving Misbehaviour Detection and Contribution Evaluation in Federated Learning * 评分:在联邦学习中保护隐私、错误行为检测和贡献评价 2506.23583v1
  • 745 06-30 PBCAT: Patch-based composite adversarial training against physically realizable attacks on object detection PBCAT: Patch-basiertes komposites gegnerisches Training gegen physikalisch realisierbare Angriffe auf Objekterkennung PBCAT:针对实际可实现的物体探测攻击的补补综合对抗性训练 2506.23581v1
  • 746 06-30 EFRame: Deeper Reasoning via Exploration-Filter-Replay Reinforcement Learning Framework EFRame: Deeper Reasoning über Exploration-Filter-Replay Stärkungs-Lernrahmen EFRame:通过探索-钻孔-回放-强化学习框架,更深的原因 2506.22200v2
  • 747 06-30 Online Human Action Detection during Escorting Online-Erkennung von Menschen während der Eskortierung 护送期间的在线人类行动探测 2506.23573v1
  • 748 06-30 Metadata, Wavelet, and Time Aware Diffusion Models for Satellite Image Super Resolution Metadaten, Wavelet und Time Aware Diffusion Modelle für Satellitenbild Super Auflösung 元数据、波盘和有时间意识的卫星图像超分辨率传播模型 2506.23566v1
  • 749 06-30 FedMM-X: A Trustworthy and Interpretable Framework for Federated Multi-Modal Learning in Dynamic Environments FedMM-X: Ein vertrauensvolles und interpretierbares Rahmenwerk für das Federated Multi-Modal Learning in dynamischen Umgebungen FEMM-X:动态环境中联邦多模式学习的可信赖和可解释框架 2503.19564v2
  • 750 06-30 A unified framework on the universal approximation of transformer-type architectures Ein einheitlicher Rahmen für die universelle Annäherung von Architekturen vom Typ Transformator 关于变压器类型结构普遍近似的统一框架 2506.23551v1
  • 751 06-30 Seeding neural network quantum states with tensor network states Neurale Netzwerk-Quantenzustände mit Tensor-Netzwerkzuständen absähen 种子神经网络量量度状态与 ARW 网络状态 2506.23550v1
  • 752 06-30 From Alignment to Advancement: Bootstrapping Audio-Language Alignment with Synthetic Data Von der Ausrichtung zur Weiterentwicklung: Bootstrapping Audio-Language Alignment mit synthetischen Daten 从对齐到推进: 用合成数据推动音频语言对齐 2505.20166v2
  • 753 06-30 RegionGCN: Spatial-Heterogeneity-Aware Graph Convolutional Networks RegionGCN: Geo-Heterogenität-Bewusst Graph Convolutional Networks 区域GCN:空间-异异异性-认知图图集网络 2501.17599v2
  • 754 06-30 CooT: Learning to Coordinate In-Context with Coordination Transformers CooT: In-Kontext-Koordinierung mit Koordinationstransformatoren lernen CoT:学会与协调变换者协调内流 2506.23549v1
  • 755 06-30 PipeOffload: Improving Scalability of Pipeline Parallelism with Memory Optimization PipeOffload: Verbesserung der Skalierbarkeit von Pipeline Parallelismus mit Speicheroptimierung 管道卸载: 提高管道平行式与内存优化的可缩放性 2503.01328v2
  • 756 06-30 Neural Langevin Machine: a local asymmetric learning rule can be creative Neural Langevin Machine: Eine lokale asymmetrische Lernregel kann kreativ sein 神经机:当地不对称学习规则可以是创造性的 2506.23546v1
  • 757 06-30 A Joint Topology-Data Fusion Graph Network for Robust Traffic Speed Prediction with Data Anomalism Ein gemeinsames Topologie-Daten-Fusions-Graph-Netzwerk für robuste Verkehrsgeschwindigkeitsvorhersage mit Datenanomalismus 具有数据Anomalisis的强力交通速度预测联合地形学-数据融合图图网络 2507.00085v1
  • 758 06-30 MoORE: SVD-based Model MoE-ization for Conflict- and Oblivion-Resistant Multi-Task Adaptation MoORE: SVD-basierte Modell-MoE-ization für Konflikt- und Vergessenheits-Resistenz-Multi-Task-Anpassung MoORE: 以SVD为基础的冲突与遗忘-恢复-远程多任务适应示范MoE化模式 2506.14436v3
  • 759 06-30 Uncertainty-aware Diffusion and Reinforcement Learning for Joint Plane Localization and Anomaly Diagnosis in 3D Ultrasound Unsicheres Diffusions- und Verstärkungslernen für Joint Plane Localization und Anomaliediagnose im 3D-Ultraschall 3D 超声波中用于联合规划地方化和异常诊断的不确定-有意识的传播和强化学习 2506.23538v1
  • 760 06-30 A general language model for peptide identification Ein allgemeines Sprachmodell für die Peptididentifikation 铅化物识别通用语言模式 2502.15610v3
  • 761 06-30 GViT: Representing Images as Gaussians for Visual Recognition GViT: Darstellung von Bildern als Gaussianer für die visuelle Erkennung GVVT: 将图像代表为视觉识别高斯人 2506.23532v1
  • 762 06-30 When Test-Time Adaptation Meets Self-Supervised Models Wenn Test-Time-Anpassung auf selbstüberwachte Modelle trifft 测试时适应与自控模型相匹配 2506.23529v1
  • 763 06-30 RetroInfer: A Vector-Storage Approach for Scalable Long-Context LLM Inference RetroInfer: Ein Vector-Storage-Ansatz für skalierbare Long-Context-LLM-Inferenz 变制器:对可缩放的长负负载LLM 推断法采用矢量-压力法 2505.02922v2
  • 764 06-30 Progressive Binarization with Semi-Structured Pruning for LLMs Progressive Binarisierung mit halbstrukturiertem Pruning für LLMs LLMM 与半结构冲压半结构 2502.01705v3
  • 765 06-30 Gumiho: A Hybrid Architecture to Prioritize Early Tokens in Speculative Decoding Gumiho: Eine hybride Architektur, um frühe Token in spekulativer Dekodierung zu priorisieren Gumiho:在投机下限中优先考虑早期物料的混合结构 2503.10135v2
  • 766 06-30 FedWSQ: Efficient Federated Learning with Weight Standardization and Distribution-Aware Non-Uniform Quantization FedWSQ: Effizientes Federated Learning mit Gewichtsstandardisierung und distributionssicherer, nicht-einheitlicher Quantisierung FFWSQ: 节能的联邦学习,重标准化和发行软件非统一量化 2506.23516v1
  • 767 06-30 The Hitchhiker’s Guide to Efficient, End-to-End, and Tight DP Auditing Der Hitchhiker-Leitfaden für effizientes, Ende-zu-Ende und enges DP-Auditing Hitchhiker的《高效、最终到最终和严格DP审计指南》 2506.16666v2
  • 768 06-30 Achieving binary weight and activation for LLMs using Post-Training Quantization Erreichen des binären Gewichts und Aktivierung für LLMs mit Post-Training Quantization 利用培训后量化办法使LLMMs实现二进制加权和激活 2504.05352v3
  • 769 06-30 Strategic Counterfactual Modeling of Deep-Target Airstrike Systems via Intervention-Aware Spatio-Causal Graph Networks Strategische gegenfaktische Modellierung von Deep-Target Airstrike-Systemen über interventionsfähige Spatio-Causal Graph Networks 通过干预-ware Spatio-Causal图表网络对深目标空袭系统进行战略反事实模拟 2507.00083v1
  • 770 06-30 Neural Canonical Polyadic Factorization for Traffic Analysis Neurale kanonischen Polyadische Faktorisierung für die Verkehrsanalyse 用于交通分析的神经卡纳尼基多功能因子化 2506.15079v2
  • 771 06-30 Efficient Online Reinforcement Learning for Diffusion Policy Effizientes Online-Verstärkungslernen für die Diffusionspolitik 高效在线强化学习促进传播政策 2502.00361v4
  • 772 06-30 Understanding and Reducing the Class-Dependent Effects of Data Augmentation with A Two-Player Game Approach Verständnis und Reduzierung der klassenabhängigen Effekte von Datenvergrößerung mit einem Zwei-Spieler-Spiel-Ansatz 理解和减少数据递增的二级依赖影响,采用双层游戏方法 2407.03146v5
  • 773 06-30 Deep Unlearn: Benchmarking Machine Unlearning for Image Classification Deep Unlearn: Benchmarking Machine Unlearning für die Bildklassifizierung 深未读:为图像分类确定基准的机器脱学 2410.01276v2
  • 774 06-30 Sample Margin-Aware Recalibration of Temperature Scaling Probenmargin-Aware Rekalibrierung der Temperaturskalierung 样本 边距天体调整温度缩放 2506.23492v1
  • 775 06-30 FZOO: Fast Zeroth-Order Optimizer for Fine-Tuning Large Language Models towards Adam-Scale Speed FZOO: Schneller Zeroth-Order-Optimierer für feintunende große Sprachmodelle in Richtung Adam-Scale-Geschwindigkeit FZOO: 向亚当规模速度的微调大语言模型最佳最佳优选器 2506.09034v2
  • 776 06-30 Test of partial effects for Frechet regression on Bures-Wasserstein manifolds Prüfung der partiellen Wirkungen auf Frechet-Regression auf Bures-Wasserstein-Muffen 测试布雷斯-沃塞尔斯坦元体Frechet回归部分效果 2506.23487v1
  • 777 06-30 Federated Learning-Enabled Hybrid Language Models for Communication-Efficient Token Transmission Federated Learning-Enabled Hybrid Language Models für kommunikationseffiziente Token-Übertragung 通信-高效调式传真传播的联邦学习-进进混合语言模式 2507.00082v1
  • 778 06-30 Reconciling Attribute and Structural Anomalies for Improved Graph Anomaly Detection Verknüpfung von Attributen und strukturellen Anomalien für verbesserte Graphenanomalienerkennung 调节特征和结构异常改进图异常探测 2506.23469v1
  • 779 06-30 AdFair-CLIP: Adversarial Fair Contrastive Language-Image Pre-training for Chest X-rays AdFair-CLIP: Adversarial Fair Contractive Language-Image Pre-Training für Röntgenaufnahmen im Brustkorb 阿德法-CLIP:胸前X光检查的公平竞争公平语言图象前训练 2506.23467v1
  • 780 06-30 Time to Rethink AI for Combinatorial Optimization: Classical Algorithms Remain Tough to Match Zeit, KI für die Kombinatorische Optimierung neu zu denken: Klassische Algorithmen bleiben schwer zu passen 重新思考 AI 组合优化的时间: 古老的算法仍然难以匹配 2502.03669v2
  • 781 06-30 TabReason: A Reinforcement Learning-Enhanced Reasoning LLM for Explainable Tabular Data Prediction TabReason: Eine verstärkte Lern-verbesserte Begründung LLM für erklärbare tabellarische Datenvorhersage TabReson: 用于可解释的图表数据预测的强化学习-提高合理理由的强化学习-强化LLMLM 2505.21807v3
  • 782 06-30 Can We Predict the Unpredictable? Leveraging DisasterNet-LLM for Multimodal Disaster Classification Können wir das Unvorhersehbare vorhersagen? Leveraging DisasterNet-LLM für die multimodale Katastrophenklassifikation 我们能够预测难以预测的情况吗?利用灾害网络-LLM来进行多模式灾害分类。 2506.23462v1
  • 783 06-30 Benefits of Early Stopping in Gradient Descent for Overparameterized Logistic Regression Vorteile des vorzeitigen Stoppens im gradienten Abstieg für überparameterisierte logistische Regression 梯层提前停产对过度量化的后勤倒退的益处 2502.13283v2
  • 784 06-30 Sampling and Identity-Testing Without Approximate Tensorization of Entropy Probenahme und Identitätstest ohne annähernde Tensorisierung der Entropie 采样和不近近于孔虫代号进行身份测试 2506.23456v1
  • 785 06-30 Minimax Optimal Two-Stage Algorithm For Moment Estimation Under Covariate Shift Minimax Optimaler Zwei-Stufen-Algorithmus für Moment-Schätzung unter Kovariate Shift 最小最大最佳双层对调 2506.23453v1
  • 786 06-30 Bridge: A Unified Framework to Knowledge Graph Completion via Language Models and Knowledge Representation Brücke: Ein einheitliches Framework zur Wissensgraphenvervollständigung über Sprachmodelle und Wissensdarstellung 桥梁:通过语言模式和知识代表性完成知识图的统一框架 2411.06660v3
  • 787 06-30 Learning Causally Predictable Outcomes from Psychiatric Longitudinal Data Erlernen ursächlich vorhersehbarer Ergebnisse aus Psychiatrischen Langzeitdaten 精神病纵向数据产生的可预期的学习结果 2506.16629v2
  • 788 06-30 Optimizing Conversational Product Recommendation via Reinforcement Learning Optimierung der Conversational Product Recommendation durch Verstärkungslernen 通过强化学习优化对口产品建议 2507.01060v1
  • 789 06-30 NatureLM-audio: an Audio-Language Foundation Model for Bioacoustics NatureLM-Audio: ein Audio-Sprachen-Stiftungsmodell für Bioakustik NatureLM-Audio: 听觉语言基金会生物生物学模型 2411.07186v2
  • 790 06-30 Enhancing Insider Threat Detection Using User-Based Sequencing and Transformer Encoders Verbesserung der Erkennung von Insiderbedrohungen mit benutzerbasierten Sequenzierungs- und Transformer-Encodern 利用用户定序和变换器计算器加强内部威胁探测 2506.23446v1
  • 791 06-30 Graph Fourier Neural ODEs: Modeling Spatial-temporal Multi-scales in Molecular Dynamics Graph Fourier Neural ODEs: Modellierung räumlich-zeitlicher Multiskalen in molekularer Dynamik 图 4级神经数极:分子动态中空间-时空多尺度建模 2411.01600v3
  • 792 06-30 CauSkelNet: Causal Representation Learning for Human Behaviour Analysis CauSkelNet: Kausales Repräsentationslernen für die menschliche Verhaltensanalyse CauSkelNet: 人类行为分析的因果关系学习 2409.15564v4
  • 793 06-29 (7) DPOT: A DeepParticle method for Computation of Optimal Transport with convergence guarantee DPOT: Eine DeepPartikel-Methode zur Berechnung des optimalen Verkehrs mit Konvergenzgarantie DPOT: 具有趋同保证的优化运输计算深颗粒法 2506.23429v1
  • 794 06-29 Detecting What Matters: A Novel Approach for Out-of-Distribution 3D Object Detection in Autonomous Vehicles Was zählt: Ein neuartiger Ansatz zur Erkennung von 3D-Objekten außerhalb der Verteilung in autonomen Fahrzeugen 发现什么重要事项:在自主车辆中采用新颖的外分发3D物体探测方法 2506.23426v1
  • 795 06-29 Accurate Parameter-Efficient Test-Time Adaptation for Time Series Forecasting Genaue Parameter-Effiziente Test-Zeit-Anpassung für Zeitreihenprognosen 用于时间序列预测的精确参数有效试验时间适应 2506.23424v1
  • 796 06-29 BenchMake: Turn any scientific data set into a reproducible benchmark BenchMake: Verwandeln Sie alle wissenschaftlichen Daten in eine reproduzierbare Benchmark BackMake:将任何科学数据集转换成可复制的基准 2506.23419v1
  • 797 06-29 LNUCB-TA: Linear-nonlinear Hybrid Bandit Learning with Temporal Attention LNUCB-TA: Linear-nichtlinearer Hybrid Bandit Lernen mit zeitlicher Aufmerksamkeit LNCB-TA:线-非线-非线-非线性混合强盗在时间上注意学习 2503.00387v2
  • 798 06-29 Remove Symmetries to Control Model Expressivity and Improve Optimization Entfernen Sie Symmetrien, um die Expressivität des Modells zu steuern und die Optimierung zu verbessern 删除对称以控制模型表达性并改进优化 2408.15495v4
  • 799 06-29 Datasets for Fairness in Language Models: An In-Depth Survey Datensätze für Fairness in Sprachmodellen: Eine In-Depth-Umfrage 语言模型公平性数据集:内部调查 2506.23411v1
  • 800 06-29 No, of course I can! Refusal Mechanisms Can Be Exploited Using Harmless Fine-Tuning Data Nein, natürlich kann ich! Weigerungsmechanismen können mit harmlosen Feintuning-Daten ausgebeutet werden 不,当然可以! 拒绝机制可以被利用无伤害的微调数据被利用 2502.19537v4
  • 801 06-29 Do LLMs Dream of Discrete Algorithms? Träumen LLMs von diskreten Algorithmen? LLMs 梦到分辨算法吗? 2506.23408v1
  • 802 06-29 Equivariance Everywhere All At Once: A Recipe for Graph Foundation Models Equivariance Everywhere All Once: Ein Rezept für Graph Foundation Modelle 偶而实现所有地方的均匀差异:图表基础模型的食谱 2506.14291v2
  • 803 06-29 AICO: Feature Significance Tests for Supervised Learning AICO: Feature Significance Tests für überwachtes Lernen AICO: 监督学习的特异意义测试 2506.23396v1
  • 804 06-29 Satisficing Regret Minimization in Bandits: Constant Rate and Light-Tailed Distribution Zufriedene Reue Minimierung bei Banditen: Konstante Rate und lichtgebremste Verteilung 山贼中令人满意的遗憾最小化:常数率和轻故障分布 2406.06802v3
  • 805 06-29 Enhancing Adversarial Robustness through Multi-Objective Representation Learning Verbesserung der adversarialen Robustheit durch multi-objektives Repräsentationslernen 通过多目标代表制学习加强反向强力 2410.01697v4
  • 806 06-29 Uncertain Boundaries: Multidisciplinary Approaches to Copyright Issues in Generative AI Unbestimmte Grenzen: multidisziplinäre Ansätze zu Urheberrechtsfragen in generativer KI 不确定性边界:创新的对版权问题的多学科方法 2404.08221v2
  • 807 06-29 Two-dimensional Taxonomy for N-ary Knowledge Representation Learning Methods Zweidimensionale Taxonomie für N-are Wissensdarstellung Lernmethoden Nary知识代表制学习方法二维分类学 2506.05626v2
  • 808 06-29 SIEDD: Shared-Implicit Encoder with Discrete Decoders SIEDD: Gemeinsam-impliziter Encoder mit diskreten Decodern SIEDD: 含有分解代碼器的共享- 集成编码器 2506.23382v1
  • 809 06-29 Data Filtering for Genetic Perturbation Prediction Datenfilterung für genetische Störungsvorhersage 遗传扰动预测数据过滤 2503.14571v3
  • 810 06-29 Adversarial Robustness Unhardening via Backdoor Attacks in Federated Learning Widerwärtige Robustheit Unhärtend durch Hintertürangriffe im Federated Learning 在联邦学习中通过后门攻击而不硬化的逆向强力 2310.11594v3
  • 811 06-29 When Additive Noise Meets Unobserved Mediators: Bivariate Denoising Diffusion for Causal Discovery Wenn additive Geräusche unbeobachtete Mediatoren treffen: Bivariate dämpfende Diffusion für die kausale Entdeckung 当“添加噪音”遇到无旁观调解员时:双重拒绝为发现原因而传播 2506.23374v1
  • 812 06-29 Emotional RAG LLMs: Reading Comprehension for the Open Internet Emotionale RAG LLMs: Leseverständnis für das offene Internet 情感性RAG LLM: 阅读开放互联网理解 2408.11189v2
  • 813 06-29 Investigating an Overfitting and Degeneration Phenomenon in Self-Supervised Multi-Pitch Estimation Untersuchung eines Über- und Degenerationsphänomens in selbstüberwachter Multi-Pitch-Schätzung 调查在自我监督的多方案估算中一个超称和破旧现象 2506.23371v1
  • 814 06-29 Federated Timeline Synthesis: Scalable and Private Methodology For Model Training and Deployment Federated Timeline Synthesis: Skalierbare und private Methodik für Modellausbildung und -einsatz 联邦时间线综合分析:示范培训和部署可缩放和私用方法 2506.23358v1
  • 815 06-29 Rethinking Algorithmic Fairness for Human-AI Collaboration Algorithmische Fairness für die Zusammenarbeit zwischen Mensch und AI neu denken 重新思考人类-AI合作的算法公平性 2310.03647v3
  • 816 06-29 A case for data valuation transparency via DValCards Ein Fall für Datenbewertungstransparenz über DValCards 通过 DValCards 提高数据估价透明度的一个案例 2506.23349v1
  • 817 06-29 Improving Robustness and Reliability in Medical Image Classification with Latent-Guided Diffusion and Nested-Ensembles Verbesserung der Robustheit und Zuverlässigkeit in der medizinischen Bildklassifizierung mit latentgeführter Diffusion und verschachtelten Baugruppen 提高医疗图像分类的强力和可靠性,包括前期引导传播和内装成品 2310.15952v5
  • 818 06-29 Composing Parts for Expressive Object Generation Komponieren von Teilen für die Expressive Objektgenerierung 表达式对象生成的组合部件 2406.10197v2
  • 819 06-29 Data-Driven Self-Supervised Learning for the Discovery of Solution Singularity for Partial Differential Equations Datengestütztes Selbstüberwachtes Lernen für die Entdeckung von Lösungssingularität für partielle Differentialgleichungen 数据驱动自监测学习,以发现部分差异等分解的解析元件 2506.23344v1
  • 820 06-29 Semantic-Aware Adaptive Video Streaming Using Latent Diffusion Models for Wireless Networks Semantic-Aware Adaptive Video-Streaming mit latenten Diffusions-Modellen für drahtlose Netzwerke 使用无线网络的冷传传播模型进行语义- 智能可调控视频流 2502.05695v2
  • 821 06-29 Scalable Non-Equivariant 3D Molecule Generation via Rotational Alignment Skalierbare nicht-equivariante 3D-Molekül-Erzeugung über Rotational Alignment 通过旋转对齐可缩放的非等式 3D 分子生成 2506.10186v2
  • 822 06-29 VALID-Mol: a Systematic Framework for Validated LLM-Assisted Molecular Design VALID-Mol: Systematisches Framework für validiertes LLM-Assisted Molecular Design VALID-Mol:验证LLM辅助分子设计系统框架 2506.23339v1
  • 823 06-29 An Investigation into Seasonal Variations in Energy Forecasting for Student Residences Untersuchung von saisonalen Variationen in der Energieprognose für Studentenwohnheime 对学生住宅能源预测季节性变化的调查 2501.07423v2
  • 824 06-29 A Library for Learning Neural Operators Eine Bibliothek zum Lernen neuraler Operatoren 学习神经操作员图书馆 2412.10354v4
  • 825 06-29 Parameter-Efficient Fine-Tuning for Pre-Trained Vision Models: A Survey and Benchmark Parametereffizientes Feintuning für vortrainierte Visionsmodelle: Eine Umfrage und ein Benchmark 培训前愿景模型:调查和基准 2402.02242v5
  • 826 06-29 Scaling Laws for Black box Adversarial Attacks Skalierungsgesetze für Black Box-Adversarialangriffe 黑盒对面袭击法律推广 2411.16782v3
  • 827 06-29 Perturbation Analysis of Singular Values in Concatenated Matrices Störungsanalyse von Singularwerten in konkatenierten Matrizen 封闭母体中单子值的扰动分析 2505.01427v2
  • 828 06-29 Learning to Rank with Variable Result Presentation Lengths Ranken lernen mit variablem Ergebnis Präsentationslängen 学习以变量结果列报长度排序 2506.23319v1
  • 829 06-29 Ensemble BERT for Medication Event Classification on Electronic Health Records (EHRs) Ensemble BERT für Medikationsveranstaltungsklassifikation auf elektronischen Gesundheitsakten (EHRs) 电子健康记录(EHRs)药品事件分类集合BERT 2506.23315v1
  • 830 06-29 Physics informed guided diffusion for accelerated multi-parametric MRI reconstruction Physik informiert geführte Diffusion für beschleunigte multiparametrische MRT-Rekonstruktion 为加速多参数MRI的重建提供物理信息 2506.23311v1
  • 831 06-29 MedLeak: Multimodal Medical Data Leakage in Secure Federated Learning with Crafted Models MedLeak: Multimodale medizinische Daten-Leakage im sicheren Federated Learning mit handwerklichen Modellen Medleak:安全联邦学习中多式医疗数据漏漏 2407.09972v2
  • 832 06-29 Overcoming Dimensional Factorization Limits in Discrete Diffusion Models through Quantum Joint Distribution Learning Überwindung von Dimensional Factorization Limits in diskreten Diffusionsmodellen durch Quantum Joint Distribution Learning 通过量子联合分发学习克服分辨传播模式中的分量限制 2505.05151v3
  • 833 06-29 Objective-Free Local Learning and Emergent Language Structure in Thinking Machines Zielfreies lokales Lernen und neue Sprachstrukturen in denkenden Maschinen 考虑机器中无目标的地方学习和新兴语言结构 2506.23293v1
  • 834 06-29 Sample then Identify: A General Framework for Risk Control and Assessment in Multimodal Large Language Models Beispiel dann Identifizieren: Ein allgemeiner Rahmen für die Risikokontrolle und Bewertung in multimodalen großen Sprachmodellen 确定:多式大语言模式风险管理和评估总框架 2410.08174v3
  • 835 06-29 Hierarchical Quantized Diffusion Based Tree Generation Method for Hierarchical Representation and Lineage Analysis Hierarchische Quantisierte Diffusionsbasierte Baumerstellungsmethode für die hierarchische Darstellung und Linienanalyse 用于等级代表性和直线分析的等级定量传播植树生成方法 2506.23287v1
  • 836 06-29 Not All Explanations for Deep Learning Phenomena Are Equally Valuable Nicht alle Erklärungen für Deep Learning Phänomene sind gleichermaßen wertvoll 并非所有关于深深学习现象的解释都具有同等价值。 2506.23286v1
  • 837 06-29 BAPE: Learning an Explicit Bayes Classifier for Long-tailed Visual Recognition BAPE: Erlernen eines expliziten Bayes-Klassifikators für die visuelle Langzeiterkennung BAPE: 学习长尾目视辨识明确贝类分类 2506.23280v1
  • 838 06-29 FLOAT: Generative Motion Latent Flow Matching for Audio-driven Talking Portrait FLOAT: Generative Bewegung Latent Flow passend für Audio-getriebenes Gesprächsporträt FLOAT: 音频驱动的谈话肖像的生成动力前端流动比对 2412.01064v3
  • 839 06-29 Predicting thinking time in Reasoning models Denkzeit in vernünftigen Modellen voraussagen 理性模型的预测思考时间 2506.23274v1
  • 840 06-29 Token Activation Map to Visually Explain Multimodal LLMs Token Aktivierungskarte, um multimodale LLMs visuell zu erklären 直观解释多模式LMS的声控激活映射图 2506.23270v1
  • 841 06-29 Sub-MoE: Efficient Mixture-of-Expert LLMs Compression via Subspace Expert Merging Sub-MoE: Effiziente Mixture-of-Expert LLMs-Kompression über Subspace Expert Merging 子MoE:通过子空间专家合并压缩的高效混合专家LLMs 2506.23266v1
  • 842 06-29 Drivetrain simulation using variational autoencoders Drivetrain-Simulation mit variablen Autoencodern 使用变式自动自动转换器进行驱动模拟模拟 2501.17653v2
  • 843 06-29 Online Meal Detection Based on CGM Data Dynamics Online-Meal Detection basierend auf CGM Data Dynamics 基于CGM数据动态的在线膳食检测 2507.00080v1
  • 844 06-29 Deep Multi-Manifold Transformation Based Multivariate Time Series Fault Detection Tiefe Multi-Manifold-Transformations-basierte Multivariate-Zeitreihen-Fehlererkennung 深多功能变换基于多变量的时间序列错失检测 2405.16258v2
  • 845 06-29 VoyagerVision: Investigating the Role of Multi-modal Information for Open-ended Learning Systems VoyagerVision: Untersuchung der Rolle multimodaler Informationen für offene Lernsysteme 旅行展望:调查多模式信息对不限名额学习系统的作用 2507.00079v1
  • 846 06-29 State Entropy Regularization for Robust Reinforcement Learning Staatliche Entropie-Regularisierung für robustes Stärkungslernen 强力强化学习的状态元件常规化 2506.07085v2
  • 847 06-29 Aggregating Local Saliency Maps for Semi-Global Explainable Image Classification Aggregation lokaler Salienzkarten für semi-globale erklärbare Bildklassifikation 为半全球可解释图像分类汇总地方悬浮图 2506.23247v1
  • 848 06-29 The language of time: a language model perspective on time-series foundation models Die Sprache der Zeit: ein Sprachmodell Perspektive auf Zeitreihen Grundmodelle 时间语言:时间序列基础模型的语言模式视角 2507.00078v1
  • 849 06-29 Automating Versatile Time-Series Analysis with Tiny Transformers on Embedded FPGAs Automatisierung der vielseitigen Zeitreihenanalyse mit kleinen Transformern auf eingebetteten FPGAs 在嵌入的 FPGAs 上用小变形器进行自动的V对活性时间序列分析 2505.17662v4
  • 850 06-29 MOE-Enhanced Explanable Deep Manifold Transformation for Complex Data Embedding and Visualization MOE-Enhanced Explanable Deep Manifold Transformation für komplexe Daten-Einbettung und Visualisierung MOE - 强化了复杂数据嵌入和可视化的可推广深磁变换 2410.19504v2
  • 851 06-29 Global Convergence of Iteratively Reweighted Least Squares for Robust Subspace Recovery Globale Konvergenz iterativ umgewichteter Least Squares für robuste Subraum-Recovery 自动再加权最低空间平面对强力亚空间恢复的全球趋同 2506.20533v2
  • 852 06-29 Bayes correlated equilibria, no-regret dynamics in Bayesian games, and the price of anarchy Bayes korrelierte Äquilibrien, No-Regret-Dynamik in Bayesischen Spielen, und der Preis der Anarchie Bayes相关平衡、Bayesian游戏中的无区域动态以及无政府状态的代价 2304.05005v2
  • 853 06-29 Masked Gated Linear Unit Maskierte gezahnte Lineareinheit 面罩线条股 2506.23225v1
  • 854 06-29 Single Image Inpainting and Super-Resolution with Simultaneous Uncertainty Guarantees by Universal Reproducing Kernels Single Image Inpainting und Super-Resolution mit gleichzeitigen Unsicherheitsgarantien von Universal Reproducing Kernels 由环球再生产内核提供的单一图像油漆和超分辨率,同时具有不确定性的保证 2506.23221v1
  • 855 06-29 Disrupting Model Merging: A Parameter-Level Defense Without Sacrificing Accuracy Disrupting Model Merging: Ein Parameter-Level-Verteidigung ohne Opferung Genauigkeit 破坏性合并模式:不牺牲准确度的参数级防御 2503.07661v2
  • 856 06-29 FedRef: Communication-Efficient Bayesian Fine Tuning with Reference Model FedRef: Kommunikation-Effizient Bayesian Feinabstimmung mit Referenzmodell FedRef: 通信-节能贝ysian精密票,参考模型 2506.23210v1
  • 857 06-29 External Data-Enhanced Meta-Representation for Adaptive Probabilistic Load Forecasting Externe datengestützte Meta-Vertretung für adaptive probabilistische Lastprognosen 适应性概率负载预测的外部数据强化元代表 2506.23201v1
  • 858 06-29 Pretrained Reversible Generation as Unsupervised Visual Representation Learning Pretrained Reversible Generation als unüberwachtes visuelles Repräsentationslernen 作为无人监督的视觉代表学习 2412.01787v4
  • 859 06-29 RiverText: A Python Library for Training and Evaluating Incremental Word Embeddings from Text Data Streams RiverText: Eine Python-Bibliothek für das Training und Evaluieren inkrementaler Word-Einbettungen aus Textdatenströmen RiverText:一个培训和评价来自文本数据流的递增单词嵌入的Python图书馆 2506.23192v1
  • 860 06-29 Efficient Algorithms for Learning and Compressing Monophonic Halfspaces in Graphs Effiziente Algorithmen zum Lernen und Verdichten monophoner Halbräume in Graphen 用于学习和压缩图形中单声半空的高效算法 2506.23186v1
  • 861 06-29 Attribution assignment for deep-generative sequence models enables interpretability analysis using positive-only data Zuordnungszuweisung für tiefgenerative Sequenzmodelle ermöglicht eine Interpretationsanalyse mit Positiv-Only-Daten 深遗传序列模型的归属分配,使得能够使用只使用正数数据的可解释性分析 2506.23182v1
  • 862 06-29 Automatic Depression Assessment using Machine Learning: A Comprehensive Survey Automatische Depressionsbewertung mit maschinellem Lernen: Eine umfassende Umfrage 利用机器学习进行自动抑郁症评估:全面调查 2506.18915v2
  • 863 06-29 Data Can Speak for Itself: Quality-guided Utilization of Wireless Synthetic Data Daten können für sich selbst sprechen: Qualitativ hochwertige Nutzung von drahtlosen synthetischen Daten 《无线合成数据的质量指南利用》。 2506.23174v1
  • 864 06-29 Deep Learning for Optical Misalignment Diagnostics in Multi-Lens Imaging Systems Deep Learning für optische Fehlausrichtungsdiagnostik in Multi-Lens-Imaging-Systemen 多层成像系统中光相错配诊断深入学习 2506.23173v1
  • 865 06-29 Compositions of Variant Experts for Integrating Short-Term and Long-Term Preferences Zusammensetzungen von Variant-Experten für die Integration kurzfristiger und langfristiger Präferenzen 纳入短期和长期优惠的备选专家的组成 2506.23170v1
  • 866 06-29 Mirror Descent Policy Optimisation for Robust Constrained Markov Decision Processes Spiegelabstiegspolitik Optimierung für robuste eingeschränkte Markov-Entscheidungsprozesse 强力约束马可夫决策过程的 “ 镜光源 “ 政策优化 2506.23165v1
  • 867 06-29 A Comprehensive Study of Shapley Value in Data Analytics Eine umfassende Untersuchung des Shapley-Wertes in der Datenanalyse 全面研究数据分析中的虚光值 2412.01460v7
  • 868 06-29 The Effectiveness of LLMs as Annotators: A Comparative Overview and Empirical Analysis of Direct Representation Die Wirksamkeit von LLMs als Annotatoren: Eine vergleichende Übersicht und empirische Analyse der direkten Repräsentation LLMs作为说明人的效力:直接代表的比较概览和经验分析 2405.01299v2
  • 869 06-29 CoT Red-Handed: Stress Testing Chain-of-Thought Monitoring CoT Red-Handed: Stresstesting Chain-of-Thought-Überwachung COT 红手:压力测试研究链监测 2505.23575v2
  • 870 06-29 maneuverRecognition – A Python package for Timeseries Classification in the domain of Vehicle Telematics ManöverErkennung – Ein Python-Paket für die Zeitreihe Klassifizierung im Bereich der Fahrzeugtelematik 机动车辆遥视技术领域时间序列分类的 Python 软件包 2506.23147v1
  • 871 06-29 Forget-MI: Machine Unlearning for Forgetting Multimodal Information in Healthcare Settings Forget-MI: Maschinelles Lernen zum Vergessen multimodaler Informationen in Gesundheitseinstellungen 忘记-MI:在保健机构中放弃多模式信息的机器学习 2506.23145v1
  • 872 06-29 Beware of Calibration Data for Pruning Large Language Models Hüten Sie sich vor Kalibrierdaten für das Pruning von großen Sprachmodellen 注意为粗略大语言模型提供校准数据 2410.17711v2
  • 873 06-29 CRISP-SAM2: SAM2 with Cross-Modal Interaction and Semantic Prompting for Multi-Organ Segmentation CRISP-SAM2: SAM2 mit Cross-Modal Interaction und semantischer Prompting für Multi-Organ Segmentierung CRIISP-SAM2:SAM2 具有跨模式相互作用和跨组织分解的语义提示的SAM2 2506.23121v1
  • 874 06-29 Enough Coin Flips Can Make LLMs Act Bayesian Genug Münze Flips kann LLMs Act Bayesian 足够多的硬币翻翻可以制造长效LLM 贝叶斯女士 2503.04722v2
  • 875 06-29 Deep Support Vectors Unterstützende Vektoren 深支持矢量 2403.17329v3
  • 876 06-29 A Survey of Test-Time Compute: From Intuitive Inference to Deliberate Reasoning Eine Übersicht über die Berechnung der Testzeit: Vom intuitiven Rückschluss zur überlegten Vernunft 试验时间计算调查:从直觉推理到故意推理 2501.02497v3
  • 877 06-29 Theoretical Modeling of LLM Self-Improvement Training Dynamics Through Solver-Verifier Gap Theoretische Modellierung der LLM-Selbstverbesserungs-Trainingsdynamik durch Solver-Verifier Gap 理学建模LLM 自改进培训动态,通过溶剂-变压装置差距 2507.00075v1
  • 878 06-29 Learning Dynamics of LLM Finetuning Dynamisches Lernen der LLM-Feinsteuerung LLM 微调的学习动态 2407.10490v4
  • 879 06-29 Integrating Large Language Models in Financial Investments and Market Analysis: A Survey Integration großer Sprachmodelle in Finanzanlagen und Marktanalyse: Eine Umfrage 将大语言模式纳入金融投资和市场分析:调查 2507.01990v1
  • 880 06-29 Score-based Generative Diffusion Models to Synthesize Full-dose FDG Brain PET from MRI in Epilepsy Patients Score-basierte Generative Diffusionsmodelle zur Synthesize-Volldosierung FDG Brain PET von MRT bei Epilepsie-Patienten 用于合成全剂量FDG脑脑PET的基于分分数的传播模型,来自麻疯病人的MRI 2506.11297v2
  • 881 06-29 Multi-task Offline Reinforcement Learning for Online Advertising in Recommender Systems Multi-Task Offline-Verstärkung Lernen für Online-Werbung in Recommender-Systemen 顾问系统中在线广告网上广告多任务离线强化学习 2506.23090v1
  • 882 06-29 Fractional Policy Gradients: Reinforcement Learning with Long-Term Memory Gradienten der Fraktionspolitik: Stärkung des Lernens mit Langzeitgedächtnis 小分政策分级:用长期记忆加强学习 2507.00073v1
  • 883 06-29 Multi-Modal Recommendation Unlearning for Legal, Licensing, and Modality Constraints Multi-Modal Empfehlung Unlearning für rechtliche, Lizenzierungs- und Modalitätsbeschränkungen 法律、许可证和方式限制方面的学习 2405.15328v3
  • 884 06-29 TyphoFormer: Language-Augmented Transformer for Accurate Typhoon Track Forecasting TyphoFormer: Sprachgesteigerter Transformer für präzise Typhoon-Track-Prognose 台风前台风:用于准确预报台风轨道的语文增强变换器 2506.17609v2
  • 885 06-29 CSBrain: A Cross-scale Spatiotemporal Brain Foundation Model for EEG Decoding CSBrain: Ein Cross-Scale Spatiotemporal Brain Foundation Modell für die EEG-Dekodierung CSBrain: 跨规模的超时脑基础模型 2506.23075v1
  • 886 06-29 Learning Counterfactually Decoupled Attention for Open-World Model Attribution Counterfactually Entkoppelte Aufmerksamkeit für Open-World-Modellzuweisung lernen 对开放世界模式的注意 2506.23074v1
  • 887 06-29 DReSS: Data-driven Regularized Structured Streamlining for Large Language Models DResS: Datengesteuerte Regularisierte Strukturierte Straffung für große Sprachmodelle DReSS: 数据驱动的大型语文模式正规化结构精简 2501.17905v3
  • 888 06-29 Multimodal Medical Code Tokenizer Multimodaler medizinischer Code Tokenizer 多式联运医疗法典化器 2502.04397v3
  • 889 06-29 Curious Causality-Seeking Agents Learn Meta Causal World Neugierige Kausalität suchende Agenten lernen Meta Kausale Welt 好奇原因 – – 寻寻代理人学习大原因世界 2506.23068v1
  • 890 06-29 Measuring How LLMs Internalize Human Psychological Concepts: A preliminary analysis Messen, wie LLMs menschliche Psychologische Konzepte internalisieren: Eine vorläufige Analyse 衡量LLM女士如何将人类心理心理概念内化:初步分析 2506.23055v1
  • 891 06-29 A Framework of Decision-Relevant Observability: Reinforcement Learning Converges Under Relative Ignorability Ein Rahmen für entscheidungsrelevante Beobachtungsfähigkeit: Verstärktes Lernen unter relativer Ignorierbarkeit 与决策相关的可观察性框架:相对无知下的加强学习联系 2504.07722v5
  • 892 06-29 Double-Diffusion: Diffusion Conditioned Diffusion Probabilistic Model For Air Quality Prediction Doppel-Diffusion: Diffusion Bedingt Diffusion probabilistische Modell für die Luftqualität Vorhersage 双重扩散: 空气质量预测的有条件扩散概率模型 2506.23053v1
  • 893 06-29 DSAC: Distributional Soft Actor-Critic for Risk-Sensitive Reinforcement Learning DSAC: Distributive Soft Actor-Critic für risikosensibles Verstärkungslernen DSAC: 用于风险敏感强化学习的分布软软活性-批评 2004.14547v3
  • 894 06-29 ReMem: Mutual Information-Aware Fine-tuning of Pretrained Vision Transformers for Effective Knowledge Distillation ReMem: Gegenseitige Information-Bewusst Feinabstimmung von vortrainierten Visionstransformatoren für effektive Wissensdestillation ReMEM:为有效知识蒸馏而对预先培训的愿景变异器进行相互信息-软件微调 2506.23041v1
  • 895 06-29 MetaSynth: Meta-Prompting-Driven Agentic Scaffolds for Diverse Synthetic Data Generation MetaSynth: Meta-prompting-Driven Agentic Scaffolds für vielfältige synthetische Datengenerierung MetaSynth: 用于多种合成数据生成的元- 制造- 挥发剂脚架 2504.12563v2
  • 896 06-29 The Oracle Complexity of Simplex-based Matrix Games: Linear Separability and Nash Equilibria Die Oracle Komplexität von Simplex-basierten Matrix Games: Lineare Separierbarkeit und Nash Equilibria 简单克斯基母体运动会的甲骨文复杂性:线性分离和纳什平衡 2412.06990v2
  • 897 06-28 (6) Fragile, Robust, and Antifragile: A Perspective from Parameter Responses in Reinforcement Learning Under Stress Fragile, robuste und antifragile: Eine Perspektive aus Parameterreaktionen bei der Stärkung des Lernens unter Stress 脆弱、强力和抗脆弱:强化学习压力下的参数对策的视角 2506.23036v1
  • 898 06-28 Feature-Wise Mixing for Mitigating Contextual Bias in Predictive Supervised Learning Feature-Wise-Mixing für die Vermeidung von Kontext-Bias in Predictive Supervised Learning 预测监督的学习中减轻背景偏见的功能- Wise 混音 2506.23033v1
  • 899 06-28 ATTENTION2D: Communication Efficient Distributed Self-Attention Mechanism ACHTUNG2D: Kommunikation Effizient verteilter Selbstaufmerksamkeitsmechanismus 注意2D: 沟通高效分配自发性传播机制 2503.15758v2
  • 900 06-28 VisionScores – A system-segmented image score dataset for deep learning tasks VisionScores – Ein systemsegmentierter Bildscore-Datensatz für Deep-Learning-Aufgaben 愿景-图像分集数据集,用于深层学习任务 2506.23030v1
  • 901 06-28 Multimodal Contrastive Representation Learning in Augmented Biomedical Knowledge Graphs Multimodales Kontrastives Repräsentationslernen in Augmented Biomedical Knowledge Graphs 生物医学知识强化图中多模式差异代表性学习 2501.01644v2
  • 902 06-28 Spectra 1.1: Scaling Laws and Efficient Inference for Ternary Language Models Spectra 1.1: Skalierungsgesetze und effiziente Schlussfolgerungen für Ternäre Sprachmodelle Spectra 1.1:扩大法律规模和短期语言模式的有效推论 2506.23025v1
  • 903 06-28 BWLer: Barycentric Weight Layer Elucidates a Precision-Conditioning Tradeoff for PINNs BWLer: Barycentric Weight Layer erklärt einen Precision-Conditioning Tradeoff für PINNs BWLer: 巴里中心重量层 2506.23024v1
  • 904 06-28 Scenario-Based Hierarchical Reinforcement Learning for Automated Driving Decision Making Szenariobasiertes Hierarchisches Verstärkungslernen für Automatisiertes Fahren bei Entscheidungsfindungen 用于自动驱动决策的基于情景的等级强化学习 2506.23023v1
  • 905 06-28 The Limited Impact of Medical Adaptation of Large Language and Vision-Language Models Die begrenzten Auswirkungen medizinischer Anpassung von großen Sprach- und Visions-Sprachenmodellen 大语言和视觉语言模式医学适应的有限影响 2411.08870v3
  • 906 06-28 On Universality of Non-Separable Approximate Message Passing Algorithms Über Universalität nicht-trennbarer ungefährer Botschaften 关于不可分离的近似信息传递算法的普遍性 2506.23010v1
  • 907 06-28 A Fundamental Accuracy–Robustness Trade-off in Regression and Classification Eine grundlegende Genauigkeit–Robustheit Trade-off in Regression und Klassifizierung 回归和分类方面的基本准确性-易腐性 2411.05853v2
  • 908 06-28 Are Convex Optimization Curves Convex? Sind Convex-Optimierungskurven Convex? Convex 优化曲线曲线 Convex? 2503.10138v3
  • 909 06-28 A Data Science Approach to Calcutta High Court Judgments: An Efficient LLM and RAG-powered Framework for Summarization and Similar Cases Retrieval A Data Science Approach to Kalkutta High Court Urteile: Ein effizienter LLM- und RAG-gestützter Rahmen für Zusammenfassungen und ähnliche Fälle 加尔各答高等法院判决的数据科学方法:高效LLM和RAG驱动的总结和类似案件检索框架 2507.01058v1
  • 910 06-28 A Reinforcement Learning Approach for Optimal Control in Microgrids Ein Verstärkungs-Lernansatz zur optimalen Steuerung in Microgrids 微电网优化控制强化学习方法 2506.22995v1
  • 911 06-28 Foundation Models for Wearable Movement Data in Mental Health Research Gründungsmodelle für Wearable Movement Data in der Mental Health Research 心理健康研究中穿戴运动数据模型基金会 2411.15240v4
  • 912 06-28 Kernel Outlier Detection Kernaussenerkennung 核心外星探测 2506.22994v1
  • 913 06-28 Recommender Systems for Good (RS4Good): Survey of Use Cases and a Call to Action for Research that Matters Recommender Systems for Good (RS4Good): Umfrage von Use Cases und ein Aufruf zur Aktion für Forschung, die zählt 良好建议系统(RS4Good):调查使用案例和呼吁采取行动开展重要研究 2411.16645v2
  • 914 06-28 Early Stopping Tabular In-Context Learning Frühzeitiges Stoppen des tabellarischen In-Context-Lernens 早期停止制表列表内容学习 2506.21387v2
  • 915 06-28 Cybersecurity-Focused Anomaly Detection in Connected Autonomous Vehicles Using Machine Learning Cybersecurity-fokussierte Anomalienerkennung in vernetzten autonomen Fahrzeugen mit maschinellem Lernen 使用机器学习的连连连自主车辆网络安全-以计算机安全为主的异常现象探测 2506.22984v1
  • 916 06-28 Fine-Tuning Next-Scale Visual Autoregressive Models with Group Relative Policy Optimization Feintuning Next-Scale Visual Autoregressive Modelle mit gruppenrelativer Politikoptimierung 采用群体相对政策优化优化的 下尺度视觉自动递减模型 2505.23331v2
  • 917 06-28 Time-MQA: Time Series Multi-Task Question Answering with Context Enhancement Time-MQA: Zeitreihe Multi-Task-Fragebeantwortung mit Kontextverbesserung 时间-MQA:时间系列多任务问题,加强背景回答 2503.01875v2
  • 918 06-28 On the Generalizability of “Competition of Mechanisms: Tracing How Language Models Handle Facts and Counterfactuals” Zur Verallgemeinerbarkeit von “Wettbewerb von Mechanismen: Aufspüren, wie Sprachmodelle mit Fakten und Gegenfakten umgehen” 关于“机制的竞争:追踪语言模式如何处理事实和反事实”的一般性 2506.22977v1
  • 919 06-28 Interpretable LLM-based Table Question Answering Interpretierbare LLM-basierte Tabellenfragebeantwortung 基于表问题的回答 2412.12386v3
  • 920 06-28 ActAlign: Zero-Shot Fine-Grained Video Classification via Language-Guided Sequence Alignment AktAlign: Nullschnelle feinkörnige Videoklassifikation über sprachgeführte Sequenzausrichtung 通过语言引导序列调整的零热精细视频分类 2506.22967v1
  • 921 06-28 CN-SBM: Categorical Block Modelling For Primary and Residual Copy Number Variation CN-SBM: Kategorische Blockmodellierung für primäre und residuale Kopiernummernvariation CNSBM: 初级和残余复制品分类块模型 2506.22963v1
  • 922 06-28 Infinite Sampling: Efficient and Stable Grouped RL Training for Large Language Models Infinite Sampling: Effiziente und stabile RL-Schulung für große Sprachmodelle 无限抽样:高语言模式有效和稳定的组合RL培训 2506.22950v1
  • 923 06-28 A Study on Semi-Supervised Detection of DDoS Attacks under Class Imbalance Eine Studie über die halbüberwachte Erkennung von DDoS-Angriffen unter Klassenungleichgewicht 关于在 “ 不平衡 “ 类下对DDoS攻击的半有效探测研究 2506.22949v1
  • 924 06-28 Efficient Shallow Ritz Method For 1D Diffusion-Reaction Problems Effiziente Ritz-Methode für 1D-Diffusionsreaktionsprobleme 用于1D 扩散反应问题的高效浅流机法 2407.01496v3
  • 925 06-28 Orthogonal Gradient Descent Improves Neural Calibration Orthogonal gradient Descent verbessert die Neuralkalibrierung 骨质梯度梯度下层改善神经校准 2506.04487v2
  • 926 06-28 Transformer Encoder and Multi-features Time2Vec for Financial Prediction Transformer Encoder und Multi-Features Time2Vec für finanzielle Vorhersagen 用于金融预测的时间2Vec 2504.13801v2
  • 927 06-28 Utilizing a Novel Deep Learning Method for Scene Categorization in Remote Sensing Data Verwendung einer neuartigen Deep-Learning-Methode für die Kategorisierung von Szenen in Daten der Fernerkundung 利用新深层学习方法对遥感数据进行星 域分类 2506.22939v1
  • 928 06-28 Efficient Cybersecurity Assessment Using SVM and Fuzzy Evidential Reasoning for Resilient Infrastructure Effiziente Cybersecurity-Bewertung mittels SVM und Fuzzy Evidential Reasoning für resiliente Infrastruktur 利用SVM和模糊的证明理由进行高效的网络安全评估 2506.22938v1
  • 929 06-28 Differentiable Radar Ambiguity Functions: Mathematical Formulation and Computational Implementation Differentielle Radar-Ambiguitätsfunktionen: Mathematische Formulierung und Computational Implementation 可区别的雷达模糊功能:数学拟订和计算实施 2506.22935v1
  • 930 06-28 FedDTG:Federated Data-Free Knowledge Distillation via Three-Player Generative Adversarial Networks FedDTG:Federated Data-Free Knowledge Destillation über drei Spieler Generative Adversarial Networks FDTG:通过三层生成反向网络进行联邦无数据知识蒸馏 2201.03169v5
  • 931 06-28 Mathematical Computation on High-dimensional Data via Array Programming and Parallel Acceleration Mathematische Berechnung hochdimensionaler Daten über Array-Programmierung und parallele Beschleunigung 通过阵列编程和平行加速对高维数据进行数学计算 2506.22929v1
  • 932 06-28 HalluSegBench: Counterfactual Visual Reasoning for Segmentation Hallucination Evaluation HalluSegBench: Counterfactual Visual Reasoning for Segmentation Halluzination Evaluation HalluSegeBench:截肢幻觉评价的反事实视觉理由 2506.21546v2
  • 933 06-28 SConU: Selective Conformal Uncertainty in Large Language Models SConU: Selektive konforme Unsicherheit in großen Sprachmodellen SCONU:大语言模式中选择性的形式不确定性 2504.14154v2
  • 934 06-28 Towards Time Series Generation Conditioned on Unstructured Natural Language Zur Zeitreihe Erzeugung auf unstrukturierte natürliche Sprache 以无结构的自然语言为条件的一代人 2506.22927v1
  • 935 06-28 Learning Truthful Mechanisms without Discretization Wahrhaftige Mechanismen ohne Diskretisierung lernen 学习真相机制,无分化 2506.22911v1
  • 936 06-28 Missing-Modality-Aware Graph Neural Network for Cancer Classification Missing-Modality-Aware Graph Neuronales Netzwerk für die Krebsklassifikation 失踪 – – 模式 – – Aware 癌症分类神经网络 2506.22901v1
  • 937 06-28 Neural Cellular Automata: From Cells to Pixels Neurale Zelluläre Automaten: Von Zellen zu Pixeln 神经细胞自定义数据: 从单元格到像素 2506.22899v1
  • 938 06-28 Interpretable Time Series Autoregression for Periodicity Quantification Verdolmetschbare Zeitreihen Autoregression für Periodizitätsquantifizierung 周期量化的自动递减 2506.22895v1
  • 939 06-28 CA-Diff: Collaborative Anatomy Diffusion for Brain Tissue Segmentation CA-Diff: Kollaborative Anatomie-Diffusion für die Segmentierung von Hirngewebe CA-Diff:脑组织分割协作解剖分解 2506.22882v1
  • 940 06-28 Loop2Net: Data-Driven Generation and Optimization of Airfoil CFD Meshes from Sparse Boundary Coordinates Loop2Net: Datengetriebene Erzeugung und Optimierung von CFD-Netzen aus Sparse Boundary Coordinates 循环2Net:数据驱动生成和优化来自偏浅边界坐标点的 CFD 空 fil CFD Meshes 2507.01057v1
  • 941 06-28 Training Flexible Models of Genetic Variant Effects from Functional Annotations using Accelerated Linear Algebra Training Flexible Modelle genetischer Variant-Effekte aus funktionellen Anmerkungen mit beschleunigter Linear Algebra 使用加速线性线性代数对功能说明的遗传变异效应灵活模型的培训 2506.19598v2
  • 942 06-28 P$^2$U: Progressive Precision Update For Efficient Model Distribution P$^2$U: Progressive Präzisionsaktualisierung für effiziente Modellverteilung P$2美元U:为有效分配模型而逐步精准更新 2506.22871v1
  • 943 06-28 PRMBench: A Fine-grained and Challenging Benchmark for Process-Level Reward Models PRMBench: Ein feinkörniger und anspruchsvoller Benchmark für Prozess-Level-Reward-Modelle PRMBBench:进程一级奖励模式的精细和质疑基准 2501.03124v5
  • 944 06-28 Machine learning in wastewater treatment: insights from modelling a pilot denitrification reactor Maschinelles Lernen in der Abwasserbehandlung: Erkenntnisse aus der Modellierung eines Pilot-Denitrifikationsreaktors 废水处理方面的机器学习:从模拟试验脱硝反应堆中获得的见解 2412.14030v2
  • 945 06-28 Deep neural networks can provably solve Bellman equations for Markov decision processes without the curse of dimensionality Tiefe neuronale Netzwerke können nachweislich Bellman-Gleichungen für Markov-Entscheidungsprozesse ohne den Fluch der Dimensionalität lösen 深神经网络可以解决Bellman方程式 用于Markov决策程序 而不受到维度的诅咒 2506.22851v1
  • 946 06-28 Scalable Structure Learning of Bayesian Networks by Learning Algorithm Ensembles Skalierbare Struktur Lernen Bayesischer Netzwerke durch das Lernen von Algorithmenensembles Bayesian网络通过学习算法群群学习的可缩放结构学习 2506.22848v1
  • 947 06-28 Quantum Neural Networks for Wind Energy Forecasting: A Comparative Study of Performance and Scalability with Classical Models Quantum Neural Networks for Wind Energy Forecasting: Eine vergleichende Studie über Leistung und Skalierbarkeit mit klassischen Modellen 风能预测的量子神经网络:用古典模型比较性能和可缩放性研究 2506.22845v1
  • 948 06-28 xLSTMAD: A Powerful xLSTM-based Method for Anomaly Detection xLSTMAD: Eine leistungsstarke xLSTM-basierte Methode zur Erkennung von Anomalien XLSTMAD: 一种以XLSTM为基础的强效异常探测方法 2506.22837v1
  • 949 06-28 Adaptive Rank Allocation for Federated Parameter-Efficient Fine-Tuning of Language Models Adaptive Rangverteilung für Federated Parameter-Efficient Fine-Tuning of Language Models 联邦准拉米有效精密语言模式调适级分配 2501.14406v3
  • 950 06-28 Privacy-Preserving Video Anomaly Detection: A Survey Datenschutz-Bewahrung Video Anomalie Erkennung: Eine Umfrage 隐私保护视频异常探测:调查 2411.14565v2
  • 951 06-28 Prompting without Panic: Attribute-aware, Zero-shot, Test-Time Calibration Prompting ohne Panik: Attribute-Bewusstsein, Null-Schuss, Test-Zeit-Kalibrierung 无恐慌的提示: 属性意识、 零弹、 测试时间校准 2506.22819v1
  • 952 06-28 BayesLoRA: Task-Specific Uncertainty in Low-Rank Adapters BayesLoRA: Aufgabenspezifische Unsicherheit in Low-Rank-Adaptern BayesLOLRA:低兰克适应器中任务具体不确定性 2506.22809v1
  • 953 06-28 Concept Pinpoint Eraser for Text-to-image Diffusion Models via Residual Attention Gate Konzept Pinpoint Radiergummi für Text-zu-Bild-Diffusionsmodelle über Residual Attention Gate 通过遗留关注门进行文字到图像传播模型的Pinpoint Eraser概念概念 2506.22806v1
  • 954 06-28 Intervening in Black Box: Concept Bottleneck Model for Enhancing Human Neural Network Mutual Understanding Intervening in Black Box: Konzept Engpass-Modell für die Verbesserung der menschlichen neuralen Netzwerk gegenseitiges Verständnis 黑盒干预:增强人类神经网络相互了解的概念瓶颈模式 2506.22803v1
  • 955 06-28 Vision-QRWKV: Exploring Quantum-Enhanced RWKV Models for Image Classification Vision-QRWKV: Quantum-erweiterte RWKV-Modelle für die Bildklassifikation 展望-QRWKKV:探索以量为基础增强的RWKV图像分类模型 2506.06633v2
  • 956 06-28 Riemannian-Geometric Fingerprints of Generative Models Riemannisch-Geometrische Fingerabdrücke von Generativen Modellen 创造型模型的里伊曼尼-大地测量指纹 2506.22802v1
  • 957 06-28 VoteSplat: Hough Voting Gaussian Splatting for 3D Scene Understanding VoteSplat: Harte Abstimmung Gaussian Splatting für 3D-Szene Verständnis VotoStplat: 高斯拉特三维场景理解的 高斯拉特票 2506.22799v1
  • 958 06-28 Aircraft Trajectory Dataset Augmentation in Latent Space Flugbahndatensatz Augmentation im Latent Space 低空空间轨迹数据集增强 2506.07585v2
  • 959 06-28 Learning Time-Aware Causal Representation for Model Generalization in Evolving Domains Time-Aware-Lernen Kausaldarstellung für Modellverallgemeinerung in sich entwickelnden Domänen 正在演变的域域中模型普遍化模型的学习时间- 软件因果代表 2506.17718v2
  • 960 06-28 Multimodal Atmospheric Super-Resolution With Deep Generative Models Multimodale atmosphärische Super-Resolution mit tiefen Generativen Modellen 多式大气多式超分辨率,具有深创模型 2506.22780v1
  • 961 06-28 PromptDSI: Prompt-based Rehearsal-free Continual Learning for Document Retrieval PromptDSI: Prompt-basiert Probefreies Kontinuales Lernen für Dokument-Retrieval 快速检索:为检索文件而进行基于即时的无排练的持续学习 2406.12593v4
  • 962 06-28 FF-INT8: Efficient Forward-Forward DNN Training on Edge Devices with INT8 Precision FF-INT8: Effizientes Vorwärts-DNN-Training auf Edge-Geräten mit INT8-Präzision FF-INT8:利用INT8精密度进行关于边缘装置的有效前向DNN培训 2506.22771v1
  • 963 06-28 Can We Reliably Predict the Fed’s Next Move? A Multi-Modal Approach to U.S. Monetary Policy Forecasting Können wir den nächsten Schritt der Fed zuverlässig vorhersagen? Ein multimodaler Ansatz zur US-amerikanischen Geldpolitikprognose 我们能否可靠地预测美联储的下一步行动?美国货币政策预测的多模式方法。 2506.22763v1
  • 964 06-28 Detecting Sockpuppetry on Wikipedia Using Meta-Learning Sockepuppetry auf Wikipedia erkennen Mit Meta-Learning 在维基百科上用元学习探测袜子布料 2506.10314v2
  • 965 06-28 Graph Contrastive Learning with Low-Rank Regularization and Low-Rank Attention for Noisy Node Classification Grafik Kontrastives Lernen mit Low-Rank-Regularisierung und Low-Rank-Achtung für Noisy Node-Klassifikation 用于噪音节点分类的与低兰克常规化和低兰克注意的对比学习图 2402.09600v2
  • 966 06-28 Sculpting Memory: Multi-Concept Forgetting in Diffusion Models via Dynamic Mask and Concept-Aware Optimization Sculpting Memory: Multi-Concept Vergessen von Diffusionsmodellen über Dynamic Mask und Concept-Aware Optimierung 雕塑记忆:通过动态面具和概念软件优化在传播模型中遗忘多种概念 2504.09039v2
  • 967 06-28 Decoding Federated Learning: The FedNAM+ Conformal Revolution Decoding Federated Learning: Die FedNAM+ Konforme Revolution 解说联邦学习:美联联储+非正规革命 2506.17872v2
  • 968 06-28 Evaluating Pavement Deterioration Rates Due to Flooding Events Using Explainable AI Bewertung von Verschlechterungsraten aufgrund von Überschwemmungen unter Verwendung erklärbarer KI 利用可解释的AI 评估洪水事件造成的平 道 况 况 率 2507.01056v1
  • 969 06-28 Efficiently Serving Large Multimodal Models Using EPD Disaggregation Effizientes Servieren großer multimodaler Modelle mit EPD-Disaggregation 利用EPD拆分有效服务大型多模式模式 2501.05460v4
  • 970 06-28 Robust Tensor Completion via Gradient Tensor Nulclear L1-L2 Norm for Traffic Data Recovery Robuste Tensor-Vervollständigung über Gradient Tensor Nulclear L1-L2 Norm für Verkehrsdatenrettung 通过梯度 Tensor Nulclear L1-L2 用于交通数据恢复的规范 2506.22732v1
  • 971 06-28 XTransfer: Cross-Modality Model Transfer for Human Sensing with Few Data at the Edge XTransfer: Cross-Modality-Modelltransfer für die menschliche Wahrnehmung mit wenigen Daten am Rand Xtransfer:人类遥感跨模式模式转让模型,边缘几乎没有数据 2506.22726v1
  • 972 06-28 BEST-Route: Adaptive LLM Routing with Test-Time Optimal Compute BEST-Route: Adaptives LLM Routing mit Test-Time Optimal Compute 最佳选择:用测试时最佳计算法运行的适应性LMLM 2506.22716v1
  • 973 06-28 Libra: Synergizing CUDA and Tensor Cores for High-Performance Sparse Matrix Multiplication Waage: CUDA- und Tensorkerne für hochleistungsfähige Sparse-Matrix-Multiplikation synergisieren 激光仪:将CUDA和Tensor核心同步用于高性能散射矩阵乘法 2506.22714v1
  • 974 06-28 Generalized Linear Mode Connectivity for Transformers Generalisierte Linearmodus-Konnektivität für Transformatoren 变形器通用线性模式连接 2506.22712v1
  • 975 06-28 Supporting renewable energy planning and operation with data-driven high-resolution ensemble weather forecast Unterstützung der Planung und des Betriebs erneuerbarer Energien mit datengetriebener Hochauflösungs-Ensemble-Wettervorhersage 支持可再生能源规划和运作,以数据驱动的高分辨率高分辨率气象组合组合天气预报支持可再生能源规划和运作 2505.04396v3
  • 976 06-28 Predicting and Explaining Customer Data Sharing in the Open Banking Vorhersage und Erklärung der Weitergabe von Kundendaten im Open Banking 在开放银行业务中预测和解释客户数据分享 2507.01987v1
  • 977 06-28 FairMarket-RL: LLM-Guided Fairness Shaping for Multi-Agent Reinforcement Learning in Peer-to-Peer Markets FairMarket-RL: LLM-geführte Fairness-Forming für das Mehr-Agenten-Verstärkungs-Lernen in Peer-to-Peer-Märkten Fairket-RL:在同行与同行市场中为多机构强化学习提供LLM-指导公平模式 2506.22708v1
  • 978 06-28 Lower bounds for trace estimation via Block Krylov and other methods Untere Grenzen für die Spurenabschätzung über Block Krylov und andere Methoden 通过Clock Krylov 和其他方法进行跟踪估计的下限范围 2506.22701v1
  • 979 06-28 Suboptimality analysis of receding horizon quadratic control with unknown linear systems and its applications in learning-based control Suboptimalitätsanalyse der receding horizont quadratic control mit unbekannten linearen Systemen und deren Anwendungen in der Learning-based control 以未知线性系统及其在以学习为基础的控制中的应用对正在后退的地平地平线二次控制及其应用的不优化程度分析 2301.07876v4
  • 980 06-28 Residual Matrix Transformers: Scaling the Size of the Residual Stream Residual Matrix Transformers: Skalierung der Größe des Residual Stream 残余矩阵变异器:扩大残余流的规模 2506.22696v1
  • 981 06-28 Mixing Time of the Proximal Sampler in Relative Fisher Information via Strong Data Processing Inequality Mischzeit des Proximal-Probenehmers in relativen Fisher-Informationen über starke Datenverarbeitungsungleichheit 利用强有力的数据处理不平等,混合比较渔业信息中通过强有力的数据处理不平等的 Proximal样本采集器的时间 2502.05623v2
  • 982 06-28 On the Convergence of Min-Max Langevin Dynamics and Algorithm Über die Konvergenz von Min-Max Langevin Dynamik und Algorithmen 关于Min-Max Langevin动力公司和Algorithm公司的趋同 2412.20471v3
  • 983 06-28 Scaling Data-Constrained Language Models Skalierung von datengebundenen Sprachmodellen 受数据约束的语言模式 2305.16264v5
  • 984 06-27 (5) Mitigating Semantic Collapse in Generative Personalization with a Surprisingly Simple Test-Time Embedding Adjustment Semantischer Zusammenbruch bei der Generativen Personalisierung mit einer überraschend einfachen Test-Time-Einbettungsanpassung abmildern 采用令人惊讶的简单测试时间嵌入调整法, 减轻在产生个性化过程中的语义消沉 2506.22685v1
  • 985 06-27 PriorDiffusion: Leverage Language Prior in Diffusion Models for Monocular Depth Estimation PriorDiffusion: Leverage Language Prior in Diffusionsmodellen für monookulare Tiefenschätzung 先前传播:在单人深度估算扩散模型中先使用语言 2411.16750v3
  • 986 06-27 CMI-Bench: A Comprehensive Benchmark for Evaluating Music Instruction Following CMI-Bench: Ein umfassender Benchmark für die Bewertung der Musikanleitung CMI-Bench:评估音乐教学的全面基准 2506.12285v2
  • 987 06-27 SLED: A Speculative LLM Decoding Framework for Efficient Edge Serving SLED: Ein spekulatives LLM-Decoding-Framework für effizientes Edge Serving SLED: 有效边缘服务投机性LLM代谢框架 2506.09397v3
  • 988 06-27 DistShap: Scalable GNN Explanations with Distributed Shapley Values DistShap: Skalierbare GNN-Erklärungen mit verteilten Shapley-Werten 分布式shap:可缩放的 GNN 解释和分布式形状值 2506.22668v1
  • 989 06-27 VERA: Variational Inference Framework for Jailbreaking Large Language Models VERA: Variationaler Bezugsrahmen für Jailbreaking große Sprachmodelle VERA:破碎大型语言模型变化推断框架 2506.22666v1
  • 990 06-27 Learning Stochastic Multiscale Models Stochastische Multiskalenmodelle lernen 学习存储多尺度模型 2506.22655v1
  • 991 06-27 Probing Quantum Spin Systems with Kolmogorov-Arnold Neural Network Quantum States Probing Quantum Spin Systems mit Kolmogorov-Arnold Neural Network Quantum States 与Kolmogorov-Arold神经网络 2506.01891v4
  • 992 06-27 XxaCT-NN: Structure Agnostic Multimodal Learning for Materials Science XxACT-NN: Strukturagnostisches multimodales Lernen für die Materialwissenschaft XxXCT-NN:材料科学机构 – – 材料科学多模式学习 2507.01054v1
  • 993 06-27 Vulnerable Road User Detection and Safety Enhancement: A Comprehensive Survey Verwundbare Erkennung und Verbesserung der Sicherheit von Verkehrsteilnehmern: Eine umfassende Umfrage 弱势道路使用者检测和加强安全:全面调查 2405.19202v5
  • 994 06-27 Cost-effective Reduced-Order Modeling via Bayesian Active Learning Kostengünstige Reduzierte Auftragsmodellierung über Bayesian Active Learning 通过巴耶斯积极学习进行降低分级模拟 2506.22645v1
  • 995 06-27 Diversity by Design: Addressing Mode Collapse Improves scRNA-seq Perturbation Modeling on Well-Calibrated Metrics Diversity by Design: Addressing Mode Collapse Verbessert scRNA-seq Perturbation Modeling auf gut kalibrierten Metrics 通过设计实现多样性:解决模式折叠问题,改进精确度度度模型的ScRNA-等分扰动模型 2506.22641v1
  • 996 06-27 Augmenting Online RL with Offline Data is All You Need: A Unified Hybrid RL Algorithm Design and Analysis Augmenting Online RL mit Offline-Daten ist alles, was Sie brauchen: Eine einheitliche Hybrid-RL-Algorithmus Design und Analyse 用离线数据扩大在线 RL 的离线数据是所有你需要的: 统一混合 RL 算法设计和分析 2505.13768v3
  • 997 06-27 Layer Importance for Mathematical Reasoning is Forged in Pre-Training and Invariant after Post-Training Layer Bedeutung für mathematische Vernunft wird im Vor-Training und Invariant nach dem Nach-Training geschmiedet 培训前和培训后变化无常中提出了数学理由的重要性。 2506.22638v1
  • 998 06-27 EXPRTS: Exploring and Probing the Robustness of Time Series Forecasting Models EXPRTS: Erforschung und Erprobung der Robustheit von Zeitreihenprognosemodellen EXPRTS:探索和检验时间序列预测模型的威力 2403.03508v3
  • 999 06-27 A hierarchical Vovk-Azoury-Warmuth forecaster with discounting for online regression in RKHS Ein hierarchischer Vovk-Azoury-Warmuth-Prognoser mit Rabatt für Online-Regression in RKHS Vovk-Azoury-Warmuth等级预报器,对RKHS的在线回归进行折扣 2506.22631v1
  • 1000 06-27 Hierarchical Modeling and Architecture Optimization: Review and Unified Framework Hierarchische Modellierung und Architekturoptimierung: Überprüfung und Unified Framework 等级建模和架构优化:审查和统一框架 2506.22621v1
  • 1001 06-27 Deep Hedging to Manage Tail Risk Deep Hedging zur Verwaltung von Tail Risk 管理尾端风险 2506.22611v1
  • 1002 06-27 Learning Individual Reproductive Behavior from Aggregate Fertility Rates via Neural Posterior Estimation Individuelles reproduktives Verhalten von Aggregat Fertilitätsraten über neurale hintere Schätzung lernen 学习个人生殖行为 学习个人生殖行为 2506.22607v1
  • 1003 06-27 A User-Centric, Privacy-Preserving, and Verifiable Ecosystem for Personal Data Management and Utilization Ein benutzerzentriertes, datenschutzschonendes und überprüfbares Ökosystem für das Management und die Nutzung personenbezogener Daten 用于个人数据管理和利用的用户中心、隐私保护和可核实生态系统 2506.22606v1
  • 1004 06-27 Are Fast Methods Stable in Adversarially Robust Transfer Learning? Sind schnelle Methoden stabil im adversarially robust Transfer Learning? 快速方法是否稳定在逆向快速传输学习中? 2506.22602v1
  • 1005 06-27 AdaReasoner: Adaptive Reasoning Enables More Flexible Thinking in Large Language Models AdaReasoner: Adaptive Reasoning ermöglicht flexibleres Denken in großen Sprachmodellen AdaReasoner: 适应性合理性使大语言模式的思维更加灵活 2505.17312v3
  • 1006 06-27 What Makes the Preferred Thinking Direction for LLMs in Multiple-choice Questions? Was macht die bevorzugte Denkrichtung für LLMs in Multiple-Choice-Fragen? ” 多种选择问题 “ 中LLMs的首选思维方向是什么? 2502.18435v3
  • 1007 06-27 The Hidden Link Between RLHF and Contrastive Learning Die versteckte Verbindung zwischen RLHF und Kontrastivem Lernen RLHF 和抵触学习之间的隐藏链接 2506.22578v1
  • 1008 06-27 Enhancing Diffusion Posterior Sampling for Inverse Problems by Integrating Crafted Measurements Verstärkte Diffusions-Parfümung für inverse Probleme durch Integration von handwerklichen Messungen 通过综合人工计量方法,加强对反向问题的传播池别样样 2411.09850v2
  • 1009 06-27 Dealing with the Evil Twins: Improving Random Augmentation by Addressing Catastrophic Forgetting of Diverse Augmentations Umgang mit den bösen Zwillingen: Zufällige Augmentation verbessern, indem Katastrophenvergessen von verschiedenen Augmentationen adressiert wird 对付邪恶双胞胎:通过解决灾难性地忘记多种增加现象,改善随机增加现象 2506.08240v2
  • 1010 06-27 Generalized Venn and Venn-Abers Calibration with Applications in Conformal Prediction Generalisierte Venn- und Venn-Abers-Kalibrierung mit Anwendungen in konformer Vorhersage 通用文文和文安-用非正式预测对应用进行校准 2502.05676v2
  • 1011 06-27 Exploration Behavior of Untrained Policies Explorationsverhalten ungeübter Politiken 未经过培训的政策的探索行为 2506.22566v1
  • 1012 06-27 Adjoint Schrödinger Bridge Sampler Adjoint Schrödinger Bridge Sampler Adjoint Schrödinger桥采样器 2506.22565v1
  • 1013 06-27 Faster Convergence of Riemannian Stochastic Gradient Descent with Increasing Batch Size Schnellere Konvergenz der Riemannschen Stochastischen Gradienten mit zunehmender Batchgröße Riemannian Stottachatic 渐变源层与增加批量大小的更快趋同 2501.18164v2
  • 1014 06-27 MetaCipher: A General and Extensible Reinforcement Learning Framework for Obfuscation-Based Jailbreak Attacks on Black-Box LLMs MetaCipher: Ein allgemeines und erweitertes Lernrahmenwerk für Verwüstungsbasierte Jailbreak-Angriffe auf Black-Box-LLMs MetaCipher:关于对黑毒LMS进行以腐烂为基础的破碎监狱袭击的一般和可扩展强化学习框架 2506.22557v1
  • 1015 06-27 Spectral Bias in Variational Quantum Machine Learning Spektrale Bias im variantenreichen Quantum Maschinelles Lernen 变化量量机器学习中的光谱比值 2506.22555v1
  • 1016 06-27 Neural models of multiscale systems: conceptual limitations, stochastic parametrizations, and a climate application Neurale Modelle von multiskaligen Systemen: konzeptionelle Einschränkungen, stochastische Parametrisierungen und eine Klimaanwendung 多尺度系统的神经模型:概念局限性、随机超称和气候应用 2506.22552v1
  • 1017 06-27 Beyond ReLU: How Activations Affect Neural Kernels and Random Wide Networks Beyond ReLU: Wie Aktivierungen Neuronale Kernel und Random Wide Networks beeinflussen ReLU之后:激活如何影响神经内核和随机广域网 2506.22429v1
  • 1018 06-27 CLoVE: Personalized Federated Learning through Clustering of Loss Vector Embeddings CLoVE: Personalisiertes Federated Learning durch Clustering von Verlust-Vektor-Embeddings CLOVE:通过将损失矢量嵌入体集中起来进行个性化联邦学习 2506.22427v1
  • 1019 06-27 ARMOR: Robust Reinforcement Learning-based Control for UAVs under Physical Attacks ARMOR: Robuste lernbasierte Steuerung zur Verstärkung von UAVs unter physischen Angriffen 对遭受人身攻击的无人驾驶航空器进行强有力的加强学习控制 2506.22423v1
  • 1020 06-27 L2MAC: Large Language Model Automatic Computer for Extensive Code Generation L2MAC: Automatischer Computer mit großem Sprachmodell für umfangreiche Code-Generierung L2MAC:用于广泛代码生成的大型语言模拟自动计算机 2310.02003v6
  • 1021 06-27 Maximizing Confidence Alone Improves Reasoning Maximierung des Vertrauens allein verbessert die Vernunft 使信心最大化单独提高合理性 2505.22660v4
  • 1022 06-27 Decoupled SGDA for Games with Intermittent Strategy Communication Entkoppelte SGDA für Spiele mit intermittierender Strategiekommunikation SGDA与间歇性战略交流 2501.14652v2
  • 1023 06-27 Exploration from a Primal-Dual Lens: Value-Incentivized Actor-Critic Methods for Sample-Efficient Online RL Exploration aus einer primär-Dual-Lense: wertanreizte aktor-kritische Methoden für probeneffiziente Online-RL 探索原始-二极镜头:用于抽样有效在线转线的有价值激励的活性-Crict 方法 2506.22401v1
  • 1024 06-27 Strategic A/B testing via Maximum Probability-driven Two-armed Bandit Strategische A/B-Prüfung über maximale Wahrscheinlichkeit zweiarmiger Bandit 战略A/B测试,通过最大概率驱动的双武装两武装土匪进行战略A/B测试 2506.22536v1
  • 1025 06-27 Multi-View Contrastive Learning for Robust Domain Adaptation in Medical Time Series Analysis Multi-View Kontrastives Lernen für robuste Domain-Anpassung in der medizinischen Zeitreihenanalyse 医学时间序列分析中强力域适应多查看差异性学习 2506.22393v1
  • 1026 06-27 How to Train Long-Context Language Models (Effectively) Wie man Langkontext-Sprachenmodelle ausbildet (effektiv) 如何培训长文本语言模型(有效) 2410.02660v3
  • 1027 06-27 Towards Distributed Neural Architectures Auf dem Weg zu verteilten Neuralarchitekturen 走向分布式神经结构 2506.22389v1
  • 1028 06-27 Probabilistic Optimality for Inference-time Scaling Probabilistische Optimalität für Inferenz-Zeitskalierung 推推时间缩放的概率概率优化度 2506.22376v1
  • 1029 06-27 Sheaf-Based Decentralized Multimodal Learning for Next-Generation Wireless Communication Systems Schafsbasiertes dezentralisiertes multimodales Lernen für drahtlose Kommunikationssysteme der nächsten Generation 用于下一代无线通信系统的基于 Sheaf的分散式多模式学习 2506.22374v1
  • 1030 06-27 Robust Detection of Watermarks for Large Language Models Under Human Edits Robuste Erkennung von Wasserzeichen für große Sprachmodelle unter menschlichen Bearbeitungen 人类版下大型语言模型水印的强力探测 2411.13868v2
  • 1031 06-27 Reinforcement Learning with Physics-Informed Symbolic Program Priors for Zero-Shot Wireless Indoor Navigation Verstärktes Lernen mit physikinformierten Symbolischen Programmvorlagen für die kabellose Navigation im Innenbereich 利用物理-内射无线室内航行前置无人制热无线导航的物理-内建符号方案加强学习 2506.22365v1
  • 1032 06-27 DiffSoundStream: Efficient Speech Tokenization via Diffusion Decoding DiffSoundStream: Effiziente Sprach-Tokenisierung durch Diffusions-Dekodierung DiffSoundStream:通过传播解码的有效演讲 2506.22362v1
  • 1033 06-27 From Ground to Air: Noise Robustness in Vision Transformers and CNNs for Event-Based Vehicle Classification with Potential UAV Applications Vom Boden zur Luft: Schallfestigkeit in Vision Transformern und CNNs für die ereignisbasierte Fahrzeugklassifizierung mit potentiellen UAV-Anwendungen 从地面到空气:视觉变形器和有线电视新闻网中的噪音强力,用于可能应用无人驾驶飞行器的事件车辆分类 2506.22360v1
  • 1034 06-27 Learning Non-Local Molecular Interactions via Equivariant Local Representations and Charge Equilibration Lernen nicht-lokaler molekularer Interaktionen über gleichberechtigte lokale Repräsentationen und Ladungsgleichstellung 通过平等地方代表性和收费平衡,学习非本地分子相互作用 2501.19179v2
  • 1035 06-27 Learning Networks from Wide-Sense Stationary Stochastic Processes Netzwerke aus breit angelegten stationären stochastischen Prozessen lernen 广意识固定式存储处理的学习网络 2412.03768v2
  • 1036 06-27 Scalable Hypergraph Structure Learning with Diverse Smoothness Priors Skalierbare Hypergraph Struktur Lernen mit unterschiedlicher Glätte Priors 具有多样性平滑性学习的可缩放超时图结构 2504.03583v2
  • 1037 06-27 Long-Sequence Memory with Temporal Kernels and Dense Hopfield Functionals Langzeitgedächtnis mit Temporalen Kerneln und Dense Hopfield Functionals 具有时心和高豪华地功能的长期序列内存 2507.01052v1
  • 1038 06-27 Optimal Estimation of Watermark Proportions in Hybrid AI-Human Texts Optimale Schätzung von Wasserzeichenanteilen in Hybrid-KI-Humantexten 对混合的AI-人类文案文中水标记比例的最佳估计 2506.22343v1
  • 1039 06-27 A Framework for Multi-source Privacy Preserving Epidemic Analysis Ein Rahmen für den Datenschutz mit mehreren Quellen, der epidemische Analysen sicherstellt 多来源隐私保护流行病分析框架 2506.22342v1
  • 1040 06-27 QuKAN: A Quantum Circuit Born Machine approach to Quantum Kolmogorov Arnold Networks QuKAN: Ein Quantum Circuit Born Machine Ansatz zu Quantum Kolmogorov Arnold Networks QuKAN: 量子巡回巡回赛对Quantum Kolmogorov Arnold 网络的 量子诞生机器方法 2506.22340v1
  • 1041 06-27 Multi-Turn Code Generation Through Single-Step Rewards Multi-Turn-Code-Generierung durch Single-Step-Rewards 通过单级奖励生成多发代码 2502.20380v2
  • 1042 06-27 Robust quantum reservoir computers for forecasting chaotic dynamics: generalized synchronization and stability Robuste Quantenreservoir-Computer für die Vorhersage chaotischer Dynamiken: generalisierte Synchronisation und Stabilität 用于预测混乱动态的强大量子储量计算机:普遍同步和稳定 2506.22335v1
  • 1043 06-27 Less Greedy Equivalence Search Weniger Gier-Äquivalenz-Suche 较少贪婪的等等值搜索 2506.22331v1
  • 1044 06-27 Conceptual Topic Aggregation Begriffliche Aggregation 专题汇总概念 2506.22309v1
  • 1045 06-27 Unfolding Generative Flows with Koopman Operators: Fast and Interpretable Sampling Entfalten Generative Strömungen mit Koopman-Betreibern: Schnelle und interpretierbare Probenahme 与 Koopman 运算符的未负生成流程: 快速和可解释抽样 2506.22304v1
  • 1046 06-27 Weakly-Supervised Domain Adaptation with Proportion-Constrained Pseudo-Labeling Schwach-überwachte Domain-Anpassung mit proportional-beschränktem Pseudo-Labeling 与受比例比例限制的 “ 优多 – – 带 “ 环 “ 的 2506.22301v1
  • 1047 06-27 CoATA: Effective Co-Augmentation of Topology and Attribute for Graph Neural Networks CoATA: Effektive Co-Augmentation von Topologie und Attribut für Graph Neuronale Netzwerke CoATA: 数字神经网络的地形和属性的有效联合增强 2506.22299v1
  • 1048 06-27 Score-Based Model for Low-Rank Tensor Recovery Score-basiertes Modell für Low-Rank Tensor Erholung 低Rank Tensor回收计分模型 2506.22295v1
  • 1049 06-27 High Resolution Isotropic 3D Cine imaging with Automated Segmentation using Concatenated 2D Real-time Imaging and Deep Learning Hochauflösende isotrope 3D-Cine-Bildgebung mit automatischer Segmentierung mit konkatenierter 2D-Echtzeit-Bildgebung und Deep Learning 高分辨率 Isotoric 3D Cine 成像, 使用 Conculated 2D 实时成像和深层学习进行自动分割 2506.22532v1
  • 1050 06-27 Gradual Domain Adaptation for Graph Learning Graduale Domain-Anpassung für Graphen-Lernen 图表学习的渐进域域适应 2501.17443v2
  • 1051 06-27 Breaking Rank Bottlenecks in Knowledge Graph Completion Breaking Rang Engpässe in der Wissensgraphenvervollständigung 完成知识图中的瓶颈 2506.22271v1
  • 1052 06-27 How do Probabilistic Graphical Models and Graph Neural Networks Look at Network Data? Wie betrachten probabilistische Graphische Modelle und Graphische Neuronale Netzwerke Netzwerkdaten? 图像模型和图形神经网络如何看待网络数据? 2506.11869v2
  • 1053 06-27 KITAB-Bench: A Comprehensive Multi-Domain Benchmark for Arabic OCR and Document Understanding KITAB-Bench: Ein umfassender Multi-Domain-Benchmark für arabisches OCR und Dokumentenverständnis KITAB-Bench:阿拉伯文OCR和文件理解的综合多领域综合基准 2502.14949v2
  • 1054 06-27 Projected Compression: Trainable Projection for Efficient Transformer Compression Projektierte Kompression: Trainierbare Projektion für effiziente Transformer-Kompression 预计压缩:高效变压器压缩培训预测 2506.22255v1
  • 1055 06-27 Risk-Averse Best Arm Set Identification with Fixed Budget and Fixed Confidence Risiko-Averse Best Arm Set Identifikation mit festem Budget und festem Vertrauen 规避风险的最佳机制确定固定预算和固定预算和固定信任 2506.22253v1
  • 1056 06-27 Spring-block theory of feature learning in deep neural networks Spring-Block-Theorie des Feature-Lernens in tiefen neuronalen Netzwerken 深神经网络中特征学习的春季阶段理论 2407.19353v4
  • 1057 06-27 Fairness-Optimized Synthetic EHR Generation for Arbitrary Downstream Predictive Tasks Fairness-Optimierte Synthetische EHR-Generation für willkürliche Downstream-Vorhersageaufgaben 用于任意下游下游预测任务的公平优化合成合成合成人电子HR生成 2406.02510v3
  • 1058 06-27 Performance of Rank-One Tensor Approximation on Incomplete Data Leistung der Rang eins Tensor-Annäherung auf unvollständigen Daten 在不完全数据上接近 “ 一等-一等 “ 的性能 2504.07818v3
  • 1059 06-27 Boosting Classification with Quantum-Inspired Augmentations Steigerung der Klassifizierung durch Quantum-inspirierte Augmentationen 以量受量影响增量促进分类 2506.22241v1
  • 1060 06-27 A Plea for History and Philosophy of Statistics and Machine Learning Ein Plädoyer für Geschichte und Philosophie der Statistik und des maschinellen Lernens 统计和机器学习历史和哲学 2506.22236v1
  • 1061 06-27 No More Sliding Window: Efficient 3D Medical Image Segmentation with Differentiable Top-k Patch Sampling Kein Schiebefenster mehr: Effiziente 3D-Messbildsegmentierung mit differenzierbarer Top-K-Patch-Sampling 没有更多滑动窗口: 高效的 3D 医疗图像分割, 配有可区别的顶k 补丁抽样 2501.10814v3
  • 1062 06-27 Uncovering smooth structures in single-cell data with PCS-guided neighbor embeddings Entdecken glatter Strukturen in Einzelzellendaten mit PCS-geführten Nachbareinbettungen 以 PCS 制导的邻居嵌入器在单细胞数据中发现平滑结构 2506.22228v1
  • 1063 06-27 Communication-Efficient Heterogeneous Federated Learning with Generalized Heavy-Ball Momentum Kommunikation-Effizient Heterogenes Federated Learning mit Generalized Heavy-Ball Momentum 具有普遍化重气动势的通信效率异质联邦学习 2311.18578v3
  • 1064 06-27 No Metric to Rule Them All: Toward Principled Evaluations of Graph-Learning Datasets No Metric to Rule them All: Toward Principled Evaluations of Graph-Learning Datasets 《规则通则》的衡量标准:对图表学习数据集进行有原则的评估 2502.02379v2
  • 1065 06-27 Soft Condorcet Optimization for Ranking of General Agents Soft Condorcet Optimierung für das Ranking von General Agents 对一般代理人员排名的优化 2411.00119v4
  • 1066 06-27 Hybrid Generative Modeling for Incomplete Physics: Deep Grey-Box Meets Optimal Transport Hybride Generative Modellierung für unvollständige Physik: Deep Grey-Box trifft auf optimalen Transport 不完全物理的混合生成模型:深灰质-Box 满足最佳迁移 2506.22204v1
  • 1067 06-27 Task-Agnostic Contrastive Pretraining for Relational Deep Learning Task-Agnostic Kontrastive Vorschulung für relationales Deep Learning 关系深层学习任务不可允许的对抗性预备培训 2506.22530v1
  • 1068 06-27 REDELEX: A Framework for Relational Deep Learning Exploration REDELEX: Ein Rahmen für relationale Deep Learning Exploration REDELEX: 关系深深学习探索框架 2506.22199v1
  • 1069 06-27 AB-UPT: Scaling Neural CFD Surrogates for High-Fidelity Automotive Aerodynamics Simulations via Anchored-Branched Universal Physics Transformers AB-UPT: Skalierende Neuronale CFD-Überlagerungen für hochfeste automobile Aerodynamik-Simulationen über verankerte Universal-Physics-Transformer AB-UPT:通过 “ 支离破碎的宇宙物理变异器 “ 模拟高纤维汽车空气动力学模拟器的缩放神经 CFD 加速器 2502.09692v3
  • 1070 06-27 dreaMLearning: Data Compression Assisted Machine Learning dreaMLearning: Datenkompression unterstütztes maschinelles Lernen dreamlearning:数据压缩辅助机器学习 2506.22190v1
  • 1071 06-27 Exploring Modularity of Agentic Systems for Drug Discovery Erforschung der Modularität von Wirkstoffsystemen für die Drogenentdeckung 探索药物发现剂系统模式 2506.22189v1
  • 1072 06-27 LLM as GNN: Graph Vocabulary Learning for Text-Attributed Graph Foundation Models LLM als GNN: Graph Vocabulary Learning für text-Attributed Graph Foundation Models 作为GNN的LLMLM:文字图表基础模型图表词汇学习 2503.03313v2
  • 1073 06-27 Thompson Sampling-Based Learning and Control for Unknown Dynamic Systems Thompson Sampling-based Learning and Control für unbekannte dynamische Systeme Thompson 未知动态系统抽样学习和控制 2506.22186v1
  • 1074 06-27 ROME: Robust Multi-Modal Density Estimator ROME: Robuster Multi-Modal Density Estimator ROME: 强力多式密度模拟器 2401.10566v3
  • 1075 06-27 Graph-Reward-SQL: Execution-Free Reinforcement Learning for Text-to-SQL via Graph Matching and Stepwise Reward Graph-Reward-SQL: Executionsfreies Verstärkungslernen für Text-zu-SQL über Graph Matching und Stepwise Reward 图表-评分-SQL:通过图表匹配和分步评分,为文本到SQL进行无执行加强学习 2505.12380v2
  • 1076 06-27 ASVSim (AirSim for Surface Vehicles): A High-Fidelity Simulation Framework for Autonomous Surface Vehicle Research ASVSim (AirSim for Surface Vehicles): Ein Simulationsrahmen mit hoher Genauigkeit für autonome Oberflächenfahrzeugforschung ASVSim(地表车辆的航空Sim):自动地表车辆研究的高纤维模拟框架 2506.22174v1
  • 1077 06-27 Near Field Localization via AI-Aided Subspace Methods Nahfeldlokalisierung über KI-gestützte Subraum-Methoden 通过AI-援助的次级空间方法使近场定位化 2504.00599v2
  • 1078 06-27 Design Patterns for Securing LLM Agents against Prompt Injections Designmuster zur Sicherung von LLM-Agenten gegen sofortige Injektionen 防止及时注射的LLM代理物的安全设计模式 2506.08837v3
  • 1079 06-27 Earthquake Damage Grades Prediction using An Ensemble Approach Integrating Advanced Machine and Deep Learning Models Erdbebenschäden Grade Vorhersage mit einem Ensemble Ansatz Integration von fortgeschrittenen Maschinen und Deep Learning Modelle 采用组合办法预测地震损害等级,结合先进机器和深学习模式 2506.22129v1
  • 1080 06-27 Generative AI for O-RAN Slicing: A Semi-Supervised Approach with VAE and Contrastive Learning Generative KI für O-RAN Slicing: Ein semi-überwachter Ansatz mit VAE und Kontrastivem Lernen O-RAN切片:与VAE和反竞争学习的半监督方法 2401.08861v3
  • 1081 06-27 Identifying a Circuit for Verb Conjugation in GPT-2 Identifizierung eines Kreises für Verbkonjugation in GPT-2 在 GPT-2 中确定 Verb 混和的电路 2506.22105v1
  • 1082 06-27 Tied Prototype Model for Few-Shot Medical Image Segmentation Gebundenes Prototypenmodell für wenige heiße medizinische Bildsegmentierung 几乎没有热的医学图像分割的捆绑式原型模型 2506.22101v1
  • 1083 06-27 Transfer Learning for Assessing Heavy Metal Pollution in Seaports Sediments Transfer-Lernen für die Bewertung der Schwermetallverschmutzung in Seehäfen Sedimente 评估海港沉积物中重金属污染的转让学习 2506.22096v1
  • 1084 06-27 Learning to Solve Multi-Objective Routing Problems on Multigraphs Mehrzielige Routing-Probleme auf Multigraphen lösen lernen 学习解答多目标多路由问题 2506.22095v1
  • 1085 06-27 SONG: Self-Organizing Neural Graphs SONG: Selbstorganisierende Neuralgraphen SONG: 自组织神经图 2107.13214v2
  • 1086 06-27 Transformers are Graph Neural Networks Transformatoren sind Graphen-Neural-Netzwerke 变形器是图形神经网络 2506.22084v1
  • 1087 06-27 crypto price prediction using lstm+xgboost Krypto-Preisvorhersage mit lstm+xgboost 使用Istm+xgboust的加密价格预测 2506.22055v1
  • 1088 06-27 Forecasting the future development in quality and value of professional football players Prognose der zukünftigen Entwicklung in Qualität und Wert von Profi-Fußballspielern 预测专业足球运动员的质量和价值的未来发展 2502.07528v2
  • 1089 06-27 UniCA: Adapting Time Series Foundation Model to General Covariate-Aware Forecasting UniCA: Anpassung des Zeitreihen-Basismodells an die allgemeine Kovariate-Vorhersage UICCA:调整时间序列基础模型以适应通用共变软件预测 2506.22039v1
  • 1090 06-27 Hyper-modal Imputation Diffusion Embedding with Dual-Distillation for Federated Multimodal Knowledge Graph Completion Hypermodale Imputation Diffusion Einbettung mit Dual-Destillation für Federated Multimodal Knowledge Graph Completion 为完成联邦多模式知识图而与双蒸馏相结合的超时热量光化扩散 2506.22036v1
  • 1091 06-27 CAPM: Fast and Robust Verification on Maxpool-based CNN via Dual Network CAPM: Schnelle und robuste Überprüfung auf Maxpool-basiertem CNN über Dual Network CAPM:通过双重网络对Maxpool的有线电视新闻网进行快速和有力的核查 2407.09550v3
  • 1092 06-27 Learning Data-Driven Uncertainty Set Partitions for Robust and Adaptive Energy Forecasting with Missing Data Learning Data-Driven Uncertainty Set Partitionen für robuste und adaptive Energieprognosen mit fehlenden Daten 缺少数据的强力和适应性能源预测的学习数据不确定性定出分区 2503.20410v2
  • 1093 06-27 C-Learner: Constrained Learning for Causal Inference C-Learner: Eingeschränktes Lernen für kausale Schlussfolgerung C-Learner: 控制学习以诱因推断 2405.09493v5
  • 1094 06-27 Distilling the Unknown to Unveil Certainty Destillieren des Unbekannten zur Enthüllung der Gewissheit 蒸馏未知至 Unveil 确定性 2311.07975v3
  • 1095 06-27 TROFI: Trajectory-Ranked Offline Inverse Reinforcement Learning TROFI: Flugbahngestütztes Offline-Inversum-Verstärkung-Lernen TROFI: 轨迹 - 脱线反向强化学习 2506.22008v1
  • 1096 06-27 GKNet: Graph Kalman Filtering and Model Inference via Model-based Deep Learning GKNet: Graph Kalman Filterung und Modellableitung über modellbasiertes Deep Learning GKNet:通过基于模型的深层学习进行图表卡路曼过滤和模型推断 2506.22004v1
  • 1097 06-27 Generative adversarial neural networks for simulating neutrino interactions Generative adversariale neuronale Netzwerke zur Simulation von Neutrino-Interaktionen 产生模拟中微微子相互作用的对抗性神经网络 2502.20244v2
  • 1098 06-27 Time series classification with random convolution kernels: pooling operators and input representations matter Zeitreihenklassifizierung mit zufälligen Konvolutionskernen: Bündelung von Operatoren und Eingabedarstellungen 随机卷变内核的时间序列分类:集合操作员和输入代表事项 2409.01115v4
  • 1099 06-27 Federated Data-Efficient Instruction Tuning for Large Language Models Federated Data-Efficient Instruction Tuning für große Sprachmodelle 大语言模式联邦数据效率指示图示 2410.10926v2
  • 1100 06-27 Analyzing and Fine-Tuning Whisper Models for Multilingual Pilot Speech Transcription in the Cockpit Analysieren und Feintuning-Flüsternmodelle für mehrsprachige Pilot-Sprachtranskription im Cockpit 分析并精精精细调校车舱多语种试验性语音翻译多语种试听模式 2506.21990v1
  • 1101 06-27 The Mamba in the Llama: Distilling and Accelerating Hybrid Models Der Mamba im Llama: Destillieren und Beschleunigen von Hybridmodellen Llama的Mamba:蒸馏和加速混合模型 2408.15237v4
  • 1102 06-27 BeamLLM: Vision-Empowered mmWave Beam Prediction with Large Language Models BeamLLM: Vision-Empowered mmWave Beam Prediction mit großen Sprachmodellen BeamLLM: 具有大语言模型的视觉-电子动力毫米 2503.10432v2
  • 1103 06-27 Distributional Reduction: Unifying Dimensionality Reduction and Clustering with Gromov-Wasserstein Verteilungsreduktion: Vereinheitlichende Dimensionalitätsreduktion und Clustering mit Gromov-Wasserstein 减少分配:与格罗莫夫-瓦塞尔斯坦合并统一减少多维度 2402.02239v3
  • 1104 06-27 Green LIME: Improving AI Explainability through Design of Experiments Green LIME: Verbesserung der KI-Erklärbarkeit durch Design von Experimenten 绿色LIME:通过设计实验改进AI可解释性 2502.12753v2
  • 1105 06-27 A Survey on Federated Fine-tuning of Large Language Models Eine Umfrage über Federated Fine-Tuning von großen Sprachmodellen 大语言模式联邦微调调查 2503.12016v2
  • 1106 06-27 Spectraformer: A Unified Random Feature Framework for Transformer Spektraformer: Ein einheitliches Random Feature Framework für Transformer 插件: 变形器统一随机地物框架 2405.15310v4
  • 1107 06-27 SceneDiffuser++: City-Scale Traffic Simulation via a Generative World Model SceneDiffuser++: City-Scale Verkehrssimulation über ein Generatives Weltmodell 景点Diffuser++:通过创世模式的城市规模交通量模拟 2506.21976v1
  • 1108 06-27 Advancing Jailbreak Strategies: A Hybrid Approach to Exploiting LLM Vulnerabilities and Bypassing Modern Defenses Verbesserung der Strategien des Jailbreaks: Ein hybrider Ansatz, um LLM-Verletzungen auszunutzen und moderne Verteidigungen zu umgehen 推进破牢战略:利用LLM脆弱性和绕过现代防御的混合办法 2506.21972v1
  • 1109 06-27 Mitigating Metropolitan Carbon Emissions with Dynamic Eco-driving at Scale Mit dem dynamischen Öko-Fahren im Maßstab die Emissionen von Metropolitankohlenstoff mindern 减缓城市碳排放,在规模上进行动态生态驾驶 2408.05609v2
  • 1110 06-27 More Vulnerable than You Think: On the Stability of Tool-Integrated LLM Agents Schwacher als Sie denken: Zur Stabilität von werkzeugintegrierten LLM-Agenten 比你想象的更加脆弱:关于工具集成LLM剂稳定问题 2506.21967v1
  • 1111 06-27 On the Lipschitz Continuity of Set Aggregation Functions and Neural Networks for Sets Über die Lipschitz Kontinuität von Set-Aggregationsfunktionen und neuralen Netzwerken für Sets 设置集合功能和神经网络的利普西茨连续性 2505.24403v2
  • 1112 06-27 Optimal Return-to-Go Guided Decision Transformer for Auto-Bidding in Advertisement Optimaler Return-to-Go-Geführter Entscheidungstransformator für Auto-Bidding in der Werbung 广告中自动提交申请的优化回向向向决策变换器 2506.21956v1
  • 1113 06-27 deCIFer: Crystal Structure Prediction from Powder Diffraction Data using Autoregressive Language Models deCIFer: Crystal Structure Vorhersage von Powder Diffraction Daten mit autoregressiven Sprachmodellen deCIFer:使用自动递减语言模型从粉末碎片数据中预测水晶结构 2502.02189v3
  • 1114 06-27 Smooth-Distill: A Self-distillation Framework for Multitask Learning with Wearable Sensor Data Smooth-Destill: Ein Selbstdestillations-Framework für Multitask-Lernen mit tragbaren Sensordaten 平流蒸馏:用可穿感应器数据进行多任务学习的自我蒸馏框架 2507.00061v1
  • 1115 06-27 Physics-informed network paradigm with data generation and background noise removal for diverse distributed acoustic sensing applications Physik-informiertes Netzwerk-Paradigma mit Datengenerierung und Hintergrundrauschenentfernung für vielfältige verteilte akustische Sensoranwendungen 物理知情网络范式,为分散的多种声学遥感应用收集数据和背景噪音清除 2506.21952v1
  • 1116 06-27 Hitchhiking Rides Dataset: Two decades of crowd-sourced records on stochastic traveling Hitchhiking Rides Datensatz: Zwei Jahrzehnte von crowd-sourced Rekorde auf stochastische Reisen Hitchhiking rides数据集:20年关于随机旅行的多方来源记录 2506.21946v1
  • 1117 06-27 Embedding-based Approaches to Hyperpartisan News Detection Einbetten-basierte Ansätze zu Hyperparteien-Nachrichten-Erkennung 以嵌入式办法探测超党派新闻 2501.01370v2
  • 1118 06-27 Step-by-Step Video-to-Audio Synthesis via Negative Audio Guidance Schritt für Schritt Video-zu-Audio-Synthese über Negative Audio-Anleitung 通过消极音频指导,逐步进行视频到视听合成 2506.20995v2
  • 1119 06-27 GuiderNet: A Meta-Learning Framework for Optimizing Quantum Circuit Geometry and Mitigating Barren Plateaus GuiderNet: Ein Meta-Learning-Framework zur Optimierung der Quantum Circuit Geometrie und zur Abmilderung von Barren Plateaus 指南Net:优化量子电路几何和减缓巴伦高原的元学习框架 2506.21940v1
  • 1120 06-27 HQCM-EBTC: A Hybrid Quantum-Classical Model for Explainable Brain Tumor Classification HQCM-EBTC: Hybrides Quantum-Klassisches Modell für erklärbare Hirntumor-Klassifikation HQCM-EBTC:可解释脑肿瘤分类混合量度-分类模型 2506.21937v1
  • 1121 06-27 Joint Task Offloading and Resource Allocation in Low-Altitude MEC via Graph Attention Diffusion Gemeinsame Aufgabe Offloading und Ressourcenzuteilung in niedrig-Altitude MEC über Graphen-Achtung Diffusion 通过 “ 图表关注 “ 传播在低海拔MEC中联合任务卸载和资源分配 2506.21933v1
  • 1122 06-27 Causal Inference Isn’t Special: Why It’s Just Another Prediction Problem Kausale Schlussfolgerung ist nicht besonders: Warum es nur ein weiteres Vorhersageproblem ist 原因推论不是特别的:为什么这只是另一个预测问题? 2504.04320v2
  • 1123 06-27 Mic-hackathon 2024: Hackathon on Machine Learning for Electron and Scanning Probe Microscopy Mic-hackathon 2024: Hackathon zum maschinellen Lernen für Elektronen- und Rastersondenmikroskopie Mic-hackathon 2024:关于电子和扫描探测器机器学习的Hackathon 2506.08423v2
  • 1124 06-27 Foundation Model Insights and a Multi-Model Approach for Superior Fine-Grained One-shot Subset Selection Grundlagenmodell-Insights und ein Multi-Model-Ansatz für eine erstklassige Auswahl an feinkörnigen Ein-Schuss-Subsets 基础型透视模型和高精粒单弹分子集选多模式办法 2506.14473v2
  • 1125 06-27 TOAST: Task-Oriented Adaptive Semantic Transmission over Dynamic Wireless Environments TOAST: Taskorientierte adaptive Semantische Übertragung über dynamische drahtlose Umgebungen TOAST: 动态无线环境中以任务为导向的适应性语义传输 2506.21900v1
  • 1126 06-27 Advancements and Challenges in Continual Reinforcement Learning: A Comprehensive Review Fortschritte und Herausforderungen im kontinuierlichen Ausbau des Lernens: Ein umfassender Überblick 不断加强学习的进展和挑战:全面审查 2506.21899v1
  • 1127 06-27 Enhancing Cloud Security through Topic Modelling Verbesserung der Cloud-Sicherheit durch Themenmodellierung 通过专题建模模式加强云层安全 2505.01463v2
  • 1128 06-27 Stability of Primal-Dual Gradient Flow Dynamics for Multi-Block Convex Optimization Problems Stabilität von primär-Dual Gradient Flow Dynamics für Multi-Block Convex Optimierungsprobleme 多锁组合最佳化问题微型-多级梯度流动动态的稳定 2408.15969v2
  • 1129 06-27 Thompson Sampling in Function Spaces via Neural Operators Thompson Probenahme in Funktionsräumen über neurale Operatoren Thompson 通过神经操作员在功能空间取样 2506.21894v1
  • 1130 06-27 Interactive Multi-Objective Probabilistic Preference Learning with Soft and Hard Bounds Interaktives multi-objektives probabilistisches Preference-Lernen mit weichen und harten Grenzen 以软和硬两环进行交互式多目标概率预测性偏好学习 2506.21887v1
  • 1131 06-27 UnMix-NeRF: Spectral Unmixing Meets Neural Radiance Fields UnMix-NeRF: Spektrales Unmixen trifft auf neurale Strahlungsfelder Un Mix- NERF: 光谱融合结合满足神经辐射场 2506.21884v1
  • 1132 06-27 Advances in Temporal Point Processes: Bayesian, Neural, and LLM Approaches Fortschritte in Temporal Point Prozessen: Bayesian, Neural, und LLM Ansätze Bayesian、Neural和LLM方法等时点工艺的进展 2501.14291v2
  • 1133 06-27 A Survey of Continual Reinforcement Learning Ein Überblick über kontinuierliches Weiterbildungslernen 持续加强学习调查 2506.21872v1
  • 1134 06-27 SPADE: Spatial Transcriptomics and Pathology Alignment Using a Mixture of Data Experts for an Expressive Latent Space SPADE: Raum-Transkriptomik und Pathologie-Ausrichtung mit einer Mischung von Datenexperten für einen Expressiven Latent Space SPADE: 利用数据专家混合模型进行空间转基因学和病理学调整 2506.21857v1
  • 1135 06-27 Leveraging Online Olympiad-Level Math Problems for LLMs Training and Contamination-Resistant Evaluation Leveraging Online-Olympiade-Level-Mathematik Probleme für LLMs Training und Kontaminierung-Resistent Evaluation 利用在线奥林匹克层面的数学问题促进LLM女士的培训和污染 – – 评估 2501.14275v2
  • 1136 06-27 Unveiling the Power of Noise Priors: Enhancing Diffusion Models for Mobile Traffic Prediction Die Kraft der Lärmvorhersage enthüllen: Diffusionsmodelle für mobile Verkehrsvorhersage verbessern 打破噪音前置物的威力:加强移动交通预测的传播模式 2501.13794v3
  • 1137 06-27 The Consistency Hypothesis in Uncertainty Quantification for Large Language Models Die Kohärenzhypothese in der Unsicherheitsquantifizierung für große Sprachmodelle 《大语言模型不确定性量化不确定性的一致假设》 2506.21849v1
  • 1138 06-27 QT-DoG: Quantization-aware Training for Domain Generalization QT-DOG: Quantization-aware Training für Domain Generalization QT-DoG: 区域通用化的量化了解培训 2410.06020v2
  • 1139 06-27 Koopman operator-based discussion on partial observation in stochastic systems Koopman operator-based discussion on partial observation in stochastic systems Koopman操作员关于抽查系统中部分观察的讨论 2506.21844v1
  • 1140 06-27 Adversarial Threats in Quantum Machine Learning: A Survey of Attacks and Defenses Widersprüchliche Bedrohungen im Quantum Machine Learning: Eine Übersicht über Angriffe und Verteidigungen 量子机器学习中的反对威胁:攻击和防御调查 2506.21842v1
  • 1141 06-27 The Cost of Avoiding Backpropagation Die Kosten der Vermeidung von Rückvermehrung 避免反言的成本 2506.21833v1
  • 1142 06-27 Computational Efficient and Minimax Optimal Nonignorable Matrix Completion Computational Efficient und Minimax Optimale nicht entzündbare Matrix-Vervollständigung 计算效率高和最小最大最佳不皮不皮的矩阵补全 2504.04016v2
  • 1143 06-27 RLSF: Fine-tuning LLMs via Symbolic Feedback RLSF: Feinjustierende LLMs über symbolisches Feedback RLSF:通过符号反馈对LLMs进行微调 2405.16661v3
  • 1144 06-27 Fetal Sleep: A Cross-Species Review of Physiology, Measurement, and Classification Fetaler Schlaf: Eine Cross-Spezies-Überprüfung der Physiologie, Messung und Klassifizierung 胎儿睡眠:对生理、测量和分类的跨物种审查 2506.21828v1
  • 1145 06-27 Few-Shot Segmentation of Historical Maps via Linear Probing of Vision Foundation Models Wenig scharfe Segmentierung historischer Karten über Linear Probing of Vision Foundation Models 通过视觉基金会模型的线性考察对历史地图进行几小片段分割 2506.21826v1
  • 1146 06-26 (4) Mathematical Modeling of Protein Structures: A Cohomology-Based Approach to the Flagellar Motor Mathematische Modellierung von Proteinstrukturen: Ein auf Kohomologie basierender Ansatz zum Flagellar Motor 蛋白质结构的数学建模:对旗舰汽车采取基于共同地质学的方法 2504.16941v2
  • 1147 06-26 CAT-SG: A Large Dynamic Scene Graph Dataset for Fine-Grained Understanding of Cataract Surgery CAT-SG: Ein großer dynamischer Szenendiagramm-Datensatz für ein feinkörniges Verständnis der Kataraktchirurgie CAT-SG:用于精确了解白内障外科手术的大型动态场景图数据集 2506.21813v1
  • 1148 06-26 Classification with Reject Option: Distribution-free Error Guarantees via Conformal Prediction Klassifizierung mit Ablehnungsoption: Verteilungsfreie Fehlergarantien durch konforme Vorhersage 采用拒绝分类办法:通过非正规预测保证无分发错误 2506.21802v1
  • 1149 06-26 Offensive Language Detection on Social Media Using XLNet Offensive Spracherkennung auf Social Media mit XLNet 使用XLNet在社交媒体上发现攻击性语言 2506.21795v1
  • 1150 06-26 Multi-task parallelism for robust pre-training of graph foundation models on multi-source, multi-fidelity atomistic modeling data Multi-Task-Parallelismus für robustes Pre-Training von Graphen-Grundmodellen auf Multi-Source-, Multi-Fidelity atomistic Modellierungsdaten 为多源、多信仰、多宗教、原子模型化数据图像基础模型模型的可靠培训前培训提供多任务平行的强有力任务 2506.21788v1
  • 1151 06-26 Estimating Correctness Without Oracles in LLM-Based Code Generation Schätzung der Korrektheit ohne Oracles in der LLM-basierten Code-Generierung 在基于LLM的代码生成中估算无甲骨文的正确性 2507.00057v1
  • 1152 06-26 Sparse-Reg: Improving Sample Complexity in Offline Reinforcement Learning using Sparsity Sparse-Reg: Verbesserung der Probenkomplexität im Offline-Verstärkungs-Lernen mit Sparsity 利用公平性改进离线强化学习的抽样复杂性 2506.17155v2
  • 1153 06-26 Graph ODEs and Beyond: A Comprehensive Survey on Integrating Differential Equations with Graph Neural Networks Graph ODEs and Beyond: Eine umfassende Umfrage zur Integration von Differentialgleichungen mit Graph Neural Networks 图ODEs 及其他:关于将差异等同与图形神经网络相结合的综合调查 2503.23167v3
  • 1154 06-26 M3PO: Massively Multi-Task Model-Based Policy Optimization M3PO: Massive Multi-Task-Modellbasierte Politikoptimierung M3PO: 大规模多任务多任务模式政策优化 2506.21782v1
  • 1155 06-26 Multi-thresholding Good Arm Identification with Bandit Feedback Multi-Thresholding Good Arm Identification mit Bandit Feedback 多保持好武器及强盗反馈 2503.10386v3
  • 1156 06-26 Generative Data Mining with Longtail-Guided Diffusion Generatives Data Mining mit Langschwanz-geführter Diffusion 利用长尾指导传播生成数据采矿 2502.01980v2
  • 1157 06-26 Gradient-Based Neuroplastic Adaptation for Concurrent Optimization of Neuro-Fuzzy Networks Gradient-Based Neuroplastic Adaptation für die gleichzeitige Optimierung von Neuro-Fuzzy-Netzwerken 为同时优化Neuro-Fuzzy网络而渐进式神经塑料适应 2506.21771v1
  • 1158 06-26 Early Glaucoma Detection using Deep Learning with Multiple Datasets of Fundus Images Frühe Glaukomerkennung mit Deep Learning mit mehreren Datensätzen von Fundus-Bildern 利用包含Fundus图像多数据集的深学习探测早期青光眼 2506.21770v1
  • 1159 06-26 Testing Causal Models with Hidden Variables in Polynomial Delay via Conditional Independencies Testen von Kausalmodellen mit versteckten Variablen in Polynom-Delay über bedingte Abhängigkeiten 测试因有条件独立导致多配偶拖延中隐藏变量的因果模型 2409.14593v2
  • 1160 06-26 Nested Stochastic Algorithm for Generalized Sinkhorn distance-Regularized Distributionally Robust Optimization Verschachtelter stochastischer Algorithmus für generalisierte Sinkhorndistanz-regularisierte distributiv robuste Optimierung 用于通用 Sinkhorn 远距离分解分布的内嵌斯托克算法优化 2503.22923v2
  • 1161 06-26 VersaPRM: Multi-Domain Process Reward Model via Synthetic Reasoning Data VersaPRM: Multi-Domain-Prozess-Reward-Modell über synthetisch begründete Daten VersaPRM:通过合成理由数据进行多部进程回报模型 2502.06737v2
  • 1162 06-26 TADA: Improved Diffusion Sampling with Training-free Augmented Dynamics TADA: Verbessertes Diffusions-Sampling mit trainingsfreier Augmented Dynamics TAD: 改进无培训增强动力的传播抽样 2506.21757v1
  • 1163 06-26 Beyond Conformal Predictors: Adaptive Conformal Inference with Confidence Predictors Beyond Conformal Predictors: Adaptive Conformal Inferenz mit Confidence Predictors 超越非正式预测者:与信心预测者发生适应性非正式推断 2409.15548v4
  • 1164 06-26 Inverse Design of Diffractive Metasurfaces Using Diffusion Models Inverses Design von diffraktiven Metaoberflächen unter Verwendung von Diffusionsmodellen 使用扩散模型反向设计 diffractive 元表层 2506.21748v1
  • 1165 06-26 Analysis of static and dynamic batching algorithms for graph neural networks Analyse von statischen und dynamischen Batching-Algorithmen für Graphen-Neuralnetzwerke 图表神经网络静态和动态批量算法分析 2502.00944v2
  • 1166 06-26 Asymmetric Graph Error Control with Low Complexity in Causal Bandits Asymmetrische Graphenfehlerkontrolle mit geringer Komplexität bei Kausalbändern Causal 土匪中低复杂度的对称图形错误控制 2408.11240v2
  • 1167 06-26 Zebra: In-Context Generative Pretraining for Solving Parametric PDEs Zebra: Im Kontext Generatives Vortraining zur Lösung parametrischer PDEs Zebra: 解决参数PDE的内源生成预设培训 2410.03437v3
  • 1168 06-26 Federated Item Response Theory Models Föderated Item Response Theory Modelle 联邦项目反应理论模型 2506.21744v1
  • 1169 06-26 Storm Surge in Color: RGB-Encoded Physics-Aware Deep Learning for Storm Surge Forecasting Storm Surge in Farbe: RGB-kodierte Physik-Bewusstes Lernen für Storm Surge Forecasting 彩色暴风雨: RGB-Encodd 物理软件深入学习以预测风暴暴雨。 2506.21743v1
  • 1170 06-26 Critically-Damped Higher-Order Langevin Dynamics Critically-Damped Higher-Order Langevin Dynamics 高分朗埃文动力公司 2506.21741v1
  • 1171 06-26 Modification of a Numerical Method Using FIR Filters in a Time-dependent SIR Model for COVID-19 Änderung einer numerischen Methode mit FIR-Filtern in einem zeitabhängigen SIR-Modell für COVID-19 修改CCOVID-19基于时间的SIR模型中使用FIR过滤器的数字方法 2506.21739v1
  • 1172 06-26 Hierarchical Reasoning Model Hierarchisches Modell der Vernunft 等级推理模型 2506.21734v1
  • 1173 06-26 Experimental investigation of pose informed reinforcement learning for skid-steered visual navigation Experimentelle Untersuchung des Pose-informierten Verstärkungslernens für die geschliffene visuelle Navigation 进行实验性调查,以进行有事实根据的强化学习,以进行滑滑滑式先行目视导航的强化学习 2506.21732v1
  • 1174 06-26 Learning treatment effects while treating those in need Lernbehandlungseffekte bei der Behandlung von Bedürftigen 治疗有需要者的同时的学习治疗效果 2407.07596v2
  • 1175 06-26 CaloHadronic: a diffusion model for the generation of hadronic showers CaloHadronic: ein Diffusionsmodell für die Erzeugung von Hadronikduschen Calo Hadronic: 生成超时速淋浴的传播模型 2506.21720v1
  • 1176 06-26 Performance Prediction for Large Systems via Text-to-Text Regression Leistungsvorhersage für große Systeme über Text-zu-Text-Regression 通过文字到文字倒退对大型系统的性能预测 2506.21718v1
  • 1177 06-26 Explicit neural network classifiers for non-separable data Explizite neuronale Netzwerkklassifikatoren für nicht voneinander getrennte Daten 对不可分离数据进行明确的神经网络分类 2504.18710v2
  • 1178 06-26 CRISP-NAM: Competing Risks Interpretable Survival Prediction with Neural Additive Models CRISP-NAM: Konkurrenzfähige Risiken interpretierbare Überlebensvorhersage mit neuralen Additivenmodellen CRIISP-NAM: 与神经添加模型相竞争的风险解释性生存预测 2505.21360v3
  • 1179 06-26 Statistical Inference of the Value Function for Reinforcement Learning in Infinite Horizon Settings Statistische Schlussfolgerung der Wertfunktion für die Stärkung des Lernens in unendlichen Horizont-Einstellungen 在无限地平地环境中加强学习的价值观职能的统计推论 2001.04515v3
  • 1180 06-26 Unimodal Strategies in Density-Based Clustering Unimodale Strategien in Dichte-basiertem Clustering 以密度为基础的群集统一模式战略 2506.21695v1
  • 1181 06-26 Multimodal Misinformation Detection Using Early Fusion of Linguistic, Visual, and Social Features Multimodale Fehlinformationserkennung mittels frühzeitiger Fusion sprachlicher, visueller und sozialer Merkmale 利用语言、视觉和社会特征的早期融合来进行多模式错误信息探测 2507.01984v1
  • 1182 06-26 Link Prediction with Physics-Inspired Graph Neural Networks Verknüpfung von Vorhersagen mit physikinspirierten Graphen-Neuralen Netzwerken 将预测与物理启发的图形神经网络联系起来 2402.14802v3
  • 1183 06-26 ANUBHUTI: A Comprehensive Corpus For Sentiment Analysis In Bangla Regional Languages ANUBHUTI: Ein umfassender Corpus für die Sentimentanalyse in Bangla Regionalsprachen ANUBUHUTI:孟加拉语地区语言中感应分析综合整体体 2506.21686v1
  • 1184 06-26 Risk-Averse Total-Reward Reinforcement Learning Risiko-Averse Total-Reward-Verstärkungs-Lernen 反风险 反风险 反 反 反 反 反 反 反 反 反 反 反 反 反 反 反 反 反 反 反 反 反 反 反 反 反 反 反 反 反 反 反 反 反 反 反 反 反 反 反 反 反 反 反 反 反 2506.21683v1
  • 1185 06-26 TanDiT: Tangent-Plane Diffusion Transformer for High-Quality 360° Panorama Generation TanDiT: Tangent-Plane Diffusion Transformer für hochwertige 360° Panorama-Generation TANDIT: 高质360°全景生成的相切电流扩散变异器 2506.21681v1
  • 1186 06-26 Whole-Body Conditioned Egocentric Video Prediction Ganzkörperbedingte egozentrische Videovorhersage 整盘有条件的Egocentcent视频预报 2506.21552v1
  • 1187 06-26 mTSBench: Benchmarking Multivariate Time Series Anomaly Detection and Model Selection at Scale mTSBench: Benchmarking Multivariate Zeitreihen Anomalieerkennung und Modellauswahl auf Scale mTSBench:制定多变时间序列基准 2506.21550v1
  • 1188 06-26 Data Efficacy for Language Model Training Dateneffizienz für Sprachmodellschulungen 语文示范培训的数据效率 2506.21545v1
  • 1189 06-26 APO: Enhancing Reasoning Ability of MLLMs via Asymmetric Policy Optimization APO: Verbesserung der Fähigkeit von MLLMs durch asymmetrische politische Optimierung APO:通过非对称政策优化政策提高低液成比例模型的理性能力 2506.21655v1
  • 1190 06-26 Maximal Matching Matters: Preventing Representation Collapse for Robust Cross-Modal Retrieval Maximal aufeinander abgestimmte Materien: Vermeidung von Darstellungskollaps für robustes Cross-Modal Retrieval 最大匹配事项: 防止在强力跨模式检索中出现代表比例折叠 2506.21538v1
  • 1191 06-26 Exploring the Design Space of 3D MLLMs for CT Report Generation Erforschung des Design-Raums von 3D-MLLMs für die CT-Berichtserstellung 为编写CT报告探索3D MLLMs的设计空间 2506.21535v1
  • 1192 06-26 Chain-of-Sketch: Enabling Global Visual Reasoning Chain-of-Sketch: Globale visuelle Vernunft aktivieren 标准链链:扶持全球视觉理性 2410.08165v2
  • 1193 06-26 Mesh-Informed Neural Operator : A Transformer Generative Approach Mesh-informed Neural Operator : Ein transformer Generativer Ansatz 气象化神经操作器:变异创造方法 2506.16656v2
  • 1194 06-26 Efficiently Escaping Saddle Points under Generalized Smoothness via Self-Bounding Regularity Effiziente Flucht aus Sattelpunkten unter generalisierter Glätte durch selbsterklärende Regelmäßigkeit 通过自我调整常态,在普遍平滑状态下有效绕开散装货架点 2503.04712v2
  • 1195 06-26 Gaussian Invariant Markov Chain Monte Carlo Gaussian Invariant Markov Kette Monte Carlo Gausian Invarianant Markov 链 蒙特卡洛 2506.21511v1
  • 1196 06-26 skLEP: A Slovak General Language Understanding Benchmark sklep: Ein slowakisches allgemeines Sprachverständnis Benchmark SkLEP:斯洛伐克一般语言理解基准 2506.21508v1
  • 1197 06-26 NY Real Estate Racial Equity Analysis via Applied Machine Learning NY Real Estate Racial Equity Analyse über angewandtes maschinelles Lernen 通过应用机器学习进行房地产种族公平分析 2505.16946v3
  • 1198 06-26 Process mining-driven modeling and simulation to enhance fault diagnosis in cyber-physical systems Prozess-Mining-gesteuerte Modellierung und Simulation zur Verbesserung der Fehlerdiagnose in cyber-physischen Systemen 由采矿流程驱动的模型和模拟模型和模拟,以加强网络物理系统中的过失诊断 2506.21502v1
  • 1199 06-26 Devising a solution to the problems of Cancer awareness in Telangana Lösung der Probleme des Krebsbewusstseins in Telangana 制定特拉甘纳癌症意识问题解决方案 2506.21500v1
  • 1200 06-26 Multi-Preference Lambda-weighted Listwise DPO for Dynamic Preference Alignment Multi-Preference Lambda-bewertet Listwise DPO für Dynamic Preference Alignment 多首选项 Lambda 加权列表 DPO 动态首选项一致 2506.19780v2
  • 1201 06-26 One Model to Forecast Them All and in Entity Distributions Bind Them Ein Modell, um sie zu prognostizieren Alles und in Entity-Distributionen Bind Them 预测所有实体和实体分配的模型之一 2501.15499v2
  • 1202 06-26 Prompting with Phonemes: Enhancing LLMs’ Multilinguality for Non-Latin Script Languages Mit Phonemes: Mehrsprachigkeit von LLMs für nicht-lateinische Script-Sprachen verbessern 以电话提示:提高LLMS的非拉丁文拼写语言多重语言质量 2411.02398v3
  • 1203 06-26 Towards Reliable Detection of Empty Space: Conditional Marked Point Processes for Object Detection Zuverlässige Erkennung von leerem Raum: Bedingte markierte Punktprozesse für Objekterkennung 争取可靠地探测空空空间:物体探测的有条件定点过程 2506.21486v1
  • 1204 06-26 Evaluation of Traffic Signals for Daily Traffic Pattern Bewertung von Verkehrssignalen für das tägliche Verkehrsmuster 对每日交通模式交通信号的评价 2506.21469v1
  • 1205 06-26 In-Context Learning Strategies Emerge Rationally In-Context Learning Strategies Emerge Rational 新兴动力 2506.17859v2
  • 1206 06-26 Optimising 4th-Order Runge-Kutta Methods: A Dynamic Heuristic Approach for Efficiency and Low Storage Optimierung der Runge-Kutta-Methoden der 4. Ordnung: Dynamischer heuristischer Ansatz für Effizienz und geringen Speicher 优化第四阶极龙格-库塔方法:高效和低储存的动态超光速方法 2506.21465v1
  • 1207 06-26 Capacity-Constrained Online Learning with Delays: Scheduling Frameworks and Regret Trade-offs Capacity-Constrained Online-Lernen mit Verzögerungen: Scheduling Frameworks und Trade-offs bedauern 受能力制约的有延误的在线学习:时间安排框架和悔恨取舍 2503.19856v2
  • 1208 06-26 Aligning Spoken Dialogue Models from User Interactions Ausrichten von gesprochenen Dialogmodellen aus Benutzerinteraktionen 校对用户互动中的口语对话框模型 2506.21463v1
  • 1209 06-26 A Keyword-Based Technique to Evaluate Broad Question Answer Script Eine Keyword-basierte Technik zur Bewertung von Broad Question Answer Script 用于评价广泛问答脚本的关键字技术 2506.21461v1
  • 1210 06-26 Wild refitting for black box prediction Wilde Nachrüstung für Black Box Vorhersage 黑盒预测的野生改造 2506.21460v1
  • 1211 06-26 Fake it till You Make it: Reward Modeling as Discriminative Prediction Verfälschen Sie es, bis Sie es: Belohnung Modellieren als diskriminative Vorhersage 假称直到你做出它: 奖赏模型作为有偏见的预测 2506.13846v2
  • 1212 06-26 Measurement to Meaning: A Validity-Centered Framework for AI Evaluation Messung zur Bedeutung: Ein gültigkeitszentrierter Rahmen für die AI-Bewertung 衡量到意义:AI评价的有效性-中心框架 2505.10573v4
  • 1213 06-26 PARALLELPROMPT: Extracting Parallelism from Large Language Model Queries PARALLELPROMPT: Parallelität aus großen Sprachmodellfragen extrahieren PARALELPROPT:从大语言模式查询中提取平行论 2506.18728v2
  • 1214 06-26 Towards an Optimal Control Perspective of ResNet Training Auf dem Weg zu einer optimalen Steuerungsperspektive der ResNet-Schulung 建立ResNet培训最佳控制视角 2506.21453v1
  • 1215 06-26 A Comprehensive Dataset for Underground Miner Detection in Diverse Scenario Ein umfassender Datensatz für die Untertage-Miner-Erkennung in unterschiedlichen Szenarien 不同情景下地下矿工探测综合数据集 2506.21451v1
  • 1216 06-26 Learnable Adaptive Time-Frequency Representation via Differentiable Short-Time Fourier Transform Lernbare adaptive Zeit-Frequenz-Darstellung über differenzierbare Kurzzeit Fourier-Transformation 通过有区别的短时四轮式变换,通过有区别的短时四轮式变换, 2506.21440v1
  • 1217 06-26 New Bounds for Sparse Variational Gaussian Processes Neue Grenzen für Sparse Variational Gaussian Prozesse 偏偏多高斯进程的新界口 2502.08730v2
  • 1218 06-26 Explainability of Large Language Models using SMILE: Statistical Model-agnostic Interpretability with Local Explanations Erklärbarkeit großer Sprachmodelle mit SMILE: Statistische Modell-agnostische Interpretierbarkeit mit lokalen Erklärungen 使用SMILE解释大语言模型的可解释性:统计模型 – – 与当地解释的可解释性 2505.21657v3
  • 1219 06-26 Graph Neural Network for Neutrino Physics Event Reconstruction Graph Neural Netzwerk für Neutrino Physik Ereignis Rekonstruktion 中子物理事件重建神经网络 2403.11872v2
  • 1220 06-26 The Sample Complexity of Learning Lipschitz Operators with respect to Gaussian Measures Die Probenkomplexität von Lipschitz-Betreibern in Bezug auf Gaussische Maßnahmen Gaussian措施方面学习利普施茨经营者的抽样复杂性 2410.23440v3
  • 1221 06-26 Deception Detection in Dyadic Exchanges Using Multimodal Machine Learning: A Study on a Swedish Cohort Deception Detection in dyadischen Austauschen mit multimodalem maschinellem Lernen: Eine Studie über eine schwedische Kohorte 利用多式机器学习的多式机器交流中的欺骗感检测:瑞典教区研究 2506.21429v1
  • 1222 06-26 Flow-Based Single-Step Completion for Efficient and Expressive Policy Learning Flow-based Single-Step-Abschluss für effizientes und expressives politisches Lernen 以流动为基础的单一步骤完成高效和明确政策学习 2506.21427v1
  • 1223 06-26 TracLLM: A Generic Framework for Attributing Long Context LLMs TracLLM: Ein generisches Rahmenwerk für die Zuweisung von LLMs mit langem Kontext TracLLM: 长期LMLM授标通用框架 2506.04202v3
  • 1224 06-26 Continual Learning as Computationally Constrained Reinforcement Learning Kontinuierliches Lernen als Computationally Constrained Reinforcement Learning 持续学习作为计算限制的训练强化学习 2307.04345v3
  • 1225 06-26 Improving Stochastic Cubic Newton with Momentum Verbesserung der stochastischen Kubik Newton mit Momentum 快速改善斯托卡立方立方牛顿 2410.19644v2
  • 1226 06-26 Action-Minimization Meets Generative Modeling: Efficient Transition Path Sampling with the Onsager-Machlup Functional Aktionsminimierung trifft auf generative Modellierung: Effizientes Transition Path Sampling mit der Onsager-Machlup Funktion 行动最优化符合产生模型的生成模型:与Onsager-Machlup 职能进行高效率过渡道路抽样 2504.18506v3
  • 1227 06-26 Distributed Cross-Channel Hierarchical Aggregation for Foundation Models Verteilte Cross-Channel Hierarchische Aggregation für Stiftungsmodelle 基金会模型分布式跨河道分道分道分道分道分道分道分道分道分道分道分道分道 2506.21411v1
  • 1228 06-26 Scalable Bayesian Low-Rank Adaptation of Large Language Models via Stochastic Variational Subspace Inference Skalierbare Bayesische Low-Rank-Anpassung von großen Sprachmodellen über stochastische Variations-Subraum-Inferenz 通过Stochastic变异性子空间推断,对大语言模型进行可缩放的Bayesian低Rank 2506.21408v1
  • 1229 06-26 Representation Learning of Lab Values via Masked AutoEncoders Darstellung Lernen von Laborwerten über Maskierte AutoEncoder 通过蒙面自动编码器学习实验室价值 2501.02648v3
  • 1230 06-26 Temporal-Aware Graph Attention Network for Cryptocurrency Transaction Fraud Detection Temporal-Aware Graph Aufmerksamkeit Netzwerk für Kryptowährung Transaktion Betrugserkennung 加密货币交易欺诈侦查实时警示图关注网络 2506.21382v1
  • 1231 06-26 HARPT: A Corpus for Analyzing Consumers’ Trust and Privacy Concerns in Mobile Health Apps HARPT: Ein Corpus für die Analyse des Vertrauens und der Datenschutzbelange der Verbraucher in mobilen Gesundheits-Apps HARPT: 分析移动保健应用程序中消费者信任和隐私问题的一个公司 2506.19268v2
  • 1232 06-26 Pay Attention to Small Weights Achten Sie auf kleine Gewichte 关注小体重 2506.21374v1
  • 1233 06-26 Latent Diffusion Model Based Denoising Receiver for 6G Semantic Communication: From Stochastic Differential Theory to Application Latent Diffusion Modellbasierter Denoisierungsempfänger für 6G Semantische Kommunikation: Von der stochastischen Differentialtheorie zur Anwendung 用于 6G 语义通讯: 从斯托卡差异理论到应用的 6G 语义通讯的 以 DEM 为基础的前传播模型模型 2506.05710v2
  • 1234 06-26 MAx-DNN: Multi-Level Arithmetic Approximation for Energy-Efficient DNN Hardware Accelerators MAx-DNN: Mehrstufige Arithmetik-Annäherung für energieeffiziente DNN-Hardwarebeschleuniger MAX-DNN: 能源高效 DNN 硬件加速器的多级自动测量近似法 2506.21371v1
  • 1235 06-26 rQdia: Regularizing Q-Value Distributions With Image Augmentation rQdia: Regularisieren der Q-Value-Distributionen mit Bildvergrößerung rQdia: 以图像放大方式规范 Q- 价值发行 2506.21367v1
  • 1236 06-26 SMMILE: An Expert-Driven Benchmark for Multimodal Medical In-Context Learning SMMILE: Ein sachverständiger Benchmark für multimodales medizinisches In-Context-Lernen SMMILE:多模式医学内书学习专家开发基准 2506.21355v1
  • 1237 06-26 Lipschitz Bounds for Persistent Laplacian Eigenvalues under One-Simplex Insertions Lipschitz Bounds für persistente Laplacian Eigenwerte unter One-Simplex-Insertionen 在单质插入下用于持久性拉板电极值的 Lipschitz Bounds 2506.21352v1
  • 1238 06-26 On the Ability of Deep Networks to Learn Symmetries from Data: A Neural Kernel Theory Über die Fähigkeit tiefer Netzwerke, Symmetrien aus Daten zu lernen: Eine neurale Kerneltheorie 深网络从数据中学习对称的深网络能力:神经核心理论 2412.11521v2
  • 1239 06-26 Learning Value of Information towards Joint Communication and Control in 6G V2X Lernwert von Informationen zur gemeinsamen Kommunikation und Kontrolle in 6G V2X 6G V2X 6G 6G V2X 6G 6G 6G V2X 6G 6G 6G 2505.06978v2
  • 1240 06-26 PuriDefense: Randomized Local Implicit Adversarial Purification for Defending Black-box Query-based Attacks PuriDefense: Randomized Local Implizite Adversarial Purification for Defending Black-Box Query-based Attacks 防御:保护黑箱质疑式袭击的随机本地秘密对抗性净化 2401.10586v2
  • 1241 06-26 Regret Bounds for Robust Online Decision Making Bedauern Sie Grenzen für robuste Online-Entscheidungsfindung 对强有力的在线决策感到遗憾 2504.06820v2
  • 1242 06-26 DynamicBench: Evaluating Real-Time Report Generation in Large Language Models DynamicBench: Bewertung der Echtzeit-Berichtserstellung in großen Sprachmodellen 动态 bench:评价以大语言模式编制实时报告的情况 2506.21343v1
  • 1243 06-26 AGTCNet: A Graph-Temporal Approach for Principled Motor Imagery EEG Classification AGTCNet: Ein graphisch-zeitlicher Ansatz für die Klassifikation der Primärmotorik EEG AGTCNet: 固定机动图像电子EEG分类的图表-临时方法 2506.21338v1
  • 1244 06-26 A Scalable Quantum Neural Network for Approximate SRBB-Based Unitary Synthesis Ein skalierbares Quantum-Neural-Netzwerk für annähernde SRBB-basierte Einheitssynthese 近似基于SRBB的单一合成的可缩放量量子神经网络 2412.03083v2
  • 1245 06-26 ScaleGNN: Towards Scalable Graph Neural Networks via Adaptive High-order Neighboring Feature Fusion ScaleGNN: Auf dem Weg zu skalierbaren Graphen-Neuralnetzwerken über adaptive High-Order Neighboring Feature Fusion SASGNN:通过适应性高顺序相邻相邻地貌融合,走向可缩放的图形神经网络 2504.15920v4
  • 1246 06-26 Latent Prototype Routing: Achieving Near-Perfect Load Balancing in Mixture-of-Experts Latent Prototype Routing: Erzielen einer nahezu perfekten Lastabgleichung in Mixture-of-Experts 原型原型路由:在混合专家中实现近效果负载平衡 2506.21328v1
  • 1247 06-26 Stochastic Quantum Spiking Neural Networks with Quantum Memory and Local Learning Stochastische Quantum-Spiking-Neuralnetzwerke mit Quantengedächtnis und lokalem Lernen 具有量子内存和本地学习的实测量量谱剖析神经网络 2506.21324v1
  • 1248 06-26 On Uniform Weighted Deep Polynomial approximation Auf einheitliche Gewichte tiefe Polynom-Annäherung 统一加权深多元近似值 2506.21306v1
  • 1249 06-26 Context-Aware Doubly-Robust Semi-Supervised Learning Kontext-Bewusst Doppel-Robust Semi-überwachtes Lernen Doubly-Robust半监督学习 2502.15577v2
  • 1250 06-26 Semantic Scene Graph for Ultrasound Image Explanation and Scanning Guidance Semantische Szenegrafik für Ultrasound-Bilderklärung und Scan-Anleitung 超声超声图像解释和扫描指导的语义谱图 2506.19683v2
  • 1251 06-26 Exploring Adapter Design Tradeoffs for Low Resource Music Generation Erforschung von Adapter-Design-Tradeoffs für Low Resource Music Generation 探索用于低资源音乐制作的适应设计取舍 2506.21298v1
  • 1252 06-26 Devil’s Hand: Data Poisoning Attacks to Locally Private Graph Learning Protocols Teufelshand: Daten vergiften Angriffe auf lokal private Graphen-Lernprotokolle 魔鬼之手:对本地私人图案学习程序的数据毒害攻击 2506.09803v2
  • 1253 06-26 Improved seeding strategies for k-means and k-GMM Verbesserte Saatstrategien für k-Mittel und k-GMM 改进k-手段和k-GMM和k-GMM的播种战略 2506.21291v1
  • 1254 06-26 Small Encoders Can Rival Large Decoders in Detecting Groundedness Kleine Encoder können große Decoder bei der Erkennung von Erdlichkeit rivalisieren 在地面探测中能够使大型分离器在探测地面时发生迭接 2506.21288v1
  • 1255 06-26 Energy Matching: Unifying Flow Matching and Energy-Based Models for Generative Modeling Energy Matching: Zusammenführen von Flow Matching- und Energy-Based-Modellen für die Generative Modellierung 能源匹配:统一流动匹配和以能源为基础的生成模型模型 2504.10612v4
  • 1256 06-26 Hyperspherical Variational Autoencoders Using Efficient Spherical Cauchy Distribution Hypersphärische Variations-Autoencoder mit effizienter sphärischer Cauchy-Distribution 使用高效球道球道配送的超球变异自动编码器 2506.21278v1
  • 1257 06-26 Lagrangian Index Policy for Restless Bandits with Average Reward Lagrangian Index Policy for Restless Bandits with Average Reward 以平均回报率衡量的无休无休止强盗拉格朗加指数政策 2412.12641v2
  • 1258 06-26 A GREAT Architecture for Edge-Based Graph Problems Like TSP Eine großartige Architektur für Edge-Based Graph Probleme wie TSP 象TSP那样的边缘图表问题大建筑 2408.16717v2
  • 1259 06-26 These Are Not All the Features You Are Looking For: A Fundamental Bottleneck in Supervised Pretraining Diese sind nicht alle Funktionen, die Sie suchen: Ein grundlegender Engpass in überwachten Pretraining 这些不是所有你正在寻找的特征: 受监督预科班的基本瓶颈。 2506.18221v2
  • 1260 06-26 DiLoCoX: A Low-Communication Large-Scale Training Framework for Decentralized Cluster DiLoCoX: Ein kommunikationsarmer groß angelegter Ausbildungsrahmen für dezentralisierte Cluster DILOCOX:权力下放小组的低通信大范围培训框架 2506.21263v1
  • 1261 06-26 Simulating Hard Attention Using Soft Attention Simulation der harten Aufmerksamkeit mit weicher Aufmerksamkeit 使用软关注模拟硬关注 2412.09925v2
  • 1262 06-26 Wavelet Diffusion Neural Operator Wavelet Diffusions-Neuraloperator Wavelet 扩散神经操作员 2412.04833v3
  • 1263 06-26 Radio Map Estimation via Latent Domain Plug-and-Play Denoising Radiokarte Schätzung über Latent Domain Plug-and-Play Denoising 通过Latent Domain Plug 和 Play Disoising 无线电地图估计 2501.13472v2
  • 1264 06-26 From On-chain to Macro: Assessing the Importance of Data Source Diversity in Cryptocurrency Market Forecasting Von der On-Chain zum Makro: Bewertung der Bedeutung der Datenquellenvielfalt in der Kryptowährungsmarktprognose 从连网到宏观:评估数据来源多样性在加密货币市场预测中的重要性 2506.21246v1
  • 1265 06-26 Zero-Shot Learning for Obsolescence Risk Forecasting Zero-Shot-Lernen für Obsoleszenz-Risikoprognosen 用于悬浮风险预测的零热学习 2506.21240v1
  • 1266 06-26 Capturing Style in Author and Document Representation Stil in der Autor- und Dokumentdarstellung erfassen 在作者和文件代表中获取样式 2407.13358v2
  • 1267 06-26 Rapid Gyroscope Calibration: A Deep Learning Approach Schnelle Gyroskop-Kalibrierung: Ein tiefer Lernansatz 快速热波校准:深学习方法 2409.00488v3
  • 1268 06-26 Complexity-aware fine-tuning Komplexitätsbewusste Feinabstimmung 复杂度认知微调 2506.21220v1
  • 1269 06-26 Balancing Privacy, Robustness, and Efficiency in Machine Learning Ausbalancierende Privatsphäre, Robustheit und Effizienz im maschinellen Lernen 平衡隐私、强健和机器学习效率 2312.14712v3
  • 1270 06-26 Unveiling Causal Reasoning in Large Language Models: Reality or Mirage? Kausale Vernunft in großen Sprachmodellen enthüllen: Realität oder Mirage? 大语言模型中未解的因果理由:现实还是幻影? 2506.21215v1
  • 1271 06-26 Unsupervised Learning for Optimal Transport plan prediction between unbalanced graphs Unüberwachtes Lernen für optimale Verkehrsplanungsvorhersage zwischen unausgewogenen Graphen 未受监督的优化交通学习计划预测 2506.12025v2
  • 1272 06-26 LLM-Based Human-Agent Collaboration and Interaction Systems: A Survey LLM-basierte human-agente Kooperations- und Interaktionssysteme: Eine Umfrage 以LLM为基础的人类-机构协作和互动系统:调查 2505.00753v4
  • 1273 06-26 Seal Your Backdoor with Variational Defense Versiegeln Sie Ihre Hintertür mit abwechslungsreicher Verteidigung 以不同防御方式密封你的后门 2503.08829v2
  • 1274 06-26 Artificial Delegates Resolve Fairness Issues in Perpetual Voting with Partial Turnout Künstliche Delegierte lösen Fairness-Probleme bei der ständigen Abstimmung mit teilweiser Wahlbeteiligung 持部分投票票的永久表决中的人造代表解决公平问题 2506.21186v1
  • 1275 06-26 PCF-Grasp: Converting Point Completion to Geometry Feature to Enhance 6-DoF Grasp PCF-Grasp: Umwandlung der Punktvervollständigung in Geometrie-Feature zur Verbesserung der 6-DoF-Grasp PCF-格拉斯普:将完成点转换成几何特征,以加强6-DoF格拉斯普 2504.16320v2
  • 1276 06-26 Performance improvement of spatial semantic segmentation with enriched audio features and agent-based error correction for DCASE 2025 Challenge Task 4 Performance-Verbesserung der räumlichen semantischen Segmentierung mit angereicherten Audio-Features und agentenbasierter Fehlerkorrektur für DCASE 2025 Challenge Task 4 DASAS 2025年挑战任务4,具有浓缩音频特征和以代理物为基础的错误更正的 空间语义分离的性能改进 2506.21174v1
  • 1277 06-26 Variational Supervised Contrastive Learning Variationelles Überwachtes Kontrastuelles Lernen 差异监督反舞弊学习 2506.07413v2
  • 1278 06-26 Moderating the Generalization of Score-based Generative Model Moderierung der Generalisierung des Score-basierten Generativen Modells 简化基于记分制的通用创制模式 2412.07229v2
  • 1279 06-26 Metis-RISE: RL Incentivizes and SFT Enhances Multimodal Reasoning Model Learning Metis-RISE: RL fördert und verbessert multimodales Reasoning Model Learning Metis-RISE: RL 激励和SFT加强多模式理由示范学习 2506.13056v2
  • 1280 06-26 Self-Regulated Neurogenesis for Online Data-Incremental Learning Selbstregulierte Neurogenese für Online-Daten-Inkrementelles Lernen 在线数据强化学习自调节神经源 2403.14684v2
  • 1281 06-26 Diverse Mini-Batch Selection in Reinforcement Learning for Efficient Chemical Exploration in de novo Drug Design Vielfältige Mini-Batch-Auswahl in Verstärkungs-Lernen für effiziente chemische Exploration in de novo Drug Design 为在新药设计中进行高效化学勘探而加强学习的多样化小型批次选择 2506.21158v1
  • 1282 06-26 Transformer-Based Spatial-Temporal Counterfactual Outcomes Estimation Schätzung von transformerbasierten räumlich-zeitlichen kontrafaktischen Ergebnissen 以变换器为基础的空间-时-时-时-反事实结果估计 2506.21154v1
  • 1283 06-26 A Novel Federated Learning-Based IDS for Enhancing UAVs Privacy and Security Ein neuartiges, lernbasiertes IDS zur Verbesserung der Privatsphäre und Sicherheit von UAVs 旨在加强无人驾驶航空器隐私和安全的新联邦学习型新学习型ISDS 2312.04135v3
  • 1284 06-26 Linearity-based neural network compression Linearitätsbasierte neuronale Netzwerkkompression 线性神经网络压缩 2506.21146v1
  • 1285 06-26 Personalized Federated Learning via Dual-Prompt Optimization and Cross Fusion Personalisiertes Federated Learning durch Dual-Prompt-Optimierung und Cross Fusion 通过双速优化和交叉融合进行个性化联邦学习 2506.21144v1
  • 1286 06-26 Generative Adversarial Evasion and Out-of-Distribution Detection for UAV Cyber-Attacks Generative Adversarial Evasion und Out-of-Distribution-Detection für UAV-Cyber-Attacks 无人驾驶航空飞行器网络设备生成反向疏散和分配外探测 2506.21142v1
  • 1287 06-26 Multi-convex Programming for Discrete Latent Factor Models Prototyping Multi-convex-Programmierung für diskrete Latent Factor Models Prototyping Discrete 后端因数模型的多contex 编程程序 2504.01431v2
  • 1288 06-26 DBConformer: Dual-Branch Convolutional Transformer for EEG Decoding DBConformer: Doppel-Branch-Konvolutionstransformator für EEG-Dekodierung DBCon前导体: EEG 解码的双相相相电变异变异器 2506.21140v1
  • 1289 06-26 Solving Inverse Problem for Multi-armed Bandits via Convex Optimization Inverses Problem für mehrarmige Banditen durch Convex-Optimierung lösen 通过 Convex 优化解决多武装强盗的反向问题 2501.18945v3
  • 1290 06-26 NaLaFormer: Norm-Aware Linear Attention for Transformer Models NaLaFormer: Norm-Aware Lineare Aufmerksamkeit für Transformer-Modelle NaLaFormer: 变形模型的诺姆- Aware 线性注意 2506.21137v1
  • 1291 06-26 Inverse Reinforcement Learning via Convex Optimization Inverse Verstärkungs-Lernen über Convex-Optimierung 通过Convex优化化进行反强化学习 2501.15957v2
  • 1292 06-26 Curriculum-Guided Antifragile Reinforcement Learning for Secure UAV Deconfliction under Observation-Space Attacks Curriculum-geführtes Antifragiles Verstärkungslernen für sichere UAV-Dekonfliktion unter Beobachtungs-Raumangriffen 在观测-空间攻击下安全无人驾驶飞行器消除冲突课程-指导反脆弱强化学习 2506.21129v1
  • 1293 06-26 Robust Policy Switching for Antifragile Reinforcement Learning for UAV Deconfliction in Adversarial Environments Robuster Policy-Switch für Antifragiles Verstärkungslernen für UAV-Deconfliction in Adversarial Environments 在逆向环境中为无人驾驶航空器消除冲突而进行抗脆弱强化学习的强有力政策转换 2506.21127v1
  • 1294 06-26 Pushing Trade-Off Boundaries: Compact yet Effective Remote Sensing Change Detection Trade-Off-Grenzen drücken: Kompakte und dennoch effektive Fernerkundungs-Änderungserkennung 推进贸易-开放边界:结合但有效的遥感变化探测 2506.21109v1
  • 1295 06-26 Unlasting: Unpaired Single-Cell Multi-Perturbation Estimation by Dual Conditional Diffusion Implicit Bridges Unhaltbar: Unpaarte Single-Cell-Multi-Perturbation-Schätzung durch Dual Conditional Diffusion Implizite Brücken 不持久: 由双条件分解隐形桥进行无压单细胞多扰动估计 2506.21107v1
  • 1296 06-26 Learning to Skip the Middle Layers of Transformers Lernen, die mittleren Schichten der Transformer zu überspringen 学习跳过变换器的中层 2506.21103v1
  • 1297 06-26 Interpretable Hierarchical Concept Reasoning through Attention-Guided Graph Learning Interpretierbares Hierarchisches Konzept durch aufmerksamkeitsorientiertes Graphenlernen 通过引人指导图表学习推理的可解释的等级概念 2506.21102v1
  • 1298 06-26 FeDa4Fair: Client-Level Federated Datasets for Fairness Evaluation FeDa4Fair: Client-Level-Federated Datasets für die Fairness-Bewertung FeDa4fair:公平评价客户-联邦数据集 2506.21095v1
  • 1299 06-26 Chain-of-Thought Enhanced Shallow Transformers for Wireless Symbol Detection Chain-of-Thought verbesserte Shallow Transformer für drahtlose Symbolerkennung 用于无线探测无线符号探测的 研究链强化浅ow变压器 2506.21093v1
  • 1300 06-26 CovDocker: Benchmarking Covalent Drug Design with Tasks, Datasets, and Solutions CovDocker: Benchmarking Covalent Drug Design mit Aufgaben, Datensätzen und Lösungen CovDocker:用任务、数据集和解决办法确定共价药物设计基准 2506.21085v1
  • 1301 06-26 EgoAdapt: Adaptive Multisensory Distillation and Policy Learning for Efficient Egocentric Perception EgoAdapt: Adaptive multisensorische Destillation und politisches Lernen für eine effiziente egozentrische Wahrnehmung EgoAdapt: 适应性多感性蒸馏和政策学习,促进高效率的以地球为中心感知 2506.21080v1
  • 1302 06-26 Homogenization of Multi-agent Learning Dynamics in Finite-state Markov Games Homogenisierung von Multi-Agent-Learning-Dynamik in Finite-State Markov Spiele 在Finite- State-Markov运动会中多剂学习动态的同质化 2506.21079v1
  • 1303 06-26 Enhancing LLM Tool Use with High-quality Instruction Data from Knowledge Graph Verbesserung der LLM-Tool-Nutzung mit hochwertigen Instruktionsdaten aus Wissensgrafik 利用来自知识图的高质量教学数据加强LLM工具的使用 2506.21071v1
  • 1304 06-26 SDE Matching: Scalable and Simulation-Free Training of Latent Stochastic Differential Equations SDE Matching: Skalierbares und simulationsfreies Training latenter stochastischer Differentialgleichungen SDE 匹配:可缩放和模拟无模拟的静态碎裂差异等量模拟培训 2502.02472v3
  • 1305 06-26 FedDAA: Dynamic Client Clustering for Concept Drift Adaptation in Federated Learning FedDAA: Dynamisches Client-Clustering für Konzept Drift-Anpassung im Federated Learning FedDAA: 联邦学习中适应概念的动态客户集群组合 2506.21054v1
  • 1306 06-26 Sharp concentration of uniform generalization errors in binary linear classification Scharfe Konzentration von einheitlichen Verallgemeinerungsfehlern in der binären linearen Klassifikation 二进线线性分类中统一一般化误差的集中程度 2505.16713v2
  • 1307 06-26 Improving Diffusion-Based Image Editing Faithfulness via Guidance and Scheduling Verbesserung der Diffusions-basierten Bildbearbeitung Treue durch Anleitung und Planung 通过指导和日程安排改进基于传播的图像编辑信仰 2506.21045v1
  • 1308 06-26 Efficient Skill Discovery via Regret-Aware Optimization Effiziente Skill Discovery durch regret-aware Optimierung 通过Regret-Aware 优化发现高效技能 2506.21044v1
  • 1309 06-26 Strict Subgoal Execution: Reliable Long-Horizon Planning in Hierarchical Reinforcement Learning Strenge Subgoal Execution: Zuverlässige Langzeitplanung im Hierarchischen Stärkungslernen 严格次级目标执行:在等级强化学习中可靠的长期规划 2506.21039v1
  • 1310 06-26 DKGCM: A Spatio-Temporal Prediction Model for Traffic Flow by Fusing Spatial Node Clustering Method and Fourier Bidirectional Mamba Mechanism DKGCM: Ein räumlich-zeitliches Vorhersagemodell für den Verkehrsfluss durch das Verschmelzen von Raumknoten-Clustering-Methode und Fourier Bidirektional Mamba-Mechanismus DKGCM:通过使用空间节点集束法和Fourier双向双向Mamba机制进行交通流动的SPA-临时预测模型 2507.01982v1
  • 1311 06-26 RL-Selector: Reinforcement Learning-Guided Data Selection via Redundancy Assessment RL-Selector: Verstärkte lernorientierte Datenauswahl über Redundanzbewertung RL-选择者:通过裁员评估甄选强化学习指导数据 2506.21037v1
  • 1312 06-26 An Information-Theoretic Analysis for Federated Learning under Concept Drift Eine informationstheoretische Analyse für das Federated Learning unter Konzept Drift 根据 “ 漂流概念 “ 进行的联邦学习信息理论分析 2506.21036v1
  • 1313 06-26 SceneGenAgent: Precise Industrial Scene Generation with Coding Agent SceneGenAgent: Präzise industrielle Szenegenerierung mit Coding Agent SceneGenerAgenti: 精密工业场景与编码剂生成 2410.21909v3
  • 1314 06-26 Little By Little: Continual Learning via Self-Activated Sparse Mixture-of-Rank Adaptive Learning Little By Little: Kontinuierliches Lernen über selbsttätiges Sparse Mixture-of-Rank Adaptives Lernen 小小小小:通过自发的微小混血体适应性学习不断学习 2506.21035v1
  • 1315 06-26 PCDVQ: Enhancing Vector Quantization for Large Language Models via Polar Coordinate Decoupling PCDVQ: Verbesserung der Vector Quantization für große Sprachmodelle über Polar Coordinate Entkopplung PCDVQ:通过极地协调脱钩,加强大语言模型的矢量量化 2506.05432v2
  • 1316 06-26 TRIDENT: Tri-Modal Molecular Representation Learning with Taxonomic Annotations and Local Correspondence TRIDENT: Tri-Modal Molecular Representative Learning mit taxonomischen Anmerkungen und lokaler Korrespondenz 三模式分子代表性学习,具有分类说明和当地通讯 2506.21028v1
  • 1317 06-26 Mixture of Experts-augmented Deep Unfolding for Activity Detection in IRS-aided Systems Mischung von Experten-augmented Deep Unfolding für Aktivitätserkennung in IRS-gestützten Systemen IRS辅助系统中活动探测专家加固深载混合体 2502.20183v2
  • 1318 06-26 HybridQ: Hybrid Classical-Quantum Generative Adversarial Network for Skin Disease Image Generation HybridQ: Hybrid-Klassisch-Quantum Generatives Adversariales Netzwerk für die Bildgenerierung von Hauterkrankungen CCF: 皮肤疾病成像生成的混合古金-量反反转网络 2506.21015v1
  • 1319 06-26 Efficient Image Generation with Variadic Attention Heads Effiziente Bildgenerierung mit verschiedenen Aufmerksamkeitsköpfen 高效的图像生成,由Variadic关注组织负责人负责 2211.05770v3
  • 1320 06-26 Proximal Point Method for Online Saddle Point Problem Proximale Point-Methode für Online-Sättelpunkt-Problem 在线搭配点问题的近点方法 2407.04591v3
  • 1321 06-26 Review learning: Real world validation of privacy preserving continual learning across medical institutions Review learning: Echte Welt-Validierung der Privatsphäre Erhaltung kontinuierlichen Lernens in medizinischen Einrichtungen 审查学习:维护各医疗机构持续学习的隐私的真实世界验证 2210.09394v2
  • 1322 06-26 Distilling Normalizing Flows Destillieren von Normalisierungsströmen 保持正常流动 2506.21003v1
  • 1323 06-26 Genetic Algorithm with Innovative Chromosome Patterns in the Breeding Process Genetischer Algorithmus mit innovativen Chromosomenmustern im Zuchtprozess 育种过程中创新性染色体模式的遗传数值 2501.18184v3
  • 1324 06-26 Leveraging Unlabeled Audio-Visual Data in Speech Emotion Recognition using Knowledge Distillation Nutzung unmarkierter Audio-Visual-Daten in der Sprach-Emotionserkennung mittels Wissensdestillation 利用知识蒸馏在语音情感识别中利用未贴标签的视听数据 2507.00055v1
  • 1325 06-26 Bridging the Gap Between Approximation and Learning via Optimal Approximation by ReLU MLPs of Maximal Regularity Überbrückung der Lücke zwischen Annäherung und Lernen durch Optimale Annäherung durch ReLU MLPs der Maximalregularität 通过最大合规性RELU MLP,通过最佳接近缩小接近与学习之间的差距 2409.12335v4
  • 1326 06-26 SharpZO: Hybrid Sharpness-Aware Vision Language Model Prompt Tuning via Forward-Only Passes SharpZO: Hybrid Sharpness-Aware Vision Sprachmodell Prompt Tuning via Forward-Only Passes SharpZO: 混合尖锐-敏锐视觉语言模型,通过前向-单行道快速调试 2506.20990v1
  • 1327 06-26 Can Gradient Descent Simulate Prompting? Kann Gradient Descent Simulate Prompting? 梯子源模拟能刺激吗? 2506.20989v1
  • 1328 06-26 Split-Merge: A Difference-based Approach for Dominant Eigenvalue Problem Split-Merge: Ein unterschiedsbasierter Ansatz für das Dominante Eigenwertproblem Split-Merge:对支配性电子价值问题采取基于差异的办法 2501.15131v2
  • 1329 06-26 Generalized Tensor-based Parameter-Efficient Fine-Tuning via Lie Group Transformations Generalisierte Tensor-basierte Parameter-Effizient Feinsteuerung über Lie Group Transformationen 通过 “ 谎言集团变形 “ 进行通用的Tensor基准参数有效精美调整 2504.00851v2
  • 1330 06-26 Explainable quantum regression algorithm with encoded data structure Erklärbarer Quantenregressionsalgorithmus mit kodierter Datenstruktur 具有编码数据结构的可解释量子回归算法 2307.03334v5
  • 1331 06-26 EraRAG: Efficient and Incremental Retrieval Augmented Generation for Growing Corpora Erarag: Effiziente und inkrementelle retrieval Augmented Generation für wachsende Corpora EraRAG: 增长企业的高效和递增回收增量增殖型增殖型增殖型增殖型增殖型增殖型增殖型 2506.20963v1
  • 1332 06-26 Antibody Design and Optimization with Multi-scale Equivariant Graph Diffusion Models for Accurate Complex Antigen Binding Antikörper-Design und Optimierung mit mehrstufigen äquivarianten Graphen-Diffusions-Modellen für präzise, komplexe Antigen-Bindung 防反体设计和优化,采用多种规模等同图形扩散模型,用于准确的复合抗原装订 2506.20957v1
  • 1333 06-26 Model State Arithmetic for Machine Unlearning Modell Staat Arithmetik für Maschine Unlearning 机器脱修示范国 2506.20941v1
  • 1334 06-26 Forecasting Geopolitical Events with a Sparse Temporal Fusion Transformer and Gaussian Process Hybrid: A Case Study in Middle Eastern and U.S. Conflict Dynamics Prognose geopolitischer Ereignisse mit einem spare Temporal Fusion Transformer und Gaußschen Prozesshybrid: Eine Fallstudie in Nahost und US-Konfliktdynamik 以松散的时空融合变异器和高斯进程混合器预测地缘政治事件:中东和美国冲突动态案例研究 2506.20935v1
  • 1335 06-26 Lower Bounds on the Size of Markov Equivalence Classes Untere Grenzen auf der Größe der Markov-Äquivalenzklassen 马克夫等等效类大小的下下界界圈 2506.20933v1
  • 1336 06-26 Extremely Simple Streaming Forest Extrem einfach streaming Wald 极简单流流森林 2110.08483v7
  • 1337 06-26 Quantum Reinforcement Learning Trading Agent for Sector Rotation in the Taiwan Stock Market Quantum-Verstärkung-Learning-Trading-Agent für Sektor-Rotation auf dem Aktienmarkt Taiwan 台湾股市部门轮换的量级强化学习贸易代理 2506.20930v1
  • 1338 06-26 Active Learning for Manifold Gaussian Process Regression Aktives Lernen für manifolde Gaußsche Prozessregression Gaussian 进程倒退的 Manifide Gaussian 正在学习 2506.20928v1
  • 1339 06-26 Interpretable Representation Learning for Additive Rule Ensembles Interpretable Representative Learning for Additive Rule Ensembles 补充规则会议的解释性代表性学习 2506.20927v1
  • 1340 06-26 LLM-guided Chemical Process Optimization with a Multi-Agent Approach LLM-geführte chemische Prozessoptimierung mit einem Multi-Agent-Ansatz LLM-LLM-制导化学过程 优化采用多机构办法 2506.20921v1
  • 1341 06-26 Machine learning of microstructure–property relationships in materials leveraging microstructure representation from foundational vision transformers Maschinelles Lernen von Mikrostruktur-Eigenschaftsbeziehungen in Materialien, die die Mikrostrukturdarstellung von grundlegenden Vision-Transformatoren nutzen 利用基础视觉变压器代表微观结构的材料中微型结构-财产关系 2501.18637v2
  • 1342 06-26 Explainable AI for Radar Resource Management: Modified LIME in Deep Reinforcement Learning Erklärbare KI für Radar-Ressourcenmanagement: Modifizierte LIME im Deep Reinforcement Learning 用于雷达资源管理的可解释的AIAI:深强化学习中修改的LIME 2506.20916v1
  • 1343 06-26 ZKPROV: A Zero-Knowledge Approach to Dataset Provenance for Large Language Models ZKPROV: Ein Null-Knowledge-Ansatz zur Datensatzprovenz für große Sprachmodelle ZKPROV:大语言模型数据集验证零知识化办法 2506.20915v1
  • 1344 06-26 Faster Fixed-Point Methods for Multichain MDPs Schnellere Fixed-Point-Methoden für Multichain-MDPs 《多链 MDP快速固定点方法》 2506.20910v1
  • 1345 06-26 Optimal Single-Policy Sample Complexity and Transient Coverage for Average-Reward Offline RL Optimale Single-Policy-Probenkomplexität und transiente Abdeckung für durchschnittlich reward Offline-RL 平均离岸平均回报率的 最佳单一政策抽样复杂程度和中度覆盖率 2506.20904v1
  • 1346 06-26 Graph-Structured Feedback Multimodel Ensemble Online Conformal Prediction Graph-strukturiertes Feedback Multimodel Ensemble Online Conformal Prediction 多模型组合在线非正式预测 2506.20898v1

Article 0

Title@2025-07-03 (4): Point3R: Streaming 3D Reconstruction with Explicit Spatial Pointer Memory

Title: Point3R: Streaming 3D Reconstruction with Explicit Spatial Pointer Memory Point3R: Streaming der 3D-Rekonstruktion mit explizitem Raumzeigerspeicher Point3R: 利用明确的空间指内存进行三维重建 2507.02863v1

Authors (4): Yuqi Wu, Wenzhao Zheng, Jie Zhou, Jiwen Lu

Dense 3D scene reconstruction from an ordered sequence or unordered image collections is a critical step when bringing research in computer vision into practical scenarios. Following the paradigm introduced by DUSt3R, which unifies an image pair densely into a shared coordinate system, subsequent methods maintain an implicit memory to achieve dense 3D reconstruction from more images. However, such implicit memory is limited in capacity and may suffer from information loss of earlier frames. We propose Point3R, an online framework targeting dense streaming 3D reconstruction. To be specific, we maintain an explicit spatial pointer memory directly associated with the 3D structure of the current scene. Each pointer in this memory is assigned a specific 3D position and aggregates scene information nearby in the global coordinate system into a changing spatial feature. Information extracted from the latest frame interacts explicitly with this pointer memory, enabling dense integration of the current observation into the global coordinate system. We design a 3D hierarchical position embedding to promote this interaction and design a simple yet effective fusion mechanism to ensure that our pointer memory is uniform and efficient. Our method achieves competitive or state-of-the-art performance on various tasks with low training costs. Code is available at: https://github.com/YkiWu/Point3R.

DUSt3R 将一个聚在一起的图像配对合并成一个共享的坐标系统, 其后的方法保持隐含的记忆, 以便从更多的图像中实现密度的 3D 重建。 然而, 这种隐含的记忆在容量上是有限的, 并可能因先前框架的信息丢失而受到影响。 我们提议Point3R, 一个针对密集的3D 重建流的在线框架。 具体地说, 我们保留一个与当前场景的3D 结构直接相关的明确的空间指针记忆。 这个记忆中的每个指针都指定了一个特定的 3D 位置和综合场景信息在全球坐标系统附近的一个变化的空间特征。 从最新框架提取的信息与这个指针记忆明确互动, 使得当前观测与全球坐标系统紧密结合。 我们设计了一个 3D 等级位置, 以推广这种互动, 并设计一个简单而有效的融合机制, 以确保我们的指向记忆是统一和有效的。 我们的方法在低培训成本的各种任务上实现了竞争性或状态。


Article 1

Title@2025-07-03 (4): Answer Matching Outperforms Multiple Choice for Language Model Evaluation

Title: Answer Matching Outperforms Multiple Choice for Language Model Evaluation Antwort Matching Outperforms Multiple Choice für Sprachmodell-Bewertung 语言模式评价的多种选择 2507.02856v1

Authors (5): Nikhil Chandak, Shashwat Goel, Ameya Prabhu, Moritz Hardt, Jonas Geiping

Multiple choice benchmarks have long been the workhorse of language model evaluation because grading multiple choice is objective and easy to automate. However, we show multiple choice questions from popular benchmarks can often be answered without even seeing the question. These shortcuts arise from a fundamental limitation of discriminative evaluation not shared by evaluations of the model’s free-form, generative answers. Until recently, there appeared to be no viable, scalable alternative to multiple choice–but, we show that this has changed. We consider generative evaluation via what we call answer matching: Give the candidate model the question without the options, have it generate a free-form response, then use a modern language model with the reference answer to determine if the response matches the reference. To compare the validity of different evaluation strategies, we annotate MMLU-Pro and GPQA-Diamond to obtain human grading data, and measure the agreement of each evaluation approach. We find answer matching using recent models–even small ones–achieves near-perfect agreement, in the range of inter-annotator agreement. In contrast, both multiple choice evaluation and using LLM-as-a-judge without reference answers aligns poorly with human grading. Improving evaluations via answer matching is not merely a conceptual concern: the rankings of several models change significantly when evaluating their free-form responses with answer matching. In light of these findings, we discuss how to move the evaluation ecosystem from multiple choice to answer matching.

长期以来,多种选择基准一直是语言模式评估的工序,因为分级的多重选择是客观的,而且容易自动化。然而,我们展示了大众基准的多重选择问题,往往可以在不看到问题的情况下解答。这些捷径来自对模式自由形式和基因化答案的评价所没有分享的歧视性评价的根本局限性。直到最近,似乎没有可行的、可伸缩的替代方法可以替代多重选择-但是,我们证明这已经发生了变化。我们通过我们所谓的答案匹配来考虑基因化评估。我们考虑通过我们所谓的“答案匹配”来匹配我们所谓的“组合性评估 ” : 给候选人模型提供没有选项的问题,让它产生一个自由的响应,然后使用具有参考答案的现代语言模型来确定答复是否与参考匹配。为了比较不同评价战略的有效性,我们对MML-Pro和GPQA-Dimmond进行了批注,以获取人类分类数据,衡量每一种评价方法的一致。我们发现答案与最近的模型甚至小型的生态系统协议相匹配,在内部协议中,它产生一个自由的响应性回应,而我们只是用LM-as-as-as-ad ad com ad com ad ad ad ad ad ad commission the missuement the mind real des iming the mind the mind iming the mind the mind the mind iming mind im im im im im im imation imation im impridemental imation im im im imation imation imation impridementmentmentmentmentald imations


Article 2

Title@2025-07-03 (4): Urban Region Pre-training and Prompting: A Graph-based Approach

Title: Urban Region Pre-training and Prompting: A Graph-based Approach Urban Region Pre-Training und Prompting: Ein graphischer Ansatz 培训前和推动:基于图表的方法 2408.05920v4

Authors (8): Jiahui Jin, Yifan Song, Dong Kan, Haojia Zhu, Xiangguo Sun, Zhicheng Li, Xigang Sun, Jinghui Zhang

Urban region representation is crucial for various urban downstream tasks. However, despite the proliferation of methods and their success, acquiring general urban region knowledge and adapting to different tasks remains challenging. Existing work pays limited attention to the fine-grained functional layout semantics in urban regions, limiting their ability to capture transferable knowledge across regions. Further, inadequate handling of the unique features and relationships required for different downstream tasks may also hinder effective task adaptation. In this paper, we propose a $\textbf{G}$raph-based $\textbf{U}$rban $\textbf{R}$egion $\textbf{P}$re-training and $\textbf{P}$rompting framework ($\textbf{GURPP}$) for region representation learning. Specifically, we first construct an urban region graph and develop a subgraph-centric urban region pre-training model to capture the heterogeneous and transferable patterns of entity interactions. This model pre-trains knowledge-rich region embeddings using contrastive learning and multi-view learning methods. To further refine these representations, we design two graph-based prompting methods: a manually-defined prompt to incorporate explicit task knowledge and a task-learnable prompt to discover hidden knowledge, which enhances the adaptability of these embeddings to different tasks. Extensive experiments on various urban region prediction tasks and different cities demonstrate the superior performance of our framework.

城市地区代表性对于各种城市下游任务至关重要。然而,尽管方法的泛滥及其成功,但获得一般城市地区知识并适应不同任务仍具有挑战性。现有工作对城市地区精细的功能布局语义框架($\ textbf{GURP}$)的关注有限,限制了它们获取跨区域可转让知识的能力。此外,对不同下游任务所需的独特特点和关系的处理不当,可能也会妨碍有效的任务适应。在本文件中,我们提议以$textbf{G}$raph为基础的美元($\ textbf{U}$_textbf{R}$_rbrb{R}$gion \(gent $\ textbf{P}retragation $@textbf{P}$rmpting 框架($\\ textlebf{GURP}\))为城市代表学习提供了限制。此外,我们首先建造一个城市区域图示图,并开发一个以子图为中心的城市互动模式,利用对比学习和多视角学习方法将知识嵌入。我们要设计两个更精确的快速的快速的快速的预测任务,以提升、更精确的、更精确的、更精确的、更精确的、更深的、更精确的、更深的、更深的、更深的、更深的都市任务,以展示的、更深的、更细的、更细的、更细的、更细的、更细的、更细的都市任务。


Article 3

Title@2025-07-03 (4): MOTIF: Modular Thinking via Reinforcement Fine-tuning in LLMs

Title: MOTIF: Modular Thinking via Reinforcement Fine-tuning in LLMs MOTIF: Modulares Denken durch Verstärkung Feinabstimmung in LLMs MOTIF:通过强化微调在LLM中进行模块思考 2507.02851v1

Authors (2): Purbesh Mitra, Sennur Ulukus

Recent advancements in the reasoning capabilities of large language models (LLMs) show that employing group relative policy optimization (GRPO) algorithm for reinforcement learning (RL) training allows the models to use more thinking/reasoning tokens for generating better responses. However, LLMs can generate only a finite amount of tokens while maintaining attention to the previously generated tokens. This limit, also known as the context size of an LLM, is a bottleneck in LLM reasoning with arbitrarily large number of tokens. To think beyond the limit of context size, an LLM must employ a modular thinking strategy to reason over multiple rounds. In this work, we propose $\textbf{MOTIF: Modular Thinking via Reinforcement Finetuning}$ – an RL training method for generating thinking tokens in multiple rounds, effectively allowing the model to think with additional context size. We trained the open-source model Qwen2.5-3B-Instruct on GSM8K dataset via parameter efficient fine-tuning and tested its accuracy on MATH500 and AIME2024 benchmarks. Our experiments show 3.8\% and 3.3\% improvements over vanilla GRPO based training in the respective benchmarks. Furthermore, this improvement was achieved with only 15\% of samples, thus demonstrating sample efficiency of MOTIF. Our code and models are available at https://github.com/purbeshmitra/MOTIF and https://huggingface.co/purbeshmitra/MOTIF, respectively.

大型语言模型(LLMS)推理能力最近的进步表明,使用集体相对政策优化算法(GROP)进行强化学习(RL)培训,使模型能够使用更多的思考/推理符号来产生更好的回应。然而,LLMS只能产生有限的象征性,同时保持对先前产生的符号的注意。这一限制,又称为LLM的上下文大小,是LLM推理中的一个瓶颈,它含有任意大量符号。要超越范围范围,LMM必须采用模块式思维战略,在多个回合中解释。在这项工作中,我们提议用$textbf{MOIF:通过强化微调进行模思考,这是在多回合中生成思维符号的一种有限数量的培训方法,实际上允许模型以额外的上下文尺寸进行思考。我们通过高效的参数微调和测试了MATHTH500和AIME2024基准的精确度。我们实验显示的是3.88和3.3_BEGROBS的改进,因此在15个样本中展示了我们现有的MAF/MO标准。


Article 4

Title@2025-07-03 (4): LLM Hypnosis: Exploiting User Feedback for Unauthorized Knowledge Injection to All Users

Title: LLM Hypnosis: Exploiting User Feedback for Unauthorized Knowledge Injection to All Users LLM Hypnose: Nutzung des Benutzerfeedbacks für unautorisierte Wissensinjektion für alle Benutzer LLM Hypnisis:利用用户反馈,为所有用户提供未经授权知识注射 2507.02850v1

Authors (4): Almog Hilel, Idan Shenfeld, Leshem Choshen, Jacob Andreas

We describe a vulnerability in language models (LMs) trained with user feedback, whereby a single user can persistently alter LM knowledge and behavior given only the ability to provide prompts and upvote / downvote feedback on LM outputs. To implement the attack, the attacker prompts the LM to stochastically output either a “poisoned” or benign response, then upvotes the poisoned response or downvotes the benign one. When feedback signals are used in a subsequent preference tuning behavior, LMs exhibit increased probability of producing poisoned responses even in contexts without malicious prompts. We show that this attack can be used to (1) insert factual knowledge the model did not previously possess, (2) modify code generation patterns in ways that introduce exploitable security flaws, and (3) inject fake financial news. Our finding both identifies a new qualitative feature of language model preference tuning (showing that it even highly restricted forms of preference data can be used to exert fine-grained control over behavior), and a new attack mechanism for LMs trained with user feedback (extending work on pretraining-time data poisoning and deployment-time prompt injection).

我们用用户反馈来描述语言模型中的脆弱之处,根据这些模型,一个用户可以持续改变LM的知识和行为,因为只有能够提供有关LM产出的提示和上投/下投反馈。为了实施袭击,攻击者促使LM以“受污染”或良性反应的方式进行随机输出,然后将中毒反应或下投良性反应。当反馈信号用于随后的偏好调控行为时,LMS显示即使在没有恶意提示的情况下也更有可能产生中毒反应。我们表明,这次攻击可以被用来:(1) 插入该模型以前没有的事实知识,(2) 修改代码生成模式,以引入可以利用的安全缺陷,(3) 输入假金融新闻。我们发现,语言模式偏好调整的新定性特征(显示甚至高度有限的偏好形式数据可以用来对行为进行精细控制),以及经过用户反馈培训的LMS新的攻击机制(延长培训前数据中毒和部署时即时及时注射工作)。


Article 5

Title@2025-07-03 (4): MvHo-IB: Multi-View Higher-Order Information Bottleneck for Brain Disorder Diagnosis

Title: MvHo-IB: Multi-View Higher-Order Information Bottleneck for Brain Disorder Diagnosis MvHo-IB: Multi-View Higher-Order Informationen Engpass für Gehirnerkrankungen Diagnose MvHO-IB:脑紊乱诊断多视高端信息瓶颈 2507.02847v1

Authors (3): Kunyu Zhang, Qiang Li, Shujian Yu

Recent evidence suggests that modeling higher-order interactions (HOIs) in functional magnetic resonance imaging (fMRI) data can enhance the diagnostic accuracy of machine learning systems. However, effectively extracting and utilizing HOIs remains a significant challenge. In this work, we propose MvHo-IB, a novel multi-view learning framework that integrates both pairwise interactions and HOIs for diagnostic decision-making, while automatically compressing task-irrelevant redundant information. MvHo-IB introduces several key innovations: (1) a principled method that combines O-information from information theory with a matrix-based Renyi alpha-order entropy estimator to quantify and extract HOIs, (2) a purpose-built Brain3DCNN encoder to effectively utilize these interactions, and (3) a new multi-view learning information bottleneck objective to enhance representation learning. Experiments on three benchmark fMRI datasets demonstrate that MvHo-IB achieves state-of-the-art performance, significantly outperforming previous methods, including recent hypergraph-based techniques. The implementation of MvHo-IB is available at https://github.com/zky04/MvHo-IB.

最近的证据表明,在功能磁共振成像(fMRI)数据中建模高阶互动(HOI)可提高机器学习系统的诊断准确性,然而,有效提取和利用HOI仍是一个重大挑战。在这项工作中,我们提议MvHo-IB,这是一个新的多视角学习框架,将双向互动和HOI用于诊断决策,同时自动压缩与任务相关的冗余信息。 MvHo-IB引入了几项关键创新:(1) 一种原则性方法,将信息理论中的O信息与基于矩阵的Renyi alpha-serectropy Extrapy entrapy Exminator相结合,以量化和提取HOIs,(2) 一种专门制造的Melean3DCNNN 编码器,以有效利用这些互动,(3) 一个新的多视角学习信息瓶颈目标,以加强代表性学习。关于三个基准的FMMRI数据集的实验表明,MvHo-IB实现了最先进的性能,大大超过以往的方法,包括最近的高射法技术。MvHO-IB的落实MvHO-IB,可在https://github.com/zky/vky/Ms.


Article 6

Title@2025-07-03 (4): Evaluating Frontier Models for Stealth and Situational Awareness

Title: Evaluating Frontier Models for Stealth and Situational Awareness Bewertung von Frontier-Modellen für Stealth und Situationsbewusstsein 评价隐形和了解情况的边界模式 2505.01420v4

Authors (9): Mary Phuong, Roland S. Zimmermann, Ziyue Wang, David Lindner, Victoria Krakovna, Sarah Cogan, Allan Dafoe, Lewis Ho, Rohin Shah

Recent work has demonstrated the plausibility of frontier AI models scheming – knowingly and covertly pursuing an objective misaligned with its developer’s intentions. Such behavior could be very hard to detect, and if present in future advanced systems, could pose severe loss of control risk. It is therefore important for AI developers to rule out harm from scheming prior to model deployment. In this paper, we present a suite of scheming reasoning evaluations measuring two types of reasoning capabilities that we believe are prerequisites for successful scheming: First, we propose five evaluations of ability to reason about and circumvent oversight (stealth). Second, we present eleven evaluations for measuring a model’s ability to instrumentally reason about itself, its environment and its deployment (situational awareness). We demonstrate how these evaluations can be used as part of a scheming inability safety case: a model that does not succeed on these evaluations is almost certainly incapable of causing severe harm via scheming in real deployment. We run our evaluations on current frontier models and find that none of them show concerning levels of either situational awareness or stealth.

最近的工作表明,前沿AI模型的构思 – – 知情和秘密地追求与其开发者的意图不相符的目标,是可信的。这种行为可能很难检测到,如果在未来的先进系统中存在,则可能造成严重的控制风险。因此,AI开发商必须排除模型部署前的构思的伤害。在本文中,我们提出一套基于两种推理能力的系统化推理评价,我们认为它们是成功构思的先决条件:首先,我们提议对合理和规避监督的能力进行五项评价。第二,我们提出11项评价,以衡量模型对自己、环境及其部署进行工具性解释的能力(情景意识)。我们证明这些评价如何被作为规划无能安全案例的一部分:在实际部署中无法成功进行这些评价的模式几乎肯定无法通过规划造成严重损害。我们对现有前沿模型进行评估,发现其中没有一项显示对形势的认识水平或隐形情况。


Article 7

Title@2025-07-03 (4): LLM-Driven Treatment Effect Estimation Under Inference Time Text Confounding

Title: LLM-Driven Treatment Effect Estimation Under Inference Time Text Confounding LLM-getriebene Behandlung Wirkung Schätzung unter Inferenzzeit Text Confounding LLM-驱动处理效应估算 2507.02843v1

Authors (4): Yuchen Ma, Dennis Frauen, Jonas Schweisthal, Stefan Feuerriegel

Estimating treatment effects is crucial for personalized decision-making in medicine, but this task faces unique challenges in clinical practice. At training time, models for estimating treatment effects are typically trained on well-structured medical datasets that contain detailed patient information. However, at inference time, predictions are often made using textual descriptions (e.g., descriptions with self-reported symptoms), which are incomplete representations of the original patient information. In this work, we make three contributions. (1) We show that the discrepancy between the data available during training time and inference time can lead to biased estimates of treatment effects. We formalize this issue as an inference time text confounding problem, where confounders are fully observed during training time but only partially available through text at inference time. (2) To address this problem, we propose a novel framework for estimating treatment effects that explicitly accounts for inference time text confounding. Our framework leverages large language models together with a custom doubly robust learner to mitigate biases caused by the inference time text confounding. (3) Through a series of experiments, we demonstrate the effectiveness of our framework in real-world applications.

估计治疗效果对于医学中个人化决策至关重要,但这一任务在临床实践中面临独特的挑战。在培训时间,估计治疗效果的模型通常在结构完善的医疗数据集中培训,其中包括详细的病人信息。然而,在推论时间,往往使用文字描述(如自我报告症状的描述)作出预测,这些描述没有完全反映最初病人的信息。在这项工作中,我们作出三项贡献。(1) 我们表明,培训时间和推论时间期间提供的数据之间的差异可能导致对治疗效果的偏差估计。我们将此问题正式确定为一种推断时间案文所困扰的问题,在培训时间里完全观察了那些混杂的人,但只能通过推断时间的文字提供部分数据。(2) 为解决这一问题,我们提出了一个新的框架,用以估计治疗效果,明确说明时间推论的文字构造。我们的框架利用了大型语言模型和一种定制的双重强健健的学习者,以减轻因推算时间文本所造成偏差。(3) 通过一系列实验,我们展示了我们框架在现实世界中的有效性。


Article 8

Title@2025-07-03 (4): StepHint: Multi-level Stepwise Hints Enhance Reinforcement Learning to Reason

Title: StepHint: Multi-level Stepwise Hints Enhance Reinforcement Learning to Reason StepHint: Mehrstufige stufenweise Hinweise stärken das Lernen zur Vernunft 步进提示:多级分步骤将强化学习提升到合理 2507.02841v1

Authors (7): Kaiyi Zhang, Ang Lv, Jinpeng Li, Yongbo Wang, Feng Wang, Haoyuan Hu, Rui Yan

Reinforcement learning with verifiable rewards (RLVR) is a promising approach for improving the complex reasoning abilities of large language models (LLMs). However, current RLVR methods face two significant challenges: the near-miss reward problem, where a small mistake can invalidate an otherwise correct reasoning process, greatly hindering training efficiency; and exploration stagnation, where models tend to focus on solutions within their comfort zone,'' lacking the motivation to explore potentially more effective alternatives. To address these challenges, we propose StepHint, a novel RLVR algorithm that utilizes multi-level stepwise hints to help models explore the solution space more effectively. StepHint generates valid reasoning chains from stronger models and partitions these chains into reasoning steps using our proposed adaptive partitioning method. The initial few steps are used as hints, and simultaneously, multiple-level hints (each comprising a different number of steps) are provided to the model. This approach directs the model's exploration toward a promising solution subspace while preserving its flexibility for independent exploration. By providing hints, StepHint mitigates the near-miss reward problem, thereby improving training efficiency. Additionally, the external reasoning pathways help the model develop better reasoning abilities, enabling it to move beyond itscomfort zone’’ and mitigate exploration stagnation. StepHint outperforms competitive RLVR enhancement methods across six mathematical benchmarks, while also demonstrating superior generalization and excelling over baselines on out-of-domain benchmarks.

以可核查的奖励加强学习(RLVR)是提高大型语言模型复杂推理能力的一个很有希望的方法。然而,目前的RLVR方法面临两大挑战:一是近乎的奖赏问题,一个小错误可能使原本正确的推理过程失效,大大妨碍培训效率;二是探索停滞,其中模型往往侧重于“舒适地带”内的解决办法,缺乏探索可能更加有效的替代方案的积极性。为了应对这些挑战,我们提议Stephint,这是一个新的RLVR算法,它利用多层次的分级提示帮助模型更有效地探索解决方案空间。StepHint从较强的模型中产生有效的推理链,用我们提议的适应性分解法将这些链分成为推理步骤。最初的几个步骤被用作提示,同时向模型提供多重提示(每个步骤由不同的步骤组成),该方法引导模型的探索走向一个有希望的通用解决方案子空间,同时保持其独立探索的灵活性。通过提供提示, StepHper 减轻近乎失败的奖赏问题,从而提高培训效率。此外,外部推理学路径将它推向更高的升级,同时推导出其升级,同时推向更高的区域。


Article 9

Title@2025-07-03 (4): From Web Search towards Agentic Deep Research: Incentivizing Search with Reasoning Agents

Title: From Web Search towards Agentic Deep Research: Incentivizing Search with Reasoning Agents Von der Web-Suche in Richtung Agentic Deep Research: Incentivizing Search with Reasoning Agents 从网络搜索到代理深层研究:激励使用理性代理进行搜索 2506.18959v3

Authors (23): Weizhi Zhang, Yangning Li, Yuanchen Bei, Junyu Luo, Guancheng Wan, Liangwei Yang, Chenxuan Xie, Yuyao Yang, Wei-Chieh Huang, Chunyu Miao, Henry Peng Zou, Xiao Luo, Yusheng Zhao, Yankai Chen, Chunkit Chan, Peilin Zhou, Xinyang Zhang, Chenwei Zhang, Jingbo Shang, Ming Zhang, Yangqiu Song, Irwin King, Philip S. Yu

Information retrieval is a cornerstone of modern knowledge acquisition, enabling billions of queries each day across diverse domains. However, traditional keyword-based search engines are increasingly inadequate for handling complex, multi-step information needs. Our position is that Large Language Models (LLMs), endowed with reasoning and agentic capabilities, are ushering in a new paradigm termed Agentic Deep Research. These systems transcend conventional information search techniques by tightly integrating autonomous reasoning, iterative retrieval, and information synthesis into a dynamic feedback loop. We trace the evolution from static web search to interactive, agent-based systems that plan, explore, and learn. We also introduce a test-time scaling law to formalize the impact of computational depth on reasoning and search. Supported by benchmark results and the rise of open-source implementations, we demonstrate that Agentic Deep Research not only significantly outperforms existing approaches, but is also poised to become the dominant paradigm for future information seeking. All the related resources, including industry products, research papers, benchmark datasets, and open-source implementations, are collected for the community in https://github.com/DavidZWZ/Awesome-Deep-Research.

我们的立场是,具有推理和代理能力的大型语言模型(LLMS)正在引入一个新的范式,称为 “ 干深研究 “ 。这些系统通过紧密整合自主推理、迭代检索和信息合成,超越常规信息搜索技术,形成动态反馈循环。我们追踪从静态网络搜索到互动、代理系统的变化,这些系统计划、探索和学习。我们还引入了测试-时间缩放法,以正式确定计算深度对推理和搜索的影响。我们借助基准结果和开放源执行的崛起,证明 “ 干深研究 “ 不仅大大超越了现有方法,而且还准备成为未来信息搜索的主要范式。所有相关资源,包括工业产品、研究论文、基准数据集和开放源实施,都在https://github.com/DavidZZZ/Awesome-Deep-Research中为社区收集了所有相关资源,包括工业产品、研究文件、基准数据集、基准数据集和开放源实施。


Article 10

Title@2025-07-03 (4): Membership Inference Attacks as Privacy Tools: Reliability, Disparity and Ensemble

Title: Membership Inference Attacks as Privacy Tools: Reliability, Disparity and Ensemble Mitgliedschafts-Inferenz Angriffe als Datenschutz-Tools: Zuverlässigkeit, Disparität und Ensemble 以成员攻击作为隐私工具:可靠性、差异性和共性 2506.13972v2

Authors (6): Zhiqi Wang, Chengyu Zhang, Yuetian Chen, Nathalie Baracaldo, Swanand Kadhe, Lei Yu

Membership inference attacks (MIAs) pose a significant threat to the privacy of machine learning models and are widely used as tools for privacy assessment, auditing, and machine unlearning. While prior MIA research has primarily focused on performance metrics such as AUC, accuracy, and TPR@low FPR - either by developing new methods to enhance these metrics or using them to evaluate privacy solutions - we found that it overlooks the disparities among different attacks. These disparities, both between distinct attack methods and between multiple instantiations of the same method, have crucial implications for the reliability and completeness of MIAs as privacy evaluation tools. In this paper, we systematically investigate these disparities through a novel framework based on coverage and stability analysis. Extensive experiments reveal significant disparities in MIAs, their potential causes, and their broader implications for privacy evaluation. To address these challenges, we propose an ensemble framework with three distinct strategies to harness the strengths of state-of-the-art MIAs while accounting for their disparities. This framework not only enables the construction of more powerful attacks but also provides a more robust and comprehensive methodology for privacy evaluation.

成员攻击(MIAs)对机器学习模式的隐私构成重大威胁,并被广泛用作隐私评估、审计和机器不学习的工具。虽然以前MIA的研究主要侧重于性能衡量标准,如AUC、准确性和TPR@low FPR-或者通过开发新方法加强这些衡量标准,或者利用这些新方法评价隐私解决方案-我们发现它忽视了不同袭击之间的差异。这些差异,无论是不同的攻击方法之间还是同一方法的多重即时性,都对MIA作为隐私评价工具的可靠性和完整性产生重要影响。在本文件中,我们通过基于覆盖面和稳定性分析的新框架系统地调查这些差异。广泛的实验揭示了MIAs、其潜在原因及其对隐私评价的更广泛影响。为应对这些挑战,我们提出了一个包含三个不同战略的组合框架,以利用最新MIA的优势,同时考虑其差异。这个框架不仅能够构建更强大的攻击,而且还为隐私评价提供了更有力和全面的方法。


Article 11

Title@2025-07-03 (4): ExPO: Unlocking Hard Reasoning with Self-Explanation-Guided Reinforcement Learning

Title: ExPO: Unlocking Hard Reasoning with Self-Explanation-Guided Reinforcement Learning ExPO: Entsperren harter Vernunft mit selbsterklärungsgeführtem Verstärkungslernen ExPO: 以自我剥削指导强化学习来解锁困难理由 2507.02834v1

Authors (4): Ruiyang Zhou, Shuozhe Li, Amy Zhang, Liu Leqi

Recent advances in large language models have been driven by reinforcement learning (RL)-style post-training, which improves reasoning by optimizing model outputs based on reward or preference signals. GRPO-style approaches implement this by using self-generated samples labeled by an outcome-based verifier. However, these methods depend heavily on the model’s initial ability to produce positive samples. They primarily refine what the model already knows (distribution sharpening) rather than enabling the model to solve problems where it initially fails. This limitation is especially problematic in early-stage RL training and on challenging reasoning tasks, where positive samples are unlikely to be generated. To unlock reasoning ability in such settings, the model must explore new reasoning trajectories beyond its current output distribution. Such exploration requires access to sufficiently good positive samples to guide the learning. While expert demonstrations seem like a natural solution, we find that they are often ineffective in RL post-training. Instead, we identify two key properties of effective positive samples: they should (1) be likely under the current policy, and (2) increase the model’s likelihood of predicting the correct answer. Based on these insights, we propose $\textbf{Self-Explanation Policy Optimization (ExPO)}$-a simple and modular framework that generates such samples by conditioning on the ground-truth answer. ExPO enables efficient exploration and guides the model to produce reasoning trajectories more aligned with its policy than expert-written CoTs, while ensuring higher quality than its own (incorrect) samples. Experiments show that ExPO improves both learning efficiency and final performance on reasoning benchmarks, surpassing expert-demonstration-based methods in challenging settings such as MATH level-5, where the model initially struggles the most.

大型语言模型的最近进步是由强化学习(RL)式的后培训驱动的,这种培训通过优化基于奖赏或优惠信号的模型产出来改进推理。GROPO式的方法通过使用由基于结果的核查者贴上标签的自产样本来实施。然而,这些方法在很大程度上取决于模型最初生产积极样本的能力。它们主要是完善模型已经知道的(分配的精锐化),而不是使模型能够解决最初失败的问题。这种限制在RL早期质量培训和具有挑战性推理任务中特别成问题,因为不可能产生正面的样本。为了在这种环境下释放推理能力,模型必须探索超出其当前产出分布范围的新推理轨。这种探索需要获得足够好的正面样本来指导学习。专家演示似乎是一种自然解决方案,但我们发现它们往往在RL后培训中无效。相反,我们找出有效正选样本的两个关键属性:它们应该基于当前政策的基础(1) , 改进模型, 而不是用于预测正确答案的可能性。基于这些更高级的推理能力, 我们提议在最高级的推理学前的推算中, 和最精确的推算的推算方法显示这样的推算。


Article 12

Title@2025-07-03 (4): DNN-Based Precoding in RIS-Aided mmWave MIMO Systems With Practical Phase Shift

Title: DNN-Based Precoding in RIS-Aided mmWave MIMO Systems With Practical Phase Shift DNN-basierte Vorkodierung in RIS-gestützten mmWave MIMO-Systemen mit praktischer Phasenverschiebung 以DNN为基础的RIS-援助毫米 wave MIMO系统实际阶段调整前编码 2507.02824v1

Authors (6): Po-Heng Chou, Ching-Wen Chen, Wan-Jen Huang, Walid Saad, Yu Tsao, Ronald Y. Chang

In this paper, the precoding design is investigated for maximizing the throughput of millimeter wave (mmWave) multiple-input multiple-output (MIMO) systems with obstructed direct communication paths. In particular, a reconfigurable intelligent surface (RIS) is employed to enhance MIMO transmissions, considering mmWave characteristics related to line-of-sight (LoS) and multipath effects. The traditional exhaustive search (ES) for optimal codewords in the continuous phase shift is computationally intensive and time-consuming. To reduce computational complexity, permuted discrete Fourier transform (DFT) vectors are used for finding codebook design, incorporating amplitude responses for practical or ideal RIS systems. However, even if the discrete phase shift is adopted in the ES, it results in significant computation and is time-consuming. Instead, the trained deep neural network (DNN) is developed to facilitate faster codeword selection. Simulation results show that the DNN maintains sub-optimal spectral efficiency even as the distance between the end-user and the RIS has variations in the testing phase. These results highlight the potential of DNN in advancing RIS-aided systems.

在本文中,对预先编码的设计进行了调查,以尽量扩大具有阻断直接通信路径的千兆瓦(mmWave)多输入多输出(MIMO)系统的吞吐量。特别是,使用可重新配置的智能表面(RIS)来增强IMO的传输,同时考虑到与视线(LOS)和多路径效应有关的毫米Wave特性。在连续阶段转换中,对最佳编码词进行传统的彻底搜索(ES)是计算密集和耗时的。为了降低计算复杂性,即使终端用户与RIS之间的距离,DNN仍然使用离散的Fourier变异(DFT)矢量器来寻找代码手册设计,包括实用或理想的RIS系统的振动反应。然而,即使分散的阶段转换在ES中被采用,它也会导致大量计算和耗时。相反,经过训练的深神经网络(DNNN)是为了促进更快的编码选择。模拟结果显示,即使在测试阶段中,DNNU的光谱效率也是随着终端用户与RIS之间的距离而变化。这些结果突出DNIS的潜能。


Article 13

Title@2025-07-03 (4): SynapseRoute: An Auto-Route Switching Framework on Dual-State Large Language Model

Title: SynapseRoute: An Auto-Route Switching Framework on Dual-State Large Language Model SynapseRoute: Ein Auto-Routen-Schaltrahmen für das Dual-State Large Language Model SynapseRoute:关于两州大语言模式的自动运行切换框架 2507.02822v1

Authors (12): Wencheng Zhang, Shiqin Qiao, Lingjie Luo, Yinfeng Li, Chuanyang Zheng, Qian Xu, Meng Li, Yong Gui, Yijun He, Jianing Qiu, Jindong Hong, Jiankai Sun

With the widespread adoption of large language models (LLMs) in practical applications, selecting an appropriate model requires balancing not only performance but also operational cost. The emergence of reasoning-capable models has further widened the cost gap between “thinking” (high reasoning) and “non-thinking” (fast, low-cost) modes. In this work, we reveal that approximately 58% of medical questions can be accurately answered by the non-thinking mode alone, without requiring the high-cost reasoning process. This highlights a clear dichotomy in problem complexity and suggests that dynamically routing queries to the appropriate mode based on complexity could optimize accuracy, cost-efficiency, and overall user experience. Based on this, we further propose SynapseRoute, a machine learning-based dynamic routing framework that intelligently assigns input queries to either thinking or non-thinking modes. Experimental results on several medical datasets demonstrate that SynapseRoute not only improves overall accuracy (0.8390 vs. 0.8272) compared to the thinking mode alone but also reduces inference time by 36.8% and token consumption by 39.66%. Importantly, qualitative analysis indicates that over-reasoning on simpler queries can lead to unnecessary delays and even decreased accuracy, a pitfall avoided by our adaptive routing. Finally, this work further introduces the Accuracy-Inference-Token (AIT) index to comprehensively evaluate the trade-offs among accuracy, latency, and token cost.

随着在实际应用中广泛采用大型语言模型(LLMs),选择一个合适的模型不仅需要平衡业绩,还需要平衡操作成本。具有推理能力的模型的出现进一步扩大了“思考”(高推理)和“非思考”(快速、低成本)模式之间的成本差距。在这项工作中,我们发现大约58%的医疗问题可以通过非思考模式来准确解答,而不必要求高成本推理程序。这凸显了问题复杂性的明显分化,并表明根据复杂性对适当模式进行不言而喻的查询可以优化准确性、成本效益和总体用户经验。基于这一点,我们进一步建议SyapseRoute(一个基于机器学习的动态路由框架),一个明智地将信息输入到思维模式或非思考模式的动态(快速、低成本)。 几个医学数据集的实验结果显示,SyapseRoute不仅提高了总体准确性(0.8390 vs. 0.8772),而且仅与思维模式相比,而且还会将准确性降低准确性的时间减少36.8%和象征消费,甚至减少39.66%。最后,质量分析表明,这种不必要、更精确性会降低。


Article 14

Title@2025-07-03 (4): Measurement as Bricolage: Examining How Data Scientists Construct Target Variables for Predictive Modeling Tasks

Title: Measurement as Bricolage: Examining How Data Scientists Construct Target Variables for Predictive Modeling Tasks Messung als Bricolage: Untersuchung, wie Datenwissenschaftler Zielvariablen für vorausschauende Modellierungsaufgaben konstruieren 计量为 “ 饼干测量 “ :审查数据科学家如何构建预测模拟任务的目标变量 2507.02819v1

Authors (5): Luke Guerdan, Devansh Saxena, Stevie Chancellor, Zhiwei Steven Wu, Kenneth Holstein

Data scientists often formulate predictive modeling tasks involving fuzzy, hard-to-define concepts, such as the “authenticity” of student writing or the “healthcare need” of a patient. Yet the process by which data scientists translate fuzzy concepts into a concrete, proxy target variable remains poorly understood. We interview fifteen data scientists in education (N=8) and healthcare (N=7) to understand how they construct target variables for predictive modeling tasks. Our findings suggest that data scientists construct target variables through a bricolage process, involving iterative negotiation between high-level measurement objectives and low-level practical constraints. Data scientists attempt to satisfy five major criteria for a target variable through bricolage: validity, simplicity, predictability, portability, and resource requirements. To achieve this, data scientists adaptively use problem (re)formulation strategies, such as swapping out one candidate target variable for another when the first fails to meet certain criteria (e.g., predictability), or composing multiple outcomes into a single target variable to capture a more holistic set of modeling objectives. Based on our findings, we present opportunities for future HCI, CSCW, and ML research to better support the art and science of target variable construction.

数据科学家往往制定预测性模型任务,涉及模糊、难以确定的概念,例如学生写作的“真实性”或病人的“保健需要”等“学生写作的“真实性”或病人的“保健需要”等概念。然而,数据科学家将模糊概念转化为具体替代目标变量的过程仍然不甚为人理解。我们采访了教育(N=8)和保健(N=7)领域的15名数据科学家,以了解他们如何为预测性模型任务构建目标变量。我们的调查结果表明,数据科学家通过一个粗略过程,即高层次测量目标和低层次实际限制之间的迭代谈判,来构建目标变量。数据科学家试图通过“bricolage”满足目标变量的五项主要标准:有效性、简洁性、可预测性、可移植性和资源要求。为了实现这一目标,数据科学家将适应性地使用问题(重新)成型战略,例如当第一个目标不符合某些标准(e.g.可预测性)时,将一个候选目标变量转换成另一个目标变量,或者将多重结果合成成一个单一的目标变量,以捕捉到更全面的模型目标。基于我们的研究结果,我们为未来的HCI、CS和MCW的可变式研究目标。


Article 15

Title@2025-07-03 (4): Replicable Distribution Testing

Title: Replicable Distribution Testing Reproduzierbare Verteilungsprüfung 可复制分发测试 2507.02814v1

Authors (5): Ilias Diakonikolas, Jingyi Gao, Daniel Kane, Sihan Liu, Christopher Ye

We initiate a systematic investigation of distribution testing in the framework of algorithmic replicability. Specifically, given independent samples from a collection of probability distributions, the goal is to characterize the sample complexity of replicably testing natural properties of the underlying distributions. On the algorithmic front, we develop new replicable algorithms for testing closeness and independence of discrete distributions. On the lower bound front, we develop a new methodology for proving sample complexity lower bounds for replicable testing that may be of broader interest. As an application of our technique, we establish near-optimal sample complexity lower bounds for replicable uniformity testing – answering an open question from prior work – and closeness testing.

我们开始系统调查算法复制性框架内的分布测试。 具体地说, 从一系列概率分布的独立的样本来看, 目标是确定可复制性测试基础分布的自然特性的样本复杂性。 在算法前方, 我们开发了新的可复制的算法, 用于测试离散分布的近距离和独立性。 在下界前方, 我们开发了一种新的方法, 用于证明样本复杂性的较低范围, 用于进行可能具有更广泛兴趣的可复制性测试。 作为我们技术的应用, 我们为可复制的统一性测试设定了近乎最佳的样本复杂性, 用于可复制性的统一性测试( 回答先前工作中的一个公开问题) 和近距离测试。


Article 16

Title@2025-07-03 (4): In-Training Multicalibrated Survival Analysis for Healthcare via Constrained Optimization

Title: In-Training Multicalibrated Survival Analysis for Healthcare via Constrained Optimization In-Training Multikalibrierte Überlebensanalyse für das Gesundheitswesen durch eingeschränkte Optimierung 通过 “ 限制最佳化 “ ,对通过 “ 限制优化 “ 进行的 “ 多校制 “ 多校准生存分析 “ 进行的 “ 多校准 “ 培训中 “ 多校准 “ 健康生存分析 “ 2507.02807v1

Authors (2): Thiti Suttaket, Stanley Kok

Survival analysis is an important problem in healthcare because it models the relationship between an individual’s covariates and the onset time of an event of interest (e.g., death). It is important for survival models to be well-calibrated (i.e., for their predicted probabilities to be close to ground-truth probabilities) because badly calibrated systems can result in erroneous clinical decisions. Existing survival models are typically calibrated at the population level only, and thus run the risk of being poorly calibrated for one or more minority subpopulations. We propose a model called GRADUATE that achieves multicalibration by ensuring that all subpopulations are well-calibrated too. GRADUATE frames multicalibration as a constrained optimization problem, and optimizes both calibration and discrimination in-training to achieve a good balance between them. We mathematically prove that the optimization method used yields a solution that is both near-optimal and feasible with high probability. Empirical comparisons against state-of-the-art baselines on real-world clinical datasets demonstrate GRADUATE’s efficacy. In a detailed analysis, we elucidate the shortcomings of the baselines vis-a-vis GRADUATE’s strengths.

在医疗保健方面,生存分析是一个重要问题,因为它模拟了个人同变体和感兴趣事件(例如死亡)的起始时间之间的关系。生存模型必须精确校准(即其预测的概率接近地面的精确度概率),因为校准不当的系统可能导致错误的临床决定。现有的生存模型通常只在人口一级进行校准,从而有可能对一个或多个少数群体子人口进行错误校准。我们提议了一个称为GRADUATE的模型,通过确保所有子人口都有良好的校准来实现多重校准。GRADUATE将多重校准作为限制的优化问题,并优化培训中的校准和歧视,以达到良好的平衡。我们用数学证明,使用优化方法可以产生一种解决办法,既接近最优化,又可行,而且概率很高。我们建议一种称为GRADUATE(GDUATE)的状态基线对比,从而实现多重校准。在一项详细的分析中,我们对GRADUATE(GRAUATE)的效能作了详细的分析,我们从基准中找出了GRAUATE(GRATIAT)的缺点。


Article 17

Title@2025-07-03 (4): Transferrable Surrogates in Expressive Neural Architecture Search Spaces

Title: Transferrable Surrogates in Expressive Neural Architecture Search Spaces Übertragbare Surrogate in Expressive Neural Architecture Suchplätze 显性神经结构搜索空间中的可转让代孕人 2504.12971v3

Authors (8): Shiwen Qin, Gabriela Kadlecová, Martin Pilát, Shay B. Cohen, Roman Neruda, Elliot J. Crowley, Jovita Lukasik, Linus Ericsson

Neural architecture search (NAS) faces a challenge in balancing the exploration of expressive, broad search spaces that enable architectural innovation with the need for efficient evaluation of architectures to effectively search such spaces. We investigate surrogate model training for improving search in highly expressive NAS search spaces based on context-free grammars. We show that i) surrogate models trained either using zero-cost-proxy metrics and neural graph features (GRAF) or by fine-tuning an off-the-shelf LM have high predictive power for the performance of architectures both within and across datasets, ii) these surrogates can be used to filter out bad architectures when searching on novel datasets, thereby significantly speeding up search and achieving better final performances, and iii) the surrogates can be further used directly as the search objective for huge speed-ups.

神经结构搜索(NAS)在平衡探索能够实现建筑创新的表达式、宽广搜索空间与有效搜索这些空间的建筑设计高效评估需要之间,面临挑战。我们调查替代模型培训,以改善基于无上下文语法的高度表达式NAS搜索空间的搜索。我们显示,i)替代模型,要么是使用零成本代用光量和神经图形特征(GRAF)培训的,要么是对现成LM进行微调,对于数据集内部和跨数据集的建筑性能具有高预测力。 ii)这些替代机器人可用于在搜索新数据集时过滤坏建筑,从而大大加快搜索并实现更好的最终性能。 iii) 替代模型可以进一步直接用作巨大超高速的搜索目标。


Article 18

Title@2025-07-03 (4): Learning to Coordinate Bidders in Non-Truthful Auctions

Title: Learning to Coordinate Bidders in Non-Truthful Auctions Lernen, Bieter in nicht-wahrheitsgemäßen Auktionen zu koordinieren 学习如何在非真话拍卖中协调投标人 2507.02801v1

Authors (2): Hu Fu, Tao Lin

In non-truthful auctions such as first-price and all-pay auctions, the independent strategic behaviors of bidders, with the corresponding equilibrium notion – Bayes Nash equilibria – are notoriously difficult to characterize and can cause undesirable outcomes. An alternative approach to designing better auction systems is to coordinate the bidders: let a mediator make incentive-compatible recommendations of correlated bidding strategies to the bidders, namely, implementing a Bayes correlated equilibrium (BCE). The implementation of BCE, however, requires knowledge of the distribution of bidders’ private valuations, which is often unavailable. We initiate the study of the sample complexity of learning Bayes correlated equilibria in non-truthful auctions. We prove that the BCEs in a large class of non-truthful auctions, including first-price and all-pay auctions, can be learned with a polynomial number $\tilde O(\frac{n}{\varepsilon^2})$ of samples from the bidders’ value distributions. Our technique is a reduction to the problem of estimating bidders’ expected utility from samples, combined with an analysis of the pseudo-dimension of the class of all monotone bidding strategies of bidders.

在非真实的拍卖中,如第一价和全价拍卖,投标人的独立战略行为,加上相应的均衡概念 – – Bays Nash equilibria – – 很难定性,而且可能造成不可取的结果。设计更好的拍卖制度的另一种办法是协调投标人:让调解人就相关的投标战略向投标人提出与奖励兼容的建议,即实施一个贝耶斯关联的均衡(BCE)。然而,实施BCE需要了解投标人的私人估值的分配情况,而这种分配往往无法获得。我们开始研究在非真实拍卖中学习贝耶斯相关电子平衡的抽样复杂性。我们证明,在大规模非真实拍卖中,包括第一价格和全价拍卖中的BCE可以学习,从投标人所有价值分配的样本的多盘数 $\ tilde O (\ frac{nvarepslon%2} 中学习。我们的方法是减少从抽样中估算投标人的预期效用的问题,同时进行模拟投标人战略的单一分析。


Article 19

Title@2025-07-03 (4): Self-Steering Deep Non-Linear Spatially Selective Filters for Efficient Extraction of Moving Speakers under Weak Guidance

Title: Self-Steering Deep Non-Linear Spatially Selective Filters for Efficient Extraction of Moving Speakers under Weak Guidance Selbstlenkende, nicht-lineare, räumlich selektive Filter zur effizienten Extraktion von beweglichen Lautsprechern unter schwacher Anleitung 在薄弱指导下有效抽取流动发言者的自制深不滑空间选择过滤器 2507.02791v1

Authors (4): Jakob Kienegger, Alina Mannanova, Huajian Fang, Timo Gerkmann

Recent works on deep non-linear spatially selective filters demonstrate exceptional enhancement performance with computationally lightweight architectures for stationary speakers of known directions. However, to maintain this performance in dynamic scenarios, resource-intensive data-driven tracking algorithms become necessary to provide precise spatial guidance conditioned on the initial direction of a target speaker. As this additional computational overhead hinders application in resource-constrained scenarios such as real-time speech enhancement, we present a novel strategy utilizing a low-complexity tracking algorithm in the form of a particle filter instead. Assuming a causal, sequential processing style, we introduce temporal feedback to leverage the enhanced speech signal of the spatially selective filter to compensate for the limited modeling capabilities of the particle filter. Evaluation on a synthetic dataset illustrates how the autoregressive interplay between both algorithms drastically improves tracking accuracy and leads to strong enhancement performance. A listening test with real-world recordings complements these findings by indicating a clear trend towards our proposed self-steering pipeline as preferred choice over comparable methods.

最近关于深非线性空间选择性过滤器的工程显示,对于已知方向的固定发言者来说,计算式轻量级结构的改进效果非常出色。然而,为了在动态情景中保持这种表现,有必要采用资源密集型数据驱动跟踪算法,以目标发言者最初的方向为条件提供精确的空间指导。由于这一额外的计算间接成本阻碍了资源限制情景的应用,例如实时语音增强,我们提出了一个新颖的战略,以粒子过滤器的形式使用低复杂性跟踪算法。假设一种因果、顺序处理风格,我们引入时间反馈,利用空间选择性过滤器的强化语音信号,以弥补粒子过滤器有限的建模能力。对合成数据集的评估表明,两种算法之间的自动反向相互作用如何极大地改进了跟踪准确性并导致强大的增强性能。用真实世界记录进行监听测试补充了这些结果,表明我们提议的自调管道的明显趋势是选择比可比较的方法。


Article 20

Title@2025-07-03 (4): GPAS: Accelerating Convergence of LLM Pretraining via Gradient-Preserving Activation Scaling

Title: GPAS: Accelerating Convergence of LLM Pretraining via Gradient-Preserving Activation Scaling GPAS: Beschleunigung der Konvergenz des LLM-Vortrainings durch Gradient-Preserving Activation Scaling GPAS:通过 “ 渐进式保留动力扩增 “ 加速汇集LLM预备训练 2506.22049v2

Authors (15): Tianhao Chen, Xin Xu, Zijing Liu, Pengxiang Li, Xinyuan Song, Ajay Kumar Jaiswal, Fan Zhang, Jishan Hu, Yang Wang, Hao Chen, Shizhe Diao, Shiwei Liu, Yu Li, Lu Yin, Can Yang

Modern Large Language Models, such as the LLaMA, Qwen and DeepSeek series, predominantly adopt the Pre-LayerNorm (Pre-LN) Transformer architecture. While being stable during pretraining and scalable to large model sizes, Pre-LN suffers from an exponential growth in activation variance across layers, causing the shortcut to dominate over sub-layer outputs in the residual connection and limiting the learning capacity of deeper layers. To mitigate this issue, we propose Gradient-Preserving Activation Scaling (GPAS), a simple technique that can be used in combination with existing approaches. GPAS works by scaling down the intermediate activations while keeping their gradients unchanged. This leaves information in the activations intact, and avoids the gradient vanishing problem associated with gradient downscaling. Extensive experiments across various model sizes from 71M to 1B show that GPAS achieves consistent performance gains. Beyond enhancing Pre-LN Transformers, GPAS also shows promise in improving alternative architectures such as Sandwich-LN and DeepNorm, demonstrating its versatility and potential for improving training dynamics in a wide range of settings. Our code is available at https://github.com/dandingsky/GPAS.

LLAMA、Quen和DeepSeek等现代大型语言模型,主要采用LLAMA、Quen和DeepSeek系列,主要采用LayerNorm (Pre-LN) 变异器结构。在培训前稳定且可缩至大模型大小的同时,LN前会因不同层引发差异的指数性增长而受害,导致在剩余连接方面对次层输出的捷径占支配地位,并限制了更深层的学习能力。为了缓解这一问题,我们建议采用一种简单技术,即渐进式保留活化增强(GPAS),这一技术可以与现有方法相结合使用。GPASS通过缩小中间激活,同时保持其梯度不变。这使激活中的信息保持不变,避免了与梯度降幅缩小相关的梯度问题逐渐消失。从71M到1B的各种模型规模的广泛实验表明,GOS取得了一致的绩效收益。除了加强LN前变异器外,GOS还显示出改进其他结构如桑威奇-LN和Deep Norm(D)的希望改进其他结构,表明其多功能和潜力,在广泛的环境中改进培训动态。我们的代码可在 http://G。


Article 21

Title@2025-07-03 (4): Understanding and Improving Length Generalization in Recurrent Models

Title: Understanding and Improving Length Generalization in Recurrent Models Verständnis und Verbesserung der Längenverallgemeinerung in rekurrenten Modellen 经常模式中的了解和改进时间长度的概述 2507.02782v1

Authors (2): Ricardo Buitrago Ruiz, Albert Gu

Recently, recurrent models such as state space models and linear attention have become popular due to their linear complexity in the sequence length. Thanks to their recurrent nature, in principle they can process arbitrarily long sequences, but their performance sometimes drops considerably beyond their training context lengths-i.e. they fail to length generalize. In this work, we provide comprehensive empirical and theoretical analysis to support the unexplored states hypothesis, which posits that models fail to length generalize when during training they are only exposed to a limited subset of the distribution of all attainable states (i.e. states that would be attained if the recurrence was applied to long sequences). Furthermore, we investigate simple training interventions that aim to increase the coverage of the states that the model is trained on, e.g. by initializing the state with Gaussian noise or with the final state of a different input sequence. With only 500 post-training steps ($\sim 0.1\%$ of the pre-training budget), these interventions enable length generalization for sequences that are orders of magnitude longer than the training context (e.g. $2k\longrightarrow 128k$) and show improved performance in long context tasks, thus presenting a simple and efficient way to enable robust length generalization in general recurrent models.

最近,国家空间模型和线性关注等经常性模型因其在序列长度上的线性复杂性而变得流行。由于其经常性的性质,它们原则上可以任意处理长序列,但其性能有时会大大超过其培训范围的长度,也就是说,它们没有做到一般化。在这项工作中,我们提供全面的经验和理论分析,以支持未探索的状态假设,这种假设假设认为,模型在培训期间仅仅接触到所有可达到的状态分布的有限子集(即如果将重犯应用到长序列中将达到的状态)时,它们没有超时化。 此外,我们调查简单的培训干预措施,目的是扩大该模式所培训的状态的覆盖范围,例如,以高斯噪音或不同输入序列的最后状态为初始化状态。由于只有500个培训后步骤(培训前预算的0.1美元),这些干预能够使比培训范围更长的顺序(例如,2krightrow 128k$)能够实现超时的长度。此外,我们调查了简单的培训干预措施,从而在一般情况下展示了更稳健的经常性模式。


Article 22

Title@2025-07-03 (4): Self-Correction Bench: Revealing and Addressing the Self-Correction Blind Spot in LLMs

Title: Self-Correction Bench: Revealing and Addressing the Self-Correction Blind Spot in LLMs Selbstkorrektionsbank: Enthüllung und Adressierung des Selbstkorrektions-Blindflecks in LLMs 自我校正法官:在LLMs中披露和处理自我校正的盲人点 2507.02778v1

Authors (1): Ken Tsui

Although large language models (LLMs) have become transformative, they still make mistakes and can explore unproductive reasoning paths. Self-correction is an important capability for a trustworthy LLM, particularly an autoregressive LLM. While LLMs can identify error in user input, they exhibit a systematic ‘Self-Correction Blind Spot’ - failing to correct identical error in their own outputs. To systematically study this phenomenon, we introduce Self-Correction Bench, a systematic framework to measure this phenomenon through controlled error injection at three complexity levels. Testing 14 models, we find an average 64.5% blind spot rate. We find multiple evidences that this limitation relates to training data composition: human training demonstrations predominantly show error-free responses rather than error-correction sequences, unlike RL-trained models that learn error correction through outcome feedback. Remarkably, simply appending “Wait” reduces blind spots by 89.3%, suggesting that the capability exists but requires activation. Our work highlights a critical limitation in current LLMs and offers potential avenues for improving their reliability and trustworthiness.

尽管大型语言模型(LLMS)已经转型,但它们仍然会犯错误,并且可以探索非生产性的推理路径。自我校正是值得信赖的LLM(LLM)的重要能力,特别是自动递减的LLM。虽然LLMS可以识别用户输入中的错误,但它们展示了系统化的“自我校正盲点 ” — — 无法纠正自己的产出中的相同错误。为了系统地研究这一现象,我们引入了自我校正法官,这是一个系统化的框架,通过在三个复杂层次的受控错误注入来测量这一现象。测试了14个模型,我们发现平均64.5%的盲点率。我们发现了许多证据,证明这一限制与数据构成有关:人类培训演示主要显示无误反应而不是错误校正序列,而不像受RL培训的模型那样通过结果反馈来学习错误校正。值得注意的是,仅仅附加“等待”就能将盲点减少89.3%,这表明存在这种能力,但需要激活。我们的工作突出了当前LMS的关键性限制,并且提供了提高可靠性和可信度的潜在途径。


Article 23

Title@2025-07-03 (4): LLM-Powered Prediction of Hyperglycemia and Discovery of Behavioral Treatment Pathways from Wearables and Diet

Title: LLM-Powered Prediction of Hyperglycemia and Discovery of Behavioral Treatment Pathways from Wearables and Diet LLM-Powered Vorhersage von Hyperglykämie und Entdeckung von Verhaltenstherapie Wege von Wearables und Diät LLM 法力预测超血糖和从穿衣和饮食中发现的行为处理途径 2503.03935v2

Authors (7): Abdullah Mamun, Asiful Arefeen, Susan B. Racette, Dorothy D. Sears, Corrie M. Whisner, Matthew P. Buman, Hassan Ghasemzadeh

Postprandial hyperglycemia, marked by the blood glucose level exceeding the normal range after consuming a meal, is a critical indicator of progression toward type 2 diabetes in people with prediabetes and in healthy individuals. A key metric for understanding blood glucose dynamics after eating is the postprandial area under the curve (AUC). Predicting postprandial AUC in advance based on a person’s lifestyle factors, such as diet and physical activity level, and explaining the factors that affect postprandial blood glucose could allow an individual to adjust their lifestyle accordingly to maintain normal glucose levels. In this study, we developed an explainable machine learning solution, GlucoLens, that takes sensor-driven inputs and uses advanced data processing, large language models, and trainable machine learning models to predict postprandial AUC and hyperglycemia from diet, physical activity, and recent glucose patterns. We used data obtained from wearables in a five-week clinical trial of 10 adults who worked full-time to develop and evaluate the proposed computational model that integrates wearable sensing, multimodal data, and machine learning. Our machine learning model takes multimodal data from wearable activity and glucose monitoring sensors, along with food and work logs, and provides an interpretable prediction of the postprandial glucose pattern. Our GlucoLens system achieves a normalized root mean squared error (NRMSE) of 0.123 in its best configuration. On average, the proposed technology provides a 16% better performance level compared to the comparison models. Additionally, our technique predicts hyperglycemia with an accuracy of 73.3% and an F1 score of 0.716 and recommends different treatment options to help avoid hyperglycemia through diverse counterfactual explanations. Code available: https://github.com/ab9mamun/GlucoLens.

以食用一顿饭后血糖水平超过正常范围为标志的血压16后高血糖高血糖,是显示有糖尿病的人和健康人中2型糖尿病发展到2型的关键指标。饮食后了解血糖动态的关键衡量标准是曲线下的血浆后地区(AUC)。根据一个人的生活方式因素,如饮食和体育活动水平,提前预测AUC之后的血浆后高血糖;解释影响过去血糖正常水平的因素:73种血糖低血糖可以使一个人相应地调整自己的生活方式,以保持正常的血糖水平。在这个研究中,我们开发了一个可以解释的机器学习解决方案,GlucoLens, 使用传感器驱动的投入,使用先进的数据处理、大语言模型和可训练的机器学习模型,以预测饮食后期、物理活动和最近的葡萄糖糖糖模式。我们利用了从磨损数据得出的数据,10个全职成年人5周的血糖指数临床试验,开发并评价了拟议的计算模型,将模型和模型纳入我们可磨损的Glusmodal 数据, 数据和Moral IML 数据,通过我们可以学习的机算数据,通过一个可以学习的模型, 数据和系统提供一种可磨损数据,通过一种可磨损数据,通过一个可磨损的数据和机机算数据,通过一个可以提供一种可以学习的系统,通过一个可以进行一种可磨损的数据和感测算数据。


Article 24

Title@2025-07-03 (4): KERAP: A Knowledge-Enhanced Reasoning Approach for Accurate Zero-shot Diagnosis Prediction Using Multi-agent LLMs

Title: KERAP: A Knowledge-Enhanced Reasoning Approach for Accurate Zero-shot Diagnosis Prediction Using Multi-agent LLMs KERAP: Ein wissensbasierter Ansatz für genaue Null-Shot-Diagnose-Vorhersage mit Multi-Agent LLMs KERRAP: 利用多种试剂LLMs进行准确零光诊断预测的知识强化理由说明方法 2507.02773v1

Authors (8): Yuzhang Xie, Hejie Cui, Ziyang Zhang, Jiaying Lu, Kai Shu, Fadi Nahab, Xiao Hu, Carl Yang

Medical diagnosis prediction plays a critical role in disease detection and personalized healthcare. While machine learning (ML) models have been widely adopted for this task, their reliance on supervised training limits their ability to generalize to unseen cases, particularly given the high cost of acquiring large, labeled datasets. Large language models (LLMs) have shown promise in leveraging language abilities and biomedical knowledge for diagnosis prediction. However, they often suffer from hallucinations, lack structured medical reasoning, and produce useless outputs. To address these challenges, we propose KERAP, a knowledge graph (KG)-enhanced reasoning approach that improves LLM-based diagnosis prediction through a multi-agent architecture. Our framework consists of a linkage agent for attribute mapping, a retrieval agent for structured knowledge extraction, and a prediction agent that iteratively refines diagnosis predictions. Experimental results demonstrate that KERAP enhances diagnostic reliability efficiently, offering a scalable and interpretable solution for zero-shot medical diagnosis prediction.

医学诊断预测在疾病检测和个性化保健方面发挥着关键作用。虽然对机器学习模式(ML)在这项工作中被广泛采用,但它们对监督培训的依赖限制了其推广到隐形病例的能力,特别是鉴于获取有标签的大型数据集的成本很高。大型语言模型(LLMs)在利用语言能力和生物医学知识进行诊断预测方面显示了希望。然而,这些模型往往存在幻觉,缺乏结构化的医疗推理,并产生无用的产物。为了应对这些挑战,我们建议采用知识图表(KG)强化的推理方法,通过多剂结构改进基于LLM的诊断预测。我们的框架包括属性绘图的连接剂、结构知识提取的检索剂以及反复完善诊断预测的预测剂。实验结果表明,KERAP提高了诊断可靠性,为零速诊断诊断预测提供了可扩展和可解释的解决方案。


Article 25

Title@2025-07-03 (4): Grounding Intelligence in Movement

Title: Grounding Intelligence in Movement Erdungsintelligenz in der Bewegung 移动中的地面情报 2507.02771v1

Authors (6): Melanie Segado, Felipe Parodi, Jordan K. Matelsky, Michael L. Platt, Eva B. Dyer, Konrad P. Kording

Recent advances in machine learning have dramatically improved our ability to model language, vision, and other high-dimensional data, yet they continue to struggle with one of the most fundamental aspects of biological systems: movement. Across neuroscience, medicine, robotics, and ethology, movement is essential for interpreting behavior, predicting intent, and enabling interaction. Despite its core significance in our intelligence, movement is often treated as an afterthought rather than as a rich and structured modality in its own right. This reflects a deeper fragmentation in how movement data is collected and modeled, often constrained by task-specific goals and domain-specific assumptions. But movement is not domain-bound. It reflects shared physical constraints, conserved morphological structures, and purposeful dynamics that cut across species and settings. We argue that movement should be treated as a primary modeling target for AI. It is inherently structured and grounded in embodiment and physics. This structure, often allowing for compact, lower-dimensional representations (e.g., pose), makes it more interpretable and computationally tractable to model than raw, high-dimensional sensory inputs. Developing models that can learn from and generalize across diverse movement data will not only advance core capabilities in generative modeling and control, but also create a shared foundation for understanding behavior across biological and artificial systems. Movement is not just an outcome, it is a window into how intelligent systems engage with the world.

机器学习的最近进步极大地提高了我们模拟语言、视觉和其他高维数据的能力,但是它们仍然在与生物系统最基本的一个方面进行斗争:运动。在神经科学、医学、机器人和神学方面,运动对于解释行为、预测意图和促成互动至关重要。尽管运动在我们的智力中具有核心意义,但它往往被视为事后思考而不是丰富和结构化的方式。这反映出在如何收集和模拟移动数据方面更加支离破碎,往往受到任务特定目标和领域特定假设的限制。但是移动不是受域限制的。它反映了共同的物理限制、受保护的形态结构以及跨越物种和环境的有目的的动态。我们认为,运动应该作为AI的主要模型目标对待。它具有内在的结构和基础。这种结构往往允许简单、低维度的表述(例如,布局),使得它比原始、高维度的感官投入更容易解释和计算为模型。开发能够从不同运动中学习的模型和一般化的形态结构以及贯穿物种和各种环境的有目的的动态动态。我们说,移动应该被当作一个基本的模型和先导力系统。它是如何创造出一个不光学的模型和先进制。


Article 26

Title@2025-07-03 (4): Avoiding Catastrophe in Online Learning by Asking for Help

Title: Avoiding Catastrophe in Online Learning by Asking for Help Vermeiden von Katastrophen im Online-Lernen, indem Sie um Hilfe bitten 通过请求帮助避免在线学习中的灾难 2402.08062v5

Authors (3): Benjamin Plaut, Hanlin Zhu, Stuart Russell

Most learning algorithms with formal regret guarantees assume that all mistakes are recoverable and essentially rely on trying all possible behaviors. This approach is problematic when some mistakes are “catastrophic”, i.e., irreparable. We propose an online learning problem where the goal is to minimize the chance of catastrophe. Specifically, we assume that the payoff in each round represents the chance of avoiding catastrophe in that round and try to maximize the product of payoffs (the overall chance of avoiding catastrophe) while allowing a limited number of queries to a mentor. We also assume that the agent can transfer knowledge between similar inputs. We first show that in general, any algorithm either queries the mentor at a linear rate or is nearly guaranteed to cause catastrophe. However, in settings where the mentor policy class is learnable in the standard online model, we provide an algorithm whose regret and rate of querying the mentor both approach 0 as the time horizon grows. Although our focus is the product of payoffs, we provide matching bounds for the typical additive regret. Conceptually, if a policy class is learnable in the absence of catastrophic risk, it is learnable in the presence of catastrophic risk if the agent can ask for help.

多数具有正式遗憾保证的学习算法都假定所有错误都是可以收回的,并且基本上依赖尝试所有可能的行为。 当有些错误是“灾难性的”,也就是不可弥补的,这个方法就有问题。 我们提出一个在线学习问题,目标是最大限度地减少灾难的可能性。 具体地说, 我们假设每一回合的回报是避免这一回合的灾难的机会, 并尽量扩大补偿的产物( 避免灾难的总机会), 同时允许向导师提出数量有限的询问。 我们还假设代理商可以在类似投入之间转让知识。 我们首先显示, 一般来说, 任何算法要么以线性速度询问导师, 要么几乎可以保证造成灾难。 但是, 在导师政策课可以在标准在线模型中学习的情况下, 我们提供一种算法, 其遗憾和速度是随着时间范围的增长, 向导师们询问0. 两种方法。 尽管我们的重点是回报的产物, 我们为典型的累加遗憾提供了相应的界限。 从概念上看, 如果在没有灾难性风险的情况下, 政策阶级可以学习的话, 在灾难性风险的存在中是可以学习的。


Article 27

Title@2025-07-03 (4): MaizeField3D: A Curated 3D Point Cloud and Procedural Model Dataset of Field-Grown Maize from a Diversity Panel

Title: MaizeField3D: A Curated 3D Point Cloud and Procedural Model Dataset of Field-Grown Maize from a Diversity Panel MaizeField3D: Ein kuratierter 3D-Punkt-Cloud- und Verfahrensmodell-Datensatz von Feld-Grown Maize aus einem Diversity-Panel Maize Fire3D:来自多样性小组的3D点云和实地增长磁场程序模型数据集 2503.07813v3

Authors (9): Elvis Kimara, Mozhgan Hadadi, Jackson Godbersen, Aditya Balu, Talukder Jubery, Yawei Li, Adarsh Krishnamurthy, Patrick S. Schnable, Baskar Ganapathysubramanian

The development of artificial intelligence (AI) and machine learning (ML) based tools for 3D phenotyping, especially for maize, has been limited due to the lack of large and diverse 3D datasets. 2D image datasets fail to capture essential structural details such as leaf architecture, plant volume, and spatial arrangements that 3D data provide. To address this limitation, we present MaizeField3D (https://baskargroup.github.io/MaizeField3D/), a curated dataset of 3D point clouds of field-grown maize plants from a diverse genetic panel, designed to be AI-ready for advancing agricultural research. Our dataset includes 1,045 high-quality point clouds of field-grown maize collected using a terrestrial laser scanner (TLS). Point clouds of 520 plants from this dataset were segmented and annotated using a graph-based segmentation method to isolate individual leaves and stalks, ensuring consistent labeling across all samples. This labeled data was then used for fitting procedural models that provide a structured parametric representation of the maize plants. The leaves of the maize plants in the procedural models are represented using Non-Uniform Rational B-Spline (NURBS) surfaces that were generated using a two-step optimization process combining gradient-free and gradient-based methods. We conducted rigorous manual quality control on all datasets, correcting errors in segmentation, ensuring accurate leaf ordering, and validating metadata annotations. The dataset also includes metadata detailing plant morphology and quality, alongside multi-resolution subsampled point cloud data (100k, 50k, 10k points), which can be readily used for different downstream computational tasks. MaizeField3D will serve as a comprehensive foundational dataset for AI-driven phenotyping, plant structural analysis, and 3D applications in agricultural research.

人造智能(AI)和基于机器学习(ML)工具的开发有限,原因是缺乏大型和多样化的3D数据集。 2D图像数据集未能捕捉3D数据所提供的基本结构细节,如叶叶结构、植物体积和空间安排。为了应对这一限制,我们介绍了Maize Field3D(https://baskargroup.github.io/Maize Field3D/),这是一套由不同基因板组成的3D点元元元的云,来自外地的玉米植物,用于推进农业研究。我们的数据集包括1 045个使用地面激光扫描仪(TLS)收集的地产玉米点的高质量云。我们用基于图表的分解法将520个植物云分解个体叶和木质,确保对所有样品进行一致的标签。这个标签数据随后用于完善程序模型,为玉米厂提供结构化的解析性分数。在程序模型中,玉米厂的剩余部分也用于使用Sildiral-deal StrealS-deal Streal mailal 基础, 和Sildal-deal-deal-deal-deal Studal 基础,我们使用了Sildal-deal-deal-deal-deal-deal-deal-maildal-maild-deal-deal-maildal disal disal 。


Article 28

Title@2025-07-03 (4): Contextual Online Pricing with (Biased) Offline Data

Title: Contextual Online Pricing with (Biased) Offline Data Kontextuelle Online-Preise mit (Biased) Offline-Daten 在线在线在线在线上上上上下线数据(比亚) 2507.02762v1

Authors (3): Yixuan Zhang, Ruihao Zhu, Qiaomin Xie

We study contextual online pricing with biased offline data. For the scalar price elasticity case, we identify the instance-dependent quantity $\delta^2$ that measures how far the offline data lies from the (unknown) online optimum. We show that the time length $T$, bias bound $V$, size $N$ and dispersion $\lambda_{\min}(\hat{\Sigma})$ of the offline data, and $\delta^2$ jointly determine the statistical complexity. An Optimism-in-the-Face-of-Uncertainty (OFU) policy achieves a minimax-optimal, instance-dependent regret bound $\tilde{\mathcal{O}}\big(d\sqrt{T} \wedge (V^2T + \frac{dT}{\lambda_{\min}(\hat{\Sigma}) + (N \wedge T) \delta^2})\big)$. For general price elasticity, we establish a worst-case, minimax-optimal rate $\tilde{\mathcal{O}}\big(d\sqrt{T} \wedge (V^2T + \frac{dT }{\lambda_{\min}(\hat{\Sigma})})\big)$ and provide a generalized OFU algorithm that attains it. When the bias bound $V$ is unknown, we design a robust variant that always guarantees sub-linear regret and strictly improves on purely online methods whenever the exact bias is small. These results deliver the first tight regret guarantees for contextual pricing in the presence of biased offline data. Our techniques also transfer verbatim to stochastic linear bandits with biased offline data, yielding analogous bounds.

我们用偏差的离线数据来研究背景在线定价。 对于卡路里价格弹性案例, 我们确定以实例为基础的数量 $\ delta=2$, 用来衡量离线数据从( 未知的) 在线最佳数据离线的距离。 我们显示时间长度 $T, 偏差约束 $V$, 大小和分散 $lambdamin} (\ hat=Sigma} 离线数据, $delta=2$ 共同决定统计复杂性。 最乐观的直线价格( OFU) 政策实现了微缩- 最佳的、 依例的遗憾绑 $\\ mathcal{ Obig( d\qrt} \ wwedge (V2T+\ frafc{dT} 的大小) , 直径直径直的數值( 直径直數值) 和直径直數值的數值 。


Article 29

Title@2025-07-03 (4): Quantifying the Cross-sectoral Intersecting Discrepancies within Multiple Groups Using Latent Class Analysis Towards Fairness

Title: Quantifying the Cross-sectoral Intersecting Discrepancies within Multiple Groups Using Latent Class Analysis Towards Fairness Quantifizierung der sektorübergreifenden Intersecting-Diskrepanzen innerhalb mehrerer Gruppen mittels Latent Class-Analyse zur Fairness 量化多个群体内部的跨部门交叉差异,利用编内分类分析进行量化,以实现公平 2407.03133v4

Authors (5): Yingfang Yuan, Kefan Chen, Mehdi Rizvi, Lynne Baillie, Wei Pang

The growing interest in fair AI development is evident. The ‘‘Leave No One Behind’’ initiative urges us to address multiple and intersecting forms of inequality in accessing services, resources, and opportunities, emphasising the significance of fairness in AI. This is particularly relevant as an increasing number of AI tools are applied to decision-making processes, such as resource allocation and service scheme development, across various sectors such as health, energy, and housing. Therefore, exploring joint inequalities in these sectors is significant and valuable for thoroughly understanding overall inequality and unfairness. This research introduces an innovative approach to quantify cross-sectoral intersecting discrepancies among user-defined groups using latent class analysis. These discrepancies can be used to approximate inequality and provide valuable insights to fairness issues. We validate our approach using both proprietary and public datasets, including both EVENS and Census 2021 (England & Wales) datasets, to examine cross-sectoral intersecting discrepancies among different ethnic groups. We also verify the reliability of the quantified discrepancy by conducting a correlation analysis with a government public metric. Our findings reveal significant discrepancies both among minority ethnic groups and between minority ethnic groups and non-minority ethnic groups, emphasising the need for targeted interventions in policy-making processes. Furthermore, we demonstrate how the proposed approach can provide valuable insights into ensuring fairness in machine learning systems.

显然,对公平的大赦国际发展的兴趣日益浓厚。““在获得服务、资源和机会方面“一个一个都不能少”的倡议敦促我们解决在获得服务、资源和机会方面的多重和交叉的不平等形式,强调在大赦国际中公平的重要性。这特别相关,因为越来越多的AI工具应用于决策进程,例如保健、能源和住房等各个部门的资源分配和服务计划制定。因此,探讨这些部门之间的共同不平等对于透彻了解总体不平等和不公平情况意义重大和宝贵。这一研究提出了一种创新办法,用潜在的阶级分析来量化用户定义群体之间的跨部门交叉差异。这些差异可用于估计不平等,并为公平问题提供宝贵的见解。我们确认我们使用专有和公共数据集的方法,包括EVES和2021年人口普查(英格兰和威尔士)数据集,以审查不同族裔群体之间的跨部门交叉差异。我们还通过与政府的公用指标进行相关分析,核实量化差异的可靠性。我们的调查结果显示,少数群体群体之间以及少数群体族裔群体和非少数群体之间的重大差异,可以用来估计不平等问题,并为公平问题提供宝贵的见解。我们确认,在制订政策时,如何以有针对性地强调我们如何进行有针对性地学习。


Article 30

Title@2025-07-03 (4): Fast and Simplex: 2-Simplicial Attention in Triton

Title: Fast and Simplex: 2-Simplicial Attention in Triton Schnell und Simplex: 2-Simplizielle Aufmerksamkeit in Triton 快速和简单: Tritton 中的 2 简化注意 2507.02754v1

Authors (8): Aurko Roy, Timothy Chou, Sai Surya Duvvuri, Sijia Chen, Jiecao Yu, Xiaodong Wang, Manzil Zaheer, Rohan Anil

Recent work has shown that training loss scales as a power law with both model size and the number of tokens, and that achieving compute-optimal models requires scaling model size and token count together. However, these scaling laws assume an infinite supply of data and apply primarily in compute-bound settings. As modern large language models increasingly rely on massive internet-scale datasets, the assumption that they are compute-bound is becoming less valid. This shift highlights the need for architectures that prioritize token efficiency. In this work, we investigate the use of the 2-simplicial Transformer, an architecture that generalizes standard dot-product attention to trilinear functions through an efficient Triton kernel implementation. We demonstrate that the 2-simplicial Transformer achieves better token efficiency than standard Transformers: for a fixed token budget, similarly sized models outperform their dot-product counterparts on tasks involving mathematics, coding, reasoning, and logic. We quantify these gains by demonstrating that $2$-simplicial attention changes the exponent in the scaling laws for knowledge and reasoning tasks compared to dot product attention.

最近的工作表明,培训损失尺度作为具有模型大小和象征数的动力法,而实现计算最佳模型需要缩放模型大小和象征性计算。然而,这些缩放法假定了无限的数据供应,并主要适用于计算范围内的设置。随着现代大型语言模型日益依赖大规模互联网规模的数据集,它们被计算成可计量数据集的假设也越来越不那么有效。这一转变凸显出需要优先使用象征效率的架构。在这项工作中,我们调查了2个简易变换器的使用情况,该变换器通过高效的Triton内核执行,将标准点产品注意力普遍化为三线函数的架构。我们证明,2个简化变换器比标准变换器的代号效率更高:对于固定的象征性预算,其类似大小的模型在涉及数学、编码、推理和逻辑的任务上超越了其副产品对应方。我们通过证明,2美元的微调注意改变了知识和推理学任务的规模法律比点产品注意。


Article 31

Title@2025-07-03 (4): Towards a Novel Measure of User Trust in XAI Systems

Title: Towards a Novel Measure of User Trust in XAI Systems Auf dem Weg zu einem neuartigen Maß für das Vertrauen der Nutzer in XAI-Systeme 争取对XAI系统用户信任的新措施 2405.05766v2

Authors (7): Miquel Miró-Nicolau, Gabriel Moyà-Alcover, Antoni Jaume-i-Capó, Manuel González-Hidalgo, Adel Ghazel, Maria Gemma Sempere Campello, Juan Antonio Palmer Sancho

The increasing reliance on Deep Learning models, combined with their inherent lack of transparency, has spurred the development of a novel field of study known as eXplainable AI (XAI) methods. These methods seek to enhance the trust of end-users in automated systems by providing insights into the rationale behind their decisions. This paper presents a novel trust measure in XAI systems, allowing their refinement. Our proposed metric combines both performance metrics and trust indicators from an objective perspective. To validate this novel methodology, we conducted three case studies showing an improvement respect the state-of-the-art, with an increased sensitiviy to different scenarios.

日益依赖深学习模式,加上它们固有的透明度不足,促使开发了一个新的研究领域,称为可氧化的AI(XAI)方法,这些方法力求通过提供对其决定理由的深入了解,增强最终用户对自动化系统的信任。本文件介绍了对XAI系统的一种新的信任度量,允许其完善。我们提议的衡量标准从客观角度将业绩衡量标准和信任指标结合起来。为了验证这一新方法,我们进行了三个案例研究,显示对最新技术的改进,增加了对不同情景的敏感度。


Article 32

Title@2025-07-03 (4): Linear Attention with Global Context: A Multipole Attention Mechanism for Vision and Physics

Title: Linear Attention with Global Context: A Multipole Attention Mechanism for Vision and Physics Lineare Aufmerksamkeit mit globalem Kontext: Ein mehrpoliger Aufmerksamkeitsmechanismus für Vision und Physik 全球一线关注:视觉和物理多极关注机制 2507.02748v1

Authors (4): Alex Colagrande, Paul Caillon, Eva Feillet, Alexandre Allauzen

Transformers have become the de facto standard for a wide range of tasks, from image classification to physics simulations. Despite their impressive performance, the quadratic complexity of standard Transformers in both memory and time with respect to the input length makes them impractical for processing high-resolution inputs. Therefore, several variants have been proposed, the most successful relying on patchification, downsampling, or coarsening techniques, often at the cost of losing the finest-scale details. In this work, we take a different approach. Inspired by state-of-the-art techniques in $n$-body numerical simulations, we cast attention as an interaction problem between grid points. We introduce the Multipole Attention Neural Operator (MANO), which computes attention in a distance-based multiscale fashion. MANO maintains, in each attention head, a global receptive field and achieves linear time and memory complexity with respect to the number of grid points. Empirical results on image classification and Darcy flows demonstrate that MANO rivals state-of-the-art models such as ViT and Swin Transformer, while reducing runtime and peak memory usage by orders of magnitude. We open source our code for reproducibility at https://github.com/AlexColagrande/MANO.

在从图像分类到物理模拟等一系列任务中,变异器已成为从图像分类到物理模拟等广泛任务的实际标准。尽管其表现令人印象深刻,标准变异器在记忆和时间上的四重复杂度使得它们在处理高分辨率投入时不切实际。因此,提出了若干变式,其中最成功地依赖补装、下取样或粗化技术,往往以失去最精细为代价。在这项工作中,我们采取了不同的做法。在美元体格模拟中最先进的技术的启发下,我们把注意力作为电网点之间的交互问题。我们引入多极注意神经操作员(MANO),该操作员以远程、多尺度的方式计算注意力。在每一个注意上都保持一个全球可容纳的字段,并在网点数方面实现直线时间和记忆的复杂性。关于图像分类和达西流的实证结果显示,MANO公司与Swin变换器等最先进模型相对,同时减少运行时间和高峰记忆用量/MANSLA/MANSLA 的开放源代码。


Article 33

Title@2025-07-03 (4): Early Signs of Steganographic Capabilities in Frontier LLMs

Title: Early Signs of Steganographic Capabilities in Frontier LLMs Frühe Anzeichen von Steganographischen Fähigkeiten in Frontier LLMs 边疆长长体动物能力早期信号 2507.02737v1

Authors (5): Artur Zolkowski, Kei Nishimura-Gasparian, Robert McCarthy, Roland S. Zimmermann, David Lindner

Monitoring Large Language Model (LLM) outputs is crucial for mitigating risks from misuse and misalignment. However, LLMs could evade monitoring through steganography: Encoding hidden information within seemingly benign generations. In this paper, we evaluate the steganography capabilities in frontier LLMs to better understand the risk they pose. We focus on two types of steganography: passing encoded messages and performing encoded reasoning. We find that current models are unable to encode short messages in their outputs without a monitor noticing under standard affordances. They can succeed, however, if given additional affordances such as using an unmonitored scratchpad and coordinating on what encoding scheme to use. We additionally find early signs that models can perform basic encoded reasoning in a simple state-tracking problem. This includes some ability to reason with their own and pre-defined schemes, including encoding schemes such as Hexadecimal. Despite this, they can rarely hide reasoning subtly within a cover task to fool a monitor. Overall, our results indicate that current LLMs exhibit nascent steganographic capabilities. While these capabilities are likely insufficient to bypass well-designed monitors at present, this could change in the future.

监测大语言模型(LLM) 输出对于减少误用和错配的风险至关重要。 但是, LLMs 可以通过星座学逃避监测: 将隐蔽的信息编码在看似友好的几代人中。 在本文中, 我们评估了边界LLMs 中的隐蔽信息能力, 以更好地了解它们构成的风险。 我们关注两种类型的星座学: 传递编码信息, 并进行编码推理。 我们发现, 目前的模型无法在它们的输出中编码短信息, 而没有在标准价格下进行显示。 但是, 如果给它们额外费用, 如使用未监测的抓取和协调编码方法, 它们可以成功。 我们还发现一些早期迹象, 显示模型可以在简单的状态跟踪问题中进行基本的编码推理。 这包括使用自己和预设的编码方法, 包括 HexaDecimal 等编码方法的某些能力。 尽管如此, 它们很难在掩盖任务中隐含精细的推理。 总的来说, 我们的结果表明, 目前的LLMS 显示, 将展示新生的视觉能力显示, 而这些能力可能不足以绕过。


Article 34

Title@2025-07-03 (4): Classification by Separating Hypersurfaces: An Entropic Approach

Title: Classification by Separating Hypersurfaces: An Entropic Approach Klassifizierung durch Trennen von Hyperoberflächen: Ein entropischer Ansatz 按分离超表层分类: 2507.02732v1

Authors (3): Argimiro Arratia, Mahmoud El Daou, Henryk Gzyl

We consider the following classification problem: Given a population of individuals characterized by a set of attributes represented as a vector in ${\mathbb R}^N$, the goal is to find a hyperplane in ${\mathbb R}^N$ that separates two sets of points corresponding to two distinct classes. This problem, with a history dating back to the perceptron model, remains central to machine learning. In this paper we propose a novel approach by searching for a vector of parameters in a bounded $N$-dimensional hypercube centered at the origin and a positive vector in ${\mathbb R}^M$, obtained through the minimization of an entropy-based function defined over the space of unknown variables. The method extends to polynomial surfaces, allowing the separation of data points by more complex decision boundaries. This provides a robust alternative to traditional linear or quadratic optimization techniques, such as support vector machines and gradient descent. Numerical experiments demonstrate the efficiency and versatility of the method in handling diverse classification tasks, including linear and non-linear separability.

我们考虑了以下分类问题:鉴于以美元为单位的一组属性为矢量的人群,我们考虑分类问题:考虑到以美元为单位的一组个人特征,目标是找到以美元为单位的高空飞机,将两组与两个不同类别相对应的点分离出来。这个问题的历史可追溯到天启模型,仍然是机器学习的核心。在本文件中,我们提出一种新的方法,在以源点为单位、正矢量为单位、以美元为单位、以正值为单位、以千兆瓦为单位的参数中寻找一个矢量为单位、以美元为单位的矢量为单位、以千兆元为单位、以千兆元为单位、以千兆元为单位、以千兆元为单位、以千兆元为单位、以未知变量空间为单位、以千兆元为单位的函数为单位的超高空格。该方法延伸至多面,允许通过更复杂的决定界限将数据点分离。这为传统的线性或二次优化技术提供了强有力的替代方法,如支持矢量传机和梯基系。数字实验显示了处理不同分类任务的方法的效率和多功能,包括线性和非线性分离性。


Article 35

Title@2025-07-03 (4): Bourbaki: Self-Generated and Goal-Conditioned MDPs for Theorem Proving

Title: Bourbaki: Self-Generated and Goal-Conditioned MDPs for Theorem Proving Bourbaki: Selbsterzeugte und zielkonditionierte MDPs für Theorem-Proving Bourbaki:为理论证明提供自创和有目标、有条件的MDP 2507.02726v1

Authors (6): Matthieu Zimmer, Xiaotong Ji, Rasul Tutunov, Anthony Bordg, Jun Wang, Haitham Bou Ammar

Reasoning remains a challenging task for large language models (LLMs), especially within the logically constrained environment of automated theorem proving (ATP), due to sparse rewards and the vast scale of proofs. These challenges are amplified in benchmarks like PutnamBench, which contains university-level problems requiring complex, multi-step reasoning. To address this, we introduce self-generated goal-conditioned MDPs (sG-MDPs), a new framework in which agents generate and pursue their subgoals based on the evolving proof state. Given this more structured generation of goals, the resulting problem becomes more amenable to search. We then apply Monte Carlo Tree Search (MCTS)-like algorithms to solve the sG-MDP, instantiating our approach in Bourbaki (7B), a modular system that can ensemble multiple 7B LLMs for subgoal generation and tactic synthesis. On PutnamBench, Bourbaki (7B) solves 26 problems, achieving new state-of-the-art results with models at this scale.

对于大型语言模型(LLMS),特别是在自动理论验证(ATP)的逻辑制约环境中,由于奖赏稀少和证据规模庞大,原因仍然是一项艰巨的任务。这些挑战在PutnamBench(PutnamBench)等基准中被放大,因为PutnamBench(PutnanBench)包含大学层面的问题,需要复杂、多步推理。为了解决这个问题,我们引入了自我生成的、有目标条件的MDP(sG-MDPs)(sG-MDPs)(sG-MDPs)(s-MDPs-MDPs),这是一个新的框架,根据不断演变的证明状态产生和追求其次级目标。鉴于这种结构化的生成,由此产生的问题就更容易被搜索。我们随后运用类似于MTS(MTS)的计算法来解决SG-MDP(MDP),在布尔巴基(7B)这一模块系统中可以包含多个7BLMs(s)子目标生成和策略合成。在Putnamnamben bech, Bouraki(7B)中, Bourbaki(7B)解决了26个问题,在这个规模模型中实现新状态的结果。


Article 36

Title@2025-07-03 (4): Hierarchical Multi-Label Contrastive Learning for Protein-Protein Interaction Prediction Across Organisms

Title: Hierarchical Multi-Label Contrastive Learning for Protein-Protein Interaction Prediction Across Organisms Hierarchisches Multi-Label Kontrastives Lernen für Protein-Protein-Interaktionsvorhersage über Organismen hinweg 高等级多标签差异学习,促进各种组织间蛋白因-蛋白因-蛋白质-蛋白因互动预测 2507.02724v1

Authors (5): Shiyi Liu, Buwen Liang, Yuetong Fang, Zixuan Jiang, Renjing Xu

Recent advances in AI for science have highlighted the power of contrastive learning in bridging heterogeneous biological data modalities. Building on this paradigm, we propose HIPPO (HIerarchical Protein-Protein interaction prediction across Organisms), a hierarchical contrastive framework for protein-protein interaction(PPI) prediction, where protein sequences and their hierarchical attributes are aligned through multi-tiered biological representation matching. The proposed approach incorporates hierarchical contrastive loss functions that emulate the structured relationship among functional classes of proteins. The framework adaptively incorporates domain and family knowledge through a data-driven penalty mechanism, enforcing consistency between the learned embedding space and the intrinsic hierarchy of protein functions. Experiments on benchmark datasets demonstrate that HIPPO achieves state-of-the-art performance, outperforming existing methods and showing robustness in low-data regimes. Notably, the model demonstrates strong zero-shot transferability to other species without retraining, enabling reliable PPI prediction and functional inference even in less characterized or rare organisms where experimental data are limited. Further analysis reveals that hierarchical feature fusion is critical for capturing conserved interaction determinants, such as binding motifs and functional annotations. This work advances cross-species PPI prediction and provides a unified framework for interaction prediction in scenarios with sparse or imbalanced multi-species data.

AI在科学方面最近的进展突出表明了在弥合多种生物数据模式方面进行对比性学习的力量。基于这一范例,我们提议了HIPPO(跨生物体的Hierararical Protein-Protein互动预测),这是蛋白质-蛋白互动(PPI)预测的分级对比性框架,通过多层次的生物代表比对,蛋白序列及其等级特征相互一致。拟议方法包含与蛋白质功能类别结构关系相似的等级对比性损失功能。框架通过数据驱动的处罚机制适应性地将域和家庭知识纳入其中,加强所学的嵌入空间与蛋白功能功能功能分级之间的一致性。基准数据集实验表明,HIPPO实现了最新业绩,优于现有方法,显示了低数据体系的稳健性。值得注意的是,该模型显示,在没有再培训的情况下,对其他物种的零发转移能力很强,使得可靠的PPPI预测和功能性推导,甚至在实验数据有限的较不具有特征或稀有特征的有机生物中。进一步分析表明,等级特征融合对于捕捉到保存的互动决定因素至关重要,例如具有约束力的模型和功能性预测。


Article 37

Title@2025-07-03 (4): On Characterizations for Language Generation: Interplay of Hallucinations, Breadth, and Stability

Title: On Characterizations for Language Generation: Interplay of Hallucinations, Breadth, and Stability Über Charakterisierungen für die Sprachgenerierung: Interplay von Halluzinationen, Breadth und Stabilität 语言生成特征:幻觉、面包和稳定之间的相互作用 2412.18530v2

Authors (3): Alkis Kalavasis, Anay Mehrotra, Grigoris Velegkas

We study language generation in the limit - introduced by Kleinberg and Mullainathan [KM24] - building on classical works of Gold [Gol67] and Angluin [Ang79]. [KM24]’s main result is an algorithm for generating from any countable language collection in the limit. While their algorithm eventually generates unseen strings from the target language $K$, it sacrifices coverage or breadth, i.e., its ability to generate a rich set of strings. Recent work introduces different notions of breadth and explores when generation with breadth is possible, leaving a full characterization of these notions open. Our first set of results settles this by characterizing generation for existing notions of breadth and their natural extensions. Interestingly, our lower bounds are very flexible and hold for many performance metrics beyond breadth - for instance, showing that, in general, it is impossible to train generators which achieve a higher perplexity or lower hallucination rate for $K$ compared to other languages. Next, we study language generation with breadth and stable generators - algorithms that eventually stop changing after seeing an arbitrary but finite number of strings - and prove unconditional lower bounds for such generators, strengthening the results of [KMV25] and demonstrating that generation with many existing notions of breadth becomes equally hard, when stability is required. This gives a separation for generation with approximate breadth, between stable and unstable generators, highlighting the rich interplay between breadth, stability, and consistency in language generation.

克莱伯格和穆莱纳坦(KM24)在古典Gold[Gol67]和安格鲁因[Ang79]古典作品的基础上,在极限范围内,我们研究语言的生成,这是克莱伯格和穆莱纳坦[KM24]提出的。[KM24]的主要结果是从任何可计算的语言收集中产生一个算法。虽然他们的算法最终从目标语言中产生看不见的字符串,但是它牺牲了覆盖面或广度,即它能够产生一套丰富的字符串。最近的工作引入了不同宽度的概念,并在有可能产生宽度时探索这些概念,从而对这些概念的完整定性开放。我们的第一套结果通过将现有宽度概念及其自然延伸的生成特征化来解决这个问题。有趣的是,我们的下限非常灵活,而且对超出广度的许多性语言的生成量值都持有。 举例说,一般来说,不可能对发电机进行更难理解或更低的错觉觉悟率。 其次,我们用宽度和更稳定的语言生成方法最终在看到任意但有限的字符串数之后停止改变。 并且证明,我们较低的广度的频度是,这种稳定的生成的深度的频率和深度的生成的深度使得稳定的生成更加稳定,这种稳定的生成变得更加稳定。


Article 38

Title@2025-07-03 (4): Gradient-Based Model Fingerprinting for LLM Similarity Detection and Family Classification

Title: Gradient-Based Model Fingerprinting for LLM Similarity Detection and Family Classification Gradient-Based Model Fingerprinting für LLM Ähnlichkeitserkennung und Familienklassifizierung LLM相似性探测和家庭分类的渐进式样指纹 2506.01631v2

Authors (3): Zehao Wu, Yanjie Zhao, Haoyu Wang

As Large Language Models (LLMs) become integral software components in modern applications, unauthorized model derivations through fine-tuning, merging, and redistribution have emerged as critical software engineering challenges. Unlike traditional software where clone detection and license compliance are well-established, the LLM ecosystem lacks effective mechanisms to detect model lineage and enforce licensing agreements. This gap is particularly problematic when open-source model creators, such as Meta’s LLaMA, require derivative works to maintain naming conventions for attribution, yet no technical means exist to verify compliance. To fill this gap, treating LLMs as software artifacts requiring provenance tracking, we present TensorGuard, a gradient-based fingerprinting framework for LLM similarity detection and family classification. Our approach extracts model-intrinsic behavioral signatures by analyzing gradient responses to random input perturbations across tensor layers, operating independently of training data, watermarks, or specific model formats. TensorGuard supports the widely-adopted safetensors format and constructs high-dimensional fingerprints through statistical analysis of gradient features. These fingerprints enable two complementary capabilities: direct pairwise similarity assessment between arbitrary models through distance computation, and systematic family classification of unknown models via the K-Means clustering algorithm with domain-informed centroid initialization using known base models. Experimental evaluation on 58 models comprising 8 base models and 50 derivatives across five model families (Llama, Qwen, Gemma, Phi, Mistral) demonstrates 94% classification accuracy under our centroid-initialized K-Means clustering.

随着大型语言模型(LLMS)成为现代应用中综合软件组成部分,未经授权的模型通过微调、合并和再分配得出,已成为关键的软件工程挑战。与克隆检测和许可证合规性已经确立的传统软件不同,LLM生态系统缺乏检测模型线和执行许可证协议的有效机制。当开源模型创建者(如Meta’s LalaMA)需要派生工程来维持归属的命名公约,而没有技术手段来核查其遵守情况时,这一差距尤其成问题。为了填补这一空白,将LLMs作为需要源头跟踪的软件工艺品处理,我们提出了TensorGuard,这是一个基于梯度的指纹框架,用于LLM的类似性检测和家庭分类。我们的方法提取了模型的模型,即:通过分析对多层随机输入的梯度反应以及实施许可证协议协议协议协议协议协议。 独立于培训数据、水标记或具体模型,支持广泛采用的安全传感器格式,并通过对梯度特征进行统计分析来建立高维度指纹。这些指纹使两种互补能力成为:直接对准的类似性QQ,50个任意性模型之间,通过KML模型,通过基础的原始模型进行任意性模型,通过已知的模型进行基础计算。


Article 39

Title@2025-07-03 (4): A Comprehensive Machine Learning Framework for Micromobility Demand Prediction

Title: A Comprehensive Machine Learning Framework for Micromobility Demand Prediction Umfassendes Machine Learning Framework für die Nachfragevorhersage der Mikromobilität 微观流动需求预测综合机械学习框架 2507.02715v1

Authors (3): Omri Porat, Michael Fire, Eran Ben-Elia

Dockless e-scooters, a key micromobility service, have emerged as eco-friendly and flexible urban transport alternatives. These services improve first and last-mile connectivity, reduce congestion and emissions, and complement public transport for short-distance travel. However, effective management of these services depends on accurate demand prediction, which is crucial for optimal fleet distribution and infrastructure planning. While previous studies have focused on analyzing spatial or temporal factors in isolation, this study introduces a framework that integrates spatial, temporal, and network dependencies for improved micromobility demand forecasting. This integration enhances accuracy while providing deeper insights into urban micromobility usage patterns. Our framework improves demand prediction accuracy by 27 to 49% over baseline models, demonstrating its effectiveness in capturing micromobility demand patterns. These findings support data-driven micromobility management, enabling optimized fleet distribution, cost reduction, and sustainable urban planning.

这是一种关键的微移动服务,它已成为生态友好型和灵活的城市交通替代物,可以改善第一英里和最后一英里的连通性,减少拥堵和排放,补充短途旅行的公共交通,然而,这些服务的有效管理取决于准确的需求预测,这对最佳车队分配和基础设施规划至关重要。虽然先前的研究侧重于分析孤立的空间或时间因素,但本研究引入了一个框架,将空间、时间和网络依赖性结合起来,以改进微移动需求预测。这种整合提高了准确性,同时更深入地了解城市微移动使用模式。我们的框架要求比基线模型提高27至49%的预测准确性,表明其在捕捉微移动需求模式方面的有效性。这些研究结果支持了数据驱动的微移动管理,有利于优化车队分配、降低成本和可持续的城市规划。


Article 40

Title@2025-07-03 (4): A Forget-and-Grow Strategy for Deep Reinforcement Learning Scaling in Continuous Control

Title: A Forget-and-Grow Strategy for Deep Reinforcement Learning Scaling in Continuous Control Eine Forget-and-Grow-Strategie für tiefe Verstärkungs-Lernskalierung in kontinuierlicher Kontrolle 连续控制深入加强学习的 “ 遗忘与增长 “ 战略 2507.02712v1

Authors (6): Zilin Kang, Chenyuan Hu, Yu Luo, Zhecheng Yuan, Ruijie Zheng, Huazhe Xu

Deep reinforcement learning for continuous control has recently achieved impressive progress. However, existing methods often suffer from primacy bias, a tendency to overfit early experiences stored in the replay buffer, which limits an RL agent’s sample efficiency and generalizability. In contrast, humans are less susceptible to such bias, partly due to infantile amnesia, where the formation of new neurons disrupts early memory traces, leading to the forgetting of initial experiences. Inspired by this dual processes of forgetting and growing in neuroscience, in this paper, we propose Forget and Grow (FoG), a new deep RL algorithm with two mechanisms introduced. First, Experience Replay Decay (ER Decay) “forgetting early experience”, which balances memory by gradually reducing the influence of early experiences. Second, Network Expansion, “growing neural capacity”, which enhances agents’ capability to exploit the patterns of existing data by dynamically adding new parameters during training. Empirical results on four major continuous control benchmarks with more than 40 tasks demonstrate the superior performance of FoG against SoTA existing deep RL algorithms, including BRO, SimBa, and TD-MPC2.

然而,现有方法往往受到首要偏见的影响,这种倾向是过度利用存储在重播缓冲中的早期经验,限制了RL剂的样本效率和一般性。相比之下,人类较少受到这种偏见的影响,部分原因是幼稚的失忆症,因为新神经元的形成扰乱了早期记忆的痕迹,导致人们忘记了最初的经验。在这种忘记和在神经科学中成长的双重过程的启发下,我们在本文件中提出“忘记和成长”(FoG),这是一种新的深层次RL算法,采用两种机制。首先,经验再玩Decay(ER Dedeay)“忘记早期经验”,通过逐步减少早期经验的影响来平衡记忆。第二,网络扩展,“神经能力增长”,通过在培训期间积极增加新的参数,提高代理人利用现有数据模式的能力。关于四个主要的连续控制基准的实证结果,有40多项任务,表明FoG对SoTA的高级RL算法的优异性,包括BRO、Simba和TD-MPC2。


Article 41

Title@2025-07-03 (4): Fluid Democracy in Federated Data Aggregation

Title: Fluid Democracy in Federated Data Aggregation Fluide Demokratie in Federated Data Aggregation 联邦数据聚合中流体民主 2507.02710v1

Authors (2): Aditya Vema Reddy Kesari, Krishna Reddy Kesari

Federated learning (FL) mechanisms typically require each client to transfer their weights to a central server, irrespective of how useful they are. In order to avoid wasteful data transfer costs from clients to the central server, we propose the use of consensus based protocols to identify a subset of clients with most useful model weights at each data transfer step. First, we explore the application of existing fluid democracy protocols to FL from a performance standpoint, comparing them with traditional one-person-one-vote (also known as 1p1v or FedAvg). We propose a new fluid democracy protocol named viscous-retained democracy that always does better than 1p1v under the same assumptions as existing fluid democracy protocols while also not allowing for influence accumulation. Secondly, we identify weaknesses of fluid democracy protocols from an adversarial lens in terms of their dependence on topology and/ or number of adversaries required to negatively impact the global model weights. To this effect, we propose an algorithm (FedVRD) that dynamically limits the effect of adversaries while minimizing cost by leveraging the delegation topology.

联邦学习(FL)机制通常要求每个客户将其权重转移至中央服务器,而不论其用途如何。为避免数据从客户向中央服务器转移成本的浪费,我们提议使用基于共识的协议,确定每个数据传输步骤中具有最有用模型权重的客户子组。首先,我们从业绩角度探索现有流动民主协议对FL的适用,将其与传统的单人一票(又称1p1v或FedAvg)进行比较。我们提议了一个新的流动民主协议,名为 “ 维持民主的直线民主 “ ,在与现有流动民主协议相同的假设下,总比1p1v要好,但不允许影响力积累。第二,我们从对立的视角中找出流动民主协议的弱点,即它们依赖表层和/或对负面影响全球模型权重所需的对手人数。为此,我们提议一种算法(FedVRD),在利用代表团权重来降低成本的同时,动态限制对手的影响。


Article 42

Title@2025-07-03 (4): Multi-Agent Reinforcement Learning for Dynamic Pricing in Supply Chains: Benchmarking Strategic Agent Behaviours under Realistically Simulated Market Conditions

Title: Multi-Agent Reinforcement Learning for Dynamic Pricing in Supply Chains: Benchmarking Strategic Agent Behaviours under Realistically Simulated Market Conditions Multi-Agenten-Verstärkungs-Lernen für dynamische Preise in Lieferketten: Benchmarking Strategic Agent Verhaltensmuster unter realistisch simulierten Marktbedingungen 供应链动态定价多机构强化学习:在现实模拟市场条件下确定战略代理行为基准 2507.02698v1

Authors (4): Thomas Hazenberg, Yao Ma, Seyed Sahand Mohammadi Ziabari, Marijn van Rijswijk

This study investigates how Multi-Agent Reinforcement Learning (MARL) can improve dynamic pricing strategies in supply chains, particularly in contexts where traditional ERP systems rely on static, rule-based approaches that overlook strategic interactions among market actors. While recent research has applied reinforcement learning to pricing, most implementations remain single-agent and fail to model the interdependent nature of real-world supply chains. This study addresses that gap by evaluating the performance of three MARL algorithms: MADDPG, MADQN, and QMIX against static rule-based baselines, within a simulated environment informed by real e-commerce transaction data and a LightGBM demand prediction model. Results show that rule-based agents achieve near-perfect fairness (Jain’s Index: 0.9896) and the highest price stability (volatility: 0.024), but they fully lack competitive dynamics. Among MARL agents, MADQN exhibits the most aggressive pricing behaviour, with the highest volatility and the lowest fairness (0.5844). MADDPG provides a more balanced approach, supporting market competition (share volatility: 9.5 pp) while maintaining relatively high fairness (0.8819) and stable pricing. These findings suggest that MARL introduces emergent strategic behaviour not captured by static pricing rules and may inform future developments in dynamic pricing.

这项研究调查了多机构强化学习(MARL)如何在供应链中改进动态定价战略,特别是在传统企业资源规划系统依赖静态、有章可循、忽视市场行为者之间战略互动的基于规则的方法的情况下,传统企业资源规划系统如何在供应链中改进动态定价战略,尽管最近的研究对定价应用了强化学习,但大多数实施仍然是单一的代理,未能模拟现实世界供应链的相互依存性质。这项研究通过在模拟环境中,根据真实电子商务交易数据和灯光GBM需求预测模型,评估三个MADPG、MADQQN和QMIX等MADQL算法的绩效,以静态规则基准为基础,在模拟环境中,在支持市场竞争(9.5 pp):保持相对高的公平性(0.8819)和最高的价格稳定性(挥发性:0.024)的同时,多数执行者仍然完全缺乏竞争动态。在MARL代理中,MADQQN展示了最具侵略性的定价行为,波动性和最低的公平性(0.5844)。MADPG提供了一种更为平衡的方法,支持市场竞争(9.5 pp),同时保持相对较高的公平性(0.8819)和稳定的价格价格动态发展。


Article 43

Title@2025-07-03 (4): RLHGNN: Reinforcement Learning-driven Heterogeneous Graph Neural Network for Next Activity Prediction in Business Processes

Title: RLHGNN: Reinforcement Learning-driven Heterogeneous Graph Neural Network for Next Activity Prediction in Business Processes RLHGNN: Verstärkung Lernorientiertes Heterogenes Graph Neuronales Netzwerk für die nächste Aktivitätsvorhersage in Geschäftsprozessen RLHGNN: 业务流程下一个活动预测的强化学习驱动的异质图形神经网络 2507.02690v1

Authors (6): Jiaxing Wang, Yifeng Yu, Jiahan Song, Bin Cao, Jing Fan, Ji Zhang

Next activity prediction represents a fundamental challenge for optimizing business processes in service-oriented architectures such as microservices environments, distributed enterprise systems, and cloud-native platforms, which enables proactive resource allocation and dynamic service composition. Despite the prevalence of sequence-based methods, these approaches fail to capture non-sequential relationships that arise from parallel executions and conditional dependencies. Even though graph-based approaches address structural preservation, they suffer from homogeneous representations and static structures that apply uniform modeling strategies regardless of individual process complexity characteristics. To address these limitations, we introduce RLHGNN, a novel framework that transforms event logs into heterogeneous process graphs with three distinct edge types grounded in established process mining theory. Our approach creates four flexible graph structures by selectively combining these edges to accommodate different process complexities, and employs reinforcement learning formulated as a Markov Decision Process to automatically determine the optimal graph structure for each specific process instance. RLHGNN then applies heterogeneous graph convolution with relation-specific aggregation strategies to effectively predict the next activity. This adaptive methodology enables precise modeling of both sequential and non-sequential relationships in service interactions. Comprehensive evaluation on six real-world datasets demonstrates that RLHGNN consistently outperforms state-of-the-art approaches. Furthermore, it maintains an inference latency of approximately 1 ms per prediction, representing a highly practical solution suitable for real-time business process monitoring applications. The source code is available at https://github.com/Joker3993/RLHGNN.

下一个活动预测是优化服务导向架构(如微观服务环境、分布式企业系统、云型平台等)业务流程的一个根本挑战,这些架构有助于积极主动的资源分配和动态服务构成。尽管以顺序为基础的方法十分普遍,但这些方法未能捕捉平行处决和有条件依赖所产生的非序列关系。即使以图表为基础的方法处理结构保护问题,它们也存在同质表述和静态结构,这些结构适用统一的模型战略,而不论个别程序复杂特性如何。为了应对这些限制,我们引入了RLHGNN,这是一个新颖的框架,将事件日志转换成基于三个不同边缘类型的不同流程流程图解。我们的方法创建了四个灵活的图表结构,有选择地结合这些边框以适应不同流程的复杂性,并利用作为Markov 决策程序开发的强化学习,以自动确定每个特定流程的最佳图形结构。RLHGNNN,然后将混杂的图表组合组合和具体关联的汇总战略用于有效预测下一个活动。这一适应方法可以精确地模拟服务互动中的序列和非序列进程。我们的方法在六个实体/系统内部数据库中持续地显示一个最新的实时数据库。


Article 44

Title@2025-07-03 (4): HAPI: A Model for Learning Robot Facial Expressions from Human Preferences

Title: HAPI: A Model for Learning Robot Facial Expressions from Human Preferences HAPI: Ein Modell zum Lernen von Roboter-Gesichtsausdrücken aus menschlichen Präferenzen HAPI: 人类首选机器人法表达方式学习模式 2503.17046v2

Authors (6): Dongsheng Yang, Qianying Liu, Wataru Sato, Takashi Minato, Chaoran Liu, Shin’ya Nishida

Automatic robotic facial expression generation is crucial for human-robot interaction, as handcrafted methods based on fixed joint configurations often yield rigid and unnatural behaviors. Although recent automated techniques reduce the need for manual tuning, they tend to fall short by not adequately bridging the gap between human preferences and model predictions-resulting in a deficiency of nuanced and realistic expressions due to limited degrees of freedom and insufficient perceptual integration. In this work, we propose a novel learning-to-rank framework that leverages human feedback to address this discrepancy and enhanced the expressiveness of robotic faces. Specifically, we conduct pairwise comparison annotations to collect human preference data and develop the Human Affective Pairwise Impressions (HAPI) model, a Siamese RankNet-based approach that refines expression evaluation. Results obtained via Bayesian Optimization and online expression survey on a 35-DOF android platform demonstrate that our approach produces significantly more realistic and socially resonant expressions of Anger, Happiness, and Surprise than those generated by baseline and expert-designed methods. This confirms that our framework effectively bridges the gap between human preferences and model predictions while robustly aligning robotic expression generation with human affective responses.

自动机器人面部表现器的生成对人类-机器人互动至关重要,因为基于固定组合组合的手工艺方法往往会产生僵硬和非自然的行为。虽然最近的自动化技术减少了人工调试的需要,但是由于未能充分弥合人类偏好与模型预测之间的差距,从而导致由于自由程度有限和观念整合不足而导致的微调和现实表达器不足。在这项工作中,我们提议了一个新型的“从学习到入手”框架,利用人类反馈解决这一差异,并增强机器人面貌的清晰度。具体地说,我们进行对称比较说明,以收集人类偏好数据,并开发人类偏好偏好偏好偏好模型(HAPI)模型,即一个基于Siamese RankNet的改进表达评价的方法。在35-DOF和机器人平台上的Bayesian Optimizmation和在线表达调查中得出的结果表明,我们的方法比基线和专家设计的方法产生更现实和社会共振的“愤怒、幸福和惊喜”的表达器。这证实了我们的框架有效地弥合了人类偏好好和模型预测之间的鸿沟,同时对机器人生成反应进行了有力的调整。


Article 45

Title@2025-07-03 (4): Learning few-step posterior samplers by unfolding and distillation of diffusion models

Title: Learning few-step posterior samplers by unfolding and distillation of diffusion models Lernen der wenigen Schritte hinteren Sampler durch Entfalten und Destillation von Diffusionsmodellen 通过推广模型的推出和蒸馏,学习几步后后后继采样器 2507.02686v1

Authors (3): Charlesquin Kemajou Mbakam, Jonathan Spence, Marcelo Pereyra

Diffusion models (DMs) have emerged as powerful image priors in Bayesian computational imaging. Two primary strategies have been proposed for leveraging DMs in this context: Plug-and-Play methods, which are zero-shot and highly flexible but rely on approximations; and specialized conditional DMs, which achieve higher accuracy and faster inference for specific tasks through supervised training. In this work, we introduce a novel framework that integrates deep unfolding and model distillation to transform a DM image prior into a few-step conditional model for posterior sampling. A central innovation of our approach is the unfolding of a Markov chain Monte Carlo (MCMC) algorithm - specifically, the recently proposed LATINO Langevin sampler (Spagnoletti et al., 2025) - representing the first known instance of deep unfolding applied to a Monte Carlo sampling scheme. We demonstrate our proposed unfolded and distilled samplers through extensive experiments and comparisons with the state of the art, where they achieve excellent accuracy and computational efficiency, while retaining the flexibility to adapt to variations in the forward model at inference time.

在Bayesian计算成像中,传播模型(DMs)已成为强大的图像前置,在这方面,提出了两种主要战略来利用DMs:插管和布局方法,这些方法是零射的,高度灵活,但依赖近似;专用有条件的DMs,通过监督培训,在具体任务方面达到更高的准确性和更快的推导率。在这项工作中,我们引入了一个新颖的框架,将深度演化和模型蒸馏结合起来,在将DM图像转化为后方取样的几步有条件模型。我们方法的一个核心创新是马可夫链Monte Carlo(MC)算法,特别是最近提出的LaTINO Langevin采样器(Spagnoletti等人,2025年)的演练,这是第一个已知的对蒙特卡洛采样计划应用的深度演进的实例。我们通过广泛的实验和与最新技术的比较,展示了我们提议的展和提炼的采样器,在其中实现了极精准和计算效率,同时保持灵活性,以适应远方模型的变化。


Article 46

Title@2025-07-03 (4): Learning Traffic Anomalies from Generative Models on Real-Time Observations

Title: Learning Traffic Anomalies from Generative Models on Real-Time Observations Verkehrsanomalien aus generativen Modellen auf Echtzeit-Beobachtungen lernen 实时观测生成模型的学习交通异常现象 2502.01391v3

Authors (2): Fotis I. Giasemis, Alexandros Sopasakis

Accurate detection of traffic anomalies is crucial for effective urban traffic management and congestion mitigation. We use the Spatiotemporal Generative Adversarial Network (STGAN) framework combining Graph Neural Networks and Long Short-Term Memory networks to capture complex spatial and temporal dependencies in traffic data. We apply STGAN to real-time, minute-by-minute observations from 42 traffic cameras across Gothenburg, Sweden, collected over several months in 2020. The images are processed to compute a flow metric representing vehicle density, which serves as input for the model. Training is conducted on data from April to November 2020, and validation is performed on a separate dataset from November 14 to 23, 2020. Our results demonstrate that the model effectively detects traffic anomalies with high precision and low false positive rates. The detected anomalies include camera signal interruptions, visual artifacts, and extreme weather conditions affecting traffic flow.

对交通异常现象的准确检测对于有效的城市交通管理和减少交通拥堵至关重要,我们利用将图形神经网络和长期短期内存网络相结合的 “ 斯帕蒂奥时效反向网络 “ 框架来捕捉交通数据中复杂的空间和时间依赖性。我们用STGAN对2020年几个月来收集的瑞典哥德堡各地42个交通摄像头进行实时、每分钟的观测。这些图像经过处理,以计算代表车辆密度的流量指标,作为模型的投入。从2020年4月至11月,对数据进行了培训,从2020年11月14日至23日,对一个单独的数据集进行了验证。我们的结果显示,该模型以高精确度和低假正率有效检测了交通异常现象。所检测到的异常现象包括摄像信号中断、视觉文物和影响交通流量的极端天气条件。


Article 47

Title@2025-07-03 (4): Detection of Disengagement from Voluntary Quizzes: An Explainable Machine Learning Approach in Higher Distance Education

Title: Detection of Disengagement from Voluntary Quizzes: An Explainable Machine Learning Approach in Higher Distance Education Detection of Disengagement from Voluntary Quizzes: Ein erklärbarer maschineller Lernansatz in der höheren Fernlehre 探测自愿Quizzes脱离自愿Quizzes:高远程教育中可解释的机械学习方法 2507.02681v1

Authors (7): Behnam Parsaeifard, Christof Imhof, Tansu Pancar, Ioan-Sorin Comsa, Martin Hlosta, Nicole Bergamin, Per Bergamin

Students disengaging from their tasks can have serious long-term consequences, including academic drop-out. This is particularly relevant for students in distance education. One way to measure the level of disengagement in distance education is to observe participation in non-mandatory exercises in different online courses. In this paper, we detect student disengagement in the non-mandatory quizzes of 42 courses in four semesters from a distance-based university. We carefully identified the most informative student log data that could be extracted and processed from Moodle. Then, eight machine learning algorithms were trained and compared to obtain the highest possible prediction accuracy. Using the SHAP method, we developed an explainable machine learning framework that allows practitioners to better understand the decisions of the trained algorithm. The experimental results show a balanced accuracy of 91\%, where about 85\% of disengaged students were correctly detected. On top of the highly predictive performance and explainable framework, we provide a discussion on how to design a timely intervention to minimise disengagement from voluntary tasks in online learning.

学生脱离任务可能会造成严重的长期后果,包括辍学。这与远程教育学生特别相关。衡量远程教育脱离程度的一个方法就是观察参加不同在线课程的非强制性练习的情况。在本文中,我们发现学生脱离了远程大学四学期42个课程的非强制性测验。我们仔细确定了可以从Moodle提取和处理的最丰富的学生日志数据。然后,8个机器学习算法经过培训,比较后获得了尽可能高的预测准确性。我们使用SHAP方法开发了一个可解释的机器学习框架,使实践者能够更好地了解经过培训的算法的决定。实验结果显示,91个的准确性是平衡的,其中大约85个已脱离的学生得到了正确检测。除了高度预测性的表现和解释性框架之外,我们还就如何设计及时的干预措施,以尽量减少在线学习中自愿任务脱离的情况进行了讨论。


Article 48

Title@2025-07-03 (4): Embedding-Based Federated Data Sharing via Differentially Private Conditional VAEs

Title: Embedding-Based Federated Data Sharing via Differentially Private Conditional VAEs Einbetten-basierter Federated Data Sharing über differentially Private Conditional VAEs 通过差异私人条件性自愿和有条件的 嵌入式基于嵌入式的联邦联邦数据分享 2507.02671v1

Authors (3): Francesco Di Salvo, Hanh Huyen My Nguyen, Christian Ledig

Deep Learning (DL) has revolutionized medical imaging, yet its adoption is constrained by data scarcity and privacy regulations, limiting access to diverse datasets. Federated Learning (FL) enables decentralized training but suffers from high communication costs and is often restricted to a single downstream task, reducing flexibility. We propose a data-sharing method via Differentially Private (DP) generative models. By adopting foundation models, we extract compact, informative embeddings, reducing redundancy and lowering computational overhead. Clients collaboratively train a Differentially Private Conditional Variational Autoencoder (DP-CVAE) to model a global, privacy-aware data distribution, supporting diverse downstream tasks. Our approach, validated across multiple feature extractors, enhances privacy, scalability, and efficiency, outperforming traditional FL classifiers while ensuring differential privacy. Additionally, DP-CVAE produces higher-fidelity embeddings than DP-CGAN while requiring $5{\times}$ fewer parameters.

深入学习(DL)已经实现了医疗成像的革命性,然而,其采用却受到数据稀缺和隐私条例的限制,限制了对多种数据集的获取。联邦学习(FL)能够提供分散化的培训,但具有较高的通信成本,而且往往局限于单一的下游任务,降低了灵活性。我们通过差异私人(DP)基因化模型提出数据分享方法。我们通过采用基础模型,提取紧凑、信息嵌入、减少冗余和降低计算间接费用,客户合作培训一个差异化的私人条件变异自动编码器(DP-CVAE),以模拟全球的隐私数据分配,支持不同的下游任务。我们的方法通过多种地物提取器验证,加强隐私、可扩展性和效率,超越传统的FL分类器,同时确保不同的隐私。此外,DP-CVAE生成比DP-CAN(DP-CGAN)更高的纤维化嵌入层,同时需要5美元的时间,减少参数。


Article 49

Title@2025-07-03 (4): Guided Generation for Developable Antibodies

Title: Guided Generation for Developable Antibodies Geführte Generation für entwickelbare Antikörper 可开发抗体的代人制 2507.02670v1

Authors (4): Siqi Zhao, Joshua Moller, Porfi Quintero-Cadena, Lood van Niekerk

Therapeutic antibodies require not only high-affinity target engagement, but also favorable manufacturability, stability, and safety profiles for clinical effectiveness. These properties are collectively called `developability’. To enable a computational framework for optimizing antibody sequences for favorable developability, we introduce a guided discrete diffusion model trained on natural paired heavy- and light-chain sequences from the Observed Antibody Space (OAS) and quantitative developability measurements for 246 clinical-stage antibodies. To steer generation toward biophysically viable candidates, we integrate a Soft Value-based Decoding in Diffusion (SVDD) Module that biases sampling without compromising naturalness. In unconstrained sampling, our model reproduces global features of both the natural repertoire and approved therapeutics, and under SVDD guidance we achieve significant enrichment in predicted developability scores over unguided baselines. When combined with high-throughput developability assays, this framework enables an iterative, ML-driven pipeline for designing antibodies that satisfy binding and biophysical criteria in tandem.

治疗抗体不仅需要高亲近目标接触,而且需要有利于临床有效性的人工制造、稳定性和安全性剖面。这些特性统称为“可开发性”。为了能够建立一个优化抗体序列的计算框架,以优化有利于开发的抗体序列,我们引入了一种有指导的离散扩散模型,对观测的抗体空间(OAS)的天然配对重链和轻链序列和246个临床阶段抗体的定量可开发性测量进行了培训。为了引导生成向生物物理上可行的候选体,我们整合了一个基于软价值的分解模块,该模块在不损及自然特性的情况下进行偏向采样。在未经控制的采样中,我们的模型复制了自然组合和核准的治疗方法的全球特征,在SVDDD指导下,我们在预测的可开发性分数与无导基线相比实现重大增益。当与高通性开发性分析相结合时,这一框架使得一个迭代的、由ML驱动的管道能够同时设计符合约束性和生物物理标准的抗体。


Article 50

Title@2025-07-03 (4): OmniDraft: A Cross-vocabulary, Online Adaptive Drafter for On-device Speculative Decoding

Title: OmniDraft: A Cross-vocabulary, Online Adaptive Drafter for On-device Speculative Decoding OmniDraft: Ein Cross-Vocabulary, Online Adaptive Drafter für die gerätespezifische Dekodierung 总括草案:跨词汇、在线在线可调适性套用投机下限设计图纸 2507.02659v1

Authors (7): Ramchalam Kinattinkara Ramakrishnan, Zhaocong Yuan, Shaojie Zhuo, Chen Feng, Yicheng Lin, Chenzheng Su, Xiaopeng Zhang

Speculative decoding generally dictates having a small, efficient draft model that is either pretrained or distilled offline to a particular target model series, for instance, Llama or Qwen models. However, within online deployment settings, there are two major challenges: 1) usage of a target model that is incompatible with the draft model; 2) expectation of latency improvements over usage and time. In this work, we propose OmniDraft, a unified framework that enables a single draft model to operate with any target model and adapt dynamically to user data. We introduce an online n-gram cache with hybrid distillation fine-tuning to address the cross-vocabulary mismatch across draft and target models; and further improve decoding speed by leveraging adaptive drafting techniques. OmniDraft is particularly suitable for on-device LLM applications where model cost, efficiency and user customization are the major points of contention. This further highlights the need to tackle the above challenges and motivates the \textit{``one drafter for all’’} paradigm. We showcase the proficiency of the OmniDraft framework by performing online learning on math reasoning, coding and text generation tasks. Notably, OmniDraft enables a single Llama-68M model to pair with various target models including Vicuna-7B, Qwen2-7B and Llama3-8B models for speculative decoding; and additionally provides up to 1.5-2x speedup.

在网上部署环境中,存在两大挑战:(1) 使用与模型草案不相符的目标模型;(2) 期望在使用和时间上提高延缓度。在这项工作中,我们提出Omnisuster,这是一个统一框架,使单一的模型草案能够与任何目标模型一起运作,并动态地适应用户数据。我们推出一个带有混合蒸馏微调的在线 ng 缓存,同时进行混合蒸馏微调,以解决草案和目标模型之间的交叉蒸汽错配;通过利用适应性起草技术进一步提高解码速度。在模型成本、效率和用户定制是主要争议点的地方,Ommicomicomicomi of特别适合LM应用程序。这进一步突显了应对上述挑战的必要性,并激励所有模式的Textitone起草者更新。我们通过在线学习数学推理学模型、计算模型、数字组合和指标生成LOma-B(包括数字、数字分析和指标生成),展示了OmniMI草案框架的精度。


Article 51

Title@2025-07-03 (4): Fair Deepfake Detectors Can Generalize

Title: Fair Deepfake Detectors Can Generalize Faire Deepfake-Detektoren können generalisieren 公平深假探测器可普遍化 2507.02645v1

Authors (6): Harry Cheng, Ming-Hui Liu, Yangyang Guo, Tianyi Wang, Liqiang Nie, Mohan Kankanhalli

Deepfake detection models face two critical challenges: generalization to unseen manipulations and demographic fairness among population groups. However, existing approaches often demonstrate that these two objectives are inherently conflicting, revealing a trade-off between them. In this paper, we, for the first time, uncover and formally define a causal relationship between fairness and generalization. Building on the back-door adjustment, we show that controlling for confounders (data distribution and model capacity) enables improved generalization via fairness interventions. Motivated by this insight, we propose Demographic Attribute-insensitive Intervention Detection (DAID), a plug-and-play framework composed of: i) Demographic-aware data rebalancing, which employs inverse-propensity weighting and subgroup-wise feature normalization to neutralize distributional biases; and ii) Demographic-agnostic feature aggregation, which uses a novel alignment loss to suppress sensitive-attribute signals. Across three cross-domain benchmarks, DAID consistently achieves superior performance in both fairness and generalization compared to several state-of-the-art detectors, validating both its theoretical foundation and practical effectiveness.

深假检测模型面临两大挑战:在人口群体中普遍采用无形操纵和人口公平。然而,现有方法往往表明,这两个目标之间有着内在的矛盾,暴露了它们之间的权衡。在本文件中,我们首次发现并正式界定了公平和笼统之间的因果关系。在后门调整的基础上,我们表明,控制混杂者(数据分布和模型能力)有助于通过公平干预改进普遍性。我们基于这一洞察力,提出了人口属性不敏感干预探测(DAID),这是一个插头和游戏框架,由以下组成:一) 人口意识数据再平衡,利用反等值加权和分组特征正常化来抵消分布偏差;二) 人口统计学特征汇总,利用新的调整损失来抑制敏感归属信号。在三个跨领域基准上,DAID始终在公平和普遍化两方面都取得优异性业绩,与若干州级探测器相比,验证其理论基础和实际效力。


Article 52

Title@2025-07-03 (4): On Efficient Bayesian Exploration in Model-Based Reinforcement Learning

Title: On Efficient Bayesian Exploration in Model-Based Reinforcement Learning Effiziente bayesische Exploration im modellbasierten Ausbau-Lernen 关于在示范强化学习中高效探索贝耶斯人 2507.02639v1

Authors (3): Alberto Caron, Chris Hicks, Vasilios Mavroudis

In this work, we address the challenge of data-efficient exploration in reinforcement learning by examining existing principled, information-theoretic approaches to intrinsic motivation. Specifically, we focus on a class of exploration bonuses that targets epistemic uncertainty rather than the aleatoric noise inherent in the environment. We prove that these bonuses naturally signal epistemic information gains and converge to zero once the agent becomes sufficiently certain about the environment’s dynamics and rewards, thereby aligning exploration with genuine knowledge gaps. Our analysis provides formal guarantees for IG-based approaches, which previously lacked theoretical grounding. To enable practical use, we also discuss tractable approximations via sparse variational Gaussian Processes, Deep Kernels and Deep Ensemble models. We then outline a general framework - Predictive Trajectory Sampling with Bayesian Exploration (PTS-BE) - which integrates model-based planning with information-theoretic bonuses to achieve sample-efficient deep exploration. We empirically demonstrate that PTS-BE substantially outperforms other baselines across a variety of environments characterized by sparse rewards and/or purely exploratory tasks.

在这项工作中,我们通过研究现有有原则的、信息理论的内在动机方法,应对数据高效的探索在强化学习方面面临的挑战。具体地说,我们侧重于一类探索奖金,该奖金针对的是感知性不确定性,而不是环境中固有的感知性噪音。我们证明,这些奖金自然发出感知性信息收益的信号,一旦代理人对环境动态和回报有充分的确定,从而将探索与真正的知识差距结合起来,我们的分析为基于IG的方法提供了正式的保障,这些方法以前缺乏理论基础。为了便于实际使用,我们还讨论了通过稀疏的变异高山进程、深核心和深聚合模型的可切近似值。我们随后概述了一个总体框架――预测性与巴伊斯探索(PTS-BE)的采样性取样性取样性探索(PTS-BE)相结合的模型规划和信息理论性奖赏实现样本高效的深层探索。我们从经验上证明,PTS-BE在以微奖赏和/或纯探索性任务为特征的各种环境中,大大超越了其他基线。


Article 53

Title@2025-07-03 (4): High-Order Deep Meta-Learning with Category-Theoretic Interpretation

Title: High-Order Deep Meta-Learning with Category-Theoretic Interpretation High-Order Deep Meta-Learning mit kategorie-theoretischer Interpretation 具有分类理论判读的高等级深层元元学习 2507.02634v1

Authors (1): David H. Mguni

We introduce a new hierarchical deep learning framework for recursive higher-order meta-learning that enables neural networks (NNs) to construct, solve, and generalise across hierarchies of tasks. Central to this approach is a generative mechanism that creates \emph{virtual tasks} – synthetic problem instances designed to enable the meta-learner to learn \emph{soft constraints} and unknown generalisable rules across related tasks. Crucially, this enables the framework to generate its own informative, task-grounded datasets thereby freeing machine learning (ML) training from the limitations of relying entirely on human-generated data. By actively exploring the virtual point landscape and seeking out tasks lower-level learners find difficult, the meta-learner iteratively refines constraint regions. This enhances inductive biases, regularises the adaptation process, and produces novel, unanticipated tasks and constraints required for generalisation. Each meta-level of the hierarchy corresponds to a progressively abstracted generalisation of problems solved at lower levels, enabling a structured and interpretable learning progression. By interpreting meta-learners as category-theoretic \emph{functors} that generate and condition a hierarchy of subordinate learners, we establish a compositional structure that supports abstraction and knowledge transfer across progressively generalised tasks. The category-theoretic perspective unifies existing meta-learning models and reveals how learning processes can be transformed and compared through functorial relationships, while offering practical design principles for structuring meta-learning. We speculate this architecture may underpin the next generation of NNs capable of autonomously generating novel, instructive tasks and their solutions, thereby advancing ML towards general artificial intelligence.

我们引入一个新的等级深层次学习框架, 用于递归更高层次的高级元学习, 使神经网络( NNS) 能够构建、 解决和概括跨任务等级的神经网络( NNS) 。 这种方法的核心是创建 emph{ 虚拟任务} 的基因化机制。 合成问题实例, 目的是让元learner 能够学习 emph{ 软约束} 和相关任务之间未知的通俗规则。 关键是, 使这个框架能够产生自己的信息丰富、 任务背景的数据集, 从而让机器学习( ML) 摆脱完全依赖人造数据的局限性。 通过积极探索虚拟点环境, 寻找较低层次的学习者发现困难的任务, 元- 元leaner 迭代迭代为限制区域。 这可以强化感性偏见, 规范适应适应进程, 并产生新的、 意想不到的任务和制约。 等级的每个元级级级对应的是, 在较低层次上逐步抽象地概括解决问题, 能够构建一个结构, 和可解释的学习解决方案的进化的进化的进化。 通过解释的代代代代数结构, 向, 向一个变的代级结构的代代代代代结构的代代结构, 形成一个不断进化的代进进进进进进进化的代进进进的代进的代进的代进的代进的代进的代进的代进的代进的代进的 , , , 结构, 制的代进的代进的变的代进进进进进进进制的进制的进进制的进制的进制的进制的进制的进制的进制的进制的进制的进制的进制的进制的进制的进制的进制的进制的进制的进制的进制的进制的进的进制的进制的进制结构, , , , 制的进制的进制的进制的进制的进制的进制的进制的进制的进制的进制的进制的进制的变制的进制的进制的进制的进制的进制的进制的进制的进制的进制的进制的进制


Article 54

Title@2025-07-03 (4): Medical Data Pecking: A Context-Aware Approach for Automated Quality Evaluation of Structured Medical Data

Title: Medical Data Pecking: A Context-Aware Approach for Automated Quality Evaluation of Structured Medical Data Medical Data Pecking: Ein kontextorientierter Ansatz zur automatisierten Qualitätsbewertung strukturierter medizinischer Daten 医疗数据分级:结构医疗数据自动质量评价的内幕意识方法 2507.02628v1

Authors (4): Irena Girshovitz, Atai Ambus, Moni Shahar, Ran Gilad-Bachrach

Background: The use of Electronic Health Records (EHRs) for epidemiological studies and artificial intelligence (AI) training is increasing rapidly. The reliability of the results depends on the accuracy and completeness of EHR data. However, EHR data often contain significant quality issues, including misrepresentations of subpopulations, biases, and systematic errors, as they are primarily collected for clinical and billing purposes. Existing quality assessment methods remain insufficient, lacking systematic procedures to assess data fitness for research. Methods: We present the Medical Data Pecking approach, which adapts unit testing and coverage concepts from software engineering to identify data quality concerns. We demonstrate our approach using the Medical Data Pecking Tool (MDPT), which consists of two main components: (1) an automated test generator that uses large language models and grounding techniques to create a test suite from data and study descriptions, and (2) a data testing framework that executes these tests, reporting potential errors and coverage. Results: We evaluated MDPT on three datasets: All of Us (AoU), MIMIC-III, and SyntheticMass, generating 55-73 tests per cohort across four conditions. These tests correctly identified 20-43 non-aligned or non-conforming data issues. We present a detailed analysis of the LLM-generated test suites in terms of reference grounding and value accuracy. Conclusion: Our approach incorporates external medical knowledge to enable context-sensitive data quality testing as part of the data analysis workflow to improve the validity of its outcomes. Our approach tackles these challenges from a quality assurance perspective, laying the foundation for further development such as additional data modalities and improved grounding methods.

电子健康记录(EHRs)用于流行病学研究和人工智能(AI)培训的情况正在迅速增加;结果的可靠性取决于EHR数据的准确性和完整性;然而,EHR数据往往包含重大的质量问题,包括主要为临床和计费目的收集的子群、偏差和系统错误的误报;现有质量评估方法仍然不够充分,缺乏评估数据是否适合研究的系统程序;方法:我们介绍了医疗数据截取方法,该方法调整了单位敏感度测试和覆盖概念,从软件工程到查明数据质量问题;我们用医疗数据截取工具(MDPT)展示了我们的方法,该方法由两个主要部分组成:(1) 自动测试生成器,使用大语言模型和地基技术,建立数据和研究描述的测试套件;(2) 数据测试框架,用于进行这些测试,报告潜在的错误和覆盖面;结果:我们用三个数据集对MCDPT进行了评估:Allus(AU)、MIMI-III和SyntheticMass,在四种情况下对每组进行55-73次测试,这四个条件下,这些测试方法包括:(1) 正确查明了我们30版的外部数据精确度数据测试结果的准确度分析。


Article 55

Title@2025-07-03 (4): A Matrix Variational Auto-Encoder for Variant Effect Prediction in Pharmacogenes

Title: A Matrix Variational Auto-Encoder for Variant Effect Prediction in Pharmacogenes Ein Matrix-Varianten-Auto-Encoder für Variant-Effekt-Vorhersage in Pharmakogenen 药用植物可变效应预测矩阵变式自动编码器 2507.02624v1

Authors (6): Antoine Honoré, Borja Rodríguez Gálvez, Yoomi Park, Yitian Zhou, Volker M. Lauschke, Ming Xiao

Variant effect predictors (VEPs) aim to assess the functional impact of protein variants, traditionally relying on multiple sequence alignments (MSAs). This approach assumes that naturally occurring variants are fit, an assumption challenged by pharmacogenomics, where some pharmacogenes experience low evolutionary pressure. Deep mutational scanning (DMS) datasets provide an alternative by offering quantitative fitness scores for variants. In this work, we propose a transformer-based matrix variational auto-encoder (matVAE) with a structured prior and evaluate its performance on 33 DMS datasets corresponding to 26 drug target and ADME proteins from the ProteinGym benchmark. Our model trained on MSAs (matVAE-MSA) outperforms the state-of-the-art DeepSequence model in zero-shot prediction on DMS datasets, despite using an order of magnitude fewer parameters and requiring less computation at inference time. We also compare matVAE-MSA to matENC-DMS, a model of similar capacity trained on DMS data, and find that the latter performs better on supervised prediction tasks. Additionally, incorporating AlphaFold-generated structures into our transformer model further improves performance, achieving results comparable to DeepSequence trained on MSAs and finetuned on DMS. These findings highlight the potential of DMS datasets to replace MSAs without significant loss in predictive performance, motivating further development of DMS datasets and exploration of their relationships to enhance variant effect prediction.

变异效果预测器(VePs)旨在评估蛋白质变异体的功能影响,传统上依赖多个序列对齐(MSAs) 。 这种方法假设自然发生的变异体是合适的,这是药用基因组的假设,一些药用基因体经历了低进化压力。 深突变扫描(DMS)数据集提供了一个替代方法,为变异体提供量性健康分数。 在这项工作中,我们提议基于变异体基矩阵的自动变异体自动变异变异器(matVAE),配有结构化的事先评估,并评价其在33个DMS数据集上与26个药物目标和ProteinGym基准的ADME蛋白质相匹配的性能。 我们所培训的关于MIS模型(matVAE-MSA)的模型(matVA-MSA)的模型优于该模型,优于DMIS的先进进化性能模型模型,在DRAMS的深度分析中,改进了AA级变异性结果。


Article 56

Title@2025-07-03 (4): L-VAE: Variational Auto-Encoder with Learnable Beta for Disentangled Representation

Title: L-VAE: Variational Auto-Encoder with Learnable Beta for Disentangled Representation L-VAE: Variationaler Auto-Encoder mit erlernbarer Beta zur entfremdeten Darstellung L-VAE: 具有拆分代表的可学习贝塔变量自动编码器 2507.02619v1

Authors (3): Hazal Mogultay Ozcan, Sinan Kalkan, Fatos T. Yarman-Vural

In this paper, we propose a novel model called Learnable VAE (L-VAE), which learns a disentangled representation together with the hyperparameters of the cost function. L-VAE can be considered as an extension of \b{eta}-VAE, wherein the hyperparameter, \b{eta}, is empirically adjusted. L-VAE mitigates the limitations of \b{eta}-VAE by learning the relative weights of the terms in the loss function to control the dynamic trade-off between disentanglement and reconstruction losses. In the proposed model, the weight of the loss terms and the parameters of the model architecture are learned concurrently. An additional regularization term is added to the loss function to prevent bias towards either reconstruction or disentanglement losses. Experimental analyses show that the proposed L-VAE finds an effective balance between reconstruction fidelity and disentangling the latent dimensions. Comparisons of the proposed L-VAE against \b{eta}-VAE, VAE, ControlVAE, DynamicVAE, and {\sigma}-VAE on datasets, such as dSprites, MPI3D-complex, Falcor3D, and Isaac3D reveals that L-VAE consistently provides the best or the second best performances measured by a set of disentanglement metrics. Moreover, qualitative experiments on CelebA dataset, confirm the success of the L-VAE model for disentangling the facial attributes.

在本文中,我们提出了一个名为“可学习VAE(L-VAE)”的新模型,该模型在成本函数的超参数下学习了分解的表达方式。L-VAE可被视为对超参数或偏差损失的延伸,而超参数,即{eta}-VAE是经经验调整的。L-VAE通过学习损失函数中术语的相对权重来减轻了损失函数中术语的局限性,以控制分解与重建损失之间的动态权衡。在拟议的模型中,损失条件和模型结构参数的比重是同时学习的。在损失函数中增加一个正规化术语,以防止对重建或偏差损失的偏差。实验分析表明,拟议的L-VAE在重建忠性和分解潜在维度之间找到有效的平衡。 L-VAE拟议模型与累进性交易(b{eta}-VAE)、VAE、控制VAE、动态VAVAE和经测量的MA3-DALA数据中的最佳性能、最佳的MAS-DAFAL3和经测量的MA-DE数据,提供了最佳数据。


Article 57

Title@2025-07-03 (4): Interpreting Graph Inference with Skyline Explanations

Title: Interpreting Graph Inference with Skyline Explanations Graphenableitung mit Skyline-Erläuterungen interpretieren 与天线解释解释的图表推理 2505.07635v2

Authors (4): Dazhuo Qiu, Haolai Che, Arijit Khan, Yinghui Wu

Inference queries have been routinely issued to graph machine learning models such as graph neural networks (GNNs) for various network analytical tasks. Nevertheless, GNNs outputs are often hard to interpret comprehensively. Existing methods typically compromise to individual pre-defined explainability measures (such as fidelity), which often leads to biased, ``one-sided’’ interpretations. This paper introduces skyline explanation, a new paradigm that interprets GNN output by simultaneously optimizing multiple explainability measures of users’ interests. (1) We propose skyline explanations as a Pareto set of explanatory subgraphs that dominate others over multiple explanatory measures. We formulate skyline explanation as a multi-criteria optimization problem, and establish its hardness results. (2) We design efficient algorithms with an onion-peeling approach, which strategically prioritizes nodes and removes unpromising edges to incrementally assemble skyline explanations. (3) We also develop an algorithm to diversify the skyline explanations to enrich the comprehensive interpretation. (4) We introduce efficient parallel algorithms with load-balancing strategies to scale skyline explanation for large-scale GNN-based inference. Using real-world and synthetic graphs, we experimentally verify our algorithms’ effectiveness and scalability.

常规的推论查询是用来绘制各种网络分析任务的图形神经网络(GNNs)等机器学习模型的图解,然而,GNNs的产出往往难以全面解释。现有的方法通常会损害个别预先界定的解释性措施(如忠诚),这往往会导致偏向,“单面”解释。本文介绍了一个解释GNN输出的新模式,即通过同时优化对用户利益的多种解释性措施来解释GNN输出的新模式。(1) 我们提出天空线解释,作为一套解释性子图集,在多种解释性措施中支配他人。我们将天空线解释作为多标准优化问题,并确立其硬性性结果。(2) 我们设计高效的算法,采用在线线方法,从战略上确定节点的优先次序,并消除不鼓励的边缘,逐步组合上天线解释性解释。(3) 我们还制定一种使天线解释多样化的算法,以丰富全面解释。(4)我们采用高效的平行算法,采用负向平衡战略,以缩小以大规模GNNN为主的直径的天线解释。我们用现实和合成性图表进行实验性核查。


Article 58

Title@2025-07-03 (4): Orientation-Aware Sparse Tensor PCA for Efficient Unsupervised Feature Selection

Title: Orientation-Aware Sparse Tensor PCA for Efficient Unsupervised Feature Selection Orientation-Aware Sparse Tensor PCA für effiziente, unüberwachte Feature-Auswahl 用于高效、不受监督地物选择的五氯苯甲醚 2407.16985v3

Authors (5): Junjing Zheng, Xinyu Zhang, Weidong Jiang, Xiangfeng Qiu, Mingjian Ren

Recently, introducing Tensor Decomposition (TD) techniques into unsupervised feature selection (UFS) has been an emerging research topic. A tensor structure is beneficial for mining the relations between different modes and helps relieve the computation burden. However, while existing methods exploit TD to preserve the data tensor structure, they do not consider the influence of data orientation and thus have difficulty in handling orientation-specific data such as time series. To solve the above problem, we utilize the orientation-dependent tensor-tensor product from Tensor Singular Value Decomposition based on *M-product (T-SVDM) and extend the one-dimensional Sparse Principal Component Analysis (SPCA) to a tensor form. The proposed sparse tensor PCA model can constrain sparsity at the specified mode and yield sparse tensor principal components, enhancing flexibility and accuracy in learning feature relations. To ensure fast convergence and a flexible description of feature correlation, we develop a convex version specially designed for general UFS tasks and propose an efficient slice-by-slice algorithm that performs dual optimization in the transform domain. Experimental results on real-world datasets demonstrate the effectiveness and remarkable computational efficiency of the proposed method for tensor data of diverse structures over the state-of-the-art. When transform axes align with feature distribution patterns, our method is promising for various applications. The codes related to our proposed methods and the experiments are available at https://github.com/zjj20212035/STPCA.git.

最近,将Tensor分解(TD)技术引入不受监督的特性选择(UFS)已成为一个新兴研究课题。高压结构有利于挖掘不同模式之间的关系,并有助于减轻计算负担。然而,虽然现有方法利用TD来保存数据分解结构,但没有考虑到数据方向的影响,因此难以处理特定方向的数据,如时间序列。为了解决上述问题,我们利用Tensor Singulal值分解(UFS)的定向导导导导导导导速速器产品120 ,基于 *M 产品(T-SVDM)和将单维版主元元元分析(SPCA)扩展至一个高压形式。拟议的稀散式高压元分析模式可以限制特定模式的松散性,产生稀散式主元结构,提高学习特性关系的灵活性和准确性。为了确保快速趋同和灵活地描述特征相关关系,我们专门为一般UFS-S-S-S-S-S-Singal-serval 值任务设计了一个配方版本,建议一种高效的切分切算法,在变换域内进行双重优化。在现实-S-al-al-al-al-al-al-al-al-al-al-al-al-al-s-s-al-al-al-s-sxxxxxxxxx-sxxxxxxxxxxxxxxxx 的计算方法上,在拟议的数据转换法系中,我们Slx-s-s-s-s-s-s-s-s-s-s-s-s-s-s-s-s-s-s-s-s-s-s-s-s-s-s-s-s-s-s-s-s-s-s-s-s-s-s-s-s-sxxxxxxxxxxxxxxxxxxxxxx-s-s-s-s-s-s-s-s-s-s-s-s-s-s-s-s-s-s-s-s-sx-s-sx-sxxxxxxxx-s-s-sxxxx-


Article 59

Title@2025-07-03 (4): Lost in Latent Space: An Empirical Study of Latent Diffusion Models for Physics Emulation

Title: Lost in Latent Space: An Empirical Study of Latent Diffusion Models for Physics Emulation Lost in Latent Space: Eine empirische Studie von Latent Diffusionsmodellen für die Emulation von Physik 失落于隐时空:物理模拟的中子扩散模型经验研究 2507.02608v1

Authors (6): François Rozet, Ruben Ohana, Michael McCabe, Gilles Louppe, François Lanusse, Shirley Ho

The steep computational cost of diffusion models at inference hinders their use as fast physics emulators. In the context of image and video generation, this computational drawback has been addressed by generating in the latent space of an autoencoder instead of the pixel space. In this work, we investigate whether a similar strategy can be effectively applied to the emulation of dynamical systems and at what cost. We find that the accuracy of latent-space emulation is surprisingly robust to a wide range of compression rates (up to 1000x). We also show that diffusion-based emulators are consistently more accurate than non-generative counterparts and compensate for uncertainty in their predictions with greater diversity. Finally, we cover practical design choices, spanning from architectures to optimizers, that we found critical to train latent-space emulators.

在图像和视频生成方面,这种计算缺陷是通过在自动编码器的潜伏空间而不是像素空间中生成的。在这项工作中,我们调查是否可以以何种成本有效地将类似的战略应用于模拟动态系统。我们发现,潜空模拟的准确性对于广泛的压缩率(高达1000x)是惊人的。我们还表明,基于扩散的模拟器比非遗传的模拟器始终更加准确,并弥补其预测中的不确定性。最后,我们涵盖了从结构到优化的实用设计选择,我们发现这些选择对于培养潜空模拟器至关重要。


Article 60

Title@2025-07-03 (4): Down with the Hierarchy: The ‘H’ in HNSW Stands for “Hubs”

Title: Down with the Hierarchy: The ‘H’ in HNSW Stands for “Hubs” Nieder mit der Hierarchie: Das ‘H’ in HNSW steht für “Hubs” 打倒等级制度: HNSW的”H”代表”Hubs” 2412.01940v3

Authors (3): Blaise Munyampirwa, Vihan Lakshman, Benjamin Coleman

Driven by recent breakthrough advances in neural representation learning, approximate near-neighbor (ANN) search over vector embeddings has emerged as a critical computational workload. With the introduction of the seminal Hierarchical Navigable Small World (HNSW) algorithm, graph-based indexes have established themselves as the overwhelmingly dominant paradigm for efficient and scalable ANN search. As the name suggests, HNSW searches a layered hierarchical graph to quickly identify neighborhoods of similar points to a given query vector. But is this hierarchy even necessary? A rigorous experimental analysis to answer this question would provide valuable insights into the nature of algorithm design for ANN search and motivate directions for future work in this increasingly crucial domain. We conduct an extensive benchmarking study covering more large-scale datasets than prior investigations of this question. We ultimately find that a flat navigable small world graph graph retains all of the benefits of HNSW on high-dimensional datasets, with latency and recall performance essentially \emph{identical} to the original algorithm but with less memory overhead. Furthermore, we go a step further and study \emph{why} the hierarchy of HNSW provides no benefit in high dimensions, hypothesizing that navigable small world graphs contain a well-connected, frequently traversed ``highway” of hub nodes that maintain the same purported function as the hierarchical layers. We present compelling empirical evidence that the \emph{Hub Highway Hypothesis} holds for real datasets and investigate the mechanisms by which the highway forms. The implications of this hypothesis may also provide future research directions in developing enhancements to graph-based ANN search.

以图表为基础的指数在近期神经代表学的突破性进步的驱动下,对矢量嵌入的近邻(ANN)搜索近似于近邻(ANN)近似是一个重要的计算工作量。随着引入原始的分级导航小世界(HNSW)算法,基于图形的指数已经确定自己是高效且可缩放的ANNN搜索的压倒性模式。正如名称所示,HNSW搜索一个分层的等级图,以快速识别与给定的查询矢量相近的相邻点。但这一等级是否必要?为回答这一问题而进行严格的实验分析,将会为ANNNE搜索的算法设计提供宝贵的洞察力,并激励今后在这个越来越关键的领域开展工作的方向方向。我们开展了一项广泛的基准研究,涉及比以前对这一问题的调查更多的大规模数据集。我们最终发现,一个平流的小型世界图图在高维值数据集上保留了HNSWW的所有好处, 其含着基本值和回顾性业绩,但基本上与最初的算法相比,但记忆要少得多。此外,我们还要一步和研究一个更深层次的O级的O级研究。


Article 61

Title@2025-07-03 (4): Alleviating Attack Data Scarcity: SCANIA’s Experience Towards Enhancing In-Vehicle Cyber Security Measures

Title: Alleviating Attack Data Scarcity: SCANIA’s Experience Towards Enhancing In-Vehicle Cyber Security Measures Benachteiligung von Angriffsdaten: SCANIAs Erfahrung zur Verbesserung von Cybersicherheitsmaßnahmen im Fahrzeug 减轻攻击数据稀缺性:SCANIA在加强车辆内部网络安全措施方面的经验 2507.02607v1

Authors (5): Frida Sundfeldt, Bianca Widstam, Mahshid Helali Moghadam, Kuo-Yun Liang, Anders Vesterberg

The digital evolution of connected vehicles and the subsequent security risks emphasize the critical need for implementing in-vehicle cyber security measures such as intrusion detection and response systems. The continuous advancement of attack scenarios further highlights the need for adaptive detection mechanisms that can detect evolving, unknown, and complex threats. The effective use of ML-driven techniques can help address this challenge. However, constraints on implementing diverse attack scenarios on test vehicles due to safety, cost, and ethical considerations result in a scarcity of data representing attack scenarios. This limitation necessitates alternative efficient and effective methods for generating high-quality attack-representing data. This paper presents a context-aware attack data generator that generates attack inputs and corresponding in-vehicle network log, i.e., controller area network (CAN) log, representing various types of attack including denial of service (DoS), fuzzy, spoofing, suspension, and replay attacks. It utilizes parameterized attack models augmented with CAN message decoding and attack intensity adjustments to configure the attack scenarios with high similarity to real-world scenarios and promote variability. We evaluate the practicality of the generated attack-representing data within an intrusion detection system (IDS) case study, in which we develop and perform an empirical evaluation of two deep neural network IDS models using the generated data. In addition to the efficiency and scalability of the approach, the performance results of IDS models, high detection and classification capabilities, validate the consistency and effectiveness of the generated data as well. In this experience study, we also elaborate on the aspects influencing the fidelity of the data to real-world scenarios and provide insights into its application.

接通车辆的数字演进和随后的安全风险强调,迫切需要实施车辆内网络安全措施,如入侵探测和反应系统。攻击情景的持续推进进一步突出表明,需要建立适应性检测机制,以探测不断变化的、未知的和复杂的威胁。有效利用ML驱动的技术可以帮助应对这一挑战。但是,由于安全、成本和道德因素,对测试车辆实施不同攻击情景的制约导致缺少代表攻击情景的数据。这种限制要求以高效和有效的替代方法生成高质量的攻击代表数据。本文介绍了一种环境觉察攻击数据生成器,生成攻击投入和与车辆网络日志相对应,即控制区网络日志,代表各种类型的攻击,包括拒绝服务(Do),模糊、潜伏、暂停和重弹攻击。由于安全、成本和道德因素等因素,对测试车辆实施不同攻击情景的制约,导致袭击情景的解析和强度调整,从而使得袭击情景与现实世界情景高度相似,并促进变异性。我们评估了在入侵探测系统内生成的攻击数据的实际数据的实际数据,并评估了这一系统测试和数据测算的准确性,我们利用了这一系统测算结果的测试结果,并更新了数据,从而评估了对数据进行了数据评估。


Article 62

Title@2025-07-03 (4): De-AntiFake: Rethinking the Protective Perturbations Against Voice Cloning Attacks

Title: De-AntiFake: Rethinking the Protective Perturbations Against Voice Cloning Attacks De-AntiFake: Die Protective Perturbations gegen Stimmen-Klonangriffe neu denken De-Antifake:重新思考对语音克隆攻击进行保护性干扰 2507.02606v1

Authors (5): Wei Fan, Kejiang Chen, Chang Liu, Weiming Zhang, Nenghai Yu

The rapid advancement of speech generation models has heightened privacy and security concerns related to voice cloning (VC). Recent studies have investigated disrupting unauthorized voice cloning by introducing adversarial perturbations. However, determined attackers can mitigate these protective perturbations and successfully execute VC. In this study, we conduct the first systematic evaluation of these protective perturbations against VC under realistic threat models that include perturbation purification. Our findings reveal that while existing purification methods can neutralize a considerable portion of the protective perturbations, they still lead to distortions in the feature space of VC models, which degrades the performance of VC. From this perspective, we propose a novel two-stage purification method: (1) Purify the perturbed speech; (2) Refine it using phoneme guidance to align it with the clean speech distribution. Experimental results demonstrate that our method outperforms state-of-the-art purification methods in disrupting VC defenses. Our study reveals the limitations of adversarial perturbation-based VC defenses and underscores the urgent need for more robust solutions to mitigate the security and privacy risks posed by VC. The code and audio samples are available at https://de-antifake.github.io.

语音生成模型的迅速发展提高了与语音克隆有关的隐私和安全关切。最近的研究通过引入对抗性扰动,调查了破坏未经授权的语音克隆的问题。然而,坚定的打击者可以减少这些保护性扰动,并成功执行VC。在本研究中,我们根据包括扰动净化在内的现实威胁模型,对这些保护性扰动进行首次系统评估。我们的研究结果显示,虽然现有的净化方法可以抵消相当一部分保护性扰动,但它们仍然导致VC模型地貌空间的扭曲,从而降低VC的性能。我们从这个角度提出一个新的两阶段净化方法:(1) 净化四周语言;(2) 使用电话指导将其与清洁语言分布相协调。实验结果显示,我们的方法在破坏VC防御方面超越了最新净化方法。我们的研究揭示了以对抗性扰动为基础的VC防御的局限性,并强调迫切需要更稳健的解决方案来减轻VC.C.debio提出的安全和隐私风险。


Article 63

Title@2025-07-03 (4): StructTransform: A Scalable Attack Surface for Safety-Aligned Large Language Models

Title: StructTransform: A Scalable Attack Surface for Safety-Aligned Large Language Models StructTransform: Eine skalierbare Angriffsfläche für sicherheitsorientierte große Sprachmodelle 条形变形:用于安全-统一大语言模型的可缩放攻击表面 2502.11853v2

Authors (5): Shehel Yoosuf, Temoor Ali, Ahmed Lekssays, Mashael AlSabah, Issa Khalil

In this work, we present a series of structure transformation attacks on LLM alignment, where we encode natural language intent using diverse syntax spaces, ranging from simple structure formats and basic query languages (e.g., SQL) to new novel spaces and syntaxes created entirely by LLMs. Our extensive evaluation shows that our simplest attacks can achieve close to a 90% success rate, even on strict LLMs (such as Claude 3.5 Sonnet) using SOTA alignment mechanisms. We improve the attack performance further by using an adaptive scheme that combines structure transformations along with existing content transformations, resulting in over 96% ASR with 0% refusals. To generalize our attacks, we explore numerous structure formats, including syntaxes purely generated by LLMs. Our results indicate that such novel syntaxes are easy to generate and result in a high ASR, suggesting that defending against our attacks is not a straightforward process. Finally, we develop a benchmark and evaluate existing safety-alignment defenses against it, showing that most of them fail with 100% ASR. Our results show that existing safety alignment mostly relies on token-level patterns without recognizing harmful concepts, highlighting and motivating the need for serious research efforts in this direction. As a case study, we demonstrate how attackers can use our attack to easily generate a sample malware and a corpus of fraudulent SMS messages, which perform well in bypassing detection.

在这项工作中,我们展示了一系列对LLM校正的结构转型攻击,我们用不同的语法空间对自然语言意图进行编码,从简单的结构格式和基本查询语言(例如SQL)到完全由LLMs创造的新新空间和语法。我们的广泛评估表明,我们最简单的攻击可以达到近90%的成功率,即使是在严格的LLMs(例如Claude 3.5 Sonnet)上,我们使用SOTA校正机制。我们利用一个适应性计划,将结构转型与现有内容转换相结合,从而进一步改进攻击性能。结果显示,96%以上的ASR与0%的拒绝率相结合。为了概括我们的攻击,我们探索了多种结构格式,包括纯粹由LMS创造的语法。我们的结果表明,这种新的语法很容易产生并导致高ASR率的成功率,这表明,捍卫我们的攻击并不是一个简单的过程。最后,我们制定了一个基准,并评估了现有的安全比对它的防御,表明它们大多与100%的ASR不合格。我们的结果表明,现有的安全比对攻击的依赖主要是依靠象征性的打击模式,而我们没有认识到危险的研究的反射法。


Article 64

Title@2025-07-03 (4): Padé Approximant Neural Networks for Enhanced Electric Motor Fault Diagnosis Using Vibration and Acoustic Data

Title: Padé Approximant Neural Networks for Enhanced Electric Motor Fault Diagnosis Using Vibration and Acoustic Data Padé Ungefähre neurale Netzwerke für verbesserte elektrische Motorfehlerdiagnose mit Vibration und akustischen Daten 利用振动和声频数据增强电动机动车故障诊断 2507.02599v1

Authors (2): Sertac Kilickaya, Levent Eren

Purpose: The primary aim of this study is to enhance fault diagnosis in induction machines by leveraging the Pad'e Approximant Neuron (PAON) model. While accelerometers and microphones are standard in motor condition monitoring, deep learning models with nonlinear neuron architectures offer promising improvements in diagnostic performance. This research addresses the question: Can Pad'e Approximant Neural Networks (Pad'eNets) outperform conventional Convolutional Neural Networks (CNNs) and Self-Organized Operational Neural Networks (Self-ONNs) in diagnosing electrical and mechanical faults using vibration and acoustic data? Methods: We evaluate and compare the diagnostic capabilities of three deep learning architectures: one-dimensional CNNs, Self-ONNs, and Pad'eNets. These models are tested on the University of Ottawa’s publicly available constant-speed induction motor datasets, which include both vibration and acoustic sensor data. The Pad'eNet model is designed to introduce enhanced nonlinearity and is compatible with unbounded activation functions such as Leaky ReLU. Results and Conclusion: Pad'eNets consistently outperformed the baseline models, achieving diagnostic accuracies of 99.96%, 98.26%, 97.61%, and 98.33% for accelerometers 1, 2, 3, and the acoustic sensor, respectively. The enhanced nonlinearity of Pad'eNets, together with their compatibility with unbounded activation functions, significantly improves fault diagnosis performance in induction motor condition monitoring.

目的 : 本研究的主要目的是通过使用 Pad'e Approximant Neuron (PaON) 模型,加强感应机器的缺陷诊断。 虽然加速计和麦克风是运动状态监测的标准, 与非线性神经结构的深学习模型为诊断性能提供了有希望的改进。 本研究针对的问题是: Can Pad'e Approximant NeuralNets(Pad'eNets)优于常规的 Convolutional Neural 网络(CNNs)和自机能操作神经网络(Of-ONNs), 使用振动和声学数据对电动和机械故障进行诊断。 方法: 我们评估和比较了三种深度学习结构的诊断能力: 一维CNN、 自 ONNs 和 Pad\ eNets。 这些模型在渥太华大学公开提供的常速感应感应电动电动电动电动电动电动机数据集中测试, 。 Pad\eNet 模型旨在强化非线性能和非线性能诊断功能一起兼容, 。


Article 65

Title@2025-07-03 (4): SURE-VQA: Systematic Understanding of Robustness Evaluation in Medical VQA Tasks

Title: SURE-VQA: Systematic Understanding of Robustness Evaluation in Medical VQA Tasks SURE-VQA: Systematisches Verständnis der Robustheitsbewertung in medizinischen VQA-Aufgaben SNU-VQA: 系统了解医疗VQA任务中的强力评价 2411.19688v3

Authors (7): Kim-Celine Kahl, Selen Erkan, Jeremias Traub, Carsten T. Lüth, Klaus Maier-Hein, Lena Maier-Hein, Paul F. Jaeger

Vision-Language Models (VLMs) have great potential in medical tasks, like Visual Question Answering (VQA), where they could act as interactive assistants for both patients and clinicians. Yet their robustness to distribution shifts on unseen data remains a key concern for safe deployment. Evaluating such robustness requires a controlled experimental setup that allows for systematic insights into the model’s behavior. However, we demonstrate that current setups fail to offer sufficiently thorough evaluations. To address this gap, we introduce a novel framework, called SURE-VQA, centered around three key requirements to overcome current pitfalls and systematically analyze VLM robustness: 1) Since robustness on synthetic shifts does not necessarily translate to real-world shifts, it should be measured on real-world shifts that are inherent to the VQA data; 2) Traditional token-matching metrics often fail to capture underlying semantics, necessitating the use of large language models (LLMs) for more accurate semantic evaluation; 3) Model performance often lacks interpretability due to missing sanity baselines, thus meaningful baselines should be reported that allow assessing the multimodal impact on the VLM. To demonstrate the relevance of this framework, we conduct a study on the robustness of various Fine-Tuning (FT) methods across three medical datasets with four types of distribution shifts. Our study highlights key insights into robustness: 1) No FT method consistently outperforms others in robustness, and 2) robustness trends are more stable across FT methods than across distribution shifts. Additionally, we find that simple sanity baselines that do not use the image data can perform surprisingly well and confirm LoRA as the best-performing FT method on in-distribution data. Code is provided at https://github.com/IML-DKFZ/sure-vqa.

视觉语言模型(VLMS)在医疗任务方面具有巨大的潜力,比如视觉问答(VQA),他们可以作为病人和临床医生的互动助手。然而,对传播隐蔽数据的变化的稳健性仍然是安全部署的一个关键问题。评估这种稳健性需要一种有控制的实验设置,以便能够系统了解模型的行为。然而,我们证明,目前的设置未能提供足够彻底的评价。为了弥补这一差距,我们引入了一个新颖的框架,称为SUIS-VQA,围绕三个关键要求,以克服当前陷阱和系统分析VLM的稳健性:1由于合成转变的稳健性并不必然转化为真实世界的变化,因此Z应该用VQAAA数据固有的真实性变化来衡量。 2 传统象征性匹配度指标往往无法捕捉基本的语义学,因此需要使用大语言模型(LMS)来进行更准确的语义评估;3 模型性变化往往无法确认我们缺乏理智的基线,因此有意义的基线应该报告,因此,能够评估合成转变性趋势对动态流的高度性影响。


Article 66

Title@2025-07-03 (4): Revisiting Active Learning under (Human) Label Variation

Title: Revisiting Active Learning under (Human) Label Variation Aktives Lernen unter (menschlichen) Label-Varianten 在(人)标签标签变换下重新审查积极学习 2507.02593v1

Authors (6): Cornelia Gruber, Helen Alber, Bernd Bischl, Göran Kauermann, Barbara Plank, Matthias Aßenmacher

Access to high-quality labeled data remains a limiting factor in applied supervised learning. While label variation (LV), i.e., differing labels for the same instance, is common, especially in natural language processing, annotation frameworks often still rest on the assumption of a single ground truth. This overlooks human label variation (HLV), the occurrence of plausible differences in annotations, as an informative signal. Similarly, active learning (AL), a popular approach to optimizing the use of limited annotation budgets in training ML models, often relies on at least one of several simplifying assumptions, which rarely hold in practice when acknowledging HLV. In this paper, we examine foundational assumptions about truth and label nature, highlighting the need to decompose observed LV into signal (e.g., HLV) and noise (e.g., annotation error). We survey how the AL and (H)LV communities have addressed – or neglected – these distinctions and propose a conceptual framework for incorporating HLV throughout the AL loop, including instance selection, annotator choice, and label representation. We further discuss the integration of large language models (LLM) as annotators. Our work aims to lay a conceptual foundation for HLV-aware active learning, better reflecting the complexities of real-world annotation.

虽然标签差异(LV),即同一情况的不同标签是常见的,但在自然语言处理中,批注框架往往仍然以单一地面真理的假设为基础。这忽略了人类标签差异(HLV),说明中出现可信的差异,作为信息信号。同样,积极学习(AL),是一种在培训ML模型中优化使用有限注解预算的流行方法,通常依赖至少几个简化假设中的一个,在承认HLV时,这些假设在实践中很少存在。我们在本文件中,审查关于真相和标签性质的基本假设,强调将观察到的LV分解成信号(如HLV)和噪音(如注解错误)的必要性。我们调查AL和(H)LV社区如何处理这些区别,或忽略了这些区别,并提议一个概念框架,将HLV纳入整个LV循环,包括实例选择、说明选择和标签代表。我们进一步讨论了将大型语言模型(LLLMM)整合成一个更能反映全球概念复杂性的大型学习基础。我们的工作的目的是为HLV奠定一个更好的学习基础。


Article 67

Title@2025-07-03 (4): Scalable Interconnect Learning in Boolean Networks

Title: Scalable Interconnect Learning in Boolean Networks Skalierbares Interconnect-Lernen in Booleschen Netzwerken 布尔网络中的可缩放互连学习 2507.02585v1

Authors (3): Fabian Kresse, Emily Yu, Christoph H. Lampert

Learned Differentiable Boolean Logic Networks (DBNs) already deliver efficient inference on resource-constrained hardware. We extend them with a trainable, differentiable interconnect whose parameter count remains constant as input width grows, allowing DBNs to scale to far wider layers than earlier learnable-interconnect designs while preserving their advantageous accuracy. To further reduce model size, we propose two complementary pruning stages: an SAT-based logic equivalence pass that removes redundant gates without affecting performance, and a similarity-based, data-driven pass that outperforms a magnitude-style greedy baseline and offers a superior compression-accuracy trade-off.

不同的布尔逻辑网络(DBNs)已经对受资源限制的硬件做出了有效的推论。 我们扩展它们, 并配有可训练的、可区分的互连, 其参数计数随着输入宽度的增加而保持不变, 允许DBNs在保持其有利准确性的同时, 将范围扩大到比先前的可学习互连设计更广得多的层次。 为了进一步缩小模型大小, 我们提议两个互补的运行阶段: 基于 SAT 的逻辑等同通道, 在不影响性能的情况下清除冗余的门, 以及类似基于数据的、以数据驱动的通关口, 该通关口优于规模式的贪婪基线, 并提供更高级的压缩- 准确性交易 。


Article 68

Title@2025-07-03 (4): SoccerDiffusion: Toward Learning End-to-End Humanoid Robot Soccer from Gameplay Recordings

Title: SoccerDiffusion: Toward Learning End-to-End Humanoid Robot Soccer from Gameplay Recordings SoccerDiffusion: Auf dem Weg zum Lernen von End-to-End Humanoid Robot Soccer aus Gameplay-Aufnahmen 足球传播:从游戏游戏录音录制中学习最终到最终人类的机器人足球 2504.20808v2

Authors (5): Florian Vahl, Jörn Griepenburg, Jan Gutsche, Jasper Güldenstein, Jianwei Zhang

This paper introduces SoccerDiffusion, a transformer-based diffusion model designed to learn end-to-end control policies for humanoid robot soccer directly from real-world gameplay recordings. Using data collected from RoboCup competitions, the model predicts joint command trajectories from multi-modal sensor inputs, including vision, proprioception, and game state. We employ a distillation technique to enable real-time inference on embedded platforms that reduces the multi-step diffusion process to a single step. Our results demonstrate the model’s ability to replicate complex motion behaviors such as walking, kicking, and fall recovery both in simulation and on physical robots. Although high-level tactical behavior remains limited, this work provides a robust foundation for subsequent reinforcement learning or preference optimization methods. We release the dataset, pretrained models, and code under: https://bit-bots.github.io/SoccerDiffusion

本文介绍一个基于变压器的传播模型 SocerDifulation, 这是一种基于变压器的传播模型, 旨在直接从真实世界游戏记录中学习人类机器人足球的端到端控制政策。 模型使用从机器人比赛中收集的数据, 预测了来自多式传感器输入( 包括视觉、 自行感知和游戏状态) 的联合指挥轨迹。 我们使用蒸馏技术, 在嵌入平台上实时推断, 将多步扩散进程降低到一个步骤 。 我们的结果表明该模型有能力在模拟和物理机器人中复制复杂的动作行为, 如行走、 踢踢脚和摔倒。 尽管高级战术行为仍然有限, 这项工作为随后的强化学习或偏好优化方法提供了坚实的基础 。 我们发布数据集、 预先训练的模型和代码 : https://bit-bots.github.io/socerDifulation


Article 69

Title@2025-07-03 (4): Towards an Explainable Comparison and Alignment of Feature Embeddings

Title: Towards an Explainable Comparison and Alignment of Feature Embeddings Auf dem Weg zu einem erklärbaren Vergleich und der Ausrichtung von Feature-Embeddings 争取实现可解释的比较和地貌隐含物的一致 2506.06231v2

Authors (3): Mohammad Jalali, Bahar Dibaei Nia, Farzan Farnia

While several feature embedding models have been developed in the literature, comparisons of these embeddings have largely focused on their numerical performance in classification-related downstream applications. However, an interpretable comparison of different embeddings requires identifying and analyzing mismatches between sample groups clustered within the embedding spaces. In this work, we propose the \emph{Spectral Pairwise Embedding Comparison (SPEC)} framework to compare embeddings and identify their differences in clustering a reference dataset. Our approach examines the kernel matrices derived from two embeddings and leverages the eigendecomposition of the difference kernel matrix to detect sample clusters that are captured differently by the two embeddings. We present a scalable implementation of this kernel-based approach, with computational complexity that grows linearly with the sample size. Furthermore, we introduce an optimization problem using this framework to align two embeddings, ensuring that clusters identified in one embedding are also captured in the other model. We provide numerical results demonstrating the SPEC’s application to compare and align embeddings on large-scale datasets such as ImageNet and MS-COCO. The project page is available at https://mjalali.github.io/SPEC/.

虽然文献中已开发了若干嵌入模型,但这些嵌入模型的比较主要侧重于其在分类相关下游应用中的数值性能。然而,要对不同嵌入进行可解释的比较,就需要查明和分析嵌入空间内各抽样组群之间的不匹配情况。在这项工作中,我们提议采用计算复杂性随着样本大小而直线增长的计算方法。此外,我们提出一个优化问题,利用这个框架来比较嵌入数据,确保从两个嵌入中找出的集群也被纳入另一个模型。我们提供了数字结果,表明SEC在比较和校准大型数据组群群群中的应用情况,例如图像/网络和MS-CO。在 MAGL/CO 上可以找到的 MAGALI/CO 。


Article 70

Title@2025-07-03 (4): Transformers Don’t Need LayerNorm at Inference Time: Scaling LayerNorm Removal to GPT-2 XL and the Implications for Mechanistic Interpretability

Title: Transformers Don’t Need LayerNorm at Inference Time: Scaling LayerNorm Removal to GPT-2 XL and the Implications for Mechanistic Interpretability Transformer benötigen keine Layernorm zur Inferenzzeit: Skalierung der Layernorm Entfernung zu GPT-2 XL und die Implikationen für mechanistische Interpretierbarkeit 突变器不需要在推断时间使用层内线: 将层内线缩放去除到 GPT-2 XL 和对机械可解释性的影响 2507.02559v1

Authors (5): Luca Baroni, Galvin Khara, Joachim Schaeffer, Marat Subkhankulov, Stefan Heimersheim

Layer-wise normalization (LN) is an essential component of virtually all transformer-based large language models. While its effects on training stability are well documented, its role at inference time is poorly understood. Additionally, LN layers hinder mechanistic interpretability by introducing additional nonlinearities and increasing the interconnectedness of individual model components. Here, we show that all LN layers can be removed from every GPT-2 model with only a small increase in validation loss (e.g. +0.03 cross-entropy loss for GPT-2 XL). Thus, LN cannot play a substantial role in language modeling. We find that the amount of fine-tuning data needed for LN removal grows sublinearly with model parameters, suggesting scaling to larger models is feasible. We release a suite of LN-free GPT-2 models on Hugging Face. Furthermore, we test interpretability techniques on LN-free models. Direct logit attribution now gives the exact direct effect of individual components, while the accuracy of attribution patching does not significantly improve. We also confirm that GPT-2’s “confidence neurons” are inactive in the LN-free models. Our work clarifies the role of LN layers in language modeling, showing that GPT-2-class models can function without LN layers. We hope that our LN-free analogs of the GPT-2 family of models will enable more precise interpretability research and improve our understanding of language models.

层与层的正常化( LN) 是几乎所有基于变压器的大型语言模型的基本组成部分。 虽然它对于培训稳定性的影响都有详细记载, 但它在推论时间的作用却不易理解。 此外, LN 层通过引入额外的非线性和增加单个模型组件的相互关联性来阻碍机械解释性。 在这里, 我们显示, 所有LN 层都可以从每个GPT-2模型中去除, 仅小幅增加校准损失( 例如, GPT-2 XL ) 。 因此, LN 无法在语言建模中发挥实质性作用。 我们发现, LN 去除所需的微调数据数量以模型参数增长为亚线性, 这表明向更大的模型扩展是可行的。 我们在Hugging Face上推出一套无LN- PT-2模型的可解释性技术。 我们测试无LN- 2模型的可解释性技术。 直接对单个构件的准确性影响, 而分解的准确性拼写性并不显著改善语言模型。 我们还确认, GPT-2 N 的模型“ 神经- N N ” 和 LPT 类模型的不动性模型在L- 级解释性模型中不动。


Article 71

Title@2025-07-03 (4): Fading memory and the convolution theorem

Title: Fading memory and the convolution theorem Das verblassende Gedächtnis und das Konvolutionstheorem 过时的记忆和变幻的定理 2408.07386v3

Authors (2): Juan-Pablo Ortega, Florian Rossmannek

Several topological and analytical notions of continuity and fading memory for causal and time-invariant filters are introduced, and the relations between them are analyzed. A significant generalization of the convolution theorem that establishes the equivalence between the fading memory property and the availability of convolution representations of linear filters is proved. This result extends a previous similar characterization to a complete array of weighted norms in the definition of the fading memory property. Additionally, the main theorem shows that the availability of convolution representations can be characterized, at least when the codomain is finite-dimensional, not only by the fading memory property but also by the reunion of two purely topological notions that are called minimal continuity and minimal fading memory property. Finally, when the input space and the codomain of a linear functional are Hilbert spaces, it is shown that minimal continuity and the minimal fading memory property guarantee the existence of interesting embeddings of the associated reproducing kernel Hilbert spaces.

对因果和时间变化过滤器的连续性和消退记忆的数个表层和分析概念进行了分析,分析了它们之间的关系。一个显著的演进理论的概括化,确定了消退的内存属性与线性过滤器的演化表达的可用性之间的等同性。这一结果将以前的类似特征扩展为在消退的内存属性定义中一系列完整的加权规范。此外,主要理论表明,至少当共存属性为有限维度时,可以说明共存表示的可用性,不仅包括消退的内存属性,而且还包括两个纯粹的表面学概念的结合,即最小的连续性和最小的消退的内存属性。最后,当输入空间和线性功能的共性为Hilbert空间时,可以证明最小的连续性和最小的消减内存属性保证了相关再生的内核的Hilbert空间存在有趣的内嵌。


Article 72

Title@2025-07-03 (4): Adaptive Probabilistic ODE Solvers Without Adaptive Memory Requirements

Title: Adaptive Probabilistic ODE Solvers Without Adaptive Memory Requirements Adaptive probabilistische ODE-Lösemittel ohne adaptive Speicheranforderungen 无适应性内存要求的适应性概率性ODE溶剂 2410.10530v2

Authors (1): Nicholas Krämer

Despite substantial progress in recent years, probabilistic solvers with adaptive step sizes can still not solve memory-demanding differential equations – unless we care only about a single point in time (which is far too restrictive; we want the whole time series). Counterintuitively, the culprit is the adaptivity itself: Its unpredictable memory demands easily exceed our machine’s capabilities, making our simulations fail unexpectedly and without warning. Still, dropping adaptivity would abandon years of progress, which can’t be the answer. In this work, we solve this conundrum. We develop an adaptive probabilistic solver with fixed memory demands building on recent developments in robust state estimation. Switching to our method (i) eliminates memory issues for long time series, (ii) accelerates simulations by orders of magnitude through unlocking just-in-time compilation, and (iii) makes adaptive probabilistic solvers compatible with scientific computing in JAX.

尽管近些年来取得了巨大进展,但具有适应性步数大小的概率求解者仍然无法解决记忆-需求差异方程式问题 – – 除非我们只关心一个时间点(这个时间点太过严格;我们需要整个时间序列 ) 。 反直觉说,罪魁祸首是适应性本身:其不可预测的记忆要求很容易超过我们的机器的能力,使我们的模拟出乎意料地失败。然而,降低适应性会放弃多年的进步,这不可能是答案。在这项工作中,我们解决了这一难题。我们开发了一个适应性概率求解器,其固定的记忆需求建立在最近动态的稳健状态估计上。我们的方法(一) 切换到(一) 消除长期记忆问题, (二) 通过及时编集加速规模的模拟, (三) 使适应性概率解密与JAX的科学计算兼容。


Article 73

Title@2025-07-03 (4): Flow Matching on Lie Groups

Title: Flow Matching on Lie Groups Flow Passend zu Lie-Gruppen 与 Lis Group 匹配 2504.00494v2

Authors (2): Finn M. Sherry, Bart M. N. Smets

Flow Matching (FM) is a recent generative modelling technique: we aim to learn how to sample from distribution $\mathfrak{X}_1$ by flowing samples from some distribution $\mathfrak{X}_0$ that is easy to sample from. The key trick is that this flow field can be trained while conditioning on the end point in $\mathfrak{X}_1$: given an end point, simply move along a straight line segment to the end point (Lipman et al. 2022). However, straight line segments are only well-defined on Euclidean space. Consequently, Chen and Lipman (2023) generalised the method to FM on Riemannian manifolds, replacing line segments with geodesics or their spectral approximations. We take an alternative point of view: we generalise to FM on Lie groups by instead substituting exponential curves for line segments. This leads to a simple, intrinsic, and fast implementation for many matrix Lie groups, since the required Lie group operations (products, inverses, exponentials, logarithms) are simply given by the corresponding matrix operations. FM on Lie groups could then be used for generative modelling with data consisting of sets of features (in $\mathbb{R}^n$) and poses (in some Lie group), e.g. the latent codes of Equivariant Neural Fields (Wessels et al. 2025).

滚动匹配( FM) 是最近的一项基因建模技术 : 我们的目标是学习如何通过从某些分销中流出 $\ mathfrak{X1$的样本从分布中采样 $\ mathfrak{X0$, 这很容易取样。 关键诀窍是, 这个流场可以接受培训, 同时在端点上调整 $\ mathfrak{X1$: 给定一个终点, 只需沿着直线段移动到终点( Lipman等人 等人 2022) 。 然而, 直线线段只在 Euclidean 空间上定义得清楚。 因此, Chen 和 Lipman (2023年) 将方法推广到Riemannian 平台上的调频段, 用地标或其光谱近似来替换线段的端点 。 我们从另一个观点是: 我们将流线组的调频调, 而不是替换线段的指数曲线曲线曲线曲线曲线 。 这导致许多矩阵 Lie Lie 组的快速实施简单、 、 和快速执行, 因为所需的利组的操作( 产品、 指数、 指数、 等值、 和正值) 的底值的基底值的模型组成。


Article 74

Title@2025-07-03 (4): Interleaved Gibbs Diffusion: Generating Discrete-Continuous Data with Implicit Constraints

Title: Interleaved Gibbs Diffusion: Generating Discrete-Continuous Data with Implicit Constraints Interleaved Gibbs Diffusion: Diskrete-kontinuierliche Daten mit impliziten Einschränkungen generieren Gibbs Interlebert Gibbs 扩散:产生具有隐含限制的分散持续数据 2502.13450v2

Authors (5): Gautham Govind Anil, Sachin Yadav, Dheeraj Nagaraj, Karthikeyan Shanmugam, Prateek Jain

We introduce Interleaved Gibbs Diffusion (IGD), a novel generative modeling framework for discrete-continuous data, focusing on problems with important, implicit and unspecified constraints in the data. Most prior works on discrete and discrete-continuous diffusion assume a factorized denoising distribution, which can hinder the modeling of strong dependencies between random variables in such problems. We empirically demonstrate a significant improvement in 3-SAT performance out of the box by switching to a Gibbs-sampling style discrete diffusion model which does not assume factorizability. Motivated by this, we introduce IGD which generalizes discrete time Gibbs sampling type Markov chain for the case of discrete-continuous generation. IGD allows for seamless integration between discrete and continuous denoisers while theoretically guaranteeing exact reversal of a suitable forward process. Further, it provides flexibility in the choice of denoisers, allows conditional generation via state-space doubling and inference time refinement. Empirical evaluations on three challenging generation tasks - molecule structures, layouts and tabular data - demonstrate state-of-the-art performance. Notably, IGD achieves state-of-the-art results without relying on domain-specific inductive biases like equivariant diffusion or auxiliary losses. We explore a wide range of modeling, and interleaving strategies along with hyperparameters in each of these problems.

我们引入了不同Gibbs Difulation(IGD),这是一个用于不同连续数据的新颖的基因模型框架,侧重于数据中重要、隐含和未具体说明的限制问题;大多数以前关于离散和离散连续扩散的工程假设了一种因子分解分布,这可能会妨碍在此类问题随机变量之间形成高度依赖性的模型;我们从经验上表明,3SAT的性能在盒外有了显著改进,转而采用不包含因素的Gibs抽样分散传播模式。为此,我们引入了IGD, 将离散、隐含和不连续的生成的离散采样类型Markov链的通用化时间采样。IGD允许离散和连续连续分散扩散的折射体之间无缝地进行分解分布,同时理论上保证了适当前向进程的准确逆转。此外,它提供了选择缩放量器的灵活性,允许通过州-空间的翻倍和误差时间调整来有条件的生成。对三种具有挑战性的生成任务――分子结构、布局和表式数据――展示了不同时间-展示了不同时间的GGPl度的模型和模型中每个磁场间变变换的模型的模型的模型的模型的模型和跨轨结果。


Article 75

Title@2025-07-03 (4): AI Research Agents for Machine Learning: Search, Exploration, and Generalization in MLE-bench

Title: AI Research Agents for Machine Learning: Search, Exploration, and Generalization in MLE-bench KI-Forscher für maschinelles Lernen: Suche, Exploration und Generalisierung in der MLE-Bench AI 机器学习研究代理机构:MLE-Bench的搜索、探索和普及 2507.02554v1

Authors (25): Edan Toledo, Karen Hambardzumyan, Martin Josifoski, Rishi Hazra, Nicolas Baldwin, Alexis Audran-Reiss, Michael Kuchnik, Despoina Magka, Minqi Jiang, Alisia Maria Lupidi, Andrei Lupu, Roberta Raileanu, Kelvin Niu, Tatiana Shavrina, Jean-Christophe Gagnon-Audet, Michael Shvartsman, Shagun Sodhani, Alexander H. Miller, Abhishek Charnalia, Derek Dunfield, Carole-Jean Wu, Pontus Stenetorp, Nicola Cancedda, Jakob Nicolaus Foerster, Yoram Bachrach

AI research agents are demonstrating great potential to accelerate scientific progress by automating the design, implementation, and training of machine learning models. We focus on methods for improving agents’ performance on MLE-bench, a challenging benchmark where agents compete in Kaggle competitions to solve real-world machine learning problems. We formalize AI research agents as search policies that navigate a space of candidate solutions, iteratively modifying them using operators. By designing and systematically varying different operator sets and search policies (Greedy, MCTS, Evolutionary), we show that their interplay is critical for achieving high performance. Our best pairing of search strategy and operator set achieves a state-of-the-art result on MLE-bench lite, increasing the success rate of achieving a Kaggle medal from 39.6% to 47.7%. Our investigation underscores the importance of jointly considering the search strategy, operator design, and evaluation methodology in advancing automated machine learning.

AI研究代理机构通过机械学习模型的设计、实施和培训自动化,展示了加快科学进步的巨大潜力。我们侧重于提高MLE-Bench代理机构业绩的方法,MLE-Bench是一个具有挑战性的基准,在这个基准中,代理机构在Kaggle竞赛中竞争解决现实世界机器学习问题。我们正式将AI研究代理机构作为探索政策的搜索政策,该政策将探索各种候选解决方案的空间,并使用操作者反复修改这些解决方案。我们设计并系统性地不同不同的操作者和搜索政策(Greedy、MCTS、Evolutionary),表明它们的相互作用对于取得高性能至关重要。我们最好的搜索战略和操作者配对组合在MLE-Bench利特上取得了最先进的成果,将获得Kagle牌的成功率从39.6%提高到47.7%。我们的调查强调了共同考虑搜索战略、操作者设计和评估方法对于推进自动机器学习的重要性。


Article 76

Title@2025-07-03 (4): Position: A Theory of Deep Learning Must Include Compositional Sparsity

Title: Position: A Theory of Deep Learning Must Include Compositional Sparsity Position: Eine Theorie des tiefen Lernens muss kompositorische Sparsamkeit einschließen 位置:深深学习理论必须包含构成平等 2507.02550v1

Authors (4): David A. Danhofer, Davide D’Ascenzo, Rafael Dubach, Tomaso Poggio

Overparametrized Deep Neural Networks (DNNs) have demonstrated remarkable success in a wide variety of domains too high-dimensional for classical shallow networks subject to the curse of dimensionality. However, open questions about fundamental principles, that govern the learning dynamics of DNNs, remain. In this position paper we argue that it is the ability of DNNs to exploit the compositionally sparse structure of the target function driving their success. As such, DNNs can leverage the property that most practically relevant functions can be composed from a small set of constituent functions, each of which relies only on a low-dimensional subset of all inputs. We show that this property is shared by all efficiently Turing-computable functions and is therefore highly likely present in all current learning problems. While some promising theoretical insights on questions concerned with approximation and generalization exist in the setting of compositionally sparse functions, several important questions on the learnability and optimization of DNNs remain. Completing the picture of the role of compositional sparsity in deep learning is essential to a comprehensive theory of artificial, and even general, intelligence.

然而,关于指导DNN的学习动态的基本原则的开放问题依然存在。在本立场文件中,我们认为,DNN能够利用目标功能的构成结构稀疏结构来推动其成功。因此,DNN能够利用最实际相关的功能可以由一小套组成功能构成的财产,每个功能仅依赖于所有投入的低维子集。我们表明,所有高效的图灵可转换功能都拥有这一属性,因此在目前的所有学习问题中都极有可能存在这种属性。虽然在构成稀疏功能的设置中存在一些关于近似和概括问题的有希望的理论见解,但关于DNNN的可学习性和优化的一些重要问题仍然存在。了解和优化成份性在深层次学习中的作用的描述对于人造甚至一般智慧的全面理论至关重要。


Article 77

Title@2025-07-03 (4): Offline Reinforcement Learning for Learning to Dispatch for Job Shop Scheduling

Title: Offline Reinforcement Learning for Learning to Dispatch for Job Shop Scheduling Offline-Verstärkung Lernen für das Lernen zum Versand für Job Shop Scheduling 离线强化学习,学习到派到招工商店日程安排的调度 2409.10589v4

Authors (3): Jesse van Remmerden, Zaharah Bukhsh, Yingqian Zhang

The Job Shop Scheduling Problem (JSSP) is a complex combinatorial optimization problem. While online Reinforcement Learning (RL) has shown promise by quickly finding acceptable solutions for JSSP, it faces key limitations: it requires extensive training interactions from scratch leading to sample inefficiency, cannot leverage existing high-quality solutions from traditional methods like Constraint Programming (CP), and require simulated environments to train in, which are impracticable to build for complex scheduling environments. We introduce Offline Learned Dispatching (Offline-LD), an offline reinforcement learning approach for JSSP, which addresses these limitations by learning from historical scheduling data. Our approach is motivated by scenarios where historical scheduling data and expert solutions are available or scenarios where online training of RL approaches with simulated environments is impracticable. Offline-LD introduces maskable variants of two Q-learning methods, namely, Maskable Quantile Regression DQN (mQRDQN) and discrete maskable Soft Actor-Critic (d-mSAC), that are able to learn from historical data, through Conservative Q-Learning (CQL). Moreover, we present a novel entropy bonus modification for d-mSAC, for maskable action spaces. Moreover, we introduce a novel reward normalization method for JSSP in an offline RL setting. Our experiments demonstrate that Offline-LD outperforms online RL on both generated and benchmark instances when trained on only 100 solutions generated by CP. Notably, introducing noise to the expert dataset yields comparable or superior results to using the expert dataset, with the same amount of instances, a promising finding for real-world applications, where data is inherently noisy and imperfect.

工作商店调度问题(JSSP)是一个复杂的组合优化问题。 在线强化学习(RL)通过快速为 JSSP 找到可接受的解决方案而显示出希望, 但它面临一些关键限制: 它需要从零到零的广泛培训互动, 导致抽样效率低下, 无法利用诸如 Constrain 编程(CP) 等传统方法的现有高质量解决方案, 需要模拟环境来培训, 无法为复杂的排程环境构建。 我们为 JSSP 引入了“ 脱线快递( 脱线- LD) ” , 这是一种脱线强化学习方法, 通过历史时间安排数据, 解决这些局限性。 我们的方法是: 具有可比较性时间安排的数据和专家解决方案, 具有可比较性的数据和专家, 用于当前 RR-LS 的 RR- Ralder 数据生成, 用于当前 R- R- R- R- R- R- Ral- Ral- seral 的实时数据生成。


Article 78

Title@2025-07-03 (4): Higher-Order Singular-Value Derivatives of Rectangular Real Matrices

Title: Higher-Order Singular-Value Derivatives of Rectangular Real Matrices Hohe Ordnung Singular-Value Derivate der rechteckigen Realmatrizen 矩形真实矩阵的高端圆形单值衍生物 2506.03764v3

Authors (3): Róisín Luo, James McDermott, Colm O’Riordan

We present a theoretical framework for deriving the general $n$-th order Fr'echet derivatives of singular values in real rectangular matrices, by leveraging reduced resolvent operators from Kato’s analytic perturbation theory for self-adjoint operators. Deriving closed-form expressions for higher-order derivatives of singular values is notoriously challenging through standard matrix-analysis techniques. To overcome this, we treat a real rectangular matrix as a compact operator on a finite-dimensional Hilbert space, and embed the rectangular matrix into a block self-adjoint operator so that non-symmetric perturbations are captured. Applying Kato’s asymptotic eigenvalue expansion to this construction, we obtain a general, closed-form expression for the infinitesimal $n$-th order spectral variations. Specializing to $n=2$ and deploying on a Kronecker-product representation with matrix convention yield the Hessian of a singular value, not found in literature. By bridging abstract operator-theoretic perturbation theory with matrices, our framework equips researchers with a practical toolkit for higher-order spectral sensitivity studies in random matrix applications (e.g., adversarial perturbation in deep learning).

我们提出了一个理论框架,用于利用卡托分析性扰动理论中较低的固态操作员,在真实矩形矩阵中得出普通值n-n-nord Fr'echet衍生物Fr'echet衍生物,为自联操作者利用卡托分析性扰动理论中减少的固态操作员。通过标准的矩阵分析技术,为超值较高序列衍生物的封闭式表达式提出了臭名昭著的挑战。要克服这一点,我们把真正的矩形矩阵视为一个在有限维的希尔伯特空间上的紧凑操作员,并将矩形矩阵矩阵嵌入一个块的自我联合操作器,以便捕捉到非对称扰动的扰动。将卡托的无对称振动性扰动值扩展应用应用应用到该构造中,我们获得了无限量值的封闭式表达式表达物。我们专门使用$=2美元,并在Kronecker-producal 上部署一个基质公约代表器,使Hesian具有在文献中找不到的单值。通过将抽象操作性操作者-神学透透透性透性过突扰扰,我们的框架在矩阵中使研究人员在高光学研究中安装中能够进行。


Article 79

Title@2025-07-03 (4): Assessing Quantum Advantage for Gaussian Process Regression

Title: Assessing Quantum Advantage for Gaussian Process Regression Bewertung des Quantenvorteils für Gaussian Process Regression 评估高山进程倒退的量度优势 2505.22502v2

Authors (3): Dominic Lowe, M. S. Kim, Roberto Bondesan

Gaussian Process Regression is a well-known machine learning technique for which several quantum algorithms have been proposed. We show here that in a wide range of scenarios these algorithms show no exponential speedup. We achieve this by rigorously proving that the condition number of a kernel matrix scales at least linearly with the matrix size under general assumptions on the data and kernel. We additionally prove that the sparsity and Frobenius norm of a kernel matrix scale linearly under similar assumptions. The implications for the quantum algorithms runtime are independent of the complexity of loading classical data on a quantum computer and also apply to dequantised algorithms. We supplement our theoretical analysis with numerical verification for popular kernels in machine learning.

Gausian 进程回归是一种众所周知的机器学习技术,已经为此提出了几种量子算法。 我们在这里显示,在多种假设中,这些算法没有显示指数加速。 我们通过严格证明内核矩阵尺度的条件数至少与数据和内核一般假设下的矩阵大小线性比较。 我们进一步证明,在类似假设下,线性内核矩阵尺度的宽度和Frobenius规范。 量子算法运行时间的影响与在量子计算机上装载古典数据的复杂性无关,也适用于分解算法。 我们用机器学习中流行的内核的数值来补充我们的理论分析。


Article 80

Title@2025-07-03 (4): Ordinality in Discrete-level Question Difficulty Estimation: Introducing Balanced DRPS and OrderedLogitNN

Title: Ordinality in Discrete-level Question Difficulty Estimation: Introducing Balanced DRPS and OrderedLogitNN Normalität in Diskrete-Level-Frage Schwierigkeitsgrad Schätzung: Einführung ausgewogene DRPS und bestelltLogitNN 不同层次问题难以估计的常态:采用平衡的DRPS和有秩序LogitNN 2507.00736v2

Authors (3): Arthur Thuy, Ekaterina Loginova, Dries F. Benoit

Recent years have seen growing interest in Question Difficulty Estimation (QDE) using natural language processing techniques. Question difficulty is often represented using discrete levels, framing the task as ordinal regression due to the inherent ordering from easiest to hardest. However, the literature has neglected the ordinal nature of the task, relying on classification or discretized regression models, with specialized ordinal regression methods remaining unexplored. Furthermore, evaluation metrics are tightly coupled to the modeling paradigm, hindering cross-study comparability. While some metrics fail to account for the ordinal structure of difficulty levels, none adequately address class imbalance, resulting in biased performance assessments. This study addresses these limitations by benchmarking three types of model outputs – discretized regression, classification, and ordinal regression – using the balanced Discrete Ranked Probability Score (DRPS), a novel metric that jointly captures ordinality and class imbalance. In addition to using popular ordinal regression methods, we propose OrderedLogitNN, extending the ordered logit model from econometrics to neural networks. We fine-tune BERT on the RACE++ and ARC datasets and find that OrderedLogitNN performs considerably better on complex tasks. The balanced DRPS offers a robust and fair evaluation metric for discrete-level QDE, providing a principled foundation for future research.

近些年来,人们对使用自然语言处理技术的难问题估计(QDE)的兴趣日益浓厚。问题困难往往使用离散的层次,将任务定为从最容易到最难的内在顺序的折叠式退缩。然而,文献忽视了任务的常规性质,依靠的是分类或离散的回归模型,而专门或非常规回归方法仍未得到探讨。此外,评价指标与模型模式紧密结合,妨碍了交叉研究的可比性。虽然有些指标没有说明困难水平的或非常规结构,但没有适当解决导致偏差的业绩评估的阶级不平衡问题。本项研究通过将三种模型产出 – – 离散回归、分类和折叠式退缩 – 基准化为基准来应对这些局限性,使用平衡的分级概率回归模型(DRPS),这是一个新颖的衡量标准,共同捕捉或偏差性和阶级不平衡。除了使用大众或非常规回归方法外,我们提议有秩序的LogiedLogitNNNN,将订购的对逻辑模型模型从生态测量网络扩大到神经测量网络,结果评估有偏颇。我们用精细的BERER 进行一个更稳的系统化的ADERDRADAR 进行一个更稳健的系统,并进行一个更稳妥的精确的系统基础。


Article 81

Title@2025-07-03 (4): A Square Peg in a Square Hole: Meta-Expert for Long-Tailed Semi-Supervised Learning

Title: A Square Peg in a Square Hole: Meta-Expert for Long-Tailed Semi-Supervised Learning Eine quadratische Peg in einem quadratischen Loch: Meta-Experte für langes semi-überwachtes Lernen 广场洞中的一个广场塞格:长期辍学的半模拟学习的元专家 2505.16341v2

Authors (2): Yaxin Hou, Yuheng Jia

This paper studies the long-tailed semi-supervised learning (LTSSL) with distribution mismatch, where the class distribution of the labeled training data follows a long-tailed distribution and mismatches with that of the unlabeled training data. Most existing methods introduce auxiliary classifiers (experts) to model various unlabeled data distributions and produce pseudo-labels, but the expertises of various experts are not fully utilized. We observe that different experts are good at predicting different intervals of samples, e.g., long-tailed expert is skilled in samples located in the head interval and uniform expert excels in samples located in the medium interval. Therefore, we propose a dynamic expert assignment module that can estimate the class membership (i.e., head, medium, or tail class) of samples, and dynamically assigns suitable expert to each sample based on the estimated membership to produce high-quality pseudo-label in the training phase and produce prediction in the testing phase. We also theoretically reveal that integrating different experts’ strengths will lead to a smaller generalization error bound. Moreover, we find that the deeper features are more biased toward the head class but with more discriminative ability, while the shallower features are less biased but also with less discriminative ability. We, therefore, propose a multi-depth feature fusion module to utilize different depth features to mitigate the model bias. Our method demonstrates its effectiveness through comprehensive experiments on the CIFAR-10-LT, STL-10-LT, and SVHN-LT datasets across various settings. The code is available at https://github.com/yaxinhou/Meta-Expert.

本文研究的是长尾半监督学习(LTSSL)与分布不匹配(LTSSL)的长尾半监督学习(LTSSL),在分布不匹配的情况下,标签培训数据的等级分布经过长尾分发,与未贴标签的培训数据不匹配。大多数现有方法都采用辅助分类人员(专家)来模拟各种未贴标签的数据分发和制作假标签,但各专家的专门知识没有得到充分利用。我们发现,不同的专家擅长预测不同样本的间隔期,例如,长尾专家精通位于头间隔的样品,而统一的专家则精通中间隔期的样品。因此,我们提出一个动态专家分配模块,可以估计各类样本的类别成员(即头部、中层或尾类)的分布和与未贴标签的培训数据不匹配。根据估计成员对每个样本进行模拟的适当专家进行模拟,以便在培训阶段产生高质量的假标签,并在测试阶段作出预测。我们从理论上发现,整合不同专家的优势将导致更小的一般错误。此外,我们发现,更深层的特征更偏向上偏向头层的样本,但更深层的S-LFLF 能力显示,我们更低的模型显示,低的深度的深度分析性模型是低的深度分析性模型显示,而甚深层的深度的深度的深度的深度的深度的深度,而低的深度的深度的深度的深度的深度的深度的深度的深度的深度的深度的深度,我们的深度的深度的深度,我们使用。


Article 82

Title@2025-07-03 (4): RetrySQL: text-to-SQL training with retry data for self-correcting query generation

Title: RetrySQL: text-to-SQL training with retry data for self-correcting query generation RetrySQL: Text-zu-SQL-Training mit Retry-Daten zur Selbstkorrektur der Abfragegenerierung 重新trySQL: 文本到 SQL 培训, 并重试数据, 用于自校查询生成 2507.02529v1

Authors (5): Alicja Rączkowska, Riccardo Belluzzo, Piotr Zieliński, Joanna Baran, Paweł Olszewski

The text-to-SQL task is an active challenge in Natural Language Processing. Many existing solutions focus on using black-box language models extended with specialized components within customized end-to-end text-to-SQL pipelines. While these solutions use both closed-source proprietary language models and coding-oriented open-source models, there is a lack of research regarding SQL-specific generative models. At the same time, recent advancements in self-correcting generation strategies show promise for improving the capabilities of existing architectures. The application of these concepts to the text-to-SQL task remains unexplored. In this paper, we introduce RetrySQL, a new approach to training text-to-SQL generation models. We prepare reasoning steps for reference SQL queries and then corrupt them to create retry data that contains both incorrect and corrected steps, divided with a special token. We continuously pre-train an open-source coding model with this data and demonstrate that retry steps yield an improvement of up to 4 percentage points in both overall and challenging execution accuracy metrics, compared to pre-training without retry data. Additionally, we confirm that supervised fine-tuning with LoRA is ineffective for learning from retry data and that full-parameter pre-training is a necessary requirement for that task. We showcase that the self-correcting behavior is learned by the model and the increase in downstream accuracy metrics is a result of this additional skill. Finally, we incorporate RetrySQL-trained models into the full text-to-SQL pipeline and showcase that they are competitive in terms of execution accuracy with proprietary models that contain orders of magnitude more parameters. RetrySQL demonstrates that self-correction can be learned in the text-to-SQL task and provides a novel way of improving generation accuracy for SQL-oriented language models.

文本到 SQL 任务在自然语言处理中是一项积极的挑战。 许多现有解决方案侧重于在定制的端到端文本到SQL 管道中使用黑箱语言模型,在专用的端到端文本到SQL 管道中扩展专门组件。 虽然这些解决方案使用封闭源自有文字语言模型和编码导向开放源模式,但缺乏关于SQL 特定基因化模型的研究。 同时,自更正生成战略的近期进展显示了提高现有结构能力的前景。 这些概念对文本到SQL 任务的应用仍未被解析。 在本文件中,我们引入了RetrySQL ,这是培训文本到SQL 生成模型模式的新方法。我们为引用SQL 的查询准备推理步骤,然后腐蚀它们创建包含不正确和校正步骤的重试数据。 我们不断将一个开源代码模型的预演化模型与这些数据的预演进, 并且显示,在总体和具有挑战性的执行精确度的版本中,RtryS-r-r-r-r-r-r-r-r-r-r-r-r-r-r-r-r-r-r-r-r-r-r-r-r-r-r-r-r-r-r-r-r-r-r-r-r-r-r-r-r-r-r-r-r-r-r-r-r-r-r-r-r-r-r-r-r-r-r-r-r-r-r-r-r-r-r-r-r-r-r-r-r-r-r-r-r-r-r-r-r-r-r-r-r-r-r-r-r-r-r-l-r-r-r-r-r-r-r-r-r-l-l-l-l-l-r-r-r-r-r-r-r-r-r-l-l-l-l-l-l-r-r-l-l-l-l-l-l-l-l-l-l-l-l-l-l-l-l-l-l-l-l-l-l


Article 83

Title@2025-07-03 (4): Deep learning four decades of human migration

Title: Deep learning four decades of human migration Tiefes Lernen von vier Jahrzehnten menschlicher Migration 四十年人类移徙的深入学习 2506.22821v2

Authors (2): Thomas Gaskin, Guy J. Abel

We present a novel and detailed dataset on origin-destination annual migration flows and stocks between 230 countries and regions, spanning the period from 1990 to the present. Our flow estimates are further disaggregated by country of birth, providing a comprehensive picture of migration over the last 35 years. The estimates are obtained by training a deep recurrent neural network to learn flow patterns from 18 covariates for all countries, including geographic, economic, cultural, societal, and political information. The recurrent architecture of the neural network means that the entire past can influence current migration patterns, allowing us to learn long-range temporal correlations. By training an ensemble of neural networks and additionally pushing uncertainty on the covariates through the trained network, we obtain confidence bounds for all our estimates, allowing researchers to pinpoint the geographic regions most in need of additional data collection. We validate our approach on various test sets of unseen data, demonstrating that it significantly outperforms traditional methods estimating five-year flows while delivering a significant increase in temporal resolution. The model is fully open source: all training data, neural network weights, and training code are made public alongside the migration estimates, providing a valuable resource for future studies of human migration.

我们从1990年至今对230个国家和地区的原籍国-目的地年度移徙流动和存量提出了一套新颖和详细的数据,涵盖期间为1990年至今,我们的流动估计数按出生国进一步分列,提供了过去35年来移徙的全面情况;通过培训一个深层次的经常性神经网络,学习所有国家18个共变体的流动模式,包括地理、经济、文化、社会和政治信息;神经网络的经常性结构意味着整个过去都能够影响目前的移徙模式,使我们能够学习长期的时间相关性;通过培训一批神经网络,并通过经过培训的网络进一步推移共变的不确定性,我们获得了我们所有估计数的信心,使研究人员能够确定最需要额外数据收集的地理区域;我们验证了我们对各种未见数据测试集的做法,表明它大大超越了估计五年流动的传统方法,同时大大提高了时间分辨率;模型是完全开放的来源:所有培训数据、神经网络重量和培训代码都与移徙估计数一起公开,为今后的人类移徙研究提供了宝贵的资源。


Article 84

Title@2025-07-03 (4): Self-Supervised Frameworks for Speaker Verification via Bootstrapped Positive Sampling

Title: Self-Supervised Frameworks for Speaker Verification via Bootstrapped Positive Sampling Selbstüberwachte Rahmenbedingungen für die Lautsprecherverifizierung durch Bootstrapped Positive Sampling 通过推动积极抽样,自我监督的演讲人核查框架 2501.17772v3

Authors (2): Theo Lepage, Reda Dehak

Recent developments in Self-Supervised Learning (SSL) have demonstrated significant potential for Speaker Verification (SV), but closing the performance gap with supervised systems remains an ongoing challenge. SSL frameworks rely on anchor-positive pairs, constructed from segments of the same audio utterance. Hence, positives have channel characteristics similar to those of their corresponding anchors, even with extensive data-augmentation. Therefore, this positive sampling strategy is a fundamental limitation as it encodes too much information regarding the recording source in the learned representations. This article introduces Self-Supervised Positive Sampling (SSPS), a bootstrapped technique for sampling appropriate and diverse positives in SSL frameworks for SV. SSPS samples positives close to their anchor in the representation space, assuming that these pseudo-positives belong to the same speaker identity but correspond to different recording conditions. This method consistently demonstrates improvements in SV performance on VoxCeleb benchmarks when applied to major SSL frameworks, including SimCLR, SwAV, VICReg, and DINO. Using SSPS, SimCLR and DINO achieve 2.57% and 2.53% EER on VoxCeleb1-O, respectively. SimCLR yields a 58% relative reduction in EER, getting comparable performance to DINO with a simpler training framework. Furthermore, SSPS lowers intra-class variance and reduces channel information in speaker representations while exhibiting greater robustness without data-augmentation.

自我监督学习(SSL)的近期发展表明,发言人核查(SV)具有巨大的潜力,但缩小监督系统的业绩差距仍然是一项持续的挑战。SSL框架依靠在同一音频表达区段内建造的固定正对对,因此,正对具有类似其相应锚的频道特征,即使有广泛的数据增强。因此,这一积极的抽样战略是一个根本性的局限性,因为它在学术陈述中将太多关于记录来源的信息编码为太多,因此,这篇文章引入了自我监督积极抽样技术(SSPS),这是SVSSSSSSSSSSSS框架内适当和不同阳性标本的取样技术。SSPS样本正对接近其在代表空间内锚的固定正对对对对对对,假定这些假正对属于同一发言者身份,但与不同的记录条件相对应。这种方法一贯表明,在应用主要SSLS框架时,包括SimCLR、SWAVA、VIVREg和DINONO,在SSPSSPS、SML和DINONO框架内较稳定度2.57%和可比较的 EERC框架内,在SER-RALA 上分别降低2.57%的SER-RA 和SER-RLISLILA 和SER-RA 的SISLA AL AL AL AL AL AL AL AL AL AL AL AL AL AL AL AL AL AL 。


Article 85

Title@2025-07-03 (4): Reconsidering the energy efficiency of spiking neural networks

Title: Reconsidering the energy efficiency of spiking neural networks Über die Energieeffizienz neuronaler Netze nachdenken 重新考虑突发神经网络的能源效率 2409.08290v2

Authors (3): Zhanglu Yan, Zhenyu Bai, Weng-Fai Wong

Spiking Neural Networks (SNNs) promise higher energy efficiency over conventional Quantized Artificial Neural Networks (QNNs) due to their event-driven, spike-based computation. However, prevailing energy evaluations often oversimplify, focusing on computational aspects while neglecting critical overheads like comprehensive data movement and memory access. Such simplifications can lead to misleading conclusions regarding the true energy benefits of SNNs. This paper presents a rigorous re-evaluation. We establish a fair baseline by mapping rate-encoded SNNs with $T$ timesteps to functionally equivalent QNNs with $\lceil \log_2(T+1) \rceil$ bits. This ensures both models have comparable representational capacities, as well has similar hardware requirement, enabling meaningful energy comparisons. We introduce a detailed analytical energy model encompassing core computation and data movement (sparse and dense activations, weights). Using this model, we systematically explore a wide parameter space, including intrinsic network characteristics ($T$, spike rate $s_r$, QNN sparsity $\gamma$, model size $N$, weight bit-level) and hardware characteristics (memory system and network-on-chip). Our analysis identifies specific operational regimes where SNNs genuinely offer superior energy efficiency. For example, under typical neuromorphic hardware conditions, SNNs with moderate time windows ($T \in [5,10]$) require an average spike rate ($s_r$) below 6.4% to outperform equivalent QNNs. These insights guide the design of genuinely energy-efficient neural network solutions.

Spik Spik Neal 网络(SNNS) 与常规的量化人工神经网络相比,能带来更高的能源效率。但是,当前的能源评估往往过于简单化,侧重于计算方面,而忽视了诸如全面数据移动和记忆存取等关键间接费用。这种简化可以导致对SNS的真正能源效益得出误导性的结论。本文件将提供一个严格的重新评估。我们通过绘制以速率编码的SNNS(以美元计)为中度基准,以美元为时间级,以等值的QNNNNS(以美元计)为等值,以美元为单位,以美元为单位,以美元为单位,以美元为单位,以美元为单位,以美元为单位,以美元为单位,以美元为单位,以美元为单位,用美元为单位,以美元为单位,以美元为单位,用美元为单位,用美元为单位,用美元为单位,以美元为单位,以美元为单位,以美元为单位,以美元为单位,以美元为单位,以内,以内,以内基数,以内,以内,以内,以内,以内,以内,以内,以内,以内价,以内,以内,以内,以内,以内,以内,以内,以内,以内,以内,以内,以内,以内,以内,以内,以内,以内,以内,以内,以内,以内,以内,以内。


Article 86

Title@2025-07-03 (4): Traveling Across Languages: Benchmarking Cross-Lingual Consistency in Multimodal LLMs

Title: Traveling Across Languages: Benchmarking Cross-Lingual Consistency in Multimodal LLMs Sprachenübergreifendes Reisen: Benchmarking Cross-Lingual Consistency in multimodalen LLMs 跨语言旅行:多模式LLM中跨语言一致基准 2505.15075v2

Authors (5): Hao Wang, Pinzhi Huang, Jihan Yang, Saining Xie, Daisuke Kawahara

The rapid evolution of multimodal large language models (MLLMs) has significantly enhanced their real-world applications. However, achieving consistent performance across languages, especially when integrating cultural knowledge, remains a significant challenge. To better assess this issue, we introduce two new benchmarks: KnowRecall and VisRecall, which evaluate cross-lingual consistency in MLLMs. KnowRecall is a visual question answering benchmark designed to measure factual knowledge consistency in 15 languages, focusing on cultural and historical questions about global landmarks. VisRecall assesses visual memory consistency by asking models to describe landmark appearances in 9 languages without access to images. Experimental results reveal that state-of-the-art MLLMs, including proprietary ones, still struggle to achieve cross-lingual consistency. This underscores the need for more robust approaches that produce truly multilingual and culturally aware models.

多式联运大型语言模式(MLLMs)的迅速演变大大加强了其实际应用,然而,在各种语言之间取得一致的成绩,特别是在融合文化知识方面,仍是一项重大挑战。为了更好地评估这一问题,我们引入了两个新基准:知识回召和Vis回召,评估MLLMs的跨语言一致性。Know Recreme是一个直观问题,用来衡量15种语言的实际知识一致性,重点是有关全球里程碑的文化问题和历史问题。VisRecall 评估了视觉记忆的一致性,要求模型描述9种语言的标志性外观,但没有图像。实验结果显示,最先进的MLLLMs,包括专有的MLLMs,仍然难以实现跨语言的一致性。这突出表明,需要采取更强有力的方法,产生真正的多语言和文化意识模式。


Article 87

Title@2025-07-03 (4): Self-Guided Process Reward Optimization with Redefined Step-wise Advantage for Process Reinforcement Learning

Title: Self-Guided Process Reward Optimization with Redefined Step-wise Advantage for Process Reinforcement Learning Selbstgesteuerte Prozess-Reward-Optimierung mit neu definiertem Schrittweiser Vorteil für Prozess-Verstärkungs-Lernen 自指导流程向上优化,具有重新定义的逐步改进的流程强化学习优势 2507.01551v2

Authors (8): Wu Fei, Hao Kong, Shuxian Liang, Yang Lin, Yibo Yang, Jing Tang, Lei Chen, Xiansheng Hua

Process Reinforcement Learning~(PRL) has demonstrated considerable potential in enhancing the reasoning capabilities of Large Language Models~(LLMs). However, introducing additional process reward models incurs substantial computational overhead, and there is no unified theoretical framework for process-level advantage estimation. To bridge this gap, we propose \textbf{S}elf-Guided \textbf{P}rocess \textbf{R}eward \textbf{O}ptimization~(\textbf{SPRO}), a novel framework that enables process-aware RL through two key innovations: (1) we first theoretically demonstrate that process rewards can be derived intrinsically from the policy model itself, and (2) we introduce well-defined cumulative process rewards and \textbf{M}asked \textbf{S}tep \textbf{A}dvantage (\textbf{MSA}), which facilitates rigorous step-wise action advantage estimation within shared-prompt sampling groups. Our experimental results demonstrate that SPRO outperforms vaniila GRPO with 3.4x higher training efficiency and a 17.5\% test accuracy improvement. Furthermore, SPRO maintains a stable and elevated policy entropy throughout training while reducing the average response length by approximately $1/3$, evidencing sufficient exploration and prevention of reward hacking. Notably, SPRO incurs no additional computational overhead compared to outcome-supervised RL methods such as GRPO, which benefit industrial implementation.

(PRL) 在提高大语言模型(LLM)的推理能力方面具有相当大的潜力。然而,引入额外的流程奖励模式会产生大量的计算间接费用,而且没有统一的流程一级优势估计理论框架。为了缩小这一差距,我们提议了\ textbf{S}S}elf-Guided\ textbff{P}spourse\ textbf{R}ward\ textbf{R}ward\ textbf{O}pimization~(\ textbf{Spropro}),这是一个通过两项关键创新使进程能够达到REL(L)的新框架。 但是,引入额外的流程奖励模式,我们首先从理论上表明,流程奖励可以从政策模式本身中产生内在的衍生收益,而且没有为流程累积收益提供统一的理论框架框架。 我们提出了定义的累积累积过程奖赏,并提出了明确的累积累积过程奖赏和确定\ textbf{S_BAR_BAR__BAR_BAR_BAR_BAR_BAR_BAR_BAR_BAR_BAR_BAR_BAR_BAR_BAR_BAR_BAR_BAR_BAR_BAR_BAR_BAR_BAR_BAR_BAR_BAR_BAR_BAR_BAR_BAR_BAR_BAR_BAR_BAR_BAR_BAR_BAR_BAR_BAR_BAR_BAR_BAR_BAR_BAR_BAR_BAR_BAR_BAR_BAR_BAR_BAR_BAR_BAR_BAR_BAR_BAR_BAR_BAR_BAR_BAR_BAR_BAR_BAR_BAR_BAR_BAR_BAR_BAR_BAR_BAR_BAR_BBAR_要通过持续上多少BAR_BAR_BAR_BAR_BAR_BAR_BAR_BAR_BARBAR_BAR_BAR_BAR_要持续上多少BAR_BAR_BAR_BAR_BAR_BAR_BAR_BAR_BAR_BBAR_BAR_


Article 88

Title@2025-07-03 (4): TFOC-Net: A Short-time Fourier Transform-based Deep Learning Approach for Enhancing Cross-Subject Motor Imagery Classification

Title: TFOC-Net: A Short-time Fourier Transform-based Deep Learning Approach for Enhancing Cross-Subject Motor Imagery Classification TFOC-Net: Ein kurzzeitiger Fourier-Transform-basierter Deep-Learning-Ansatz zur Verbesserung der Klassifizierung bereichsübergreifender Motorbilder TFOC-Net:一个短时间短时间的四倍变换式深学习方法,以加强跨主题机动机图象分类 2507.02510v1

Authors (4): Ahmed G. Habashi, Ahmed M. Azab, Seif Eldawlatly, Gamal M. Aly

Cross-subject motor imagery (CS-MI) classification in brain-computer interfaces (BCIs) is a challenging task due to the significant variability in Electroencephalography (EEG) patterns across different individuals. This variability often results in lower classification accuracy compared to subject-specific models, presenting a major barrier to developing calibration-free BCIs suitable for real-world applications. In this paper, we introduce a novel approach that significantly enhances cross-subject MI classification performance through optimized preprocessing and deep learning techniques. Our approach involves direct classification of Short-Time Fourier Transform (STFT)-transformed EEG data, optimized STFT parameters, and a balanced batching strategy during training of a Convolutional Neural Network (CNN). This approach is uniquely validated across four different datasets, including three widely-used benchmark datasets leading to substantial improvements in cross-subject classification, achieving 67.60% on the BCI Competition IV Dataset 1 (IV-1), 65.96% on Dataset 2A (IV-2A), and 80.22% on Dataset 2B (IV-2B), outperforming state-of-the-art techniques. Additionally, we systematically investigate the classification performance using MI windows ranging from the full 4-second window to 1-second windows. These results establish a new benchmark for generalizable, calibration-free MI classification in addition to contributing a robust open-access dataset to advance research in this domain.

大脑-计算机界面(BCIs)的跨主题机动图象(CS-MI)分类是一项具有挑战性的任务,因为不同个人之间的电子脑图谱模式差异很大,这种差异往往导致分类准确性比特定主题模型低,对开发适合现实世界应用的无校准BCI形成主要障碍。在本文件中,我们采用了一种新颖的方法,通过优化预处理和深层学习技术,大大提高跨主题的MI分类性能。我们的方法包括直接分类短时Fourier变换(STFT)的EEEG数据、优化STFT参数和在培训进化神经网络(CNN)期间平衡的分批战略。这种方法在四个不同的数据集中,包括三个广泛使用的基准数据集,导致交叉分类的大幅改进,在BCI竞争四类数据集1(IV-1)上实现了67.60%的成绩,在数据集2A(IV-2A)上实现了65.96%,在数据集2B(IV-2B)上实现了80.22%的分级,在全州至州级标准窗口上比级全面升级,在MI中建立了新的标准。


Article 89

Title@2025-07-03 (4): The Choice of Normalization Influences Shrinkage in Regularized Regression

Title: The Choice of Normalization Influences Shrinkage in Regularized Regression Die Wahl der Normalisierung beeinflusst Schrumpfung bei regularisierter Regression 正常化的影响在正规化后退过程中的缩减 2501.03821v3

Authors (2): Johan Larsson, Jonas Wallin

Regularized models are often sensitive to the scales of the features in the data and it has therefore become standard practice to normalize (center and scale) the features before fitting the model. But there are many different ways to normalize the features and the choice may have dramatic effects on the resulting model. In spite of this, there has so far been no research on this topic. In this paper, we begin to bridge this knowledge gap by studying normalization in the context of lasso, ridge, and elastic net regression. We focus on binary features and show that their class balances (proportions of ones) directly influences the regression coefficients and that this effect depends on the combination of normalization and regularization methods used. We demonstrate that this effect can be mitigated by scaling binary features with their variance in the case of the lasso and standard deviation in the case of ridge regression, but that this comes at the cost of increased variance of the coefficient estimates. For the elastic net, we show that scaling the penalty weights, rather than the features, can achieve the same effect. Finally, we also tackle mixes of binary and normal features as well as interactions and provide some initial results on how to normalize features in these cases.

常规化模型往往对数据特征的尺度敏感,因此,它已成为标准做法,在设计模型之前使特征(中心和规模)正常化(中心和规模),但有许多不同的方式使特征和选择正常化,可能会对由此形成的模型产生巨大影响。尽管如此,迄今为止尚未就此专题进行过研究。在本文件中,我们开始通过研究拉索、脊和弹性网回归方面的正常化来弥补这一知识差距。我们侧重于二进制特征,并表明其等级平衡(两者的比例)直接影响到回归系数,而这种影响取决于所采用的正常化和正规化方法的结合。我们证明,通过在斜坡回归的情况下,通过放大二进制特征和标准偏差,可以减轻这一影响,但这样做的代价是系数估计差异的增加。关于弹性网,我们表明,在调整处罚重量而不是特征方面,可以取得同样的效果。最后,我们还处理二进制和正常特征的混合,作为互动和一些初步结果,这些案例如何正常化。


Article 90

Title: IndianBailJudgments-1200: A Multi-Attribute Dataset for Legal NLP on Indian Bail Orders IndianBailJudgments-1200: Ein Multi-Attribut-Datensatz für legale NLP auf indischen Bail-Aufträgen IndianBailJail Judgments-12000:印度保释令法律国家保护程序多属性数据集 2507.02506v1

Authors (2): Sneha Deshmukh, Prathmesh Kamble

Legal NLP remains underdeveloped in regions like India due to the scarcity of structured datasets. We introduce IndianBailJudgments-1200, a new benchmark dataset comprising 1200 Indian court judgments on bail decisions, annotated across 20+ attributes including bail outcome, IPC sections, crime type, and legal reasoning. Annotations were generated using a prompt-engineered GPT-4o pipeline and verified for consistency. This resource supports a wide range of legal NLP tasks such as outcome prediction, summarization, and fairness analysis, and is the first publicly available dataset focused specifically on Indian bail jurisprudence.

在印度等地区,由于缺少结构化数据集,国家法律援助方案仍然不发达。我们引入了印度Bail Judgments-1200(IndianBailJudgments-1200)这一新的基准数据集,其中包括1200个印度法院关于保释判决的判决,20+属性的附加说明,包括保释结果、国际刑法委员会各科、犯罪类型和法律推理。说明是利用迅速设计的GPT-4o(GPT-4o)管道生成的,并核实一致性。这一资源支持了国家法律援助方案范围广泛的法律任务,如结果预测、总结和公平分析,也是第一个专门侧重于印度保释判例的公开数据集。


Article 91

Title@2025-07-03 (4): Continual Gradient Low-Rank Projection Fine-Tuning for LLMs

Title: Continual Gradient Low-Rank Projection Fine-Tuning for LLMs Kontinuierliche gradiente Low-Rank-Projektion Feintuning für LLMs LLMM 连续渐进式低射频低射射精度 2507.02503v1

Authors (4): Chenxu Wang, Yilin Lyu, Zicheng Sun, Liping Jing

Continual fine-tuning of Large Language Models (LLMs) is hampered by the trade-off between efficiency and expressiveness. Low-Rank Adaptation (LoRA) offers efficiency but constrains the model’s ability to learn new tasks and transfer knowledge due to its low-rank nature and reliance on explicit parameter constraints. We propose GORP (Gradient LOw Rank Projection) for Continual Learning, a novel training strategy that overcomes these limitations by synergistically combining full and low-rank parameters and jointly updating within a unified low-rank gradient subspace. GORP expands the optimization space while preserving efficiency and mitigating catastrophic forgetting. Extensive experiments on continual learning benchmarks demonstrate GORP’s superior performance compared to existing state-of-the-art approaches. Code is available at https://github.com/Wcxwcxw/GORP.

大语言模型(LLMs)的持续微调受到效率与表达性之间的权衡的阻碍。低兰克适应(LORA)提供了效率,但因其级别低和依赖明确的参数限制,限制了该模型学习新任务和转让知识的能力。我们提议为持续学习提供GORP(高Low Rank Rank Projection),这是一项新颖的培训战略,通过将全部和低级别参数协同结合起来并在统一的低级别梯度子空间内联合更新来克服这些限制。GORP扩大了优化空间,同时保持效率和减轻灾难性的遗忘。关于持续学习基准的广泛实验显示了GORP相对于现有最新方法的优异性表现。守则见https://github.com/Wcxwxw/GARP。


Article 92

Title@2025-07-03 (4): Online Conformal Prediction with Efficiency Guarantees

Title: Online Conformal Prediction with Efficiency Guarantees Online-Konforme Vorhersage mit Effizienzgarantien 具有效率保障的在线非正式预测 2507.02496v1

Authors (1): Vaidehi Srinivas

We study the problem of conformal prediction in a novel online framework that directly optimizes efficiency. In our problem, we are given a target miscoverage rate $\alpha > 0$, and a time horizon $T$. On each day $t \le T$ an algorithm must output an interval $I_t \subseteq [0, 1]$, then a point $y_t \in [0, 1]$ is revealed. The goal of the algorithm is to achieve coverage, that is, $y_t \in I_t$ on (close to) a $(1 - \alpha)$-fraction of days, while maintaining efficiency, that is, minimizing the average volume (length) of the intervals played. This problem is an online analogue to the problem of constructing efficient confidence intervals. We study this problem over arbitrary and exchangeable (random order) input sequences. For exchangeable sequences, we show that it is possible to construct intervals that achieve coverage $(1 - \alpha) - o(1)$, while having length upper bounded by the best fixed interval that achieves coverage in hindsight. For arbitrary sequences however, we show that any algorithm that achieves a $\mu$-approximation in average length compared to the best fixed interval achieving coverage in hindsight, must make a multiplicative factor more mistakes than $\alpha T$, where the multiplicative factor depends on $\mu$ and the aspect ratio of the problem. Our main algorithmic result is a matching algorithm that can recover all Pareto-optimal settings of $\mu$ and number of mistakes. Furthermore, our algorithm is deterministic and therefore robust to an adaptive adversary. This gap between the exchangeable and arbitrary settings is in contrast to the classical online learning problem. In fact, we show that no single algorithm can simultaneously be Pareto-optimal for arbitrary sequences and optimal for exchangeable sequences. On the algorithmic side, we give an algorithm that achieves the near-optimal tradeoff between the two cases.

我们在一个直接优化效率的新在线框架中研究符合性预测问题。 在我们的问题中, 我们得到一个目标错误的覆盖率 $\ alpha > 0美元, 以及一个时间范围$T$。 在每天$t\le T$上, 一个算法必须输出一个间隔 $t\ subseteq [0, 1] 美元, 然后一个点$t\ t\ in [0, 1] 。 算法的目标是实现覆盖, 即 $y_ t\ t\ in I_ t$( 接近) 美元, 以( 接近) 美元( 1 - oppha) 的算法比率。 在保持效率的同时, 一个目标错误的错误是 $1 - 美元, 以最小值计算。 这个问题是一个在线的序列模拟, 以构建高效信任间隔的问题。 在任意和交换( 随机) 输入( 随机) 输入的顺序中, 我们展示的是, 有可能在( 1 -\ a) ma) r) 交易到(l) ral) rial rial rial rial deal deal deal ral 中, lax lax lax ex 。


Article 93

Title@2025-07-03 (4): Reliability-Adjusted Prioritized Experience Replay

Title: Reliability-Adjusted Prioritized Experience Replay Reliability-Adjusted Prioritized Experience Replay 调整了可靠性调整后确定优先经验重述 2506.18482v2

Authors (3): Leonard S. Pleiss, Tobias Sutter, Maximilian Schiffer

Experience replay enables data-efficient learning from past experiences in online reinforcement learning agents. Traditionally, experiences were sampled uniformly from a replay buffer, regardless of differences in experience-specific learning potential. In an effort to sample more efficiently, researchers introduced Prioritized Experience Replay (PER). In this paper, we propose an extension to PER by introducing a novel measure of temporal difference error reliability. We theoretically show that the resulting transition selection algorithm, Reliability-adjusted Prioritized Experience Replay (ReaPER), enables more efficient learning than PER. We further present empirical results showing that ReaPER outperforms PER across various environment types, including the Atari-10 benchmark.

经验重现有助于从网上强化学习机构以往的经验中获取数据效率高的学习经验。传统上,经验统一地从重现缓冲中取样,而不论具体经验的学习潜力不同。为了更有效地取样,研究人员采用了优先经验重现(PER)。在本文中,我们建议通过采用时间差误误可靠性的新尺度来扩展PER。我们理论上表明,由此产生的过渡选择算法Reforce-调整后重现优先经验重现(Reapper)比PER(Reapper)更有效率的学习。我们进一步介绍了经验结果,表明Reapper在包括Atari-10基准在内的各种环境类型中优于PER。


Article 94

Title@2025-07-03 (4): MC-INR: Efficient Encoding of Multivariate Scientific Simulation Data using Meta-Learning and Clustered Implicit Neural Representations

Title: MC-INR: Efficient Encoding of Multivariate Scientific Simulation Data using Meta-Learning and Clustered Implicit Neural Representations MC-INR: Effiziente Kodierung von multivariaten wissenschaftlichen Simulationsdaten mit Meta-Learning und clusterierten Impliziten Neuraldarstellungen MC-INR:利用元学习和集成隐隐性神经图示对多变量科学模拟数据进行有效编码 2507.02494v1

Authors (5): Hyunsoo Son, Jeonghyun Noh, Suemin Jeon, Chaoli Wang, Won-Ki Jeong

Implicit Neural Representations (INRs) are widely used to encode data as continuous functions, enabling the visualization of large-scale multivariate scientific simulation data with reduced memory usage. However, existing INR-based methods face three main limitations: (1) inflexible representation of complex structures, (2) primarily focusing on single-variable data, and (3) dependence on structured grids. Thus, their performance degrades when applied to complex real-world datasets. To address these limitations, we propose a novel neural network-based framework, MC-INR, which handles multivariate data on unstructured grids. It combines meta-learning and clustering to enable flexible encoding of complex structures. To further improve performance, we introduce a residual-based dynamic re-clustering mechanism that adaptively partitions clusters based on local error. We also propose a branched layer to leverage multivariate data through independent branches simultaneously. Experimental results demonstrate that MC-INR outperforms existing methods on scientific data encoding tasks.

隐性神经图示(INRs)被广泛用于将数据编码为连续功能,使大规模多变科学模拟数据具有可视化功能,减少记忆使用,然而,基于INR的现有方法面临三个主要限制:(1) 复杂结构的不灵活代表,(2) 主要侧重于单一可变数据,(3) 依赖结构化网格。因此,在应用到复杂的真实世界数据集时,其性能会下降。为了解决这些限制,我们提议了一个新的神经网络框架,即MC-INR,它处理非结构化网格上的多变性数据。它将元学习和集群结合起来,以便能够灵活地编码复杂的结构。为了进一步改进绩效,我们采用了基于残余的动态群集机制,根据地方错误进行适应性分配。我们还提议了一个分支层,通过独立的分支利用多变性数据。实验结果表明,MC-INR比科学数据编码任务的现有方法要好得多。


Article 95

Title@2025-07-03 (4): Deep Transfer Learning for Kidney Cancer Diagnosis

Title: Deep Transfer Learning for Kidney Cancer Diagnosis Deep Transfer Learning für Nierenkrebs-Diagnose 肾脏癌症诊断深入转移学习 2408.04318v2

Authors (8): Yassine Habchi, Hamza Kheddar, Yassine Himeur, Mohamed Chahine Ghanem, Abdelkrim Boukabou, Shadi Atalla, Wathiq Mansoor, Hussain Al-Ahmad

Incurable diseases continue to pose major challenges to global healthcare systems, with their prevalence shaped by lifestyle, economic, social, and genetic factors. Among these, kidney disease remains a critical global health issue, requiring ongoing research to improve early diagnosis and treatment. In recent years, deep learning (DL) has shown promise in medical imaging and diagnostics, driving significant progress in automatic kidney cancer (KC) detection. However, the success of DL models depends heavily on the availability of high-quality, domain-specific datasets, which are often limited and expensive to acquire. Moreover, DL models demand substantial computational power and storage, restricting their real-world clinical use. To overcome these barriers, transfer learning (TL) has emerged as an effective approach, enabling the reuse of pre-trained models from related domains to enhance KC diagnosis. This paper presents a comprehensive survey of DL-based TL frameworks for KC detection, systematically reviewing key methodologies, their advantages, and limitations, and analyzing their practical performance. It further discusses challenges in applying TL to medical imaging and highlights emerging trends that could influence future research. This review demonstrates the transformative role of TL in precision medicine, particularly oncology, by improving diagnostic accuracy, lowering computational demands, and supporting the integration of AI-powered tools in healthcare. The insights provided offer valuable guidance for researchers and practitioners, paving the way for future advances in KC diagnostics and personalized treatment strategies.

无可救药的疾病继续给全球保健系统带来重大挑战,其流行程度取决于生活方式、经济、社会和遗传因素,其中肾病仍然是一个重要的全球卫生问题,需要不断进行研究,以改进早期诊断和治疗;近年来,深入学习(DL)在医学成像和诊断方面显示出希望,推动自动肾癌检测取得重大进展;然而,DL模型的成功在很大程度上取决于能否获得高质量、具体领域的数据集,这些数据往往有限,而且获取费用昂贵;此外,DL模型要求大量的计算能力和储存,限制其真实世界临床用途;为克服这些障碍,转移学习(TL)已成为一种有效办法,使相关领域经过预先训练的模型得以再利用,以加强KC诊断;该文件对基于DL的肾癌检测TL框架进行了全面调查,系统地审查关键方法、其优点和局限性,并分析其实际业绩;它进一步讨论了在将TL应用到医疗成像和储存方面所面临的挑战,并着重指出可能影响未来研究的新趋势;为了克服这些障碍,转移学习(TL)已成为一种有效的方法,使相关领域经过培训的模型得以重新使用,从而改进了对AIC进行精确的诊断和诊断工具的先进性分析。


Article 96

Title@2025-07-03 (4): Learning Massive-scale Partial Correlation Networks in Clinical Multi-omics Studies with HP-ACCORD

Title: Learning Massive-scale Partial Correlation Networks in Clinical Multi-omics Studies with HP-ACCORD Learning Massive-scale Partial Correlation Networks in klinischen Multi-Omics-Studien mit HP-ACCORD 与HP-ACCORD在临床多工程学研究中学习大规模大规模局部关联网络 2412.11554v3

Authors (6): Sungdong Lee, Joshua Bang, Youngrae Kim, Hyungwon Choi, Sang-Yun Oh, Joong-Ho Won

Graphical model estimation from multi-omics data requires a balance between statistical estimation performance and computational scalability. We introduce a novel pseudolikelihood-based graphical model framework that reparameterizes the target precision matrix while preserving the sparsity pattern and estimates it by minimizing an $\ell_1$-penalized empirical risk based on a new loss function. The proposed estimator maintains estimation and selection consistency in various metrics under high-dimensional assumptions. The associated optimization problem allows for a provably fast computation algorithm using a novel operator-splitting approach and communication-avoiding distributed matrix multiplication. A high-performance computing implementation of our framework was tested using simulated data with up to one million variables, demonstrating complex dependency structures similar to those found in biological networks. Leveraging this scalability, we estimated a partial correlation network from a dual-omic liver cancer data set. The co-expression network estimated from the ultrahigh-dimensional data demonstrated superior specificity in prioritizing key transcription factors and co-activators by excluding the impact of epigenetic regulation, thereby highlighting the value of computational scalability in multi-omic data analysis.

从多组数据得出的图形模型估计要求统计估计性能和计算可缩放性之间的平衡。我们引入了一个新的假象基样图形模型框架,对目标精度矩阵进行重新校准,同时保持宽度模式,并根据新的损失函数,尽量减少一个$\ell_1$$1美元的统一经验风险来估计和估计。提议的估算员在高维假设下维持各种计量的估算和选择一致性。相关的优化问题使得能够使用新的操作员分裂法和通信避免分布式矩阵乘数来快速计算算法。我们框架的高性计算执行率是用最多100万变量的模拟数据测试的,显示了与生物网络中发现的相似的复杂依赖性结构。利用这一缩放性,我们估计了从一个双组的肝癌数据集中得出的部分相关性网络。从超高维数据中估算出的共同表达网络在确定关键校正系数和共活化因素的优先次序时表现出了高度的特异性,排除了后遗传调节的影响,从而突出了多组数据分析中的计算可缩性值。


Article 97

Title@2025-07-03 (4): Variational Kolmogorov-Arnold Network

Title: Variational Kolmogorov-Arnold Network Unterschiedliches Kolmogorov-Arnold-Netzwerk 变化科尔莫戈罗夫-阿诺尔德网络 2507.02466v1

Authors (3): Francesco Alesiani, Henrik Christiansen, Federico Errica

Kolmogorov Arnold Networks (KANs) are an emerging architecture for building machine learning models. KANs are based on the theoretical foundation of the Kolmogorov-Arnold Theorem and its expansions, which provide an exact representation of a multi-variate continuous bounded function as the composition of a limited number of univariate continuous functions. While such theoretical results are powerful, their use as a representation learning alternative to a multi-layer perceptron (MLP) hinges on the ad-hoc choice of the number of bases modeling each of the univariate functions. In this work, we show how to address this problem by adaptively learning a potentially infinite number of bases for each univariate function during training. We therefore model the problem as a variational inference optimization problem. Our proposal, called InfinityKAN, which uses backpropagation, extends the potential applicability of KANs by treating an important hyperparameter as part of the learning process.

Kolmogorov Arnold Networks (KANs) 是建立机器学习模型的新兴架构。 KANs 是基于Kolmogorov-Arnold Theorem及其扩展的理论基础, 它提供了多种变量连续封闭功能的精确表达方式, 作为数量有限的单体连续功能的构成。 虽然这种理论结果是强大的, 但它们作为多层次透视器( MLP) 的代言学习的替代方法, 取决于对每个单体功能建模的基数的随机选择。 在这项工作中, 我们展示了如何通过适应性地学习每个单体功能的潜在无限的基数来解决这一问题。 因此, 我们把这个问题作为变异推力优化问题模型。 我们的提案称为InfinityKAN, 使用反向调整, 将重要的超参数作为学习过程的一部分, 扩大了KANs的潜在适用性。


Article 98

Title@2025-07-03 (4): Quantum-enhanced causal discovery for a small number of samples

Title: Quantum-enhanced causal discovery for a small number of samples Quantenverstärkte kausale Entdeckung für eine kleine Anzahl von Proben 少数样本的量吨量强化因果发现 2501.05007v2

Authors (6): Yu Terada, Ken Arai, Yu Tanaka, Yota Maeda, Hiroshi Ueno, Hiroyuki Tezuka

The discovery of causal relations from observed data has attracted significant interest from disciplines such as economics, social sciences, and biology. In practical applications, considerable knowledge of the underlying systems is often unavailable, and real data are usually associated with nonlinear causal structures, which makes the direct use of most conventional causality analysis methods difficult. This study proposes a novel quantum Peter-Clark (qPC) algorithm for causal discovery that does not require any assumptions about the underlying model structures. Based on conditional independence tests in a class of reproducing kernel Hilbert spaces characterized by quantum circuits, the proposed algorithm can explore causal relations from the observed data drawn from arbitrary distributions. We conducted systematic experiments on fundamental graphs of causal structures, demonstrating that the qPC algorithm exhibits better performance, particularly with smaller sample sizes compared to its classical counterpart. Furthermore, we proposed a novel optimization approach based on Kernel Target Alignment (KTA) for determining hyperparameters of quantum kernels. This method effectively reduced the risk of false positives in causal discovery, enabling more reliable inference. Our theoretical and experimental results demonstrate that the quantum algorithm can empower classical algorithms for accurate inference in causal discovery, supporting them in regimes where classical algorithms typically fail. In addition, the effectiveness of this method was validated using the datasets on Boston housing prices, heart disease, and biological signaling systems as real-world applications. These findings highlight the potential of quantum-based causal discovery methods in addressing practical challenges, particularly in small-sample scenarios, where traditional approaches have shown significant limitations.

从观察到的数据中发现的因果关系引起了经济学、社会科学和生物学等学科的极大兴趣。在实际应用中,对基本系统的大量知识往往缺乏,而实际数据通常与非线性因果关系结构相关,这使得直接使用最传统的因果关系分析方法成为困难。本研究报告提出了一个新的量子Peter-Clark(qPC)因果发现算法,这一算法不需要对基本模型结构作任何假设。根据在以量子回路为特点的再生产内核希尔伯特空间类别中进行的有条件独立测试,拟议的算法可以探索从任意分布的观测数据中得出的因果关系。我们对因果关系结构的基本图表进行了系统化实验,表明qPC算法表现得更好,特别是与其古典对应方相比,样本规模较小。此外,我们提出了基于Kernel 目标校准(QTA)的因果发现新算法,这一方法有效地减少了因果发现中假正数的风险,使得更可靠的推论。我们的理论和实验结果表明,小量算法可以增强因果性算法的因果性算算法,在典型的生物级发现中,典型的因果性研究中,其结果显示其误判算法显示,其误差率性分析方法显示,其结果的误差值是。在标准。


Article 99

Title@2025-07-03 (4): Universal Collection of Euclidean Invariants between Pairs of Position-Orientations

Title: Universal Collection of Euclidean Invariants between Pairs of Position-Orientations Universelle Sammlung von Euklidischen Invarianten zwischen Paaren von Positionsorientierungen 位置对等位置对等之间通用的欧立地异异异物收集 2504.03299v2

Authors (3): Gijs Bellaard, Bart M. N. Smets, Remco Duits

Euclidean E(3) equivariant neural networks that employ scalar fields on position-orientation space M(3) have been effectively applied to tasks such as predicting molecular dynamics and properties. To perform equivariant convolutional-like operations in these architectures one needs Euclidean invariant kernels on M(3) x M(3). In practice, a handcrafted collection of invariants is selected, and this collection is then fed into multilayer perceptrons to parametrize the kernels. We rigorously describe an optimal collection of 4 smooth scalar invariants on the whole of M(3) x M(3). With optimal we mean that the collection is independent and universal, meaning that all invariants are pertinent, and any invariant kernel is a function of them. We evaluate two collections of invariants, one universal and one not, using the PONITA neural network architecture. Our experiments show that using a collection of invariants that is universal positively impacts the accuracy of PONITA significantly.

Euclidean E(3) E(3) 等离子神经网络在位置导向空间上使用斜体字段 M(3) 已被有效地应用于预测分子动态和特性等任务。在这些结构中,为了进行等同进化操作,需要在 M(3) x M(3) 上进行异变内核。在实践中,选择了人工制作的异变体集,然后将这一集放入多层透视器,以对内核进行对称。我们严格描述全M(3) x M(3) 的4个平滑变异体的最佳集集。我们最理想的意思是,该集是独立和普遍的,这意味着所有异变体都是相关的,任何异性内核是它们的函数。我们用 PONITA 神经网络结构评估两种异变体集,一个是普遍的,一个不是的。我们的实验表明,使用通用的异变体集对 PONITA的准确性产生了显著影响。


Article 100

Title@2025-07-03 (4): Roto-Translation Invariant Metrics on Position-Orientation Space

Title: Roto-Translation Invariant Metrics on Position-Orientation Space Roto-Translation Invariante Metrics auf Positions-Orientierungsraum 定位空间的旋转变换变化计量器 2504.03309v2

Authors (2): Gijs Bellaard, Bart M. N. Smets

Riemannian metrics on the position-orientation space M(3) that are roto-translation group SE(3) invariant play a key role in image analysis tasks like enhancement, denoising, and segmentation. These metrics enable roto-translation equivariant algorithms, with the associated Riemannian distance often used in implementation. However, computing the Riemannian distance is costly, which makes it unsuitable in situations where constant recomputation is needed. We propose the mav (minimal angular velocity) distance, defined as the Riemannian length of a geometrically meaningful curve, as a practical alternative. We see an application of the mav distance in geometric deep learning. Namely, neural networks architectures such as PONITA, relies on geometric invariants to create their roto-translation equivariant model. The mav distance offers a trainable invariant, with the parameters that determine the Riemannian metric acting as learnable weights. In this paper we: 1) classify and parametrize all SE(3) invariant metrics on M(3), 2) describes how to efficiently calculate the mav distance, and 3) investigate if including the mav distance within PONITA can positively impact its accuracy in predicting molecular properties.

位置方向空间 M(3) 上的里曼尼测量仪是SE(3) 变异的旋转翻译组,在图像分析任务中发挥着关键作用,如增强、去除、分解等。这些测量仪使旋转翻转等等变异算法能够与相联的里曼尼距离相适应,但计算里曼距离成本高昂,因此在需要不断进行回译的情况下不适宜。我们提议将马夫(最小角速度)距离定义为具有几何意义的曲线的里曼长度,作为一种实用的替代方法。我们看到在几何深学习中应用马夫距离。即神经网络结构,如PONNITA,依靠几何几何变异性来创建其转转译等变异模型。马夫距离提供了一种可训练的变性参数,用以确定里曼度测量仪的参数作为可学习的重量。在这个文件中,我们:1) 分类和对应所有SE(3) 的里格度度度度测量指标,包括M(3) 3) 预测其距离内如何精确地计算出其距离,如果能正确计算出MTA 内磁度,那么精确度的内测测测测算出其内的影响。


Article 101

Title@2025-07-03 (4): Red grape detection with accelerated artificial neural networks in the FPGA’s programmable logic

Title: Red grape detection with accelerated artificial neural networks in the FPGA’s programmable logic Rote Traubenerkennung mit beschleunigten künstlichen neuronalen Netzwerken in der programmierbaren Logik des FPGA FPGA的可编程逻辑的红葡萄探测与加速人工神经网络 2507.02443v1

Authors (5): Sandro Costa Magalhães, Marco Almeida, Filipe Neves dos Santos, António Paulo Moreira, Jorge Dias

Robots usually slow down for canning to detect objects while moving. Additionally, the robot’s camera is configured with a low framerate to track the velocity of the detection algorithms. This would be constrained while executing tasks and exploring, making robots increase the task execution time. AMD has developed the Vitis-AI framework to deploy detection algorithms into FPGAs. However, this tool does not fully use the FPGAs’ PL. In this work, we use the FINN architecture to deploy three ANNs, MobileNet v1 with 4-bit quantisation, CNV with 2-bit quantisation, and CNV with 1-bit quantisation (BNN), inside an FPGA’s PL. The models were trained on the RG2C dataset. This is a self-acquired dataset released in open access. MobileNet v1 performed better, reaching a success rate of 98 % and an inference speed of 6611 FPS. In this work, we proved that we can use FPGAs to speed up ANNs and make them suitable for attention mechanisms.

机器人在移动时通常会慢下来, 以便让罐头在移动时检测物体 。 此外, 机器人的相机配置低框架速率, 以跟踪检测算法的速度 。 这将在任务执行和探索时受到限制, 使机器人增加任务执行时间 。 AMD 开发了 Vitis- AI 框架, 将检测算法部署到 FPGAs 中。 但是, 这个工具没有完全使用 FPGAs PL 。 在这项工作中, 我们使用 FINN 结构来部署 3 个 ANN、 MobilNet v1 和 4 位四位方位数的 ANN、 CPV 和 CNV 和 1 位四位量化 (BNNN) 。 这些模型在 FPGA 的 PL 中被训练为 RG2C 数据集 。 这是在开放访问中释放的自取数据集 。 MOPNet v1 效果更好, 达到 98 和 611 FPS 的推断速度 。 在这项工作中, 我们证明我们可以使用 FPGA 来加速加速 。


Article 102

Title@2025-07-03 (4): A Deep Learning Powered Numerical Relativity Surrogate for Binary Black Hole Waveforms

Title: A Deep Learning Powered Numerical Relativity Surrogate for Binary Black Hole Waveforms Eine tief lernfähige numerische Relativitätsüberlagerung für Binary Black Hole Waveforms 二进制黑洞波形的深学习动力数字相对相对性替代工具 2412.06946v2

Authors (9): Osvaldo Gramaxo Freitas, Anastasios Theodoropoulos, Nino Villanueva, Tiago Fernandes, Solange Nunes, José A. Font, Antonio Onofre, Alejandro Torres-Forné, José D. Martin-Guerrero

Gravitational-wave approximants are essential for gravitational-wave astronomy, allowing the coverage binary black hole parameter space for inference or match filtering without costly numerical relativity (NR) simulations, but generally trading some accuracy for computational efficiency. To reduce this trade-off, NR surrogate models can be constructed using interpolation within NR waveform space. We present a 2-stage training approach for neural network-based NR surrogate models. Initially trained on approximant-generated waveforms and then fine-tuned with NR data, these dual-stage artificial neural surrogate (\texttt{DANSur}) models offer rapid and competitively accurate waveform generation, generating millions in under 20ms on a GPU while keeping mean mismatches with NR around $10^{-4}$. Implemented in the \textsc{bilby} framework, we show they can be used for parameter estimation tasks.

引力波相近器对于引力波波天文学至关重要,允许覆盖的双黑洞参数参数空间用于在没有昂贵的数值相对性模拟的情况下进行推导或匹配过滤,但一般地以某种精确度进行计算效率交易。为了减少这种权衡,可以使用NR波形空间的内插来建造NR代孕模型。我们为以神经网络为基础的NR代孕模型提出了一个两阶段培训方法。最初对准氧化生成的波形进行了培训,然后对NR数据进行了微调调整,这些双级人造神经外形模型(\ textt{danSur})提供了快速和有竞争力的精确波形生成,在GPUP上产生不到20米的百万个波形,同时与NR保持平均的不匹配值大约10-4美元。我们在\ textsc{bilby}框架中执行,我们显示这些模型可用于参数评估任务。


Article 103

Title@2025-07-03 (4): Unsupervised Cognition

Title: Unsupervised Cognition Unüberwachte Kognition 未受监督的认知 2409.18624v3

Authors (5): Alfredo Ibias, Hector Antona, Guillem Ramirez-Miranda, Enric Guinovart, Eduard Alarcon

Unsupervised learning methods have a soft inspiration in cognition models. To this day, the most successful unsupervised learning methods revolve around clustering samples in a mathematical space. In this paper we propose a primitive-based, unsupervised learning approach for decision-making inspired by a novel cognition framework. This representation-centric approach models the input space constructively as a distributed hierarchical structure in an input-agnostic way. We compared our approach with both current state-of-the-art unsupervised learning classification, with current state-of-the-art small and incomplete datasets classification, and with current state-of-the-art cancer type classification. We show how our proposal outperforms previous state-of-the-art. We also evaluate some cognition-like properties of our proposal where it not only outperforms the compared algorithms (even supervised learning ones), but it also shows a different, more cognition-like, behaviour.

不受监督的学习方法在认知模型中具有软的灵感。 直到今天, 最成功的不受监督的学习方法围绕数学空间的组合样本进行。 在本文中, 我们提出一种原始的、不受监督的学习方法, 用于受创新的认知框架启发的决策。 这种以代表为中心的方法将输入空间建为具有建设性的空间, 以输入- 不可知性的方式作为分布性的等级结构。 我们比较了我们的方法, 与目前最先进的不受监督的学习分类, 与目前最先进的小型和不完全的数据集分类, 以及目前最先进的癌症类型分类。 我们展示了我们的提案如何超越了以前的状态。 我们还评估了我们提案中的一些类似认知的属性, 其中不仅超越了比较的算法( 即使是受监督的学习算法) , 而且还显示了一种不同、 更认知相似的行为 。


Article 104

Title@2025-07-03 (4): Anatomical Foundation Models for Brain MRIs

Title: Anatomical Foundation Models for Brain MRIs Anatomische Grundlagenmodelle für Hirn-MRTs 脑吸管机模型解剖基础模型 2408.07079v4

Authors (4): Carlo Alberto Barbano, Matteo Brunello, Benoit Dufumier, Marco Grangetto

Deep Learning (DL) in neuroimaging has become increasingly relevant for detecting neurological conditions and neurodegenerative disorders. One of the most predominant biomarkers in neuroimaging is represented by brain age, which has been shown to be a good indicator for different conditions, such as Alzheimer’s Disease. Using brain age for weakly supervised pre-training of DL models in transfer learning settings has also recently shown promising results, especially when dealing with data scarcity of different conditions. On the other hand, anatomical information of brain MRIs (e.g. cortical thickness) can provide important information for learning good representations that can be transferred to many downstream tasks. In this work, we propose AnatCL, an anatomical foundation model for brain MRIs that i.) leverages anatomical information in a weakly contrastive learning approach, and ii.) achieves state-of-the-art performances across many different downstream tasks. To validate our approach we consider 12 different downstream tasks for the diagnosis of different conditions such as Alzheimer’s Disease, autism spectrum disorder, and schizophrenia. Furthermore, we also target the prediction of 10 different clinical assessment scores using structural MRI data. Our findings show that incorporating anatomical information during pre-training leads to more robust and generalizable representations. Pre-trained models can be found at: https://github.com/EIDOSLAB/AnatCL.

神经成像中的深度学习(DL)对神经成像中的神经条件和神经退化性失调的探测越来越重要。神经成像中最主要的生物标志之一是大脑年龄,这已证明是各种条件(如阿尔茨海默氏病)的良好指标。利用大脑年龄在转移学习环境中对DL模型进行监管不力的预培训,最近也显示出有希望的结果,特别是在处理不同条件的数据稀缺时。另一方面,大脑MRIS的解剖信息(例如皮层厚度)可以提供重要信息,用于学习可以转移到许多下游任务的良好表现。在这项工作中,我们提议AmatCL,即大脑MMSIs的解剖基础模型(i),以较弱的对比性学习方法利用解剖学信息,以及(ii)在很多不同的下游任务中实现最新水平表现。为了验证我们的方法,我们考虑12个不同的下游任务,用于诊断诸如阿尔茨海默氏病、自闭谱障碍和先质-内分泌等等不同条件。此外,我们提议AATCLCLCL-I-IDREnimanimatial 的模型,我们还利用了10项的结构性分析结果,以更稳性的预测,在一般数据分析中发现10项前的临床分析结果。我们在一般数据分析。我们找到了-ADLILIBI-ADR-ADR-ADIB-ADI-A中发现。我们发现的临床-ADADR-ADAR)。


Article 105

Title@2025-07-03 (4): The Artificial Scientist – in-transit Machine Learning of Plasma Simulations

Title: The Artificial Scientist – in-transit Machine Learning of Plasma Simulations Der Künstliche Wissenschaftler – in-transit maschinelles Lernen von Plasmasimulationen 人造科学家 – – Plasma模拟模拟的中转机器学习 2501.03383v3

Authors (22): Jeffrey Kelling, Vicente Bolea, Michael Bussmann, Ankush Checkervarty, Alexander Debus, Jan Ebert, Greg Eisenhauer, Vineeth Gutta, Stefan Kesselheim, Scott Klasky, Vedhas Pandit, Richard Pausch, Norbert Podhorszki, Franz Poschel, David Rogers, Jeyhun Rustamov, Steve Schmerler, Ulrich Schramm, Klaus Steiniger, Rene Widera, Anna Willmann, Sunita Chandrasekaran

Increasing HPC cluster sizes and large-scale simulations that produce petabytes of data per run, create massive IO and storage challenges for analysis. Deep learning-based techniques, in particular, make use of these amounts of domain data to extract patterns that help build scientific understanding. Here, we demonstrate a streaming workflow in which simulation data is streamed directly to a machine-learning (ML) framework, circumventing the file system bottleneck. Data is transformed in transit, asynchronously to the simulation and the training of the model. With the presented workflow, data operations can be performed in common and easy-to-use programming languages, freeing the application user from adapting the application output routines. As a proof-of-concept we consider a GPU accelerated particle-in-cell (PIConGPU) simulation of the Kelvin- Helmholtz instability (KHI). We employ experience replay to avoid catastrophic forgetting in learning from this non-steady process in a continual manner. We detail challenges addressed while porting and scaling to Frontier exascale system.

深度学习技术,特别是利用这些数量的域内数据来提取有助于建立科学理解的模式。在这里,我们展示了一个流流工作流程,模拟数据直接流到一个机器学习(ML)框架,绕过文件系统瓶颈。数据在中转过程中不同步地转换为模拟和培训模型。随着所提供的工作流程,数据操作可以用通用和易用的编程语言进行,使应用程序用户不必适应应用程序输出常规。作为证据,我们考虑对Kelvin-Helmholtz 进行GPU加速细胞中的粒子模拟(PIConGPU),我们利用经验再玩,避免在从非稳定的进程中不断学习过程中灾难性地遗忘。我们详细介绍了在移植和扩展到前沿外观系统时遇到的挑战。


Article 106

Title@2025-07-03 (4): Is Complex Query Answering Really Complex?

Title: Is Complex Query Answering Really Complex? Ist die Antwort auf komplexe Abfragen wirklich komplex? 复杂查询回答是否真的复杂? 2410.12537v3

Authors (7): Cosimo Gregucci, Bo Xiong, Daniel Hernandez, Lorenzo Loconte, Pasquale Minervini, Steffen Staab, Antonio Vergari

Complex query answering (CQA) on knowledge graphs (KGs) is gaining momentum as a challenging reasoning task. In this paper, we show that the current benchmarks for CQA might not be as complex as we think, as the way they are built distorts our perception of progress in this field. For example, we find that in these benchmarks, most queries (up to 98% for some query types) can be reduced to simpler problems, e.g., link prediction, where only one link needs to be predicted. The performance of state-of-the-art CQA models decreases significantly when such models are evaluated on queries that cannot be reduced to easier types. Thus, we propose a set of more challenging benchmarks composed of queries that require models to reason over multiple hops and better reflect the construction of real-world KGs. In a systematic empirical investigation, the new benchmarks show that current methods leave much to be desired from current CQA methods.

关于知识图表(KGs)的复杂解答(CQA)与具有挑战性的推理任务一样,正在获得势头。在本文中,我们表明目前CQA的基准可能不会像我们想象的那样复杂,因为这些基准的建立方式扭曲了我们对该领域进展的看法。例如,我们发现在这些基准中,大多数查询(某些查询类型高达98%)可以简化为简单的问题,例如,链接预测,只需要预测一个链接。在对无法简化的查询类型进行评估时,最先进的CQA模型的性能会显著下降。因此,我们提出了一套更具挑战性的基准,其中包括需要模型来解释多重跳跃和更好地反映现实世界KGs的构建。在系统化的经验性调查中,新的基准显示,目前的方法比目前的CQA方法要好得多。


Article 107

Title@2025-07-03 (4): Determination Of Structural Cracks Using Deep Learning Frameworks

Title: Determination Of Structural Cracks Using Deep Learning Frameworks Bestimmung von strukturellen Rissen mittels Deep Learning Frameworks 利用深学习框架确定结构裂缝 2507.02416v1

Authors (3): Subhasis Dasgupta, Jaydip Sen, Tuhina Halder

Structural crack detection is a critical task for public safety as it helps in preventing potential structural failures that could endanger lives. Manual detection by inexperienced personnel can be slow, inconsistent, and prone to human error, which may compromise the reliability of assessments. The current study addresses these challenges by introducing a novel deep-learning architecture designed to enhance the accuracy and efficiency of structural crack detection. In this research, various configurations of residual U-Net models were utilized. These models, due to their robustness in capturing fine details, were further integrated into an ensemble with a meta-model comprising convolutional blocks. This unique combination aimed to boost prediction efficiency beyond what individual models could achieve. The ensemble’s performance was evaluated against well-established architectures such as SegNet and the traditional U-Net. Results demonstrated that the residual U-Net models outperformed their predecessors, particularly with low-resolution imagery, and the ensemble model exceeded the performance of individual models, proving it as the most effective. The assessment was based on the Intersection over Union (IoU) metric and DICE coefficient. The ensemble model achieved the highest scores, signifying superior accuracy. This advancement suggests way for more reliable automated systems in structural defects monitoring tasks.

对公众安全来说,结构裂缝检测是一项至关重要的任务,因为它有助于防止可能危及生命的潜在结构故障。没有经验的人员人工检测可能缓慢、不连贯,容易发生人为错误,这可能会损害评估的可靠性。当前研究通过引入新的深层次学习结构来应对这些挑战,目的是提高结构裂缝检测的准确性和效率。在这项研究中,利用了余下U-Net模型的各种配置,这些模型由于在捕捉细细节方面表现强劲,被进一步纳入一个由共产集团组成的元模型的组合中。这种独特的组合旨在提高预测效率,超出单个模型所能达到的程度。对共同体的性能进行了评估,根据诸如SegNet和传统的U-Net等成熟结构对共同体的性能进行了评估。结果显示,残余U-Net模型比其前身,特别是低分辨率图像的性能要强,组合模型超过了单个模型的性能,证明它是最有效的。评估基于Intercle over Culation(IoU) 指标和 DICE 系数。该元模型实现了最高分级的成绩,表明在结构上更精确度的改进。


Article 108

Title@2025-07-03 (4): ForceGrip: Reference-Free Curriculum Learning for Realistic Grip Force Control in VR Hand Manipulation

Title: ForceGrip: Reference-Free Curriculum Learning for Realistic Grip Force Control in VR Hand Manipulation ForceGrip: Reference-Free Curriculum Learning for Realistic Grip Force Control in VR Hand Manipulation ForgGrip: VR 手动操纵中现实力力控的无参考课程学习 2503.08061v4

Authors (6): DongHeun Han, Byungmin Kim, RoUn Lee, KyeongMin Kim, Hyoseok Hwang, HyeongYeop Kang

Realistic Hand manipulation is a key component of immersive virtual reality (VR), yet existing methods often rely on kinematic approach or motion-capture datasets that omit crucial physical attributes such as contact forces and finger torques. Consequently, these approaches prioritize tight, one-size-fits-all grips rather than reflecting users’ intended force levels. We present ForceGrip, a deep learning agent that synthesizes realistic hand manipulation motions, faithfully reflecting the user’s grip force intention. Instead of mimicking predefined motion datasets, ForceGrip uses generated training scenarios-randomizing object shapes, wrist movements, and trigger input flows-to challenge the agent with a broad spectrum of physical interactions. To effectively learn from these complex tasks, we employ a three-phase curriculum learning framework comprising Finger Positioning, Intention Adaptation, and Dynamic Stabilization. This progressive strategy ensures stable hand-object contact, adaptive force control based on user inputs, and robust handling under dynamic conditions. Additionally, a proximity reward function enhances natural finger motions and accelerates training convergence. Quantitative and qualitative evaluations reveal ForceGrip’s superior force controllability and plausibility compared to state-of-the-art methods. Demo videos are available as supplementary material and the code is provided at https://han-dongheun.github.io/ForceGrip.

现实手操纵是隐蔽虚拟现实的一个关键组成部分,然而,现有方法往往依赖运动方法或运动抓取数据集,这些数据集忽略了关键的物理特征,例如接触力和手指火把等关键物理特征。因此,这些方法优先考虑紧凑、一刀切的紧握,而不是反映用户预期的武力水平。我们提出ForceGrip,这是一个深层次的学习代理,它综合了现实的手动操纵动作,忠实地反映了用户的握力意图。除了模仿预先定义的运动数据集外,ForceGrip利用生成了培训假想方案,调整物体形状、手腕运动和触发输入流,以广泛物理互动的方式挑战代理人。为了有效地学习这些复杂的任务,我们采用了三阶段的课程学习框架,包括手指定位、自用适应和动态稳定。这一渐进式战略确保了稳定的手脚接触、基于用户投入的适应性武力控制,以及在动态条件下的严格处理。此外,近距离奖励功能会增强自然手指动作,加速培训的趋同。定量和定性评价显示Forgriphe-ripriphe-immoditionalfortial-deflifal-de-defolvioltitude-de-de-folty-folty-fortitution-folty-forlence


Article 109

Title@2025-07-03 (4): EquiTabPFN: A Target-Permutation Equivariant Prior Fitted Networks

Title: EquiTabPFN: A Target-Permutation Equivariant Prior Fitted Networks EquiTabPFN: Eine Ziel-Permutation Equivariant Prior Fitted Networks EquiTabPFN: 目标-变异等同先前适合的网络 2502.06684v2

Authors (3): Michael Arbel, David Salinas, Frank Hutter

Recent foundational models for tabular data, such as TabPFN, excel at adapting to new tasks via in-context learning, but remain constrained to a fixed, pre-defined number of target dimensions-often necessitating costly ensembling strategies. We trace this constraint to a deeper architectural shortcoming: these models lack target equivariance, so that permuting target dimension orderings alters their predictions. This deficiency gives rise to an irreducible “equivariance gap”, an error term that introduces instability in predictions. We eliminate this gap by designing a fully target-equivariant architecture-ensuring permutation invariance via equivariant encoders, decoders, and a bi-attention mechanism. Empirical evaluation on standard classification benchmarks shows that, on datasets with more classes than those seen during pre-training, our model matches or surpasses existing methods while incurring lower computational overhead.

TabPFN等最近表格数据的基本模型通过理论内学习适应新的任务,非常擅长于适应新的任务,但仍然受制于固定的、预先确定的目标维度数量,往往需要花费昂贵的组合战略。我们将这一制约因素追溯到更深的建筑缺陷:这些模型缺乏目标等同性,因此目标维度的排列改变其预测。这一缺陷导致一个不可减少的“差差”的错误术语,在预测中造成不稳定。我们通过equivariant 编码器、解码器和双向注意机制设计一个完全的目标-等同结构确保变异性,从而消除了这一差距。关于标准分类基准的实证评估表明,在比培训前所见的更多等级的数据集上,我们的模型匹配或超过现有的方法,同时导致较低的计算间接费用。


Article 110

Title@2025-07-03 (4): Significativity Indices for Agreement Values

Title: Significativity Indices for Agreement Values Signifikativitätsindizes für Vereinbarungswerte 协议价值的量化指数 2504.15325v2

Authors (4): Alberto Casagrande, Francesco Fabris, Rossano Girometti, Roberto Pagliarini

Agreement measures, such as Cohen’s kappa or intraclass correlation, gauge the matching between two or more classifiers. They are used in a wide range of contexts from medicine, where they evaluate the effectiveness of medical treatments and clinical trials, to artificial intelligence, where they can quantify the approximation due to the reduction of a classifier. The consistency of different classifiers to a golden standard can be compared simply by using the order induced by their agreement measure with respect to the golden standard itself. Nevertheless, labelling an approach as good or bad exclusively by using the value of an agreement measure requires a scale or a significativity index. Some quality scales have been proposed in the literature for Cohen’s kappa, but they are mainly na"ive, and their boundaries are arbitrary. This work proposes a general approach to evaluate the significativity of any agreement value between two classifiers and introduces two significativity indices: one dealing with finite data sets, the other one handling classification probability distributions. Moreover, this manuscript addresses the computational challenges of evaluating such indices and proposes some efficient algorithms for their evaluation.

协议措施,如 Cohen 的 kappa 或 类内关联, 测量两个或两个以上分类器之间的匹配。 它们被广泛使用, 从医学, 评估医学和临床试验的效果, 到人工智能, 他们可以量化近似值, 因为分类器减少。 不同的分类器与黄金标准的一致性, 只需使用他们协议措施在黄金标准本身方面引起的顺序即可比较。 然而, 仅仅使用协议措施的价值将一种方法标为好或坏, 需要使用一个比例尺或符号性指数。 文献中为 Cohen 的 kappa 提出了一些质量尺度, 但这些尺度主要是天性, 其界限是任意的。 这项工作提出了一个总体方法, 来评估两个分类器之间任何协议值的符号性, 并引入两个符号性指数: 一个处理有限的数据集, 另一个处理分类概率分布。 此外, 这份手稿还述及评估这些指数的计算挑战, 并为评估这些指数提出一些高效的算法 。


Article 111

Title@2025-07-03 (4): S2FGL: Spatial Spectral Federated Graph Learning

Title: S2FGL: Spatial Spectral Federated Graph Learning S2FGL: Raumspektrales Federiertes Graphenlernen S2FGL: 空间光谱联邦图表学习 2507.02409v1

Authors (6): Zihan Tan, Suyuan Huang, Guancheng Wan, Wenke Huang, He Li, Mang Ye

Federated Graph Learning (FGL) combines the privacy-preserving capabilities of federated learning (FL) with the strong graph modeling capability of Graph Neural Networks (GNNs). Current research addresses subgraph-FL only from the structural perspective, neglecting the propagation of graph signals on spatial and spectral domains of the structure. From a spatial perspective, subgraph-FL introduces edge disconnections between clients, leading to disruptions in label signals and a degradation in the class knowledge of the global GNN. From a spectral perspective, spectral heterogeneity causes inconsistencies in signal frequencies across subgraphs, which makes local GNNs overfit the local signal propagation schemes. As a result, spectral client drifts occur, undermining global generalizability. To tackle the challenges, we propose a global knowledge repository to mitigate label signal disruption and a frequency alignment to address spectral client drifts. The combination of spatial and spectral strategies forms our framework S2FGL. Extensive experiments on multiple datasets demonstrate the superiority of S2FGL. The code is available at https://github.com/Wonder7racer/S2FGL.git.

联邦图表学习(FGL)将联邦学习(FL)的隐私保护能力与图形神经网络(GNNS)的强烈图形模型模型能力相结合。当前的研究仅从结构角度处理子光子FL,忽视了该结构空间和光谱域图信号的传播。从空间角度看,子光子-FL引入客户之间的边缘断裂,导致标签信号中断和全球GN的班级知识退化。从光谱角度看,光谱异质导致各子集的信号频率不一致,使得本地GNN超出本地信号传播计划的适宜性。结果,光谱客户的漂移会发生,破坏全球通用性。为了应对挑战,我们提议建立一个全球知识库,以减少标签信号中断和频率调整,以应对光谱客户的漂移。空间和光谱战略的结合构成我们S2FGL.关于多个数据集的广泛实验的框架,显示了S2FGL的优越性。代码见https://github.com/Wenderassar/S2FL.F。


Article 112

Title@2025-07-03 (4): Attention-based clustering

Title: Attention-based clustering Aufmerksamkeitsbasiertes Clustering 以关注为主的集群集群 2505.13112v2

Authors (3): Rodrigo Maulen-Soto, Claire Boyer, Pierre Marion

Transformers have emerged as a powerful neural network architecture capable of tackling a wide range of learning tasks. In this work, we provide a theoretical analysis of their ability to automatically extract structure from data in an unsupervised setting. In particular, we demonstrate their suitability for clustering when the input data is generated from a Gaussian mixture model. To this end, we study a simplified two-head attention layer and define a population risk whose minimization with unlabeled data drives the head parameters to align with the true mixture centroids. This phenomenon highlights the ability of attention-based layers to capture underlying distributional structure. We further examine an attention layer with key, query, and value matrices fixed to the identity, and show that, even without any trainable parameters, it can perform in-context quantization, revealing the surprising capacity of transformer-based methods to adapt dynamically to input-specific distributions.

变异器已经成为一个强大的神经网络架构,能够应对广泛的学习任务。 在这项工作中,我们提供了对其在无人监督的环境中自动从数据中提取结构结构的能力的理论分析。 特别是, 当输入数据来自高斯混合模型时, 我们证明它们适合分组。 为此, 我们研究一个简化的双头关注层, 并定义一个人口风险, 其使用无标签数据最小化会驱动头参数与真正的混合类固醇相匹配。 这一现象凸显了基于关注的层捕捉基本分布结构的能力。 我们进一步检查一个关注层, 其关键、 查询和价值矩阵固定在身份上, 并显示即使没有任何可培训的参数, 它也可以在文本中进行量化, 揭示基于变异器方法的惊人能力, 以动态方式适应特定投入分布。


Article 113

Title@2025-07-03 (4): Benchmarking Akan ASR Models Across Domain-Specific Datasets: A Comparative Evaluation of Performance, Scalability, and Adaptability

Title: Benchmarking Akan ASR Models Across Domain-Specific Datasets: A Comparative Evaluation of Performance, Scalability, and Adaptability Benchmarking Akan ASR-Modelle über Domain-spezifische Datensätze: Eine vergleichende Bewertung von Leistung, Skalierbarkeit und Anpassungsfähigkeit 确定Akan ASR模型基准的全域具体数据集:业绩比较评价、可缩放性和可调适性 2507.02407v1

Authors (8): Mark Atta Mensah, Isaac Wiafe, Akon Ekpezu, Justice Kwame Appati, Jamal-Deen Abdulai, Akosua Nyarkoa Wiafe-Akenten, Frank Ernest Yeboah, Gifty Odame

Most existing automatic speech recognition (ASR) research evaluate models using in-domain datasets. However, they seldom evaluate how they generalize across diverse speech contexts. This study addresses this gap by benchmarking seven Akan ASR models built on transformer architectures, such as Whisper and Wav2Vec2, using four Akan speech corpora to determine their performance. These datasets encompass various domains, including culturally relevant image descriptions, informal conversations, biblical scripture readings, and spontaneous financial dialogues. A comparison of the word error rate and character error rate highlighted domain dependency, with models performing optimally only within their training domains while showing marked accuracy degradation in mismatched scenarios. This study also identified distinct error behaviors between the Whisper and Wav2Vec2 architectures. Whereas fine-tuned Whisper Akan models led to more fluent but potentially misleading transcription errors, Wav2Vec2 produced more obvious yet less interpretable outputs when encountering unfamiliar inputs. This trade-off between readability and transparency in ASR errors should be considered when selecting architectures for low-resource language (LRL) applications. These findings highlight the need for targeted domain adaptation techniques, adaptive routing strategies, and multilingual training frameworks for Akan and other LRLs.

大多数现有的自动语音识别(ASR)研究都评估了使用内域数据集的模型,但很少评价它们如何对不同演讲背景加以概括。本研究通过对建立在变压器结构上,例如Whisper和Wav2Vec2的7个Akan ASR模型进行基准化来弥补这一差距,使用4个Akan 语音识别(ASR)研究来确定其性能。这些数据集包含不同的领域,包括文化上相关的图像描述、非正式对话、圣经读取和自发的财务对话。对单词误差率和字符误差率的比较突出显示了域依赖性,模型只在培训领域最优化地运行,同时显示不匹配情景中明显的准确性退化。本研究还查明了Whisper和Wav2Vec2结构之间的明显错误行为。而微调Whisper Akan模型导致更多流利,但可能产生误导性的校正错误。当遇到不熟悉的投入时,Wav2Vec2生成了更明显但解释性更小的产出。在ASR的可读性和透明度错误之间,在选择低资源语言结构(LKAN)应用和适应性应用领域技术时,这些结论强调其他适应性框架需要。


Article 114

Title@2025-07-03 (4): Improving Consistency in Vehicle Trajectory Prediction Through Preference Optimization

Title: Improving Consistency in Vehicle Trajectory Prediction Through Preference Optimization Verbesserung der Konsistenz in der Fahrzeug-Trajektorie-Vorhersage durch Preference-Optimierung 通过优先优化提高车辆轨迹预测的一致性 2507.02406v1

Authors (7): Caio Azevedo, Lina Achaji, Stefano Sabatini, Nicola Poerio, Grzegorz Bartyzel, Sascha Hornauer, Fabien Moutarde

Trajectory prediction is an essential step in the pipeline of an autonomous vehicle. Inaccurate or inconsistent predictions regarding the movement of agents in its surroundings lead to poorly planned maneuvers and potentially dangerous situations for the end-user. Current state-of-the-art deep-learning-based trajectory prediction models can achieve excellent accuracy on public datasets. However, when used in more complex, interactive scenarios, they often fail to capture important interdependencies between agents, leading to inconsistent predictions among agents in the traffic scene. Inspired by the efficacy of incorporating human preference into large language models, this work fine-tunes trajectory prediction models in multi-agent settings using preference optimization. By taking as input automatically calculated preference rankings among predicted futures in the fine-tuning process, our experiments–using state-of-the-art models on three separate datasets–show that we are able to significantly improve scene consistency while minimally sacrificing trajectory prediction accuracy and without adding any excess computational requirements at inference time.

有关其周围物剂移动的不准确或不一致的预测导致计划不周和最终使用者可能面临危险的情况。目前最先进的深学习轨迹预测模型可以在公共数据集中实现极佳的准确性。然而,如果在更复杂、互动的假设中使用,它们往往无法捕捉各种物剂之间重要的相互依存关系,导致交通现场物剂之间的预测不一致。由于将人类偏好纳入大型语言模型的功效,这种工作微调轨迹预测模型在使用偏好优化的多试剂环境中产生效果,因此在微调过程中自动计算预测的未来的偏好等级,我们在三个不同的数据集中实验使用最先进的模型,可以大大改进现场的一致性,同时尽可能降低轨迹预测的准确性,在推断时不增加任何超量的计算要求。


Article 115

Title@2025-07-03 (4): XGeM: A Multi-Prompt Foundation Model for Multimodal Medical Data Generation

Title: XGeM: A Multi-Prompt Foundation Model for Multimodal Medical Data Generation XGeM: Ein Multi-Prompt-Stiftungsmodell für multimodale medizinische Datengenerierung XGeM:多式医疗数据多式生成多式医疗多模式基金会模式 2501.04614v3

Authors (8): Daniele Molino, Francesco Di Feola, Eliodoro Faiella, Deborah Fazzini, Domiziana Santucci, Linlin Shen, Valerio Guarrasi, Paolo Soda

The adoption of Artificial Intelligence in medical imaging holds great promise, yet it remains hindered by challenges such as data scarcity, privacy concerns, and the need for robust multimodal integration. While recent advances in generative modeling have enabled high-quality synthetic data generation, existing approaches are often limited to unimodal, unidirectional synthesis and therefore lack the ability to jointly synthesize multiple modalities while preserving clinical consistency. To address this challenge, we introduce XGeM, a 6.77-billion-parameter multimodal generative model designed to support flexible, any-to-any synthesis between medical data modalities. XGeM constructs a shared latent space via contrastive learning and introduces a novel Multi-Prompt Training strategy, enabling conditioning on arbitrary subsets of input modalities. This design allows the model to adapt to heterogeneous clinical inputs and generate multiple outputs jointly, preserving both semantic and structural coherence. We extensively validate XGeM: first we benchmark it against five competitors on the MIMIC-CXR dataset, a state-of-the-art dataset for multi-view Chest X-ray and radiological report generation. Secondly, we perform a Visual Turing Test with expert radiologists to assess the realism and clinical relevance of the generated data, ensuring alignment with real-world scenarios. Finally, we show how XGeM can support key medical data challenges such as anonymization, class imbalance, and data scarcity, underscoring its utility as a foundation model for medical data synthesis. Project page is at https://cosbidev.github.io/XGeM/.

在医学成像中采用人工智能很有希望,但是仍然受到诸如数据稀缺、隐私问题和健全多式联运一体化需要等挑战的阻碍。尽管最近在基因模型方面的进步使高质量的合成数据生成成为了高质量的合成数据,但现有方法往往仅限于单式、单向合成,因此缺乏在保持临床一致性的同时联合综合多种模式的能力。为了应对这一挑战,我们引入了XGeM,这是一个6.77亿公吨的多式联运变色模型,旨在支持医学数据模式之间的灵活、任何合成。XGeM通过对比学习构建一个共享的潜在空间,并引入了一个新的多功能培训战略,允许对任意的输入模式进行调控,但这种设计使模型能够适应各种临床投入和联合生成多种产出,同时保持语义和结构的一致性。我们广泛验证XGeM:首先我们用MIMIC-CXR数据集的5个竞争者作为基准,一个用于多视图X光/辐射报告生成的州级合成数据集级数据集。第二,我们用VeVal-Producal数据库来评估其真实的实用性数据库和最终数据定位。


Article 116

Title@2025-07-03 (4): Wildlife Target Re-Identification Using Self-supervised Learning in Non-Urban Settings

Title: Wildlife Target Re-Identification Using Self-supervised Learning in Non-Urban Settings Wildlife Target Re-Identification mit selbstüberwachtem Lernen in nicht-Urban-Einstellungen 在非城市环境中利用自我监督的学习重新确定野生动物目标 2507.02403v1

Authors (2): Mufhumudzi Muthivhi, Terence L. van Zyl

Wildlife re-identification aims to match individuals of the same species across different observations. Current state-of-the-art (SOTA) models rely on class labels to train supervised models for individual classification. This dependence on annotated data has driven the curation of numerous large-scale wildlife datasets. This study investigates self-supervised learning Self-Supervised Learning (SSL) for wildlife re-identification. We automatically extract two distinct views of an individual using temporal image pairs from camera trap data without supervision. The image pairs train a self-supervised model from a potentially endless stream of video data. We evaluate the learnt representations against supervised features on open-world scenarios and transfer learning in various wildlife downstream tasks. The analysis of the experimental results shows that self-supervised models are more robust even with limited data. Moreover, self-supervised features outperform supervision across all downstream tasks. The code is available here https://github.com/pxpana/SSLWildlife.

野生生物再识别旨在在不同观测中与同一物种的个人相匹配。 目前最新的艺术模型(SOTA)依靠类标签来培训受监督的个人分类模式。 这种依赖附加说明的数据促使大量大规模野生生物数据集的整理。 本研究调查了野生生物再识别的自监督学习自我监督学习(SSL ) 。 我们自动从没有监督的相机捕捉数据中提取两种不同的观点, 使用时间图像配对的个人。 图像配对从潜在的无尽视频数据流中培养一种自监督模型。 我们评估了在开放世界情景中学习到的外观特征, 并在各种野生生物下游任务中传授学习。 对实验结果的分析显示,即使有有限的数据, 自监督模型也更加健全。 此外, 自我监督的功能在所有下游任务中都超越了监督功能。 代码可在这里查阅 https://github.com/pxpana/SSLWildfile。


Article 117

Title@2025-07-03 (4): TABNet: A Triplet Augmentation Self-Recovery Framework with Boundary-Aware Pseudo-Labels for Medical Image Segmentation

Title: TABNet: A Triplet Augmentation Self-Recovery Framework with Boundary-Aware Pseudo-Labels for Medical Image Segmentation TABNet: Ein Triplet Augmentation Self-Recovery Framework mit Boundary-Aware Pseudo-Labors für die Segmentierung medizinischer Bilder TABNet:三联增强自我恢复框架,配有医疗图像分割的边界-软件软件-标签 2507.02399v1

Authors (8): Peilin Zhang, Shaouxan Wua, Jun Feng, Zhuo Jin, Zhizezhang Gao, Jingkun Chen, Yaqiong Xing, Xiao Zhang

Background and objective: Medical image segmentation is a core task in various clinical applications. However, acquiring large-scale, fully annotated medical image datasets is both time-consuming and costly. Scribble annotations, as a form of sparse labeling, provide an efficient and cost-effective alternative for medical image segmentation. However, the sparsity of scribble annotations limits the feature learning of the target region and lacks sufficient boundary supervision, which poses significant challenges for training segmentation networks. Methods: We propose TAB Net, a novel weakly-supervised medical image segmentation framework, consisting of two key components: the triplet augmentation self-recovery (TAS) module and the boundary-aware pseudo-label supervision (BAP) module. The TAS module enhances feature learning through three complementary augmentation strategies: intensity transformation improves the model’s sensitivity to texture and contrast variations, cutout forces the network to capture local anatomical structures by masking key regions, and jigsaw augmentation strengthens the modeling of global anatomical layout by disrupting spatial continuity. By guiding the network to recover complete masks from diverse augmented inputs, TAS promotes a deeper semantic understanding of medical images under sparse supervision. The BAP module enhances pseudo-supervision accuracy and boundary modeling by fusing dual-branch predictions into a loss-weighted pseudo-label and introducing a boundary-aware loss for fine-grained contour refinement. Results: Experimental evaluations on two public datasets, ACDC and MSCMR seg, demonstrate that TAB Net significantly outperforms state-of-the-art methods for scribble-based weakly supervised segmentation. Moreover, it achieves performance comparable to that of fully supervised methods.

医学图象分解是各种临床应用的核心任务。然而,获取大规模、充分附加说明的医疗图象数据集既耗时又费钱。作为稀释标签的一种形式,缩略图为医疗图象分解提供了高效且具有成本效益的替代方法。然而,缩略图解析的宽度限制了目标区域的特征学习,并缺乏足够的边界监督,对培训分解网络构成重大挑战。方法:我们提议TAB Net,这是一个新颖的、不受严密监督的医疗图象分解框架,由两个关键组成部分组成:三重增强自我恢复模块和边界观测假标签监督模块。TAS 模块通过三种互补强化战略强化特征学习:强度转换提高了模型对纹质和对比变异的敏感性,切断网络通过掩蔽关键区域来捕捉本地解剖结构,而缩略图增强结构则通过破坏空间连续性来强化全球解剖图理结构的模型。 指导网络从多重智能自我分析自我恢复的缩缩略图象仪表,TAS 将深度分析模型化的精确性模型引入双轨的模型,通过深度分析模型分析模型分析模型分析模型进行更深的深度分析。 IM路段,通过深度的深度的深度分析,使SLSLSLSLSLS-RIMS-S-LS-S-S-S-S-S-S-S-S-S-SD-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-R-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-


Article 118

Title@2025-07-03 (4): Posterior Transition Modeling for Unsupervised Diffusion-Based Speech Enhancement

Title: Posterior Transition Modeling for Unsupervised Diffusion-Based Speech Enhancement Hinterer Übergang Modellierung für unüberwachte Diffusions-basierte Sprachverbesserung 未经监督的基于传播的语音增强的外向过渡建模 2507.02391v1

Authors (4): Mostafa Sadeghi, Jean-Eudes Ayilo, Romain Serizel, Xavier Alameda-Pineda

We explore unsupervised speech enhancement using diffusion models as expressive generative priors for clean speech. Existing approaches guide the reverse diffusion process using noisy speech through an approximate, noise-perturbed likelihood score, combined with the unconditional score via a trade-off hyperparameter. In this work, we propose two alternative algorithms that directly model the conditional reverse transition distribution of diffusion states. The first method integrates the diffusion prior with the observation model in a principled way, removing the need for hyperparameter tuning. The second defines a diffusion process over the noisy speech itself, yielding a fully tractable and exact likelihood score. Experiments on the WSJ0-QUT and VoiceBank-DEMAND datasets demonstrate improved enhancement metrics and greater robustness to domain shifts compared to both supervised and unsupervised baselines.

我们利用传播模型来探索未经监督的语音增强,作为清洁言语的表达式基因前缀; 现有方法通过近似、噪音隔热的可能性评分来指导反向传播过程,同时通过交换超参数进行无条件评分; 在这项工作中,我们提出两种替代算法,直接模拟传播状态的有条件反向过渡分布。 第一种方法以原则方式将先前的传播与观察模型结合起来,从而消除超参数调的需要。 第二种方法界定了在吵闹的言语本身上的传播过程,产生了完全可移植和准确的概率评分。 WSJ0-QUT 和 VoiceBank-DEMAND 数据集的实验表明,相对于监督和不受监督的基线而言,改进了增强度度和对域转移的可靠性。


Article 119

Title@2025-07-03 (4): MInCo: Mitigating Information Conflicts in Distracted Visual Model-based Reinforcement Learning

Title: MInCo: Mitigating Information Conflicts in Distracted Visual Model-based Reinforcement Learning MInCo: Informationskonflikte im distraktierten visuellen Modell-basierten Verstärkungslernen abmildern MINCO: 减缓视觉模型化强化学习中的信息冲突 2504.04164v3

Authors (7): Shiguang Sun, Hanbo Zhang, Zeyang Liu, Xinrui Yang, Lipeng Wan, Xingyu Chen, Xuguang Lan

Existing visual model-based reinforcement learning (MBRL) algorithms with observation reconstruction often suffer from information conflicts, making it difficult to learn compact representations and hence result in less robust policies, especially in the presence of task-irrelevant visual distractions. In this paper, we first reveal that the information conflicts in current visual MBRL algorithms stem from visual representation learning and latent dynamics modeling with an information-theoretic perspective. Based on this finding, we present a new algorithm to resolve information conflicts for visual MBRL, named MInCo, which mitigates information conflicts by leveraging negative-free contrastive learning, aiding in learning invariant representation and robust policies despite noisy observations. To prevent the dominance of visual representation learning, we introduce time-varying reweighting to bias the learning towards dynamics modeling as training proceeds. We evaluate our method on several robotic control tasks with dynamic background distractions. Our experiments demonstrate that MInCo learns invariant representations against background noise and consistently outperforms current state-of-the-art visual MBRL methods. Code is available at https://github.com/ShiguangSun/minco.

通过观察重建的现有视觉模型强化学习(MBRL)算法往往受到信息冲突的影响,使得难以学习压缩的表述方式,从而导致不那么稳健的政策,特别是在存在与任务相关的视觉分流的情况下。在本文中,我们首先揭示,当前视觉模型强化算法中的信息冲突源于以信息理论视角建模的视觉代表学习和潜在动态模型。基于这一发现,我们为视觉模型MBRL(名为MINCO)提出了一个新的算法以解决信息冲突,它通过利用消极的对比学习来缓解信息冲突,帮助学习变异代表制和强健健的政策,尽管观察吵闹。为了防止视觉代表学习的主导地位,我们引入时间的加权,将学习偏重于动态模型作为培训的收益。我们评估了我们用动态背景分散的几项机器人控制任务的方法。我们的实验表明,MInCo对背景噪音学习了异性表达方式,并持续超越了当前状态的视觉MBRL方法。代码可在 https://github.com/ShigwangSun/min 。


Article 120

Title@2025-07-03 (4): Sparse Gaussian Processes: Structured Approximations and Power-EP Revisited

Title: Sparse Gaussian Processes: Structured Approximations and Power-EP Revisited Sparse Gaussian Processes: Strukturierte Annäherungen und Power-EP Revisited 粗浅的高斯进程:结构相近和电源-EP 重新审视 2507.02377v1

Authors (2): Thang D. Bui, Michalis K. Titsias

Inducing-point-based sparse variational Gaussian processes have become the standard workhorse for scaling up GP models. Recent advances show that these methods can be improved by introducing a diagonal scaling matrix to the conditional posterior density given the inducing points. This paper first considers an extension that employs a block-diagonal structure for the scaling matrix, provably tightening the variational lower bound. We then revisit the unifying framework of sparse GPs based on Power Expectation Propagation (PEP) and show that it can leverage and benefit from the new structured approximate posteriors. Through extensive regression experiments, we show that the proposed block-diagonal approximation consistently performs similarly to or better than existing diagonal approximations while maintaining comparable computational costs. Furthermore, the new PEP framework with structured posteriors provides competitive performance across various power hyperparameter settings, offering practitioners flexible alternatives to standard variational approaches.

基于点的微小变异高山进程已成为扩大GP模型的标准工作马。 最近的进展表明,这些方法可以通过引入一个对等缩放矩阵来改进,即根据诱导点对有条件的后部密度引入一个对等缩放矩阵。本文件首先考虑对缩放矩阵采用区块对角结构的扩展,可以明显地收紧变差下限。然后我们重新审视以Power Expectation proparation(PEP)为基础的稀释组合组合组合组合组合的统一框架,并表明它能够发挥杠杆作用并受益于新的结构化近似后部。通过广泛的回归实验,我们表明,拟议的块对角近似值在保持可比计算成本的同时始终与现有的对角近似或更好。此外,带有结构化后部的新的PEPEP框架在各种电源超光度环境中都具有竞争性性能,为从业者提供了标准变异方法的灵活替代方法。


Article 121

Title@2025-07-03 (4): Gateformer: Advancing Multivariate Time Series Forecasting through Temporal and Variate-Wise Attention with Gated Representations

Title: Gateformer: Advancing Multivariate Time Series Forecasting through Temporal and Variate-Wise Attention with Gated Representations Gateformer: Multivariate Zeitreihen durch Temporale und Variate-Wise-Achtung mit gezahnten Darstellungen prognostizieren Gatefex:通过时空和蒸汽预报推进多变时间序列 2505.00307v3

Authors (2): Yu-Hsiang Lan, Eric K. Oermann

There has been a recent surge of interest in time series modeling using the Transformer architecture. However, forecasting multivariate time series with Transformer presents a unique challenge as it requires modeling both temporal (cross-time) and variate (cross-variate) dependencies. While Transformer-based models have gained popularity for their flexibility in capturing both sequential and cross-variate relationships, it is unclear how to best integrate these two sources of information in the context of the Transformer architecture while optimizing for both performance and efficiency. We re-purpose the Transformer architecture to effectively model both cross-time and cross-variate dependencies. Our approach begins by embedding each variate independently into a variate-wise representation that captures its cross-time dynamics, and then models cross-variate dependencies through attention mechanisms on these learned embeddings. Gating operations in both cross-time and cross-variate modeling phases regulate information flow, allowing the model to focus on the most relevant features for accurate predictions. Our method achieves state-of-the-art performance across 13 real-world datasets and can be seamlessly integrated into other Transformer-based and LLM-based forecasters, delivering performance improvements up to 20.7\% over original models. Code is available at this repository: https://github.com/nyuolab/Gateformer.

最近对使用变换器结构的时间序列建模的兴趣激增。然而,与变换器一起预测多变时间序列是一个独特的挑战,因为它需要同时建模时间(跨时间)和变异(跨变量)依赖。虽然以变换器为基础的模型在捕捉相继和跨变量关系方面的灵活性越来越受欢迎,但不清楚如何在变换器结构中最好地整合这两个信息来源,同时优化性能和效率。我们重新利用变换器结构有效地建模跨时间和跨变量依赖性。我们的方法是将每种变异独立嵌入一个可捕捉其跨时间动态的变异式代表制,然后通过这些学习嵌入的注意机制来建模跨变量依赖性。将跨时间和跨变量建模阶段的操作进行调节信息流动,使该模型能够侧重于准确预测的最相关特征。我们的方法是在整个13个实体数据集和跨变量的跨时间和跨变量依赖性。我们的方法首先将每一种变异式模型独立嵌入一个反映其跨时间动态的表达式(跨时间)和变异性代表制(跨时间),然后通过这些学习嵌的嵌化机制建模型来模拟跨变量依赖性模型。


Article 122

Title@2025-07-03 (4): Good Representation, Better Explanation: Role of Convolutional Neural Networks in Transformer-Based Remote Sensing Image Captioning

Title: Good Representation, Better Explanation: Role of Convolutional Neural Networks in Transformer-Based Remote Sensing Image Captioning Gute Darstellung, bessere Erklärung: Rolle konvolutionärer neuraler Netzwerke bei transformerbasierter Fernerkundung Bildunterschriften 良好的表述,更好的解释:进化神经网络在以变换器为基础的遥感图像描述中的作用 2502.16095v2

Authors (4): Swadhin Das, Saarthak Gupta, Kamal Kumar, Raksha Sharma

Remote Sensing Image Captioning (RSIC) is the process of generating meaningful descriptions from remote sensing images. Recently, it has gained significant attention, with encoder-decoder models serving as the backbone for generating meaningful captions. The encoder extracts essential visual features from the input image, transforming them into a compact representation, while the decoder utilizes this representation to generate coherent textual descriptions. Recently, transformer-based models have gained significant popularity due to their ability to capture long-range dependencies and contextual information. The decoder has been well explored for text generation, whereas the encoder remains relatively unexplored. However, optimizing the encoder is crucial as it directly influences the richness of extracted features, which in turn affects the quality of generated captions. To address this gap, we systematically evaluate twelve different convolutional neural network (CNN) architectures within a transformer-based encoder framework to assess their effectiveness in RSIC. The evaluation consists of two stages: first, a numerical analysis categorizes CNNs into different clusters, based on their performance. The best performing CNNs are then subjected to human evaluation from a human-centric perspective by a human annotator. Additionally, we analyze the impact of different search strategies, namely greedy search and beam search, to ensure the best caption. The results highlight the critical role of encoder selection in improving captioning performance, demonstrating that specific CNN architectures significantly enhance the quality of generated descriptions for remote sensing images. By providing a detailed comparison of multiple encoders, this study offers valuable insights to guide advances in transformer-based image captioning models.

遥感图像显示( RSIC) 是从遥感图像中生成有意义的描述的过程。 最近, 解码器的解码器已经得到了人们的极大关注, 编码器的解码器模型是生成有意义标题的骨干。 编码器从输入图像中提取基本的视觉特征, 将其转换成一个缩略语, 而解码器则利用这种表达方式来生成一致的文字描述。 最近, 以变压器为基础的模型因其能够捕捉远程依赖性和背景信息而获得了显著的受欢迎程度。 解码器的解码器已经很好地探索了文本生成, 而编码器的解码器仍然相对没有被探索。 然而, 优化多码器的解码器是关键的关键, 因为它直接影响到提取的功能的丰富性, 这反过来影响到所生成的字幕的质量。 为了缩小这一差距, 我们系统评估了12个不同的革命神经网络结构, 以基于变压器的编码框架来评估其在 。 评估分为两个阶段: 首先, 将CNN 的解码器的解析器的解析器的解析器根据它们的性表现, , , 最精确的解析器的解析器的解析器的解析器的精略 , , 从人类的精细的精度的精度, 分析, 从人类的精细分析, 分析, 从人类的精细分析, 从人类的精细分析, 分析, 从人类的精细的精细分析, 分析, 分析, 分析, 分析, 分析, 从人类的精细到 分析, 分析, 分析, 分析, 分析, 分析, 分析, 从人类的 分析, 分析, 分析, 分析, 分析, 分析, 分析, 分析, 分析, 分析, 分析, 分析, 分析, 分析, 分析, 分析, 分析, 分析, 分析, 分析, 分析, 分析, 分析, 分析, 分析, 分析, 分析, 分析, 分析, 分析, 分析, 分析, 分析, 分析, 分析, 分析, 分析, 分析, 分析, 分析, 分析, 分析, 分析, 分析, 分析, 分析,


Article 123

Title@2025-07-03 (4): Exploring the Integration of Large Language Models in Industrial Test Maintenance Processes

Title: Exploring the Integration of Large Language Models in Industrial Test Maintenance Processes Erforschung der Integration von großen Sprachmodellen in industrielle Testwartungsprozesse 探索将大语言模型纳入工业试验维护工艺 2409.06416v2

Authors (6): Jingxiong Liu, Ludvig Lemner, Linnea Wahlgren, Gregory Gay, Nasser Mohammadiha, Joakim Wennerberg

Much of the cost and effort required during the software testing process is invested in performing test maintenance - the addition, removal, or modification of test cases to keep the test suite in sync with the system-under-test or to otherwise improve its quality. Tool support could reduce the cost - and improve the quality - of test maintenance by automating aspects of the process or by providing guidance and support to developers. In this study, we explore the capabilities and applications of large language models (LLMs) - complex machine learning models adapted to textual analysis - to support test maintenance. We conducted a case study at Ericsson AB where we explore the triggers that indicate the need for test maintenance, the actions that LLMs can take, and the considerations that must be made when deploying LLMs in an industrial setting. We also propose and demonstrate a multi-agent architecture that can predict which tests require maintenance following a change to the source code. Collectively, these contributions advance our theoretical and practical understanding of how LLMs can be deployed to benefit industrial test maintenance processes.

软件测试过程中所需的大部分成本和精力都用于进行测试维护 – – 添加、删除或修改测试案例,使测试套件与测试中的系统保持同步,或以其他方式提高测试套件的质量; 工具支持可以降低测试维护的成本,并提高质量,办法是使过程的各方面自动化,或向开发者提供指导和支持; 在这项研究中,我们探讨了大型语言模型(LLMs) – – 适合于文字分析的复杂机器学习模型 – – 的能力和应用,以支持测试维护; 我们在Ericsson AB进行了一项案例研究,我们在该研究中探讨了显示需要测试维护的触发因素、LLMs可以采取的行动,以及在工业环境中部署LMs时必须作出的考虑; 我们还提出并展示了一种多工具结构,可以预测在源码改变后哪些测试需要维护; 共同而言,这些贡献提高了我们对如何部署LMs以有利于工业测试维护过程的理论和实践理解。


Article 124

Title@2025-07-03 (4): Deep Reinforcement Learning-Based DRAM Equalizer Parameter Optimization Using Latent Representations

Title: Deep Reinforcement Learning-Based DRAM Equalizer Parameter Optimization Using Latent Representations Tiefenverstärkte DRAM-Equalizer-Parameter-Optimierung mit latenten Darstellungen 利用延迟代表方式优化深度强化学习-基于深强化的 DRAM 平准参数 2507.02365v1

Authors (2): Muhammad Usama, Dong Eui Chang

Equalizer parameter optimization for signal integrity in high-speed Dynamic Random Access Memory systems is crucial but often computationally demanding or model-reliant. This paper introduces a data-driven framework employing learned latent signal representations for efficient signal integrity evaluation, coupled with a model-free Advantage Actor-Critic reinforcement learning agent for parameter optimization. The latent representation captures vital signal integrity features, offering a fast alternative to direct eye diagram analysis during optimization, while the reinforcement learning agent derives optimal equalizer settings without explicit system models. Applied to industry-standard Dynamic Random Access Memory waveforms, the method achieved significant eye-opening window area improvements: 42.7\% for cascaded Continuous-Time Linear Equalizer and Decision Feedback Equalizer structures, and 36.8\% for Decision Feedback Equalizer-only configurations. These results demonstrate superior performance, computational efficiency, and robust generalization across diverse Dynamic Random Access Memory units compared to existing techniques. Core contributions include an efficient latent signal integrity metric for optimization, a robust model-free reinforcement learning strategy, and validated superior performance for complex equalizer architectures.

高速动态随机存取内存系统中信号完整性的等效参数优化至关重要,但通常在计算上要求很高或依赖模型。本文件引入了一个数据驱动框架,采用为高效信号完整性评价而学习的潜在潜在信号表示法,同时引入了一个为参数优化而学习的无模型的优劣动画强化剂。潜在表示法捕捉了关键信号完整性特征,为在优化过程中直接进行眼图分析提供了快速的替代方法,而强化学习剂则在没有明确系统模型的情况下获得了最佳平衡器设置。该方法适用于工业标准动态随机存取内存波形,实现了显著的开放窗口区域改进:42.7用于连续线性线性反应和决定反馈平准结构,36.8用于决策反馈平准配置。这些结果表明,与现有技术相比,各种动态随机存取内存单元的性效果优异、计算效率和稳健通用性。核心贡献包括高效的潜在信号完整性指标用于优化、强的无模型强化学习战略,并验证了复杂的平衡器结构的优性性性。


Article 125

Title@2025-07-03 (4): Offline Reinforcement Learning with Penalized Action Noise Injection

Title: Offline Reinforcement Learning with Penalized Action Noise Injection Offline-Verstärkung Lernen mit Penalized Action Noise Injection 用惩罚性行动噪音注射加强离线强化学习 2507.02356v1

Authors (2): JunHyeok Oh, Byung-Jun Lee

Offline reinforcement learning (RL) optimizes a policy using only a fixed dataset, making it a practical approach in scenarios where interaction with the environment is costly. Due to this limitation, generalization ability is key to improving the performance of offline RL algorithms, as demonstrated by recent successes of offline RL with diffusion models. However, it remains questionable whether such diffusion models are necessary for highly performing offline RL algorithms, given their significant computational requirements during inference. In this paper, we propose Penalized Action Noise Injection (PANI), a method that simply enhances offline learning by utilizing noise-injected actions to cover the entire action space, while penalizing according to the amount of noise injected. This approach is inspired by how diffusion models have worked in offline RL algorithms. We provide a theoretical foundation for this method, showing that offline RL algorithms with such noise-injected actions solve a modified Markov Decision Process (MDP), which we call the noisy action MDP. PANI is compatible with a wide range of existing off-policy and offline RL algorithms, and despite its simplicity, it demonstrates significant performance improvements across various benchmarks.

离线强化学习(RL)只使用固定数据集优化政策,使其成为在与环境互动费用昂贵的情况下的一种实用方法。由于这一限制,一般化能力是改善离线RL算法绩效的关键,最近离线RL在扩散模型方面的成功证明了这一点。然而,鉴于在推断过程中对高性能离线 RL 算法的重大计算要求,这种扩散模型对于高性能离线 RL 算法是否必要仍然值得怀疑。在本文件中,我们提议了惩罚性行动噪音注射(PANI),这种方法仅通过使用噪音注入行动覆盖整个行动空间来增强离线学习,同时根据注入的噪音数量进行处罚。这一方法受到扩散模型在离线RL 算法中如何发挥作用的启发。我们为这种方法提供了一个理论基础,表明带有这种噪音输入行动的离线式RL算法解决了修正的Markov 决策程序(MDP),我们称之为噪音行动MDP。PNI与现有的离线和离线性RL算法的广泛范围相容,尽管其简单,但它展示了各种明显的业绩。


Article 126

Title@2025-07-03 (4): Text-Aware Image Restoration with Diffusion Models

Title: Text-Aware Image Restoration with Diffusion Models Text-Aware Bildrestaurierung mit Diffusionsmodellen 带有传播模型的文本软件图像恢复 2506.09993v2

Authors (9): Jaewon Min, Jin Hyeon Kim, Paul Hyunbin Cho, Jaeeun Lee, Jihye Park, Minkyu Park, Sangpil Kim, Hyunhee Park, Seungryong Kim

Image restoration aims to recover degraded images. However, existing diffusion-based restoration methods, despite great success in natural image restoration, often struggle to faithfully reconstruct textual regions in degraded images. Those methods frequently generate plausible but incorrect text-like patterns, a phenomenon we refer to as text-image hallucination. In this paper, we introduce Text-Aware Image Restoration (TAIR), a novel restoration task that requires the simultaneous recovery of visual contents and textual fidelity. To tackle this task, we present SA-Text, a large-scale benchmark of 100K high-quality scene images densely annotated with diverse and complex text instances. Furthermore, we propose a multi-task diffusion framework, called TeReDiff, that integrates internal features from diffusion models into a text-spotting module, enabling both components to benefit from joint training. This allows for the extraction of rich text representations, which are utilized as prompts in subsequent denoising steps. Extensive experiments demonstrate that our approach consistently outperforms state-of-the-art restoration methods, achieving significant gains in text recognition accuracy. See our project page: https://cvlab-kaist.github.io/TAIR/

然而,现有的基于传播的恢复方法尽管在自然图像恢复方面取得了巨大成功,但往往努力在退化图像中忠实地重建文字区域。这些方法经常产生貌似合理但不正确的文本模式,我们称之为文字图像幻觉。在本文中,我们引入了文本-软件图像恢复(TAIR),这是一项需要同时恢复视觉内容和文字真实性的新颖的恢复任务。为了完成这项任务,我们提出了SA-Text,这是100K高品质图像的大规模基准,其高品质图像的高度注释,有多种复杂的文字实例。此外,我们提议了一个称为TereDiff的多任务式传播框架,将扩散模型的内部特征纳入文本定位模块,使这两个组成部分都能从联合培训中受益。这样可以提取丰富的文字表达方式,这些表达方式在随后的脱色步骤中被迅速使用。广泛的实验表明,我们的方法始终超越了最新艺术恢复方法,在文本识别中取得了显著的准确性。见我们的项目页面:https://cvlab-kaist.giubth.


Article 127

Title@2025-07-03 (4): Privacy-Preserving Operating Room Workflow Analysis using Digital Twins

Title: Privacy-Preserving Operating Room Workflow Analysis using Digital Twins Datenschutz-Bewahrung von Betriebsraum-Workflow-Analysen mit digitalen Zwillingen 利用数字双对进行工作流量分析 2504.12552v2

Authors (10): Alejandra Perez, Han Zhang, Yu-Chun Ku, Lalithkumar Seenivasan, Roger Soberanis, Jose L. Porras, Richard Day, Jeff Jopling, Peter Najjar, Mathias Unberath

The operating room (OR) is a complex environment where optimizing workflows is critical to reduce costs and improve patient outcomes. While computer vision approaches for automatic recognition of perioperative events can identify bottlenecks for OR optimization, privacy concerns limit the use of OR videos for automated event detection. We propose a two-stage pipeline for privacy-preserving OR video analysis and event detection. First, we leverage vision foundation models for depth estimation and semantic segmentation to generate de-identified Digital Twins (DT) of the OR from conventional RGB videos. Second, we employ the SafeOR model, a fused two-stream approach that processes segmentation masks and depth maps for OR event detection. Evaluation on an internal dataset of 38 simulated surgical trials with five event classes shows that our DT-based approach achieves performance on par with – and sometimes better than – raw RGB video-based models for OR event detection. Digital Twins enable privacy-preserving OR workflow analysis, facilitating the sharing of de-identified data across institutions and potentially enhancing model generalizability by mitigating domain-specific appearance differences.

操作室(OR)是一个复杂的环境,最优化工作流程对于降低成本和改善患者结果至关重要。自动识别周期性活动的计算机远景方法可以确定OR优化的瓶颈,但隐私问题限制了使用OR视频进行自动事件探测。我们提议了隐私保护视频分析和事件探测的两阶段管道。首先,我们利用深度估计和语义分割的愿景基础模型,从常规 RGB 视频中产生脱钩的OR数字双胞胎。第二,我们采用安全OR模式,即双流混合方法,用于处理分解遮罩和用于检测OR事件深度地图。对38个模拟外科手术试验的内部数据集进行分为5个活动类别的评价表明,我们基于DT的方法与原始 RGB 视频模型相当,有时比原始RGB 视频模型更好,用于检测OR 事件。数字双子能够进行隐私保护OR 工作流程分析,促进各机构间分享脱钩数据,并有可能通过减少特定区域外观差异而提高模型的通用性。


Article 128

Title@2025-07-03 (4): UniNet: A Unified Multi-granular Traffic Modeling Framework for Network Security

Title: UniNet: A Unified Multi-granular Traffic Modeling Framework for Network Security UniNet: Ein einheitliches Multi-Granular Traffic Modeling Framework für Netzwerksicherheit UniNet:网络安全统一多重点交通示范框架 2503.04174v2

Authors (3): Binghui Wu, Dinil Mon Divakaran, Mohan Gurusamy

As modern networks grow increasingly complex–driven by diverse devices, encrypted protocols, and evolving threats–network traffic analysis has become critically important. Existing machine learning models often rely only on a single representation of packets or flows, limiting their ability to capture the contextual relationships essential for robust analysis. Furthermore, task-specific architectures for supervised, semi-supervised, and unsupervised learning lead to inefficiencies in adapting to varying data formats and security tasks. To address these gaps, we propose UniNet, a unified framework that introduces a novel multi-granular traffic representation (T-Matrix), integrating session, flow, and packet-level features to provide comprehensive contextual information. Combined with T-Attent, a lightweight attention-based model, UniNet efficiently learns latent embeddings for diverse security tasks. Extensive evaluations across four key network security and privacy problems–anomaly detection, attack classification, IoT device identification, and encrypted website fingerprinting–demonstrate UniNet’s significant performance gain over state-of-the-art methods, achieving higher accuracy, lower false positive rates, and improved scalability. By addressing the limitations of single-level models and unifying traffic analysis paradigms, UniNet sets a new benchmark for modern network security.

随着各种装置、加密协议和不断变化的威胁-网络交通分析的日益复杂驱动的现代网络变得日益复杂,各种装置、加密协议和不断变化的威胁-网络交通分析已变得极其重要。现有的机器学习模式往往只依赖一个组合或流动的单一代表,从而限制了它们捕捉对稳健分析至关重要的背景关系的能力。此外,监督、半监督和不受监督的学习任务特定结构导致适应不同数据格式和安全任务方面的效率低下。为弥补这些差距,我们提议UniNet是一个统一框架,采用新的多组合交通代表(T-Matrix),整合会话、流量和包级功能,以提供全面的背景资料。与T-Aent、轻量度关注模式相结合,UniNet高效地学习了各种安全任务的潜在嵌入。对四个关键网络的安全和隐私问题进行了广泛的评价,如异常检测、攻击分类、IoT装置识别和加密网站指纹-验证United United United的特征识别,在州-艺术方法上取得了显著的业绩收益,实现了更高的准确性积极率,并改进了现代通信网络的可缩缩略性。通过统一性模式分析,统一了单一级的安全模式。


Article 129

Title@2025-07-03 (4): DeltaSHAP: Explaining Prediction Evolutions in Online Patient Monitoring with Shapley Values

Title: DeltaSHAP: Explaining Prediction Evolutions in Online Patient Monitoring with Shapley Values DeltaSHAP: Erklären von Vorhersageentwicklungen bei der Online-Patientenüberwachung mit Shapley-Werten DelsaSHAP: 解释在有阴影值的在线患者监测中的预测演变 2507.02342v1

Authors (4): Changhun Kim, Yechan Mun, Sangchul Hahn, Eunho Yang

This study proposes DeltaSHAP, a novel explainable artificial intelligence (XAI) algorithm specifically designed for online patient monitoring systems. In clinical environments, discovering the causes driving patient risk evolution is critical for timely intervention, yet existing XAI methods fail to address the unique requirements of clinical time series explanation tasks. To this end, DeltaSHAP addresses three key clinical needs: explaining the changes in the consecutive predictions rather than isolated prediction scores, providing both magnitude and direction of feature attributions, and delivering these insights in real time. By adapting Shapley values to temporal settings, our approach accurately captures feature coalition effects. It further attributes prediction changes using only the actually observed feature combinations, making it efficient and practical for time-sensitive clinical applications. We also introduce new evaluation metrics to evaluate the faithfulness of the attributions for online time series, and demonstrate through experiments on online patient monitoring tasks that DeltaSHAP outperforms state-of-the-art XAI methods in both explanation quality as 62% and computational efficiency as 33% time reduction on the MIMIC-III decompensation benchmark. We release our code at https://github.com/AITRICS/DeltaSHAP.

这项研究提出了三角洲SHAP(DeltaSHAP),这是专门为在线病人监测系统设计的新颖的、可解释的人工智能(XAI)算法。在临床环境中,发现导致病人风险演变的原因对于及时干预至关重要,但现有的XAI方法未能满足临床时间序列解释任务的独特要求。为此,三角洲SHAP满足了三大临床需求:解释连续预测的变化,而不是孤立的预测分数,提供特征属性归属的大小和方向,并实时提供这些洞察力。通过将沙普利值调整到时间设置,我们的方法准确地捕捉了特征联盟效应。它进一步将预测变化仅使用实际观测到的特征组合,使其对时间敏感的临床应用具有效率和实用性。我们还采用了新的评价指标,以评估在线时间序列属性的准确性,并通过在线病人监测任务实验证明,DeltaSHAP在解释质量方面超过了62%,计算效率为MIMIMIC-III脱混基准减少了33%。我们在 https://github.com/AICS/DSAD.


Article 130

Title@2025-07-03 (4): Skywork-Reward-V2: Scaling Preference Data Curation via Human-AI Synergy

Title: Skywork-Reward-V2: Scaling Preference Data Curation via Human-AI Synergy Skywork-Reward-V2: Scaling Preference Data Curation via Human-AI Synergy Skywork-Reward-V2:通过人类-AI协同增强优先数据曲线 2507.01352v2

Authors (12): Chris Yuhao Liu, Liang Zeng, Yuzhen Xiao, Jujie He, Jiacai Liu, Chaojie Wang, Rui Yan, Wei Shen, Fuxiang Zhang, Jiacheng Xu, Yang Liu, Yahui Zhou

Despite the critical role of reward models (RMs) in reinforcement learning from human feedback (RLHF), current state-of-the-art open RMs perform poorly on most existing evaluation benchmarks, failing to capture the spectrum of nuanced and sophisticated human preferences. Even approaches that incorporate advanced training techniques have not yielded meaningful performance improvements. We hypothesize that this brittleness stems primarily from limitations in preference datasets, which are often narrowly scoped, synthetically labeled, or lack rigorous quality control. To address these challenges, we present a large-scale preference dataset comprising 40 million preference pairs, named SynPref-40M. To enable data curation at scale, we design a human-AI synergistic two-stage pipeline that leverages the complementary strengths of human annotation quality and AI scalability. In this pipeline, humans provide verified annotations, while large language models perform automatic curation based on human guidance. Training on this preference mixture, we introduce Skywork-Reward-V2, a suite of eight reward models ranging from 0.6B to 8B parameters, trained on a carefully curated subset of 26 million preference pairs from SynPref-40M. We demonstrate that Skywork-Reward-V2 is versatile across a wide range of capabilities, including alignment with human preferences, objective correctness, safety, resistance to stylistic biases, and best-of-N scaling, achieving state-of-the-art performance across seven major reward model benchmarks. Ablation studies confirm that the effectiveness of our approach stems not only from data scale but also from high-quality curation. The Skywork-Reward-V2 series represents substantial progress in open reward models, highlighting the untapped potential of existing preference datasets and demonstrating how human-AI curation synergy can unlock significantly higher data quality.

尽管奖励模式(RMs)在从人类反馈(RLHF)中强化学习方面发挥着关键作用,但目前最先进的公开RMs在多数现有评价基准方面表现不佳,未能捕捉到细微和复杂的人类偏好范围。即使采用先进培训技术的做法也没有带来有意义的绩效改进。我们推测,这种微弱主要来自优惠数据集的局限性,这些数据集往往范围狭窄,贴上合成标签,或缺乏严格的质量控制。为了应对这些挑战,我们提出了一个由4,000万对特惠的SynPref-40M组成的大规模公开优惠数据集。为了能够在规模上调整数据,我们设计了人类-AI两阶段的协同管道,利用了人类注解质量和AI可伸缩的互补优势。在这个管道中,人类提供经核实的说明,而大型语言模型则根据人文指南进行自动缩放。我们引入了Skywork-Reward-V2的组合,这是一套从0.6B到8B的奖赏基准,从Syprepref-ral-rol-rolation sload supal press relity sqreal lader lax lax lax lax lax lax ex s real dal dal lax lax lax lax a lax lax lax lax lax lax a lax lax


Article 131

Title@2025-07-03 (4): SecAlign: Defending Against Prompt Injection with Preference Optimization

Title: SecAlign: Defending Against Prompt Injection with Preference Optimization SecAlign: Abwehr gegen prompte Injektion mit Preference-Optimierung Secallign: 以优先优化防范迅速注射 2410.05451v3

Authors (6): Sizhe Chen, Arman Zharmagambetov, Saeed Mahloujifar, Kamalika Chaudhuri, David Wagner, Chuan Guo

Large language models (LLMs) are becoming increasingly prevalent in modern software systems, interfacing between the user and the Internet to assist with tasks that require advanced language understanding. To accomplish these tasks, the LLM often uses external data sources such as user documents, web retrieval, results from API calls, etc. This opens up new avenues for attackers to manipulate the LLM via prompt injection. Adversarial prompts can be injected into external data sources to override the system’s intended instruction and instead execute a malicious instruction. To mitigate this vulnerability, we propose a new defense called SecAlign based on the technique of preference optimization. Our defense first constructs a preference dataset with prompt-injected inputs, secure outputs (ones that respond to the legitimate instruction), and insecure outputs (ones that respond to the injection). We then perform preference optimization on this dataset to teach the LLM to prefer the secure output over the insecure one. This provides the first known method that reduces the success rates of various prompt injections to <10%, even against attacks much more sophisticated than ones seen during training. This indicates our defense generalizes well against unknown and yet-to-come attacks. Also, SecAlign models are still practical with similar utility to the one before defensive training in our evaluations. Our code is at https://github.com/facebookresearch/SecAlign

大型语言模型(LLMS)在现代软件系统中日益流行,用户与互联网之间相互连接,协助完成需要高级语言理解的任务。为完成这些任务,LLM经常使用外部数据源,如用户文件、网络检索、API电话等。这为攻击者通过迅速注射操作LLM开辟了新的途径。反向提示可以注入外部数据源,以推翻系统预定的指令,而代之以恶意教学。为了减轻这种脆弱性,我们提议以优化优惠技术为基础,建立一个称为SecAlign的新防御系统。我们的防御系统首先以快速输入的投入、安全输出(符合合法指令的输出)和不安全输出(响应注射的输出)来构建首选数据集。然后我们优化这一数据集,让LLM更喜欢安全输出而不是不安全的输出。这提供了第一种已知的方法,将各种快速注入的成功率降低到 < 10%,甚至针对比培训期间看到的更复杂的攻击。这说明我们的防御系统在实际搜索之前仍然对未知的和防御性模型进行一般的通用的搜索。Secregial-tocom攻击中,Segration a pregration with vitutional deview of we preficial fal destral deview of extitutional deview and expecidudududustrational exmentaltituductions.


Article 132

Title@2025-07-03 (4): Path Planning using a One-shot-sampling Skeleton Map

Title: Path Planning using a One-shot-sampling Skeleton Map Pfadplanung mit einer One-Shot-Sampling-Skeleton-Karte 使用单点抽样的 Skeleton 地图进行路径规划 2507.02328v1

Authors (3): Gabriel O. Flores-Aquino, Octavio Gutierrez-Frias, Juan Irving Vasquez

Path planning algorithms aim to compute a collision-free path, and many works focus on finding the optimal distance path. However, for some applications, a more suitable approach is to balance response time, safety of the paths, and path length. In this context, a skeleton map is a useful tool in graph-based schemes, as it provides an intrinsic representation of free configuration space. However, skeletonization algorithms are very resource-intensive, being primarily oriented towards image processing tasks. We propose an efficient path-planning methodology that finds safe paths within an acceptable processing time. This methodology leverages a Deep Denoising Auto-Encoder (DDAE) based on U-Net architecture to compute a skeletonized version of the navigation map, which we refer to as SkelUnet. The SkelUnet network facilitates exploration of the entire workspace through one-shot sampling (OSS), as opposed to the iterative process used by exact algorithms or the probabilistic sampling process. SkelUnet is trained and tested on a dataset consisting of 12,500 bi-dimensional dungeon maps. The motion planning methodology is evaluated in a simulation environment for an Unmanned Aerial Vehicle (UAV) using 250 previously unseen maps, and assessed with various navigation metrics to quantify the navigability of the computed paths. The results demonstrate that using SkelUnet to construct a roadmap offers significant advantages, such as connecting all regions of free workspace, providing safer paths, and reducing processing times. These characteristics make this method particularly suitable for mobile service robots in structured environments.

路径规划算法旨在计算无碰撞路径,许多工作的重点是寻找最佳距离路径。然而,对于某些应用,更合适的方法是平衡响应时间、路径安全和路径长度。在这方面,骨架地图是基于图形的图法中有用的工具,因为它提供了自由配置空间的内在代表。然而,骨架化算法非常资源密集,主要面向图像处理任务。我们建议一种有效的路径规划方法,在可接受的处理时间内找到安全路径。这一方法利用基于U-Net结构的深 Denoising Auto-Encoder(DADEE)来计算导航图的骨架化版本,我们称之为SkelUnet。SkelUnet网络为整个工作空间的探索提供了有用的工具,因为它提供了一张直径抽样(OSSOSS),而不是精确算法或概率取样程序所使用的迭接进程。SkelUnet在一个由12 500双维的地名录构成的数据集中找到安全路径。在模拟环境中评估运动规划方法,用以将一个不正规的导航路径(Skeloyal road road road road road) 区域进行更精确的流程,并评估,以Sklibilalalal roadtravelil 和Skal roadal roadal roaddal roadal 提供各种系统,以Skildalbalbildalbildalbildalbildalbildaldaldaldaldal 。


Article 133

Title@2025-07-03 (4): High-Performance Reinforcement Learning on Spot: Optimizing Simulation Parameters with Distributional Measures

Title: High-Performance Reinforcement Learning on Spot: Optimizing Simulation Parameters with Distributional Measures Leistungsstarkes Stärkungslernen vor Ort: Simulationsparameter mit distributiven Maßnahmen optimieren 高绩效加强现场学习:以分配措施优化模拟参数 2504.17857v3

Authors (4): AJ Miller, Fangzhou Yu, Michael Brauckmann, Farbod Farshidian

This work presents an overview of the technical details behind a high performance reinforcement learning policy deployment with the Spot RL Researcher Development Kit for low level motor access on Boston Dynamics Spot. This represents the first public demonstration of an end to end end reinforcement learning policy deployed on Spot hardware with training code publicly available through Nvidia IsaacLab and deployment code available through Boston Dynamics. We utilize Wasserstein Distance and Maximum Mean Discrepancy to quantify the distributional dissimilarity of data collected on hardware and in simulation to measure our sim2real gap. We use these measures as a scoring function for the Covariance Matrix Adaptation Evolution Strategy to optimize simulated parameters that are unknown or difficult to measure from Spot. Our procedure for modeling and training produces high quality reinforcement learning policies capable of multiple gaits, including a flight phase. We deploy policies capable of over 5.2ms locomotion, more than triple Spots default controller maximum speed, robustness to slippery surfaces, disturbance rejection, and overall agility previously unseen on Spot. We detail our method and release our code to support future work on Spot with the low level API.

这项工作概述了高性能强化学习政策部署背后的技术细节,与波士顿动力点上低级发动机接入的Spot RL 研究者开发工具包一起,为波士顿动力点上低级发动机设置了高性能强化学习政策,这是第一次公开展示结束在Spot硬件上部署的强化学习政策,该培训守则通过Nvidia IsaacLab公开提供,部署守则通过波士顿动力站提供。我们利用Wasserstein距离和最大中值差异来量化硬件和模拟所收集的数据的分布差异,以衡量我们的模拟差距。我们利用这些措施作为变量矩阵适应进化战略的评分功能,优化从点上未知或难以测量的模拟参数。我们的建模和培训程序产生了高质量的强化学习政策,能够有多个格子,包括飞行阶段。我们部署的政策能够超过5.2Ms Locotion,超过三倍点默认控制器的最大速度,对滑滑透表面的坚固性,扰动阻力,以及先前在点上看不见的总体灵活性。我们详细介绍了我们的方法,并公布了我们的代码,以支持今后与低级API的点的工作。


Article 134

Title@2025-07-03 (4): Aligning Frozen LLMs by Reinforcement Learning: An Iterative Reweight-then-Optimize Approach

Title: Aligning Frozen LLMs by Reinforcement Learning: An Iterative Reweight-then-Optimize Approach Ausrichten von gefrorenen LLMs durch Verstärkungslernen: Ein iteratives Reweight-then-Optimize-Ansatz 通过强化学习将冻结的LLMs与 “ 强化学习:一种过渡性再加权再优化方法 “ 相匹配 2506.17828v2

Authors (9): Xinnan Zhang, Chenliang Li, Siliang Zeng, Jiaxiang Li, Zhongruo Wang, Kaixiang Lin, Songtao Lu, Alfredo Garcia, Mingyi Hong

Aligning large language models (LLMs) with human preferences usually requires fine-tuning methods such as RLHF and DPO. These methods directly optimize the model parameters, so they cannot be used in test-time to improve model performance, nor are they applicable when the model weights are not accessible. In contrast, test-time methods sidestep weight updates by leveraging reward functions to guide and improve output quality. However, they incur high inference costs, and their one-shot guidance is often based on imperfect reward or value functions, leading to suboptimal outputs. In this work, we present a method named Iterative Reweight-then-Optimize (IRO), a reinforcement learning (RL) framework that performs RL-style alignment of the (frozen) base model without touching its parameters. During training, each iteration (i) samples candidates from the base model, (ii) resamples using current value functions, and (iii) trains a new lightweight value function that guides the next decoding pass. At test time, the value functions are used to guide the base model generation via a search-based optimization process. Notably, users can apply IRO to align a model on their own dataset, similar to OpenAI’s reinforcement fine-tuning (RFT), but without requiring access to the model weights.

将大型语言模型(LLMS)与人类偏好对齐通常要求微调方法,如 RLHF 和 DPO 。 这些方法直接优化模型参数。 这些方法直接优化模型参数, 因而无法在测试时用于改进模型性能, 当模型重量无法使用时, 也不适用这些参数。 相反, 测试时间方法通过利用奖赏功能来更新边际权重, 以引导和改进产出质量, 产生高的推论成本, 他们的单向指导通常基于不完善的奖赏或价值功能, 导致次优化产出。 在这项工作中, 我们提出了一个名为“ 迭代超重时优化( IRO) ” 的方法, 一个强化学习( RL) 框架, 用来在不触动模型参数的情况下进行 RL 型对齐 。 相比之下, 测试期间, 测试对象( i) 利用当前价值函数进行抽样, 以及 (iii) 培训一个新的轻度值值值值值值值值函数, 引导下一个解码。 测试时, 将值函数用于指导基础模型生成, 模型生成, 通过基于基于搜索的搜索- RFSBSE , 。 用户可以应用到 IP AS AS 。


Article 135

Title@2025-07-03 (4): Transformer-based EEG Decoding: A Survey

Title: Transformer-based EEG Decoding: A Survey Transformer-basierte EEG-Dekodierung: Eine Umfrage 以变换器为基础的电子EEG编码:调查 2507.02320v1

Authors (2): Haodong Zhang, Hongqi Li

Electroencephalography (EEG) is one of the most common signals used to capture the electrical activity of the brain, and the decoding of EEG, to acquire the user intents, has been at the forefront of brain-computer/machine interfaces (BCIs/BMIs) research. Compared to traditional EEG analysis methods with machine learning, the advent of deep learning approaches have gradually revolutionized the field by providing an end-to-end long-cascaded architecture, which can learn more discriminative features automatically. Among these, Transformer is renowned for its strong handling capability of sequential data by the attention mechanism, and the application of Transformers in various EEG processing tasks is increasingly prevalent. This article delves into a relevant survey, summarizing the latest application of Transformer models in EEG decoding since it appeared. The evolution of the model architecture is followed to sort and organize the related advances, in which we first elucidate the fundamentals of the Transformer that benefits EEG decoding and its direct application. Then, the common hybrid architectures by integrating basic Transformer with other deep learning techniques (convolutional/recurrent/graph/spiking neural netwo-rks, generative adversarial networks, diffusion models, etc.) is overviewed in detail. The research advances of applying the modified intrinsic structures of customized Transformer have also been introduced. Finally, the current challenges and future development prospects in this rapidly evolving field are discussed. This paper aims to help readers gain a clear understanding of the current state of Transformer applications in EEG decoding and to provide valuable insights for future research endeavors.

与传统的电子计算机/机械界面(BCIS/BMIS)研究相比,深层学习方法的出现通过提供端到端的长层结构逐渐使这个领域发生革命,这种结构可以自动学习更具有歧视性的特征。其中,变异器因其通过关注机制处理连续数据的能力很强而闻名,而变异器在各种EEEG处理任务中应用各种变异器的帮助性能力越来越普遍。这篇文章进入了相关调查中,总结了电子计算机/机器界面(BCIS/BIS/BIS)自出现以来的变异器模型的最新应用情况。模型的演变有助于整理和整理相关进展,我们首先阐述了变异器的基本原理,有利于EEEG的分解及其直接应用。然后,普通混合结构通过将基本变异变器与其他深学习技术整合起来(变异器/子/变异变异器/变异式变异式变异数据应用各种未来变异数据应用) 变异变异变变变异器研究结构的当前变异/变异性变变异性变异性变变变变变变变变变变式研究结构, 变变变变式研究结构的变变变变变变变变变变变变变变变变变变式研究结构的变变变式研究结构的变变变变变变变变变式研究结构结构结构结构结构结构结构结构结构结构结构在基因变型结构结构结构结构的变变变变变式研究结构结构结构研究结构结构结构结构结构结构的变变变式结构结构中也应用了基因变变变变变变变变变变变式研究结构的变变变变式结构的变变变变变变变变变变变变变变变式结构的变式结构的变变式结构的变变变变变变变变变变变变式结构结构结构的变变变式结构结构结构结构结构结构结构结构结构结构结构的变式结构的变变变变式结构的变变变变变变变变变变变变变变变变变变变变变变式结构的变变变变变变变式结构变变变变变变变变变变变变变式结构,在变变变变变变变变变变变变


Article 136

Title@2025-07-03 (4): Generalization vs. Specialization under Concept Shift

Title: Generalization vs. Specialization under Concept Shift Verallgemeinerung vs. Spezialisierung unter Konzeptverschiebung 普遍化与概念转变下的专业化 2409.15582v2

Authors (3): Alex Nguyen, David J. Schwab, Vudtiwat Ngampruetikorn

Machine learning models are often brittle under distribution shift, i.e., when data distributions at test time differ from those during training. Understanding this failure mode is central to identifying and mitigating safety risks of mass adoption of machine learning. Here we analyze ridge regression under concept shift – a form of distribution shift in which the input-label relationship changes at test time. We derive an exact expression for prediction risk in the thermodynamic limit. Our results reveal nontrivial effects of concept shift on generalization performance, including a phase transition between weak and strong concept shift regimes and nonmonotonic data dependence of test performance even when double descent is absent. Our theoretical results are in good agreement with experiments based on transformers pretrained to solve linear regression; under concept shift, too long context length can be detrimental to generalization performance of next token prediction. Finally, our experiments on MNIST and FashionMNIST suggest that this intriguing behavior is present also in classification problems.

机器学习模型往往在分布式转换时是易碎的, 也就是说, 当测试时的数据分布与培训期间的数据分布不同时。 了解这一失败模式对于确定和减轻大规模采用机器学习的安全风险至关重要。 在这里, 我们分析概念转变下的山脊回归 – – 一种在测试时输入标签关系变化的分布式转变形式。 我们得出热力极限预测风险的准确表达方式。 我们的结果显示概念转变对一般化性能的非边际效应, 包括弱强概念转变制度之间的阶段过渡, 以及测试性能的非热量数据依赖性, 即使不存在双向下降。 我们的理论结果与基于变压器的实验是完全一致的; 在概念转变下, 过长的背景长度可能不利于下一个象征性预测的概括性表现。 最后, 我们关于MNIST 和 FashionMNIST的实验表明, 在分类问题中也存在这种诱人的行为 。


Article 137

Title@2025-07-03 (4): Improving Constrained Generation in Language Models via Self-Distilled Twisted Sequential Monte Carlo

Title: Improving Constrained Generation in Language Models via Self-Distilled Twisted Sequential Monte Carlo Verbesserung der eingeschränkten Generierung in Sprachmodellen durch selbstdestillierte Twisted Sequential Monte Carlo 通过自行筛选的蒙蒙特卡洛顺序改进语言模型中受约束的一代 2507.02315v1

Authors (3): Sooyeon Kim, Giung Nam, Juho Lee

Recent work has framed constrained text generation with autoregressive language models as a probabilistic inference problem. Among these, Zhao et al. (2024) introduced a promising approach based on twisted Sequential Monte Carlo, which incorporates learned twist functions and twist-induced proposals to guide the generation process. However, in constrained generation settings where the target distribution concentrates on outputs that are unlikely under the base model, learning becomes challenging due to sparse and uninformative reward signals. We show that iteratively refining the base model through self-distillation alleviates this issue by making the model progressively more aligned with the target, leading to substantial gains in generation quality.

最近的工作将限制的文本生成设计成自动递减语言模型,作为概率推论问题,其中赵等人(2024年)在扭曲的 “ 顺序 “ 蒙特卡洛(Conte Carlo)的基础上引入了有希望的方法,其中包括学习的扭曲功能和扭曲引发的建议,以指导生成过程;然而,在目标分配集中于基础模型下不可能实现的产出的受限制的生成环境中,由于微弱和不具有信息意义的奖励信号,学习变得具有挑战性。 我们表明,通过自我蒸馏反复地完善基础模型可以缓解这一问题,使模型与目标更加一致,导致生产质量的大幅提高。


Article 138

Title@2025-07-03 (4): Holistic Continual Learning under Concept Drift with Adaptive Memory Realignment

Title: Holistic Continual Learning under Concept Drift with Adaptive Memory Realignment Ganzheitliches kontinuierliches Lernen unter Konzept Drift mit adaptiver Memory-Neuausrichtung 适应性记忆调整概念下的全面持续学习 2507.02310v1

Authors (4): Alif Ashrafee, Jedrzej Kozal, Michal Wozniak, Bartosz Krawczyk

Traditional continual learning methods prioritize knowledge retention and focus primarily on mitigating catastrophic forgetting, implicitly assuming that the data distribution of previously learned tasks remains static. This overlooks the dynamic nature of real-world data streams, where concept drift permanently alters previously seen data and demands both stability and rapid adaptation. We introduce a holistic framework for continual learning under concept drift that simulates realistic scenarios by evolving task distributions. As a baseline, we consider Full Relearning (FR), in which the model is retrained from scratch on newly labeled samples from the drifted distribution. While effective, this approach incurs substantial annotation and computational overhead. To address these limitations, we propose Adaptive Memory Realignment (AMR), a lightweight alternative that equips rehearsal-based learners with a drift-aware adaptation mechanism. AMR selectively removes outdated samples of drifted classes from the replay buffer and repopulates it with a small number of up-to-date instances, effectively realigning memory with the new distribution. This targeted resampling matches the performance of FR while reducing the need for labeled data and computation by orders of magnitude. To enable reproducible evaluation, we introduce four concept-drift variants of standard vision benchmarks: Fashion-MNIST-CD, CIFAR10-CD, CIFAR100-CD, and Tiny-ImageNet-CD, where previously seen classes reappear with shifted representations. Comprehensive experiments on these datasets using several rehearsal-based baselines show that AMR consistently counters concept drift, maintaining high accuracy with minimal overhead. These results position AMR as a scalable solution that reconciles stability and plasticity in non-stationary continual learning environments.

传统持续学习方法将保留知识作为优先事项,主要侧重于减轻灾难性的遗忘,隐含地假设以前学到的任务的数据分布仍然静止不变。这忽略了真实世界数据流的动态性质,即概念的移动永久地改变以往看到的数据,要求稳定和迅速适应。我们引入了一个整体框架,用于在概念流下不断学习,通过不断变化的任务分布来模拟现实的情景。作为一个基线,我们考虑全面再学习(FR),模型从零开始在漂移分布的新标签样本上重新训练,虽然这种做法有效,但需要大量注解和计算间接费用。为克服这些局限性,我们提议了适应记忆重新定位的准确性(AMR),这是一个较轻的替代方法,为基于排练的学习者配备了一种漂流的适应机制。 AM(AMR)有选择地将过时的漂流阶级样本从重新放缓冲中去除,并用少量的最新实例对它进行再现,有效地将记忆与新分布相匹配。这些有目标的重新解读,同时减少对标签数据和计算以数量顺序进行计算。我们提议了一个轻量的缩缩缩缩的缩缩缩缩缩缩缩缩缩的模型,我们引入了四级的模型概念:在不断变动的模型中,在不断变动的模型中,在不断变动的模型中,在不断变动的模型中,在不断变动的模型中,在不断变动的轨道上显示的缩缩缩缩缩缩缩缩缩缩缩缩的模型中将显示的缩的缩缩缩缩缩缩缩缩缩的模型中,在前的模型中,在前的缩略图中,在前的模型中,在前的模型中,在前的模型中,在前的模型中,在前的模型中,在前的模型中,在前的缩缩缩缩缩缩缩缩的缩的缩的缩的缩的缩的缩的缩的缩的缩缩的缩的缩的缩的缩。


Article 139

Title@2025-07-03 (4): Distributional Soft Actor-Critic with Diffusion Policy

Title: Distributional Soft Actor-Critic with Diffusion Policy Verteilungs-Soft-Actor-Kritik mit Diffusionspolitik 配发软软软动作- 带有传播政策批评器 2507.01381v2

Authors (9): Tong Liu, Yinuo Wang, Xujie Song, Wenjun Zou, Liangfa Chen, Likun Wang, Bin Shuai, Jingliang Duan, Shengbo Eben Li

Reinforcement learning has been proven to be highly effective in handling complex control tasks. Traditional methods typically use unimodal distributions, such as Gaussian distributions, to model the output of value distributions. However, unimodal distribution often and easily causes bias in value function estimation, leading to poor algorithm performance. This paper proposes a distributional reinforcement learning algorithm called DSAC-D (Distributed Soft Actor Critic with Diffusion Policy) to address the challenges of estimating bias in value functions and obtaining multimodal policy representations. A multimodal distributional policy iteration framework that can converge to the optimal policy was established by introducing policy entropy and value distribution function. A diffusion value network that can accurately characterize the distribution of multi peaks was constructed by generating a set of reward samples through reverse sampling using a diffusion model. Based on this, a distributional reinforcement learning algorithm with dual diffusion of the value network and the policy network was derived. MuJoCo testing tasks demonstrate that the proposed algorithm not only learns multimodal policy, but also achieves state-of-the-art (SOTA) performance in all 9 control tasks, with significant suppression of estimation bias and total average return improvement of over 10% compared to existing mainstream algorithms. The results of real vehicle testing show that DSAC-D can accurately characterize the multimodal distribution of different driving styles, and the diffusion policy network can characterize multimodal trajectories.

实践证明,强化学习在处理复杂的控制任务方面非常有效,传统方法通常使用单式分配方式,如高森分配方式,以模拟价值分配的产出。然而,单式分配经常和容易导致价值函数估计的偏差,导致算法性能差。本文建议采用一个分配式强化学习算法,称为DSAC-D(分散软动作与传播政策),以解决在估计价值功能和获得多式联运政策表述方面存在的偏差方面的挑战。通过引入政策激化和价值分配功能,建立了可与最佳政策趋同的多式联运政策循环框架。通过使用一个推广模式进行反向抽样抽样,建立了能够准确描述多峰分配情况的传播值网络。在此基础上,提出了一种分配式强化学习算法,将价值网络和政策网络的双重传播结合起来。 MuJoCo测试任务表明,拟议的算法不仅可以学习多式联运政策,还可以在所有9项控制任务中实现州-艺术(SOTA)业绩。通过大幅抑制对多种峰值分配的分布方式进行精确的传播,并且对目前机动式分配模式模式进行真正的分析,可以准确地显示现有10个汽车主流分析结果。


Article 140

Title@2025-07-03 (4): Neural CRNs: A Natural Implementation of Learning in Chemical Reaction Networks

Title: Neural CRNs: A Natural Implementation of Learning in Chemical Reaction Networks Neurale CRNs: Eine natürliche Umsetzung des Lernens in chemischen Reaktionsnetzwerken CRNs:化学反应网络学习的自然应用 2409.00034v3

Authors (2): Rajiv Teja Nagipogu, John H. Reif

This work introduces Neural CRNs, a general-purpose chemical neural network framework that embeds learning directly into mass-action chemical reaction systems. Unlike prior approaches that chemically implement and compose discrete neural computations, Neural CRNs adopt an analog computing approach, where both forward and backward passes of learning are implemented as continuous-time evolutions of molecular concentrations. Such an analog formulation naturally aligns with the analog nature of chemical kinetics, yielding concise circuits and practicable reactions. We demonstrate this efficiency by constructing a streamlined supervised learning procedure executable in just two sequential stages. We then implement several learning circuits to demonstrate the framework’s linear and nonlinear modeling capabilities and to validate its learning procedure. These circuits are implemented entirely using unimolecular and bimolecular reactions, avoiding the complexity of higher-order chemistries. In summary, Neural CRNs offer a compact, scalable, and autonomous framework for biochemical learning, opening new avenues for adaptive computing in synthetic biology, bioengineering, and biomedicine.

这项工作引入了Neal CRNs,这是一个通用化学神经网络框架,直接将学习嵌入大规模行动化学反应系统,与以往化学实施和组成离散神经计算的方法不同,神经CRNs采用了模拟计算方法,即作为分子浓度的连续时间演进,采用前向和后向学习途径,这种模拟配方自然地与化学动能学的模拟性质相一致,产生简便的电路和切实反应。我们通过建立一个简化的监管学习程序,在两个相继阶段即可执行,来证明这一效率。我们随后实施了若干学习电路,以展示框架的线性和非线性建模能力,并验证其学习程序。这些电路完全使用非分子和双分子反应,避免更高级化学反应的复杂性。简而言之,Neural CRNs为生物化学学习提供了一个紧凑、可扩展和自主的框架,为合成生物学、生物工程和生物医学的适应性计算开辟了新的途径。


Article 141

Title@2025-07-03 (4): Bypass Back-propagation: Optimization-based Structural Pruning for Large Language Models via Policy Gradient

Title: Bypass Back-propagation: Optimization-based Structural Pruning for Large Language Models via Policy Gradient Bypass Back-Propagation: Optimierungsbasiertes Structural Pruning für große Sprachmodelle über Policy Gradient Bypass 后回通信:通过 “ 政策梯度 “ 优化基于优化的结构结构,为大语言模式提供缓冲 2406.10576v3

Authors (5): Yuan Gao, Zujing Liu, Weizhong Zhang, Bo Du, Gui-Song Xia

Recent Large-Language Models (LLMs) pruning methods typically operate at the post-training phase without the expensive weight finetuning, however, their pruning criteria often rely on heuristically hand-crafted metrics, potentially leading to suboptimal performance. We instead propose a novel optimization-based structural pruning that learns the pruning masks in a probabilistic space directly by optimizing the loss of the pruned model. To preserve efficiency, our method eliminates the back-propagation through the LLM per se during optimization, requiring only the forward pass of the LLM. We achieve this by learning an underlying Bernoulli distribution to sample binary pruning masks, where we decouple the Bernoulli parameters from LLM loss, facilitating efficient optimization via policy gradient estimator without back-propagation. Thus, our method can 1) support global and heterogeneous pruning (i.e., automatically determine different redundancy for different layers), and 2) optionally initialize with a metric-based method (for our Bernoulli distributions). Extensive experiments conducted on LLaMA, LLaMA-2, LLaMA-3, Vicuna, and Mistral models using the C4 and WikiText2 datasets demonstrate the promising performance of our method in efficiency and effectiveness. Code is available at https://github.com/ethanygao/backprop-free_LLM_pruning.

最近的大型语言模型(LLMS)修剪方法通常在培训后阶段运作,没有昂贵的重量微调,但是,它们的修剪标准往往依赖于超自然手工制作的模量,可能导致业绩不理想。我们建议采用新的优化结构裁剪方法,直接通过优化修剪模型的丢失,在概率空间学习修剪面罩。为了保持效率,我们的方法在优化期间消除了通过LLM 本身的反向调整,只需要LM的向前传。 我们通过学习Bernoulli向模范双向模范遮罩的配送实现这一点,我们从LLM损失中分解Bernoulli参数,通过政策梯度估测仪促进高效优化,而无需反调。因此,我们的方法可以(1) 支持全球和混杂的修剪裁(即自动确定不同层次的不同冗余),以及(2) 采用基于标准的方法(我们的Bernoulli分发),我们通过Bernoulli来做到这一点。 在LMAMA、LAMA 2 和LMA 2 数据展示具有前景性的方法中,在Misabia-LMA 2中进行广泛的试验。


Article 142

Title@2025-07-03 (4): Kernel Density Bayesian Inverse Reinforcement Learning

Title: Kernel Density Bayesian Inverse Reinforcement Learning Kernel Dichte Bayesian Inverse Verstärkung Lernen 内核冷密度贝耶斯反强化学习 2303.06827v4

Authors (6): Aishwarya Mandyam, Didong Li, Jiayu Yao, Diana Cai, Andrew Jones, Barbara E. Engelhardt

Inverse reinforcement learning (IRL) methods infer an agent’s reward function using demonstrations of expert behavior. A Bayesian IRL approach models a distribution over candidate reward functions, capturing a degree of uncertainty in the inferred reward function. This is critical in some applications, such as those involving clinical data. Typically, Bayesian IRL algorithms require large demonstration datasets, which may not be available in practice. In this work, we incorporate existing domain-specific data to achieve better posterior concentration rates. We study a common setting in clinical and biological applications where we have access to expert demonstrations and known reward functions for a set of training tasks. Our aim is to learn the reward function of a new test task given limited expert demonstrations. Existing Bayesian IRL methods impose restrictions on the form of input data, thus limiting the incorporation of training task data. To better leverage information from training tasks, we introduce kernel density Bayesian inverse reinforcement learning (KD-BIRL). Our approach employs a conditional kernel density estimator, which uses the known reward functions of the training tasks to improve the likelihood estimation across a range of reward functions and demonstration samples. Our empirical results highlight KD-BIRL’s faster concentration rate in comparison to baselines, particularly in low test task expert demonstration data regimes. Additionally, we are the first to provide theoretical guarantees of posterior concentration for a Bayesian IRL algorithm. Taken together, this work introduces a principled and theoretically grounded framework that enables Bayesian IRL to be applied across a variety of domains.

反强化学习( IRL) 方法用专家行为演示来推断代理人的奖赏功能。 巴伊西亚的IRL 方法模拟候选人奖赏功能的分布,在推断奖赏功能中捕捉到一定程度的不确定性。 这在某些应用中至关重要, 如临床数据。 典型地, 巴伊西亚的IRL 算法需要大型的示范数据集, 在实践中可能没有这些数据。 在这项工作中, 我们整合了现有的特定领域数据, 以达到更好的事后集结率。 我们研究了临床和生物应用领域的共同背景。 我们研究的是临床和生物应用领域, 在那里, 我们能够获得专家演示任务的专家演示和已知的奖赏功能。 我们的目的是学习新的测试任务的奖赏功能, 在有限的专家演示演示演示活动中, 现有的巴伊西亚的IR 方法对输入数据的形式施加限制, 从而限制了培训任务数据的整合。 为了更好地利用培训任务中的信息, 我们引入了内核密度的贝亚反强化学习( KD- BIR) 。 我们的方法使用一个有条件的内核深度深度测测测测测度框架, 利用已知的培训任务应用的深度测算功能, 使得我们测算中测算的测算的测算结果的测算为我们测算的测算的测算率的测算结果, 。


Article 143

Title@2025-07-03 (4): REINFORCE++: An Efficient RLHF Algorithm with Robustness to Both Prompt and Reward Models

Title: REINFORCE++: An Efficient RLHF Algorithm with Robustness to Both Prompt and Reward Models REINFORCE++: Effizienter RLHF-Algorithmus mit Robustheit sowohl für Prompt- als auch für Reward-Modelle REINFORCE++: 高效的RLHF对快速模型和奖励模型具有强力的测算法 2501.03262v4

Authors (14): Jian Hu, Xibin Wu, Wei Shen, Jason Klein Liu, Zilin Zhu, Weixun Wang, Songlin Jiang, Haoran Wang, Hao Chen, Bin Chen, Weikai Fang, Xianyu, Yu Cao, Haotian Xu

Large Language Models (LLMs) fine-tuned via Reinforcement Learning from Human Feedback (RLHF) and Reinforcement Learning with Verifiable Rewards (RLVR) significantly improve the alignment of human-AI values and further raise the upper bound of AI capabilities, particularly in reasoning-intensive, long-context Chain-of-Thought (long-CoT) tasks. However, existing RLHF (or RLVR) frameworks commonly face challenges such as inference bottlenecks and complexity barriers, restricting their accessibility for newcomers. To bridge this gap, we introduce \textbf{OpenRLHF}, a user-friendly, scalable, and easy-to-learn open-source RLHF framework built upon Ray, vLLM, DeepSpeed, and HuggingFace Transformers, featuring a simplified design, clear code structure, and comprehensive documentation to facilitate entry for researchers and practitioners. Experimental results show that OpenRLHF achieves superior training efficiency with speedups ranging from 1.22x to 1.68x across different model sizes compared to state-of-the-art frameworks, while requiring significantly fewer lines of code for implementation. OpenRLHF is publicly available at https://github.com/OpenRLHF/OpenRLHF, and has already been adopted by leading institutions to accelerate RLHF research and learning.

大型语言模型(LLM)通过 “ 人类反馈强化学习(RLHF) “ 和 “ 可验证的奖励强化学习(RLVR) “ 进行微调,大大改进了人类-AI价值的一致性,进一步提高了AI能力的上限,特别是在推理密集型、长文本链(Long-CoT)任务方面。但是,现有的RLHF(或RLVR)框架通常面临一些挑战,如推论瓶颈和复杂障碍,限制了新来者进入。为了缩小这一差距,我们引入了 “ 人类-AI “ 的强化学习(LLLHF),这是一个方便用户、可扩展和易读的开放源的开放RLHF框架,在Ray、vLLM、DeepSpeed和Hugg Face变异体(LFace Grofters)上建了一个方便研究人员进入的简化设计、清晰的代码结构和综合文件。实验结果表明,OpreloadRHF在与州-Rart框架相比,在不同的模型规模从1.22到1.6x-RHFMFML实施方面实现了优优优优优优等标准。


Article 144

Title@2025-07-03 (4): DoMIX: An Efficient Framework for Exploiting Domain Knowledge in Fine-Tuning

Title: DoMIX: An Efficient Framework for Exploiting Domain Knowledge in Fine-Tuning DoMIX: Ein effizientes Framework zur Nutzung von Domain-Wissen im Feintuning DoMIX:一个在微调中利用域知识的有效框架 2507.02302v1

Authors (3): Dohoon Kim, Donghun Kang, Taesup Moon

Domain-Adaptive Pre-training (DAP) has recently gained attention for its effectiveness in fine-tuning pre-trained models. Building on this, continual DAP has been explored to develop pre-trained models capable of incrementally incorporating different domain datasets. However, existing continual DAP methods face several limitations: (1) high computational cost and GPU memory usage during training; (2) sensitivity to incremental data order; and (3) providing a single, generalized model for all end tasks, which contradicts the essence of DAP. In this paper, we propose DoMIX, a novel approach that addresses these challenges by leveraging LoRA modules, a representative parameter-efficient fine-tuning (PEFT) method. Our approach enables efficient and parallel domain-adaptive pre-training that is robust to domain order and effectively utilizes accumulated knowledge to provide tailored pre-trained models for specific tasks. We also demonstrate that our method can be extended beyond the DAP setting to standard LLM fine-tuning scenarios. Code is available at https://github.com/dohoonkim-ai/DoMIX.

最近,在经过培训的模型的微调方面,培训前的适应性(DAP)取得了成效,这一点最近引起了人们的注意。在此基础上,对持续的培训前模式进行了探索,以开发能够逐步纳入不同领域数据集的经过培训的模型,然而,现有的持续的培训前模式面临若干限制:(1) 培训期间的计算成本高和GPU记忆使用率高;(2) 对增量数据顺序的敏感性;(3) 为所有最终任务提供单一的通用模式,这与培训前模式的本质相矛盾。我们在本文件中提议DoMIX,这是通过利用具有代表性的参数效率微调(PEFT)模块来应对这些挑战的一种新颖方法。我们的方法使得高效和平行的对域适应性培训前方法能够对域的秩序进行有力利用,并有效地利用积累的知识为具体任务提供经过专门培训的模型。我们还表明,我们的方法可以超越DAP设置的范围,扩大到标准的LM微调情景。代码可在 https://github.com/dohoonkim-ai/DoMIX查阅。


Article 145

Title@2025-07-03 (4): Implicit Counterfactual Data Augmentation for Robust Learning

Title: Implicit Counterfactual Data Augmentation for Robust Learning Implizite gegenfaktische Datenvergrößerung für robustes Lernen 强力学习所需的反事实数据放大 2304.13431v3

Authors (3): Xiaoling Zhou, Ou Wu, Michael K. Ng

Machine learning models are prone to capturing the spurious correlations between non-causal attributes and classes, with counterfactual data augmentation being a promising direction for breaking these spurious associations. However, generating counterfactual data explicitly poses a challenge, and incorporating augmented data into the training process decreases training efficiency. This study proposes an Implicit Counterfactual Data Augmentation (ICDA) method to remove spurious correlations and make stable predictions. Specifically, first, a novel sample-wise augmentation strategy is developed that generates semantically and counterfactually meaningful deep features with distinct augmentation strength for each sample. Second, we derive an easy-to-compute surrogate loss on the augmented feature set when the number of augmented samples becomes infinite. Third, two concrete schemes are proposed, including direct quantification and meta-learning, to derive the key parameters for the robust loss. In addition, ICDA is explained from a regularization perspective, revealing its capacity to improve intra-class compactness and augment margins at both class and sample levels. Extensive experiments have been conducted across various biased learning scenarios covering both image and text datasets, demonstrating that ICDA consistently enhances the generalization and robustness performance of popular networks.

机器学习模型容易捕捉非因果属性和类别之间的虚假关联,反事实数据增强是打破这些虚假协会的一个大有希望的方向。然而,生成反事实数据显然构成挑战,并将扩大的数据纳入培训过程会降低培训效率。本研究报告提出了隐含反事实数据增强方法,以消除虚假关联和作出稳定的预测。具体地说,首先,开发了新型的样本增强战略,在每样样本中产生具有明显增强力的精细精细特征。第二,我们在扩大的样本数量无限时,对强化的功能设定的功能进行简单化的代谢损失。第三,提出了两个具体计划,包括直接量化和元学习,以得出稳健损失的关键参数。此外,ICDA从正规化角度解释其提高本级内部缩缩缩缩和增加等级和样本级间距的能力。在包括图像和文本数据集在内的各种有偏差的学习假设中进行了广泛的实验,表明ICDA始终加强通用网络的一般性和稳健性。


Article 146

Title@2025-07-03 (4): Knowledge Graph-Based Explainable and Generalized Zero-Shot Semantic Communications

Title: Knowledge Graph-Based Explainable and Generalized Zero-Shot Semantic Communications Wissensgraphisch erklärbare und generalisierte zero-shot semantische Kommunikation 基于知识的图表可解释和通用的零热语义通信 2507.02291v1

Authors (5): Zhaoyu Zhang, Lingyi Wang, Wei Wu, Fuhui Zhou, Qihui Wu

Data-driven semantic communication is based on superficial statistical patterns, thereby lacking interpretability and generalization, especially for applications with the presence of unseen data. To address these challenges, we propose a novel knowledge graph-enhanced zero-shot semantic communication (KGZS-SC) network. Guided by the structured semantic information from a knowledge graph-based semantic knowledge base (KG-SKB), our scheme provides generalized semantic representations and enables reasoning for unseen cases. Specifically, the KG-SKB aligns the semantic features in a shared category semantics embedding space and enhances the generalization ability of the transmitter through aligned semantic features, thus reducing communication overhead by selectively transmitting compact visual semantics. At the receiver, zero-shot learning (ZSL) is leveraged to enable direct classification for unseen cases without the demand for retraining or additional computational overhead, thereby enhancing the adaptability and efficiency of the classification process in dynamic or resource-constrained environments. The simulation results conducted on the APY datasets show that the proposed KGZS-SC network exhibits robust generalization and significantly outperforms existing SC frameworks in classifying unseen categories across a range of SNR levels.

数据驱动语义通信以表面统计模式为基础,因此缺乏可解释性和概括性,特别是对于有不可见数据存在的应用而言。为应对这些挑战,我们提议建立一个新型知识图形强化零光语义通信(KGZS-SC)网络。在基于知识图形的语义知识库(KG-SKB)提供结构化语义信息的指导下,我们的计划提供了通用语义表达方式,并为隐蔽案例提供了推理。具体地说,KG-SKB将共享类语义嵌入空间的语义特征统一起来,并通过一致的语义特征加强发射器的通用能力,从而通过选择性传输压缩视觉语义来减少通信管理。在接收器上,零光学学习(ZSL)被利用,以便能够直接分类不要求再培训或额外的计算间接成本,从而增强动态或资源紧张环境中分类过程的适应性和效率。在APY数据集上进行的模拟结果表明,拟议的KGZS-SC网络展示了强大的常规化,大大超越了现有星域框架的系统等级。


Article 147

Title@2025-07-03 (4): Optimal strategies to perform multilingual analysis of social content for a novel dataset in the tourism domain

Title: Optimal strategies to perform multilingual analysis of social content for a novel dataset in the tourism domain Optimale Strategien zur mehrsprachigen Analyse sozialer Inhalte für einen neuartigen Datensatz im Tourismusbereich 为旅游领域新数据集的社会内容进行多语种社会内容分析的最佳最佳战略 2311.14727v2

Authors (6): Maxime Masson, Rodrigo Agerri, Christian Sallaberry, Marie-Noelle Bessagnet, Annig Le Parc Lacayrelle, Philippe Roose

The rising influence of social media platforms in various domains, including tourism, has highlighted the growing need for efficient and automated Natural Language Processing (NLP) strategies to take advantage of this valuable resource. However, the transformation of multilingual, unstructured, and informal texts into structured knowledge still poses significant challenges, most notably the never-ending requirement for manually annotated data to train deep learning classifiers. In this work, we study different NLP techniques to establish the best ones to obtain competitive performances while keeping the need for training annotated data to a minimum. To do so, we built the first publicly available multilingual dataset (French, English, and Spanish) for the tourism domain, composed of tourism-related tweets. The dataset includes multilayered, manually revised annotations for Named Entity Recognition (NER) for Locations and Fine-grained Thematic Concepts Extraction mapped to the Thesaurus of Tourism and Leisure Activities of the World Tourism Organization, as well as for Sentiment Analysis at the tweet level. Extensive experimentation comparing various few-shot and fine-tuning techniques with modern language models demonstrate that modern few-shot techniques allow us to obtain competitive results for all three tasks with very little annotation data: 5 tweets per label (15 in total) for Sentiment Analysis, 30 tweets for Named Entity Recognition of Locations and 1K tweets annotated with fine-grained thematic concepts, a highly fine-grained sequence labeling task based on an inventory of 315 classes. We believe that our results, grounded in a novel dataset, pave the way for applying NLP to new domain-specific applications, reducing the need for manual annotations and circumventing the complexities of rule-based, ad-hoc solutions.

社交媒体平台在包括旅游在内的各个领域的影响不断提高,突出表明越来越需要利用这一宝贵资源,制定高效和自动化的自然语言处理(NLP)战略,以利用这一宝贵资源;然而,将多语言、无结构文本和非正式文本转换为结构化知识,这仍构成重大挑战,其中最突出的是,对人工编制附加说明的数据,以培训深层次学习分类人员的要求是永无止尽的。在这项工作中,我们研究不同的NLP技术,以建立最佳数据获得有竞争力的业绩,同时将附加说明的数据培训保持在最低限度。为此,我们为旅游领域建立了第一个公开提供的多语言(法语、英语和西班牙语)数据集(法语、英语和西班牙语),由旅游相关推文构成的旅游领域。该数据集包括多层次、手工修订的实体识别(NER)图示,用于地点和精细化主题概念,用于培训深层次的学习分类。我们研究了不同的NLP技术,以及基于推文的解解决方案。将各种新点和微调整技术与现代语言模型进行了广泛的实验,用现代的精细图解推理学应用技术降低了我们网站上的推算,使得我们得以获得高额标签的SEN的SEN数据,用于所有3号的SEN的排名。


Article 148

Title@2025-07-03 (4): Prompt Disentanglement via Language Guidance and Representation Alignment for Domain Generalization

Title: Prompt Disentanglement via Language Guidance and Representation Alignment for Domain Generalization Prompt Entwirren über Sprachführung und Repräsentation Ausrichtung für Domain-Verallgemeinerung 通过语言指导和代表对称调整,迅速解脱 2507.02288v1

Authors (6): De Cheng, Zhipeng Xu, Xinyang Jiang, Dongsheng Li, Nannan Wang, Xinbo Gao

Domain Generalization (DG) seeks to develop a versatile model capable of performing effectively on unseen target domains. Notably, recent advances in pre-trained Visual Foundation Models (VFMs), such as CLIP, have demonstrated considerable potential in enhancing the generalization capabilities of deep learning models. Despite the increasing attention toward VFM-based domain prompt tuning within DG, the effective design of prompts capable of disentangling invariant features across diverse domains remains a critical challenge. In this paper, we propose addressing this challenge by leveraging the controllable and flexible language prompt of the VFM. Noting that the text modality of VFMs is naturally easier to disentangle, we introduce a novel framework for text feature-guided visual prompt tuning. This framework first automatically disentangles the text prompt using a large language model (LLM) and then learns domain-invariant visual representation guided by the disentangled text feature. However, relying solely on language to guide visual feature disentanglement has limitations, as visual features can sometimes be too complex or nuanced to be fully captured by descriptive text. To address this, we introduce Worst Explicit Representation Alignment (WERA), which extends text-guided visual prompts by incorporating an additional set of abstract prompts. These prompts enhance source domain diversity through stylized image augmentations, while alignment constraints ensure that visual representations remain consistent across both the original and augmented distributions. Experiments conducted on major DG datasets, including PACS, VLCS, OfficeHome, DomainNet, and TerraInc, demonstrate that our proposed method outperforms state-of-the-art DG methods.

常规通用 (DG) 寻求开发一个能够有效在无形目标域上有效运行的多功能模型。 值得注意的是, 诸如 CLIP 等经过事先训练的视觉基础模型( VFM ) 最近的进展表明, 在提高深层学习模型的普及能力方面, 具有相当大的潜力。 尽管对基于 VFM 的域在 DG 内部的快速调整日益重视, 有效设计能够在不同域间分解异变异特性的提示器仍是一个严峻的挑战。 但是, 在本文中, 我们建议通过利用 VFM 的可控性和灵活语言来应对这一挑战。 注意到 VFM 的文本模式自然更容易分解, 我们为文本导导的视觉基础模型引入了一个全新的框架。 这个框架首先自动分解了基于 VFM 的域, 然后学习了由分解的文本特性引导的域异变的视觉代表器。 然而, 仅仅依靠语言来引导视觉特征分解有局限性, 因为视觉特征有时过于复杂或微妙, 无法完全通过描述性文字来完全被解析。 在视觉结构上, 我们引入了最精确的图像格式的分布, , 继续 继续 继续 继续 , 通过视觉结构 展示 继续 继续 继续 方向 继续 , 展示 继续 继续 继续 方向 继续 继续 继续 继续 继续 继续 继续 方向 方向 继续 继续 继续 继续 继续 继续 继续 继续 继续 继续 方向 继续 继续 继续 继续 继续 继续 继续 继续 继续 继续 继续 继续 继续 继续 继续 继续 继续 继续 继续 继续 方向 方向 方向 继续 继续 继续 继续 继续 继续 继续 继续 方向 方向 方向 继续 方向 方向 方向 方向 方向 方向 方向 方向 方向 方向 方向 方向 方向 方向 方向 方向 方向 方向 方向 方向 方向 方向 方向 方向 , , , , , , , , , , , , , , , , 通过 , 通过 , 通过 , 通过 , 通过 展示 , 通过 , 通过 , 通过 , 通过 , 通过 , 通过


Article 149

Title@2025-07-03 (4): Content filtering methods for music recommendation: A review

Title: Content filtering methods for music recommendation: A review Methoden zur Inhaltsfilterung für Musikempfehlungen: Eine Rezension 音乐建议的内容过滤方法:审查 2507.02282v1

Authors (2): Terence Zeng, Abhishek K. Umrawal

Recommendation systems have become essential in modern music streaming platforms, shaping how users discover and engage with songs. One common approach in recommendation systems is collaborative filtering, which suggests content based on the preferences of users with similar listening patterns to the target user. However, this method is less effective on media where interactions are sparse. Music is one such medium, since the average user of a music streaming service will never listen to the vast majority of tracks. Due to this sparsity, there are several challenges that have to be addressed with other methods. This review examines the current state of research in addressing these challenges, with an emphasis on the role of content filtering in mitigating biases inherent in collaborative filtering approaches. We explore various methods of song classification for content filtering, including lyrical analysis using Large Language Models (LLMs) and audio signal processing techniques. Additionally, we discuss the potential conflicts between these different analysis methods and propose avenues for resolving such discrepancies.

在现代音乐流流平台中,建议系统变得至关重要,它决定了用户如何发现和接触歌曲。建议系统的一个共同方法是合作过滤,它显示的内容基于对目标用户具有类似听模式的用户的偏好。然而,这一方法对互动稀少的媒体效果较差。音乐就是这样一种媒介,因为音乐流服务的平均用户绝不会倾听绝大多数曲目。由于这种偏狭性,需要用其他方法解决若干挑战。本审查审查了在应对这些挑战方面的研究现状,重点是内容过滤在减少合作过滤方法所固有的偏见方面的作用。我们探索了用于内容过滤的各种歌曲分类方法,包括使用大语言模型(LLMS)和音频信号处理技术的流学分析。此外,我们讨论了这些不同分析方法之间的潜在冲突,并提出了解决这些差异的途径。


Article 150

Title@2025-07-03 (4): Causal Representation Learning with Generative Artificial Intelligence: Application to Texts as Treatments

Title: Causal Representation Learning with Generative Artificial Intelligence: Application to Texts as Treatments Kausales Repräsentationslernen mit generativer Künstlicher Intelligenz: Anwendung auf Texte als Behandlungen 产生人工智能的因果代表性学习:应用文字作为治疗 2410.00903v3

Authors (2): Kosuke Imai, Kentaro Nakamura

In this paper, we demonstrate how to enhance the validity of causal inference with unstructured high-dimensional treatments like texts, by leveraging the power of generative Artificial Intelligence (GenAI). Specifically, we propose to use a deep generative model such as large language models (LLMs) to efficiently generate treatments and use their internal representation for subsequent causal effect estimation. We show that the knowledge of this true internal representation helps disentangle the treatment features of interest, such as specific sentiments and certain topics, from other possibly unknown confounding features. Unlike existing methods, the proposed GenAI-Powered Inference (GPI) methodology eliminates the need to learn causal representation from the data, and hence produces more accurate and efficient estimates. We formally establish the conditions required for the nonparametric identification of the average treatment effect, propose an estimation strategy that avoids the violation of the overlap assumption, and derive the asymptotic properties of the proposed estimator through the application of double machine learning. Finally, using an instrumental variables approach, we extend the proposed methodology to the settings in which the treatment feature is based on human perception. The proposed GPI methodology is also applicable to text reuse where an LLM is used to regenerate existing texts. We conduct simulation and empirical studies, using the generated text data from an open-source LLM, Llama 3, to illustrate the advantages of our estimator over state-of-the-art causal representation learning algorithms.

在本文件中,我们展示了如何通过利用基因化人工智能(GenAI)的力量,提高因果推断与文本等非结构化高层次处理方法(GenAI)的有效性。具体地说,我们提议使用一个深层次的基因模型,如大型语言模型(LLMS),以有效产生治疗方法,并使用内部代表方法来进行随后的因果关系估计。我们表明,了解这种真正的内部代表方法有助于将感兴趣的治疗特征,例如具体情感和某些议题,与其他可能未知的混杂特征脱钩。与现有方法不同,拟议的GenAI权力推算法(GIGPI)方法消除了从数据中学习因果表述的必要性,从而得出更准确、更高效的估计数。我们正式确定对平均治疗效果进行非参数识别的必要条件,提出避免违反重叠假设的估计战略,并通过应用双机学习来得出拟议估算师的无约束性特性。最后,我们将拟议的方法推广到基于人类认知的处理特征的环境,即从现有LIM法文本到使用现有理论再利用的理论分析法。


Article 151

Title@2025-07-03 (4): It’s Hard to Be Normal: The Impact of Noise on Structure-agnostic Estimation

Title: It’s Hard to Be Normal: The Impact of Noise on Structure-agnostic Estimation Es ist schwer, normal zu sein: Der Einfluss von Lärm auf die strukturagnostische Abschätzung 很难正常:噪音对结构-不可计量估计的影响 2507.02275v1

Authors (3): Jikai Jin, Lester Mackey, Vasilis Syrgkanis

Structure-agnostic causal inference studies how well one can estimate a treatment effect given black-box machine learning estimates of nuisance functions (like the impact of confounders on treatment and outcomes). Here, we find that the answer depends in a surprising way on the distribution of the treatment noise. Focusing on the partially linear model of \citet{robinson1988root}, we first show that the widely adopted double machine learning (DML) estimator is minimax rate-optimal for Gaussian treatment noise, resolving an open problem of \citet{mackey2018orthogonal}. Meanwhile, for independent non-Gaussian treatment noise, we show that DML is always suboptimal by constructing new practical procedures with higher-order robustness to nuisance errors. These \emph{ACE} procedures use structure-agnostic cumulant estimators to achieve $r$-th order insensitivity to nuisance errors whenever the $(r+1)$-st treatment cumulant is non-zero. We complement these core results with novel minimax guarantees for binary treatments in the partially linear model. Finally, using synthetic demand estimation experiments, we demonstrate the practical benefits of our higher-order robust estimators.

结构- 不可知性因果推论研究 如何很好地估计了黑盒机器对骚扰功能的治疗效果, 黑盒机器对骚扰功能的学习估计( 如混乱者对治疗和结果的影响 ) 。 在此, 我们发现答案以令人惊讶的方式取决于治疗噪音的分布。 聚焦于部分线性模型\ citet{ robinson1988root} , 我们首先显示, 广泛采用的双机学习( DML) 估计值对于高斯治疗噪音来说是最小的速率- 最佳的, 解决了一个公开的骚扰功能问题( 比如混乱者对治疗和结果的影响 ) 。 同时, 对于独立的非加萨治疗噪音, 我们发现DML 总是不优化, 通过建立具有更高调度的干扰误差的新的实用程序。 这些程序使用结构- 不可知性累积性估测器来达到 $- 的对核糖分值的灵敏度差值, 只要$( +1) makey- mest comme- complain adationalimestational adviewal imal impalimalimalimalimpalimpalimpal impalimpal impalimpal impalimpalimpalimpal impalimpalimpalimpalimpalimpalimpalimpal impalimpal impalimpalimpalimpalimpalimpal impalpalimationsess 需要, 我们。


Article 152

Title@2025-07-03 (4): Benchmarking Generalizable Bimanual Manipulation: RoboTwin Dual-Arm Collaboration Challenge at CVPR 2025 MEIS Workshop

Title: Benchmarking Generalizable Bimanual Manipulation: RoboTwin Dual-Arm Collaboration Challenge at CVPR 2025 MEIS Workshop Benchmarking Generalizable Bimanual Manipulation: RoboTwin Dual-Arm Collaboration Challenge bei CVPR 2025 MEIS Workshop 基准的可通用二手操纵:2025年欧洲气象和气象科学研究所讲习班上的机器人双臂双臂合作挑战 2506.23351v2

Authors (99): Tianxing Chen, Kaixuan Wang, Zhaohui Yang, Yuhao Zhang, Zanxin Chen, Baijun Chen, Wanxi Dong, Ziyuan Liu, Dong Chen, Tianshuo Yang, Haibao Yu, Xiaokang Yang, Yusen Qin, Zhiqiang Xie, Yao Mu, Ping Luo, Tian Nian, Weiliang Deng, Yiheng Ge, Yibin Liu, Zixuan Li, Dehui Wang, Zhixuan Liang, Haohui Xie, Rijie Zeng, Yunfei Ge, Peiqing Cong, Guannan He, Zhaoming Han, Ruocheng Yin, Jingxiang Guo, Lunkai Lin, Tianling Xu, Hongzhe Bi, Xuewu Lin, Tianwei Lin, Shujie Luo, Keyu Li, Ziyan Zhao, Ke Fan, Heyang Xu, Bo Peng, Wenlong Gao, Dongjiang Li, Feng Jin, Hui Shen, Jinming Li, Chaowei Cui, Yu Chen, Yaxin Peng, Lingdong Zeng, Wenlong Dong, Tengfei Li, Weijie Ke, Jun Chen, Erdemt Bao, Tian Lan, Tenglong Liu, Jin Yang, Huiping Zhuang, Baozhi Jia, Shuai Zhang, Zhengfeng Zou, Fangheng Guan, Tianyi Jia, Ke Zhou, Hongjiu Zhang, Yating Han, Cheng Fang, Yixian Zou, Chongyang Xu, Qinglun Zhang, Shen Cheng, Xiaohe Wang, Ping Tan, Haoqiang Fan, Shuaicheng Liu, Jiaheng Chen, Chuxuan Huang, Chengliang Lin, Kaijun Luo, Boyu Yue, Yi Liu, Jinyu Chen, Zichang Tan, Liming Deng, Shuo Xu, Zijian Cai, Shilong Yin, Hao Wang, Hongshan Liu, Tianyang Li, Long Shi, Ran Xu, Huilin Xu, Zhengquan Zhang, Congsheng Xu, Jinchang Yang, Feng Xu

Embodied Artificial Intelligence (Embodied AI) is an emerging frontier in robotics, driven by the need for autonomous systems that can perceive, reason, and act in complex physical environments. While single-arm systems have shown strong task performance, collaborative dual-arm systems are essential for handling more intricate tasks involving rigid, deformable, and tactile-sensitive objects. To advance this goal, we launched the RoboTwin Dual-Arm Collaboration Challenge at the 2nd MEIS Workshop, CVPR 2025. Built on the RoboTwin Simulation platform (1.0 and 2.0) and the AgileX COBOT-Magic Robot platform, the competition consisted of three stages: Simulation Round 1, Simulation Round 2, and a final Real-World Round. Participants totally tackled 17 dual-arm manipulation tasks, covering rigid, deformable, and tactile-based scenarios. The challenge attracted 64 global teams and over 400 participants, producing top-performing solutions like SEM and AnchorDP3 and generating valuable insights into generalizable bimanual policy learning. This report outlines the competition setup, task design, evaluation methodology, key findings and future direction, aiming to support future research on robust and generalizable bimanual manipulation policies. The Challenge Webpage is available at https://robotwin-benchmark.github.io/cvpr-2025-challenge/.

人工智能(Embodied AI)是机器人的新兴前沿,其驱动力是需要能够感知、理性和在复杂的物理环境中行动的自主系统。虽然单臂系统已经表现出很强的任务性能,但协作双臂系统对于处理涉及僵硬、变形和触摸敏感物体的更复杂的任务至关重要。为推进这一目标,我们在第二次MEIS研讨会上发起了机器人双臂双臂协作挑战(CVPR 2025 2025)。在RoboTwin模拟平台(1.0和2.0)和AgileX COBOT-Magic机器人平台(AgilX COBOT-Magic机器人平台)上建立起来,竞争由三个阶段组成:模拟回合1、模拟回合2和最后现实世界回合。参与者完全处理了17项双臂操纵任务,包括僵硬、变形和触角假设。这项挑战吸引了64个全球团队和400多名参与者,产生了像SEM和AnchorDP3这样的最优秀的解决方案,并对通用双体政策学习产生了宝贵的见解。这份报告概述了竞争设置、强有力设计任务设计、关键结果以及未来战略评估方法,这是未来研究的基础研究,目的是要达到总目标。


Article 153

Title@2025-07-03 (4): Explainable AI for Comprehensive Risk Assessment for Financial Reports: A Lightweight Hierarchical Transformer Network Approach

Title: Explainable AI for Comprehensive Risk Assessment for Financial Reports: A Lightweight Hierarchical Transformer Network Approach Erklärbare KI für umfassende Risikobewertung für Finanzberichte: Ein leichter Hierarchischer Transformer-Netzwerkansatz 用于财务报告综合风险评估的可解释的AIAI:轻量级高压变压器网络方法 2506.23767v2

Authors (2): Xue Wen Tan, Stanley Kok

Every publicly traded U.S. company files an annual 10-K report containing critical insights into financial health and risk. We propose Tiny eXplainable Risk Assessor (TinyXRA), a lightweight and explainable transformer-based model that automatically assesses company risk from these reports. Unlike prior work that relies solely on the standard deviation of excess returns (adjusted for the Fama-French model), which indiscriminately penalizes both upside and downside risk, TinyXRA incorporates skewness, kurtosis, and the Sortino ratio for more comprehensive risk assessment. We leverage TinyBERT as our encoder to efficiently process lengthy financial documents, coupled with a novel dynamic, attention-based word cloud mechanism that provides intuitive risk visualization while filtering irrelevant terms. This lightweight design ensures scalable deployment across diverse computing environments with real-time processing capabilities for thousands of financial documents which is essential for production systems with constrained computational resources. We employ triplet loss for risk quartile classification, improving over pairwise loss approaches in existing literature by capturing both the direction and magnitude of risk differences. Our TinyXRA achieves state-of-the-art predictive accuracy across seven test years on a dataset spanning 2013-2024, while providing transparent and interpretable risk assessments. We conduct comprehensive ablation studies to evaluate our contributions and assess model explanations both quantitatively by systematically removing highly attended words and sentences, and qualitatively by examining explanation coherence. The paper concludes with findings, practical implications, limitations, and future research directions. Our code is available at https://github.com/Chen-XueWen/TinyXRA.

每个公开交易的美国公司都每年提交一份10K报告,其中载有对金融健康和风险的批判性洞察力。我们建议使用细易易碎的风险评估器(TinyXRA),这是一个轻巧和可解释的基于变压器的模型,自动评估这些报告给公司带来的风险。与以前完全依赖超额回报标准偏差(Fama-Forish模式调整后)的工作不同,前者不分青红皂白地惩罚上下游风险,TinyXRA包含对风险四分法的扭曲性、kurtsisis和Sortino比率,以便进行更全面的风险评估。我们利用TiniyBERT作为我们的编码师来有效处理冗长的金融文件,以及一个全新的动态、基于关注的字彩虹机制,提供直观风险的视觉化,同时过滤无关的术语。这种轻重的设计确保了在各种计算环境中的可升级部署,具有实时处理能力,这对计算资源有限的生产系统至关重要。我们使用三重损失模型进行风险四分法的分类,改进现有文献中的双向损失方法,通过捕捉风险方向和程度的数值解释。我们TinX公司的研究在2013年/20的准确度研究中,同时进行一项测试分析分析,我们的数据分析,对未来数据的准确性分析,然后通过测试评估,我们的数据分析,我们的数据在2013年的准确性分析,通过测试和持续了我们的行为评估,我们的数据分析分析,我们的数据分析,我们的数据分析,通过测试分析,然后进行。


Article 154

Title@2025-07-03 (4): Neural Graph Matching Improves Retrieval Augmented Generation in Molecular Machine Learning

Title: Neural Graph Matching Improves Retrieval Augmented Generation in Molecular Machine Learning Neural Graph Matching verbessert retrieval Augmented Generation im molekularen maschinellen Lernen 在分子机学习中改进回流增强一代的神经图相匹配 2502.17874v2

Authors (4): Runzhong Wang, Rui-Xi Wang, Mrunali Manjrekar, Connor W. Coley

Molecular machine learning has gained popularity with the advancements of geometric deep learning. In parallel, retrieval-augmented generation has become a principled approach commonly used with language models. However, the optimal integration of retrieval augmentation into molecular machine learning remains unclear. Graph neural networks stand to benefit from clever matching to understand the structural alignment of retrieved molecules to a query molecule. Neural graph matching offers a compelling solution by explicitly modeling node and edge affinities between two structural graphs while employing a noise-robust, end-to-end neural network to learn affinity metrics. We apply this approach to mass spectrum simulation and introduce MARASON, a novel model that incorporates neural graph matching to enhance a fragmentation-based neural network. Experimental results highlight the effectiveness of our design, with MARASON achieving 28% top-1 accuracy, a substantial improvement over the non-retrieval state-of-the-art accuracy of 19%. Moreover, MARASON outperforms both naive retrieval-augmented generation methods and traditional graph matching approaches. Code is publicly available at https://github.com/coleygroup/ms-pred

分子机学习随着几何深深学习的进步而越来越受欢迎。 与此同时,检索增强的一代已成为语言模型常用的一种原则性方法。然而,将检索增强的最佳整合到分子机学习中仍然不明确。图表神经网络从智能匹配中受益,以理解已回收的分子与查询分子的结构对齐。神经图形匹配提供了令人信服的解决方案,明确建模两个结构图形之间的节点和边缘近似,同时使用噪音-机器人、端到端神经网络学习亲近度度度。我们将这一方法应用于大规模频谱模拟并引入MARASON,这是一个包含神经图匹配的新模式,以强化基于碎裂的神经网络。实验结果突显了我们设计的有效性,MARASON实现了28%的顶层-1精确度,大大改进了19 %的非检索状态-艺术精确度。此外,MAARSON超越了天性检索-放大生成方法和传统图形匹配方法。代码在 https://github.com/coleygroup/s 上公开提供。


Article 155

Title@2025-07-03 (4): NLP4Neuro: Sequence-to-sequence learning for neural population decoding

Title: NLP4Neuro: Sequence-to-sequence learning for neural population decoding NLP4Neuro: Sequence-to-Sequence-Lernen für neuronale Populationsdekodierung NLP4Neuro:神经人口解码的序列到序列学习 2507.02264v1

Authors (7): Jacob J. Morra, Kaitlyn E. Fouke, Kexin Hang, Zichen He, Owen Traubert, Timothy W. Dunn, Eva A. Naumann

Delineating how animal behavior arises from neural activity is a foundational goal of neuroscience. However, as the computations underlying behavior unfold in networks of thousands of individual neurons across the entire brain, this presents challenges for investigating neural roles and computational mechanisms in large, densely wired mammalian brains during behavior. Transformers, the backbones of modern large language models (LLMs), have become powerful tools for neural decoding from smaller neural populations. These modern LLMs have benefited from extensive pre-training, and their sequence-to-sequence learning has been shown to generalize to novel tasks and data modalities, which may also confer advantages for neural decoding from larger, brain-wide activity recordings. Here, we present a systematic evaluation of off-the-shelf LLMs to decode behavior from brain-wide populations, termed NLP4Neuro, which we used to test LLMs on simultaneous calcium imaging and behavior recordings in larval zebrafish exposed to visual motion stimuli. Through NLP4Neuro, we found that LLMs become better at neural decoding when they use pre-trained weights learned from textual natural language data. Moreover, we found that a recent mixture-of-experts LLM, DeepSeek Coder-7b, significantly improved behavioral decoding accuracy, predicted tail movements over long timescales, and provided anatomically consistent highly interpretable readouts of neuron salience. NLP4Neuro demonstrates that LLMs are highly capable of informing brain-wide neural circuit dissection.

描述动物行为如何从神经活动中产生神经活动是神经科学的一个基本目标。 然而,随着整个大脑上千个神经神经元组成的网络对基本行为进行计算,这给在行为过程中调查大型、密集有线哺乳动物大脑中的神经作用和计算机制带来了挑战。 现代大型语言模型(LLMS)的骨干 — — 变异器 — — 已经成为来自较小神经群的神经解码功能的强大工具。 这些现代LMS从广泛的培训前学习中受益,其从序列到序列的学习被显示为新颖的任务和数据模式,这也可能会给神经从大型、整个大脑活动记录中解码带来优势。 在这里,我们展示了对现出LMLMLM的系统评估,以解码来自整个大脑人群的行为,称为NLP4NNUUROR,我们用来测试同时的钙成像和行为记录。通过NLP4NUFNU,我们发现LMLM公司在从更大型的内层、更清晰的内分解、更清晰的内变的内核数据,我们从最近对内变的内变的内变的内变的内变的内变。


Article 156

Title@2025-07-03 (4): MemAgent: Reshaping Long-Context LLM with Multi-Conv RL-based Memory Agent

Title: MemAgent: Reshaping Long-Context LLM with Multi-Conv RL-based Memory Agent MemAgent: Umgestalten von Langkontext-LLM mit Multi-Conv RL-basierten Speicheragenten MemerAgent: 与基于多Conv RL的内存代理重塑长文本LLM 2507.02259v1

Authors (11): Hongli Yu, Tinghong Chen, Jiangtao Feng, Jiangjie Chen, Weinan Dai, Qiying Yu, Ya-Qin Zhang, Wei-Ying Ma, Jingjing Liu, Mingxuan Wang, Hao Zhou

Despite improvements by length extrapolation, efficient attention and memory modules, handling infinitely long documents with linear complexity without performance degradation during extrapolation remains the ultimate challenge in long-text processing. We directly optimize for long-text tasks in an end-to-end fashion and introduce a novel agent workflow, MemAgent, which reads text in segments and updates the memory using an overwrite strategy. We extend the DAPO algorithm to facilitate training via independent-context multi-conversation generation. MemAgent has demonstrated superb long-context capabilities, being able to extrapolate from an 8K context trained on 32K text to a 3.5M QA task with performance loss < 5% and achieves 95%+ in 512K RULER test.

尽管在长度外推法、高效关注和记忆模块方面有所改进,但在外推过程中处理线性复杂且不出现性能退化的无限长文件仍然是长文本处理过程中的终极挑战。我们直接优化以端到端的方式执行长文本任务,并引入新的代理工作流程MemAgenti,该流程以部分方式阅读文字,并使用覆盖式战略更新记忆。我们扩展了DAPO算法,以通过独立通文多变量生成促进培训。MeAgency展示了超强长文本能力,能够从经过32K文本培训的8K环境推断为3.5M QA任务,其性能损失 < 5%,并在512K RULER测试中达到95。


Article 157

Title@2025-07-03 (4): Uncertainty-aware Reward Design Process

Title: Uncertainty-aware Reward Design Process Unsicheres Design von Belohnungen 不确定性-对不确定性的认知度-奖励设计流程 2507.02256v1

Authors (4): Yang Yang, Xiaolu Zhou, Bosong Ding, Miao Xin

Designing effective reward functions is a cornerstone of reinforcement learning (RL), yet it remains a challenging process due to the inefficiencies and inconsistencies inherent in conventional reward engineering methodologies. Recent advances have explored leveraging large language models (LLMs) to automate reward function design. However, their suboptimal performance in numerical optimization often yields unsatisfactory reward quality, while the evolutionary search paradigm demonstrates inefficient utilization of simulation resources, resulting in prohibitively lengthy design cycles with disproportionate computational overhead. To address these challenges, we propose the Uncertainty-aware Reward Design Process (URDP), a novel framework that integrates large language models to streamline reward function design and evaluation in RL environments. URDP quantifies candidate reward function uncertainty based on self-consistency analysis, enabling simulation-free identification of ineffective reward components while discovering novel reward components. Furthermore, we introduce uncertainty-aware Bayesian optimization (UABO), which incorporates uncertainty estimation to significantly enhance hyperparameter configuration efficiency. Finally, we construct a bi-level optimization architecture by decoupling the reward component optimization and the hyperparameter tuning. URDP orchestrates synergistic collaboration between the reward logic reasoning of the LLMs and the numerical optimization strengths of the Bayesian Optimization. We conduct a comprehensive evaluation of URDP across 35 diverse tasks spanning three benchmark environments. Our experimental results demonstrate that URDP not only generates higher-quality reward functions but also achieves significant improvements in the efficiency of automated reward design compared to existing approaches.

设计有效的奖赏职能是强化学习的基石,然而,由于常规奖赏工程方法存在效率低和不一致的内在差异,它仍然是一个具有挑战性的进程。最近的进展探索了利用大型语言模型(LLMs)实现奖赏功能设计自动化,然而,在数字优化方面,其业绩不理想,往往产生不令人满意的奖赏质量,而进化搜索模式显示模拟资源利用效率低下,导致设计周期过长,计算管理费用过高,造成过高的计算间接费用;为应对这些挑战,我们建议采用不确定性奖赏设计进程(URDP),这是一个新颖的框架,将大型语言模型整合起来,以简化RLE工程方法的奖赏职能设计和评价。URDP根据自我一致性分析,量化候选人奖赏功能的不确定性,在发现新奖赏部分的同时,允许不进行模拟性地确定无效的奖赏组成部分。此外,我们引入了不确定性评估周期优化(UABO)的优化(UAB),其中包括不确定性估算,以大幅提高超度配置效率。最后,我们通过拆分奖赏部分的优化和超度调整,建立一个双级优化机制。URDP(UDP)在对35级设计的逻辑上,还进行大幅的优化。


Article 158

Title@2025-07-03 (4): Listwise Preference Alignment Optimization for Tail Item Recommendation

Title: Listwise Preference Alignment Optimization for Tail Item Recommendation Listwise Preference Alignment Optimization for Tail Item Empfehlung 用于反尾项目建议的优先对齐优化列表 2507.02255v1

Authors (7): Zihao Li, Chao Yang, Tong Zhang, Yakun Chen, Xianzhi Wang, Guandong Xu, Daoyi Dong

Preference alignment has achieved greater success on Large Language Models (LLMs) and drawn broad interest in recommendation research. Existing preference alignment methods for recommendation either require explicit reward modeling or only support pairwise preference comparison. The former directly increases substantial computational costs, while the latter hinders training efficiency on negative samples. Moreover, no existing effort has explored preference alignment solutions for tail-item recommendation. To bridge the above gaps, we propose LPO4Rec, which extends the Bradley-Terry model from pairwise comparison to listwise comparison, to improve the efficiency of model training. Specifically, we derive a closed form optimal policy to enable more efficient and effective training without explicit reward modeling. We also present an adaptive negative sampling and reweighting strategy to prioritize tail items during optimization and enhance performance in tail-item recommendations. Besides, we theoretically prove that optimizing the listwise preference optimization (LPO) loss is equivalent to maximizing the upper bound of the optimal reward. Our experiments on three public datasets show that our method outperforms 10 baselines by a large margin, achieving up to 50% performance improvement while reducing 17.9% GPU memory usage when compared with direct preference optimization (DPO) in tail-item recommendation. Our code is available at https://github.com/Yuhanleeee/LPO4Rec.

在大语言模型(LLMs)上,参照点调整取得了更大的成功,并在建议研究中引起了广泛的兴趣。现有的建议优惠调整方法要么需要明确的奖赏模型,要么只是支持配对式优惠比较。前者直接增加大量的计算成本,而后者妨碍对负面样本的培训效率。此外,目前没有努力探索尾项建议中的优先调整办法。为了缩小上述差距,我们提议LPO4Rec,将布拉德利-Terray模型从对口比较扩大到列表比较,以提高示范培训的效率。具体地说,我们制定了一种封闭形式的最佳政策,以便在没有明确奖赏模型的情况下进行更有效率和更有效的培训。我们还提出了一个适应性的负抽样和重新加权战略,以便在优化时优先处理尾项,提高尾项建议的执行效率。此外,我们理论上证明,最优化列表式的优惠(LPO)损失相当于最大限度地实现最佳奖赏的上限。我们在三个公共数据集上的实验显示,我们的方法比10个基线大幅度,达到50%的绩效改进,同时在与直接优惠时减少17.9%的GPU记忆使用率(MAGPO/Requ)中,我们的尾项建议。


Article 159

Title@2025-07-03 (4): Circuit-tuning: A Mechanistic Approach for Identifying Parameter Redundancy and Fine-tuning Neural Networks

Title: Circuit-tuning: A Mechanistic Approach for Identifying Parameter Redundancy and Fine-tuning Neural Networks Schaltungs-Tuning: Mechanistischer Ansatz zur Identifizierung von Parameter Redundanz und Feinsteuerung neuraler Netzwerke 电路调控:确定参数冗余和精微调整神经网络的机械化方法 2502.06106v2

Authors (4): Yueyan Li, Wenhao Gao, Caixia Yuan, Xiaojie Wang

The study of mechanistic interpretability aims to reverse-engineer a model to explain its behaviors. While recent studies have focused on the static mechanism of a certain behavior, the learning dynamics inside a model remain to be explored. In this work, we develop an interpretable fine-tuning method for analyzing the mechanism behind learning. We first introduce the concept of node-level intrinsic dimensionality to describe the learning process of a model in a computational graph. Based on our theory, we propose circuit-tuning, a two-stage algorithm that iteratively builds the minimal subgraph for a specific task and updates the key parameters in a heuristic way. Experimental results confirm the existence of the intrinsic dimensionality at the node level and demonstrate the effectiveness of our method for transparent and interpretable fine-tuning. We visualize and analyze the circuits before, during, and after fine-tuning, providing new insights into the self-organization mechanism of a neural network in the learning process.

机械学解释性研究旨在逆向设计一种模型来解释其行为。 虽然最近的研究侧重于某种行为的静态机制, 模型内的学习动态仍有待探索。 在这项工作中, 我们开发了一种可解释的微调方法来分析学习后的机制。 我们首先引入了节点的内在维度概念来描述计算图中模型的学习过程。 我们根据我们的理论, 提出了电路调节, 这是一种两阶段的算法, 迭接地为特定任务构建了最起码的子集, 并以超常方式更新了关键参数。 实验结果证实节点一级存在内在的维度, 并展示了我们透明、 可解释的微调方法的有效性。 我们视觉化和分析了计算图之前、 期间 和 之后 的电路, 为学习过程中神经网络的自我组织机制提供了新的洞察力。


Article 160

Title@2025-07-03 (4): Adapter-Enhanced Semantic Prompting for Continual Learning

Title: Adapter-Enhanced Semantic Prompting for Continual Learning Adapter-erweiterte semantische Prompting für kontinuierliches Lernen 增强适应性强化的持续学习的语义提示 2412.11074v3

Authors (8): Baocai Yin, Ji Zhao, Huajie Jiang, Ningning Hou, Yongli Hu, Amin Beheshti, Ming-Hsuan Yang, Yuankai Qi

Continual learning (CL) enables models to adapt to evolving data streams. A major challenge of CL is catastrophic forgetting, where new knowledge will overwrite previously acquired knowledge. Traditional methods usually retain the past data for replay or add additional branches in the model to learn new knowledge, which has high memory requirements. In this paper, we propose a novel lightweight CL framework, Adapter-Enhanced Semantic Prompting (AESP), which integrates prompt tuning and adapter techniques. Specifically, we design semantic-guided prompts to enhance the generalization ability of visual features and utilize adapters to efficiently fuse the semantic information, aiming to learn more adaptive features for the continual learning task. Furthermore, to choose the right task prompt for feature adaptation, we have developed a novel matching mechanism for prompt selection. Extensive experiments on three CL datasets demonstrate that our approach achieves favorable performance across multiple metrics, showing its potential for advancing CL.

持续学习(CL)使模型能够适应不断演变的数据流。 CL(CL)的主要挑战之一是灾难性的忘记,因为新的知识将覆盖先前获得的知识。传统方法通常保留过去的数据,用于重放或增加模型中的更多分支,以学习具有高记忆要求的新知识。在本文中,我们提议了一个新型的轻巧CL框架,即适应-增强语义提示(AESP),将快速调试和适配技术结合起来。具体地说,我们设计语义引导提示,以提高视觉特征的通用能力,并利用适应器有效地整合语义信息,目的是学习更适应性更强的特征,以适应持续学习任务。此外,为了选择适合特征适应的正确任务,我们开发了一个用于迅速选择的新型匹配机制。在三个 CL数据集上的广泛实验表明,我们的方法在多个计量中取得了优异的性能,显示了其推进 CL的潜力。


Article 161

Title@2025-07-03 (4): Transfer Learning for Matrix Completion

Title: Transfer Learning for Matrix Completion Transfer-Lernen für Matrix-Vervollständigung 用于完成矩阵表完成的转移学习 2507.02248v1

Authors (2): Dali Liu, Haolei Weng

In this paper, we explore the knowledge transfer under the setting of matrix completion, which aims to enhance the estimation of a low-rank target matrix with auxiliary data available. We propose a transfer learning procedure given prior information on which source datasets are favorable. We study its convergence rates and prove its minimax optimality. Our analysis reveals that with the source matrices close enough to the target matrix, out method outperforms the traditional method using the single target data. In particular, we leverage the advanced sharp concentration inequalities introduced in \cite{brailovskaya2024universality} to eliminate a logarithmic factor in the convergence rate, which is crucial for proving the minimax optimality. When the relevance of source datasets is unknown, we develop an efficient detection procedure to identify informative sources and establish its selection consistency. Simulations and real data analysis are conducted to support the validity of our methodology.

在本文中,我们探索了在矩阵完成设置下的知识转让,其目的是加强对现有辅助数据的低级目标矩阵的估算;我们建议了一种转让学习程序,事先提供哪些源数据集是有利的信息;我们研究了其趋同率,并证明了其微量最佳性;我们的分析表明,由于源矩阵离目标矩阵足够近,方法比使用单一目标数据的传统方法要好;特别是,我们利用在\cite{brailovskaya2024普遍性中引入的高级急性浓度不平等,消除趋同率中的对数系数,这对于证明微量最大最佳性至关重要;当源数据集的相关性不明时,我们制定高效的检测程序,以查明信息源并确立其选择一致性;进行模拟和真实数据分析,以支持我们的方法的有效性。


Article 162

Title@2025-07-03 (4): Order Acquisition Under Competitive Pressure: A Rapidly Adaptive Reinforcement Learning Approach for Ride-Hailing Subsidy Strategies

Title: Order Acquisition Under Competitive Pressure: A Rapidly Adaptive Reinforcement Learning Approach for Ride-Hailing Subsidy Strategies Auftragserwerb unter Wettbewerbsdruck: Ein schnell adaptiver Lernansatz zur Stärkung der Reit-Hailing-Subsidy-Strategien 在竞争压力下取得订单:快速适应性强化学习办法,用于骑乘-快乐补贴战略 2507.02244v1

Authors (6): Fangzhou Shi, Xiaopeng Ke, Xinye Xiong, Kexin Meng, Chang Men, Zhengdan Zhu

The proliferation of ride-hailing aggregator platforms presents significant growth opportunities for ride-service providers by increasing order volume and gross merchandise value (GMV). On most ride-hailing aggregator platforms, service providers that offer lower fares are ranked higher in listings and, consequently, are more likely to be selected by passengers. This competitive ranking mechanism creates a strong incentive for service providers to adopt coupon strategies that lower prices to secure a greater number of orders, as order volume directly influences their long-term viability and sustainability. Thus, designing an effective coupon strategy that can dynamically adapt to market fluctuations while optimizing order acquisition under budget constraints is a critical research challenge. However, existing studies in this area remain scarce. To bridge this gap, we propose FCA-RL, a novel reinforcement learning-based subsidy strategy framework designed to rapidly adapt to competitors’ pricing adjustments. Our approach integrates two key techniques: Fast Competition Adaptation (FCA), which enables swift responses to dynamic price changes, and Reinforced Lagrangian Adjustment (RLA), which ensures adherence to budget constraints while optimizing coupon decisions on new price landscape. Furthermore, we introduce RideGym, the first dedicated simulation environment tailored for ride-hailing aggregators, facilitating comprehensive evaluation and benchmarking of different pricing strategies without compromising real-world operational efficiency. Experimental results demonstrate that our proposed method consistently outperforms baseline approaches across diverse market conditions, highlighting its effectiveness in subsidy optimization for ride-hailing service providers.

搭便车聚合物平台的激增为搭便车服务提供者带来了巨大的增长机会,增加了订货量和商品总价值。在大多数载车聚集物平台上,提供较低票价的服务提供商在上市上名列较高,因此更有可能被乘客选中。这一竞争性排名机制为服务提供商采取优惠券战略提供了强大的激励,这种优惠券战略可以降低价格,以确保更多的订单,因为订单量直接影响到其长期可行性和可持续性。因此,设计有效的优惠券战略,既能动态地适应市场波动,又在预算限制下优化采购订单是一项关键的研究挑战。然而,这一领域现有的研究仍然很少。为弥补这一差距,我们提议了一个全新的强化学习型补贴战略框架,旨在迅速适应竞争者的价格调整。我们的方法结合了两种关键技术:快速竞争适应,从而能够对动态的价格变化作出迅速反应,以及加强拉格兰加公司调整(拉格朗公司),确保坚持预算限制,同时优化在新的价格形势下实现购买票价决定。此外,我们提出了一个不因地调整成本而调整的、不因地调整而调整的、不因地调整的市场标准而调整的、不因地调整的、不因地调整的、在市场上调价价比价制而导致的首的、不因地调整后,我们提出了一个专门的模拟了一种专门的模拟的市场标准价比价比价制环境。


Article 163

Title@2025-07-03 (4): EigenLoRAx: Recycling Adapters to Find Principal Subspaces for Resource-Efficient Adaptation and Inference

Title: EigenLoRAx: Recycling Adapters to Find Principal Subspaces for Resource-Efficient Adaptation and Inference EigenLoRAx: Recycling-Adapter, um wesentliche Subräume für ressourceneffiziente Anpassung und Schlussfolgerung zu finden EigenLORAx:为资源效率适应和推断寻找主要子空间的再循环适应器 2502.04700v4

Authors (4): Prakhar Kaushik, Ankit Vaidya, Shravan Chaudhari, Alan Yuille

The rapid growth of large models has raised concerns about their environmental impact and equity in accessibility due to significant computational costs. Low-Rank Adapters (LoRA) offer a lightweight solution for finetuning large models, resulting in an abundance of publicly available adapters tailored to diverse domains. We ask: Can these pretrained adapters be leveraged to further streamline adaptation to new tasks while addressing these challenges? We introduce EigenLoRAx, a parameter-efficient finetuning method that recycles existing adapters to create a principal subspace aligned with their shared domain knowledge which can be further augmented with orthogonal basis vectors in low-resource scenarios. This enables rapid adaptation to new tasks by learning only lightweight coefficients on the principal components of the subspace-eliminating the need to finetune entire adapters. EigenLoRAx requires significantly fewer parameters and memory, improving efficiency for both training and inference. Our method demonstrates strong performance across diverse domains and tasks, offering a scalable for edge-based applications, personalization, and equitable deployment of large models in resource-constrained environments.

大型模型的迅速增长引起了人们对其环境影响的关切,以及由于计算成本的高昂而实现无障碍的公平性。低射线适应器(LORA)为微调大型模型提供了一个轻量化的解决方案,导致大量针对不同领域的公开可调适的适应器。我们问:能否利用这些事先经过培训的适应器进一步简化适应新任务,同时应对这些挑战?我们引入了Eigen LoRAx,这是一种具有参数效率的微调方法,使现有适应器循环利用,以创造与其共享的域知识相适应的主要次空间,在低资源情景下,可与正向矢量矢量相进一步扩展。这能够通过只学习小空间主要组成部分的轻量系数来快速适应新任务,从而消除微微调整整个适应器的需要。Eigen LoRAx需要大大减少参数和记忆,提高培训和推断的效率。我们的方法显示,在不同的领域和任务中都表现得力强,为边缘应用、个性化和在资源紧张的环境中公平部署大型模型提供了可扩展的尺度。


Article 164

Title@2025-07-03 (4): VERBA: Verbalizing Model Differences Using Large Language Models

Title: VERBA: Verbalizing Model Differences Using Large Language Models VERBA: Verbalisieren von Modellunterschieden mit großen Sprachmodellen VERBA: 使用大语言模式 2507.02241v1

Authors (3): Shravan Doda, Shashidhar Reddy Javaji, Zining Zhu

In the current machine learning landscape, we face a “model lake” phenomenon: Given a task, there is a proliferation of trained models with similar performances despite different behavior. For model users attempting to navigate and select from the models, documentation comparing model pairs is helpful. However, for every $N$ models there could be $O(N^2)$ pairwise comparisons, a number prohibitive for the model developers to manually perform pairwise comparisons and prepare documentations. To facilitate fine-grained pairwise comparisons among models, we introduced $\textbf{VERBA}$. Our approach leverages a large language model (LLM) to generate verbalizations of model differences by sampling from the two models. We established a protocol that evaluates the informativeness of the verbalizations via simulation. We also assembled a suite with a diverse set of commonly used machine learning models as a benchmark. For a pair of decision tree models with up to 5% performance difference but 20-25% behavioral differences, $\textbf{VERBA}$ effectively verbalizes their variations with up to 80% overall accuracy. When we included the models’ structural information, the verbalization’s accuracy further improved to 90%. $\textbf{VERBA}$ opens up new research avenues for improving the transparency and comparability of machine learning models in a post-hoc manner.

在目前的机器学习环境中,我们面临一个“模范湖”现象:根据任务,经过训练的模型数量激增,尽管行为不同,也有类似的表现。对于试图浏览和从模型中选择模型的模型用户来说,比较模型配对的文档很有帮助。然而,对于每一个美元模型来说,对于每个模型来说,都可能有美元(N)2,美元对比,模型开发者手动进行配对比较和编制文件的难度很大。为了便于在模型之间进行细微的对比比较,我们引入了美元。我们的方法利用一个大型语言模型(LLLM),通过对两个模型进行抽样抽样抽样,生成模型差异的口述。我们制定了一个协议,通过模拟来评估口述的口述内容。我们还组装了一个套套套,一套通用的机器学习模型作为基准。对于具有5%性能差异但20-25 %行为差异的决策树模型,我们引入了美元对口授(Textb{VERBA) $,有效地用80%的整体精确度来表达这些差异。当我们把模型的结构信息引入时, 将语言分析的准确度提高到90美元。


Article 165

Title@2025-07-03 (4): Bi-modality medical images synthesis by a bi-directional discrete process matching method

Title: Bi-modality medical images synthesis by a bi-directional discrete process matching method Bi-Modalität medizinische Bilder Synthese durch eine bi-direktionale diskrete Prozess-Matching-Methode 以双向离散进程匹配方法对双向单离散过程匹配方法进行双向医学图像合成 2409.03977v3

Authors (3): Zhe Xiong, Qiaoqiao Ding, Xiaoqun Zhang

Recently, medical image synthesis gains more and more popularity, along with the rapid development of generative models. Medical image synthesis aims to generate an unacquired image modality, often from other observed data modalities. Synthesized images can be used for clinical diagnostic assistance, data augmentation for model training and validation or image quality improving. In the meanwhile, the flow-based models are among the successful generative models for the ability of generating realistic and high-quality synthetic images. However, most flow-based models require to calculate flow ordinary different equation (ODE) evolution steps in synthesis process, for which the performances are significantly limited by heavy computation time due to a large number of time iterations. In this paper, we propose a novel flow-based model, namely bi-directional Discrete Process Matching (Bi-DPM) to accomplish the bi-modality image synthesis tasks. Different to other flow matching based models, we propose to utilize both forward and backward ODE flows and enhance the consistency on the intermediate images over a few discrete time steps, resulting in a synthesis process maintaining high-quality generations for both modalities under the guidance of paired data. Our experiments on three datasets of MRI T1/T2 and CT/MRI demonstrate that Bi-DPM outperforms other state-of-the-art flow-based methods for bi-modality image synthesis, delivering higher image quality with accurate anatomical regions.

最近,医学图像合成随着基因化模型的迅速发展而越来越受欢迎。医学图像合成旨在产生一种未获得的图像模式,通常来自其他观察到的数据模式。合成图像可用于临床诊断协助、模型培训和验证或图像质量的提高。与此同时,流动模型是成功生成现实和高质量合成图像的能力的基因化模型之一。然而,大多数流动模型需要计算合成过程中的普通不同方程(ODE)进化步骤,其性能由于大量时间的反复变化而大大限制了大量计算时间。在本文件中,我们提出一个新的流动模型,即双向分解进程匹配(Bi-DPM),以完成双向分散进程和图像合成任务。与其他流动模型不同的是,我们提议利用前向和后向的混合模型流动模型,并在几个离散时间步骤中提高中间图像的一致性,从而在对齐数据的指导下,使两种模式的计算过程的质量都大大受限制。我们提出的新的流动模型模型,即双向分向分解进程(B-DPMM-M-M-M-M-M-M-M-M-M-S-M-M-M-M-M-M-M-M-M-M-M-M-M-M-M-M-M-M-M-M-M-M-M-M-M-M-M-M-M-M-M-M-M-M-M-M-M-M-M-M-M-M-M-M-S-S-S-M-M-S-M-M-S-S-S-S-S-S-S-S-M-M-M-M-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-D-D-D-D-D-R-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S


Article 166

Title@2025-07-03 (4): Escaping Platos Cave: JAM for Aligning Independently Trained Vision and Language Models

Title: Escaping Platos Cave: JAM for Aligning Independently Trained Vision and Language Models Escaping Platons Cave: JAM for Aligning Independently Trained Vision and Language Models 撤离排洞穴:调整独立培训的愿景和语言模式JAM 2507.01201v2

Authors (4): Hyoseo, Yoon, Yisong Yue, Been Kim

Independently trained vision and language models inhabit disjoint representational spaces, shaped by their respective modalities, objectives, and architectures. Yet an emerging hypothesis - the Platonic Representation Hypothesis - suggests that such models may nonetheless converge toward a shared statistical model of reality. This compatibility, if it exists, raises a fundamental question: can we move beyond post-hoc statistical detection of alignment and explicitly optimize for it between such disjoint representations? We cast this Platonic alignment problem as a multi-objective optimization task - preserve each modality’s native structure while aligning for mutual coherence. We introduce the Joint Autoencoder Modulator (JAM) framework that jointly trains modality-specific autoencoders on the latent representations of pre-trained single modality models, encouraging alignment through both reconstruction and cross-modal objectives. By analogy, this framework serves as a method to escape Plato’s Cave, enabling the emergence of shared structure from disjoint inputs. We evaluate this framework across three critical design axes: (i) the alignment objective - comparing contrastive loss (Con), its hard-negative variant (NegCon), and our Spread loss, (ii) the layer depth at which alignment is most effective, and (iii) the impact of foundation model scale on representational convergence. Our findings show that our lightweight Pareto-efficient framework reliably induces alignment, even across frozen, independently trained representations, offering both theoretical insight and practical pathways for transforming generalist unimodal foundations into specialist multimodal models.

经过独立训练的愿景和语言模型位于不同代表空间,由各自的模式、目标和结构决定。但新出现的假设—-平板代表假设假设—-表明这些模型可能仍然会趋向于共同的现实统计模式。如果存在这种兼容性,则提出一个根本问题:我们能否超越热后统计检测对一致性的检测,明确优化这种不一致性代表之间的匹配?我们把这种平板对齐问题作为一个多目标优化任务――维护每种模式的本体结构,同时为相互一致而协调。我们引入了联合自动编码模型(JAM)框架,共同培训特定模式的自动调整模型,以了解事先培训过的单一模式模式模式的潜在表现,鼓励通过重建和交叉模式目标加以调整。类推,这一框架可以作为一种摆脱柏托洞穴的方法,使共同结构能够从不协调的投入中出现。我们从三个关键设计轴对这个框架进行了评估:(一) 调整目标――比较性专家模型(Con),其硬性反差变量(Necon),以及我们横向结构结构结构结构的调整,这显示了我们总体结构结构的深度,显示了我们整个结构结构结构结构结构的深度。


Article 167

Title@2025-07-03 (4): Fast AI Model Splitting over Edge Networks

Title: Fast AI Model Splitting over Edge Networks Schnelle KI-Modellaufspaltung über Edge-Netzwerke 快速 AI 模型在边缘网络上分割 2507.01041v2

Authors (7): Zuguang Li, Wen Wu, Shaohua Wu, Songge Zhang, Ye Wang, Xuemin, Shen

Split learning (SL) has emerged as a computationally efficient approach for artificial intelligence (AI) model training, which can alleviate device-side computational workloads. However, complex AI model architectures pose high computational complexity to obtain the optimal model splitting. In this paper, we represent an arbitrary AI model as a directed acyclic graph (DAG), and then reformulate the optimal model splitting problem as a minimum s-t cut search problem. To solve the problem, we propose a fast DAG-based model splitting algorithm, which restructures the DAG to enable the optimal model splitting identification via a maximum flow method. Theoretical analysis indicates that the proposed algorithm is optimal. Furthermore, considering AI models with block structures, we propose a block-wise model splitting algorithm to reduce computational complexity. The algorithm abstracts each block, i.e., a component consisting of multiple layers, into a single vertex, thereby obtaining the optimal model splitting via a simplified DAG. Extensive experimental results demonstrate that the proposed algorithms can determine the optimal model splitting within milliseconds, as well as reduce training delay by 24.62%-38.95% in dynamic edge networks as compared to the state-of-the-art benchmarks.

分解学习( SL) 已经成为人工智能(AI) 模型培训的一种计算高效的方法,可以减轻设备侧计算工作量。 但是, 复杂的 AI 模型结构在计算上具有很高的复杂性, 以获得最佳模式拆分。 在本文中, 我们代表一个任意的 AI 模型, 作为定向环绕图( DAG) , 然后将最佳模式拆分问题重新定位为最小的点切搜索问题 。 为了解决问题, 我们提议了一个基于 DAG 的快速模式拆分算法, 以重组 DAG 模型, 以便通过最大流法进行最佳模式拆分分辨。 理论分析表明, 拟议的算法是最佳的。 此外, 考虑到 与块结构的 AI 模型, 我们提议了一个分解模型, 以降低计算复杂性。 每个块的算法摘要, 即由多个层组成的一个部分, 通过简化的 DAGAG 获得最佳模式拆分解。 广泛的实验结果显示, 拟议的算法可以确定最佳模式在毫秒内进行最佳的拆分解。 。 以及将动态边缘网络中的培训延迟率减少24.62%- 38% 基准, 。


Article 168

Title@2025-07-03 (4): Privacy-Preserving Quantized Federated Learning with Diverse Precision

Title: Privacy-Preserving Quantized Federated Learning with Diverse Precision Privacy-Preserving Quantized Federated Learning with Diverse Precision 具有多种精密度的隐私保护量化联邦学习 2507.00920v2

Authors (6): Dang Qua Nguyen, Morteza Hashemi, Erik Perrins, Sergiy A. Vorobyov, David J. Love, Taejoon Kim

Federated learning (FL) has emerged as a promising paradigm for distributed machine learning, enabling collaborative training of a global model across multiple local devices without requiring them to share raw data. Despite its advancements, FL is limited by factors such as: (i) privacy risks arising from the unprotected transmission of local model updates to the fusion center (FC) and (ii) decreased learning utility caused by heterogeneity in model quantization resolution across participating devices. Prior work typically addresses only one of these challenges because maintaining learning utility under both privacy risks and quantization heterogeneity is a non-trivial task. In this paper, our aim is therefore to improve the learning utility of a privacy-preserving FL that allows clusters of devices with different quantization resolutions to participate in each FL round. Specifically, we introduce a novel stochastic quantizer (SQ) that is designed to simultaneously achieve differential privacy (DP) and minimum quantization error. Notably, the proposed SQ guarantees bounded distortion, unlike other DP approaches. To address quantization heterogeneity, we introduce a cluster size optimization technique combined with a linear fusion approach to enhance model aggregation accuracy. Numerical simulations validate the benefits of our approach in terms of privacy protection and learning utility compared to the conventional LaplaceSQ-FL algorithm.

联邦学习(FL)已成为分布式机器学习的一个很有希望的范例,使得在不要求他们分享原始数据的情况下,通过多种地方设备对一种全球模式进行协作培训,无需分享原始数据。尽管取得了进步,但联邦学习(FL)受到以下因素的限制:(一) 不受保护地向聚合中心传送当地模型更新信息所产生的隐私风险;(二) 参与装置的模型四分解分辨率差异导致学习效用减少,以往工作通常只解决其中的一个挑战,因为在隐私风险和量化异质化两种情况下都保持学习效用,这是一项非三重性的任务。因此,在本文件中,我们的目标是改进保密FL的学习效用,使具有不同四分解分辨率的各类设备能够参加FL的每一轮活动。具体地说,我们引入了一种新的偏差四分量量的量衡算法(SQ),目的是同时实现差异性隐私(DP)和最小度错误。值得注意的是,拟议的SQQ保证受约束的扭曲性扭曲性(不同于DP的其他方法)是非三重性的任务。在本文中,我们的目标是提高保密性FL的组合规模优化技术的组合,同时结合了对NUIFL的精确度的模拟方法。


Article 169

Title@2025-07-03 (4): PhysicsCorrect: A Training-Free Approach for Stable Neural PDE Simulations

Title: PhysicsCorrect: A Training-Free Approach for Stable Neural PDE Simulations PhysikKorrektur: Ein trainingsfreier Ansatz für stabile neurale PDE-Simulationen 物理校正:稳定神经物质模拟的无培训方法 2507.02227v1

Authors (2): Xinquan Huang, Paris Perdikaris

Neural networks have emerged as powerful surrogates for solving partial differential equations (PDEs), offering significant computational speedups over traditional methods. However, these models suffer from a critical limitation: error accumulation during long-term rollouts, where small inaccuracies compound exponentially, eventually causing complete divergence from physically valid solutions. We present PhysicsCorrect, a training-free correction framework that enforces PDE consistency at each prediction step by formulating correction as a linearized inverse problem based on PDE residuals. Our key innovation is an efficient caching strategy that precomputes the Jacobian and its pseudoinverse during an offline warm-up phase, reducing computational overhead by two orders of magnitude compared to standard correction approaches. Across three representative PDE systems – Navier-Stokes fluid dynamics, wave equations, and the chaotic Kuramoto-Sivashinsky equation – PhysicsCorrect reduces prediction errors by up to 100x while adding negligible inference time (under 5\%). The framework integrates seamlessly with diverse architectures including Fourier Neural Operators, UNets, and Vision Transformers, effectively transforming unstable neural surrogates into reliable simulation tools that bridge the gap between deep learning’s computational efficiency and the physical fidelity demanded by practical scientific applications.

作为解决部分差异方程式(PDE残余物)的强大代孕器,神经网络已成为解决部分差异方程式(PDEs)的强大代孕器,为传统方法提供了巨大的计算速度。然而,这些模型受到一个关键的限制:长期推出期间的错误积累,在长期推出期间,小的不准确性成倍增加,最终导致与物理上有效的解决方案完全脱差。我们提出物理校正,这是一个无培训的校正框架,通过根据PDE残余物将校正编成一个线性反问题,在每个预测步骤中执行PDE的一致性。我们的关键创新是一种高效的缓冲战略,在离线热阶段预先计算Jacobian及其伪反作用,比标准校正方法减少两个数量级的计算间接费用。三种具有代表性的PDE系统 – – 纳维-斯托克液体动态、波方程式和混乱的Kuramotomoto-Sivashinsky方程式 – – 物理校正校正校正校正将预测错误减少100x,同时增加可忽略的推误时间(5)。这个框架与多种结构结构相密结合,包括四级神经操作操作操作、UNet和视野变换变换器,有效地将不稳定的心心心基的心基的内测测制成工具。


Article 170

Title@2025-07-03 (4): DecoRTL: A Run-time Decoding Framework for RTL Code Generation with LLMs

Title: DecoRTL: A Run-time Decoding Framework for RTL Code Generation with LLMs DecoRTL: Ein Laufzeit-Decoding-Framework für RTL-Code-Generierung mit LLMs DecoRTL: 使用LLMs的RTL代码生成运行时间解码框架 2507.02226v1

Authors (3): Mohammad Akyash, Kimia Azar, Hadi Kamali

As one of their many applications, large language models (LLMs) have recently shown promise in automating register transfer level (RTL) code generation. However, conventional LLM decoding strategies, originally designed for natural language, often fail to meet the structural and semantic demands of RTL, leading to hallucinated, repetitive, or invalid code outputs. In this paper, we first investigate the root causes of these decoding failures through an empirical analysis of token-level entropy during RTL generation. Our findings reveal that LLMs exhibit low confidence in regions of structural ambiguity or semantic complexity, showing that standard decoding strategies fail to differentiate between regions requiring determinism (syntax-critical regions) and those that benefit from creative exploratory variability (design-critical regions). Then, to overcome this, we introduce DecoRTL, a novel run-time decoding strategy, that is both syntax-aware and contrastive for RTL code generation. DecoRTL integrates two complementary components: (i) self-consistency sampling, which generates multiple candidates and re-ranks them based on token-level agreement to promote correctness while maintaining diversity; and (ii) syntax-aware temperature adaptation, which classifies tokens by their syntactical and functional roles and adjusts the sampling temperature accordingly, enforcing low temperature for syntax-critical tokens and higher temperature for exploratory ones. Our approach operates entirely at inference time without requiring any additional model fine-tuning. Through evaluations on multiple open-source LLMs using the VerilogEval benchmark, we demonstrate significant improvements in syntactic validity, functional correctness, and output diversity, while the execution overhead (performance overhead) is imperceptible.

作为许多应用之一,大型语言模型(LLMS)最近显示,在注册传输水平(RTL)代码生成自动化方面,大型语言模型(LLM)最近显示出了前景;然而,原本为自然语言设计的常规LLM解码战略往往无法满足RTL的结构和语义要求,导致幻灭、重复或无效代码输出。在本文中,我们首先通过对RTL生成过程中的代币性激素进行实验性分析来调查这些解码失败的根源。我们的调查结果显示,LLMS在结构模糊或语义复杂度区域中表现出了多度的可靠性,表明标准解码战略未能区分需要确定性(合成关键区域)的区域和那些受益于创造性探索性变异(设计关键区域)的区域。为了克服这些差异,我们引入了DecoRTL(一种新型运行时间解码战略),这是对RTL生成的代币种性调和对比性的。


Article 171

Title@2025-07-03 (4): Metric Design != Metric Behavior: Improving Metric Selection for the Unbiased Evaluation of Dimensionality Reduction

Title: Metric Design != Metric Behavior: Improving Metric Selection for the Unbiased Evaluation of Dimensionality Reduction Metrisches Design != Metrisches Verhalten: Verbesserung der Metrischen Auswahl für die unvoreingenommene Bewertung der Dimensionalitätsreduktion 计量设计 ! = 计量行为:改进对减少多维度的无偏见评价的计量选择 2507.02225v1

Authors (3): Jiyeon Bae, Hyeon Jeon, Jinwook Seo

Evaluating the accuracy of dimensionality reduction (DR) projections in preserving the structure of high-dimensional data is crucial for reliable visual analytics. Diverse evaluation metrics targeting different structural characteristics have thus been developed. However, evaluations of DR projections can become biased if highly correlated metrics–those measuring similar structural characteristics–are inadvertently selected, favoring DR techniques that emphasize those characteristics. To address this issue, we propose a novel workflow that reduces bias in the selection of evaluation metrics by clustering metrics based on their empirical correlations rather than on their intended design characteristics alone. Our workflow works by computing metric similarity using pairwise correlations, clustering metrics to minimize overlap, and selecting a representative metric from each cluster. Quantitative experiments demonstrate that our approach improves the stability of DR evaluation, which indicates that our workflow contributes to mitigating evaluation bias.

为了解决这一问题,我们提议一个新的工作流程,通过根据经验相关性而不是仅仅根据预期设计特点进行分组,减少在选择评价指标方面的偏差。我们的工作流是利用对称相关性计算类似指标,将指标组合起来以尽量减少重叠,并从每个组中选择具有代表性的衡量尺度。定量实验表明,我们的方法提高了DR评价的稳定性,这表明我们的工作流程有助于减少评价的偏差。


Article 172

Title@2025-07-03 (4): Where to find Grokking in LLM Pretraining? Monitor Memorization-to-Generalization without Test

Title: Where to find Grokking in LLM Pretraining? Monitor Memorization-to-Generalization without Test Wo finden Sie Grokking in LLM Pretraining? Überwachen Sie Memorization-to-Generalization ohne Test 在 LLLM 预修课程中在哪里找到 Grokking ? 监视不试验的记忆化到普及 。 2506.21551v2

Authors (3): Ziyue Li, Chenrui Fan, Tianyi Zhou

Grokking, i.e., test performance keeps improving long after training loss converged, has been recently witnessed in neural network training, making the mechanism of generalization and other emerging capabilities such as reasoning mysterious. While prior studies usually train small models on a few toy or highly-specific tasks for thousands of epochs, we conduct the first study of grokking on checkpoints during one-pass pretraining of a 7B large language model (LLM), i.e., OLMoE. We compute the training loss and evaluate generalization on diverse benchmark tasks, including math reasoning, code generation, and commonsense/domain-specific knowledge retrieval tasks. Our study, for the first time, verifies that grokking still happens in the pretraining of large-scale foundation models, though different data may enter grokking stages asynchronously. We further demystify grokking’s “emergence of generalization” by investigating LLM internal dynamics. Specifically, we find that training samples’ pathways (i.e., expert choices across layers) evolve from random, instance-specific to more structured and shareable between samples. Also, the complexity of a sample’s pathway reduces despite the converged loss. These indicate a memorization-to-generalization “knowledge digestion”, providing a mechanistic explanation of delayed generalization. In the study, we develop two novel metrics to quantify pathway distance and the complexity of a single pathway. We show their ability to predict the generalization improvement on diverse downstream tasks. They are efficient, simple to compute and solely dependent on training data. Hence, they have practical value for pretraining, enabling us to monitor the generalization performance without finetuning and test. Theoretically, we show that more structured pathways reduce model complexity and improve the generalization bound.

在培训损失集中之后,测试性表现一直在改善。我们计算了实际培训损失,并评价了不同基准任务,包括数学推理、代码生成、公元/公元特定知识回收任务等,使得一般化机制和其他新兴能力机制,例如推理神秘。虽然先前的研究通常对数千个时代的一些玩具或高度特定任务进行小型模型培训,但对于在7B大语言模型(LLIM)的单步前训练中,我们首次对检查站进行重整研究。我们通过调查LLOM内部动态,进一步淡化了7B大语言模型(LLM)的“显性化”的“显明性”。我们计算了实际培训损失,并评价了不同基准任务,包括数学推理、代码生成和公元/公元特定知识回收任务。我们的研究首次证实,在大型基础模型模型的预训练前阶段,虽然不同数据可能进入重整阶段,但是仍然会发生怪怪怪。我们进一步淡化,通过调查LLM内部动态, 进一步淡化的“显清” 。我们发现,在一般培训中显示, 标准化的路径改进(即专家对不同层次的选择)从随机、具体变变变换的路径,让我们的内变换的路径,让我们的路径, 显示它们变换的变换的变的路径, 显示了内变变的变的变的内变的变变的变的变的变的变变的“ , 变的变变的变, 变的变的变的变的变的变的变的变的变。


Article 173

Title@2025-07-03 (4): Multivariate de Bruijn Graphs: A Symbolic Graph Framework for Time Series Forecasting

Title: Multivariate de Bruijn Graphs: A Symbolic Graph Framework for Time Series Forecasting Multivariate de Bruijn Graphen: Ein symbolisches Graphen-Framework für die Vorhersage von Zeitreihen 布鲁伊图多变量图:时间序列预测符号图框架 2505.22768v3

Authors (5): Mert Onur Cakiroglu, Idil Bilge Altun, Mehmet Dalkilic, Elham Buxton, Hasan Kurban

Time series forecasting remains a challenging task for foundation models due to temporal heterogeneity, high dimensionality, and the lack of inherent symbolic structure. In this work, we propose DRAGON (Discrete Representation and Augmented Graph encoding Over de BruijN Graphs), a novel encoder that introduces Multivariate de Bruijn Graphs (MdBGs) to bridge the gap between symbolic representations and neural modeling. DRAGON discretizes continuous input sequences and maps them onto a fixed graph structure, enabling dynamic context recovery via graph-based attention. Integrated as an auxiliary module within a dual-branch architecture, DRAGON augments conventional CNN-based encoders with symbolic, structure-aware representations. All code developed for this study is available at: https://github.com/KurbanIntelligenceLab/MultdBG-Time-Series-Library

时间序列预测仍然是基础模型的一项艰巨任务,因为时间差异性、高度多维性和缺乏内在的象征结构。 在这项工作中,我们提议DRAGON(BruijN Graphs上的分层表示和增强图形编码),这是一个新颖的编码器,引进了布鲁伊恩图多变图(MdBGs),以弥合象征性表示和神经建模之间的差距。DRAGON将连续输入序列分解并绘制到固定的图表结构上,从而能够通过基于图形的注意实现动态环境的恢复。DRAGON作为一个辅助模块,将基于CNN的常规编码器整合成一个符号、结构意识的图示。为这项研究制定的所有代码都可在以下网址查阅:https://github.com/KurbanInteligligenceLab/MultdBG-Time-Sery-Library。


Article 174

Title@2025-07-03 (4): Hybrid least squares for learning functions from highly noisy data

Title: Hybrid least squares for learning functions from highly noisy data Hybride kleinste Quadrate für das Lernen von Funktionen aus hochlauten Daten 从高噪音数据中学习函数最小混合方格 2507.02215v1

Authors (4): Ben Adcock, Bernhard Hientzsch, Akil Narayan, Yiming Xu

Motivated by the need for efficient estimation of conditional expectations, we consider a least-squares function approximation problem with heavily polluted data. Existing methods that are powerful in the small noise regime are suboptimal when large noise is present. We propose a hybrid approach that combines Christoffel sampling with certain types of optimal experimental design to address this issue. We show that the proposed algorithm enjoys appropriate optimality properties for both sample point generation and noise mollification, leading to improved computational efficiency and sample complexity compared to existing methods. We also extend the algorithm to convex-constrained settings with similar theoretical guarantees. When the target function is defined as the expectation of a random field, we extend our approach to leverage adaptive random subspaces and establish results on the approximation capacity of the adaptive procedure. Our theoretical findings are supported by numerical studies on both synthetic data and on a more challenging stochastic simulation problem in computational finance.

出于有效估计有条件期望的需要,我们认为,与严重污染的数据相比,最差的功能近似问题与最受污染的数据相提并论。在出现大噪音时,在小噪音制度中具有强大力量的现有方法并不理想。我们提议一种混合方法,将克赖斯特offel抽样与某些类型的最佳实验设计相结合,以解决这一问题。我们表明,拟议的算法在采样点生成和噪声聚变方面都具有适当的最佳性能,从而与现有方法相比,提高了计算效率和抽样复杂性。我们还将算法扩大到具有类似理论保证的受控制的环境。当目标功能被定义为随机字段的预期时,我们扩大我们的方法,利用适应性随机子空间,并确立适应程序近似能力的结果。我们理论发现得到了关于合成数据和计算融资中更具挑战性的随机模拟问题的数字研究的支持。


Article 175

Title@2025-07-03 (4): SciGA: A Comprehensive Dataset for Designing Graphical Abstracts in Academic Papers

Title: SciGA: A Comprehensive Dataset for Designing Graphical Abstracts in Academic Papers SciGA: Ein umfassender Datensatz zur Gestaltung grafischer Abstracts in wissenschaftlichen Papieren SciGA: 用于设计学术论文制图摘要的综合数据集 2507.02212v1

Authors (4): Takuro Kawada, Shunsuke Kitada, Sota Nemoto, Hitoshi Iyatomi

Graphical Abstracts (GAs) play a crucial role in visually conveying the key findings of scientific papers. While recent research has increasingly incorporated visual materials such as Figure 1 as de facto GAs, their potential to enhance scientific communication remains largely unexplored. Moreover, designing effective GAs requires advanced visualization skills, creating a barrier to their widespread adoption. To tackle these challenges, we introduce SciGA-145k, a large-scale dataset comprising approximately 145,000 scientific papers and 1.14 million figures, explicitly designed for supporting GA selection and recommendation as well as facilitating research in automated GA generation. As a preliminary step toward GA design support, we define two tasks: 1) Intra-GA recommendation, which identifies figures within a given paper that are well-suited to serve as GAs, and 2) Inter-GA recommendation, which retrieves GAs from other papers to inspire the creation of new GAs. We provide reasonable baseline models for these tasks. Furthermore, we propose Confidence Adjusted top-1 ground truth Ratio (CAR), a novel recommendation metric that offers a fine-grained analysis of model behavior. CAR addresses limitations in traditional ranking-based metrics by considering cases where multiple figures within a paper, beyond the explicitly labeled GA, may also serve as GAs. By unifying these tasks and metrics, our SciGA-145k establishes a foundation for advancing visual scientific communication while contributing to the development of AI for Science.

图像摘要(GAs)在直观地传达科学论文的关键结论方面发挥着关键作用。虽然最近的研究越来越多地将图1等视觉材料作为事实上的GAs纳入,但其加强科学通信的潜力基本上尚未探索。此外,设计有效的GA需要先进的视觉化技能,为广泛采用这些技能制造障碍。为了应对这些挑战,我们引入了由大约145 000份科学论文和114万数字组成的大型数据集SciGA-145k,这是一个大型数据集,明确旨在支持GA的挑选和建议,并促进自动的GA一代的研究。作为大会设计支持的初步步骤,我们界定了两项任务:(1) GA内部建议,其中确定了适合作为GA的某一文件中的数字;(2) GA之间的建议,其中从其他文件中检索GAs,以激励创建新的GA。我们为这些任务提供了合理的基线模型模型。此外,我们提出了信任调整后头一地面真相比值(CAR),这是对模型行为进行精确分析的新建议指标。CARC处理基于传统排名的衡量标准中的局限性,建议确定了适合作为GA的数值;145,同时考虑将多种数字作为基础,为GA的参考,并明确确定为GAA的标准化的图像基础。


Article 176

Title@2025-07-02 (3): Latent Chain-of-Thought? Decoding the Depth-Recurrent Transformer

Title: Latent Chain-of-Thought? Decoding the Depth-Recurrent Transformer Latent Chain-of-Thought? Dekodierung des Tiefen-Recurrent Transformers 点解深度- Rent 变换器 2507.02199v1

Authors (5): Wenquan Lu, Yuechuan Yang, Kyle Lee, Yanshu Li, Enqi Liu

Chain-of-thought (CoT) reasoning has enabled transformer-based language models to excel at complex mathematics and multi-step planning. However, in standard decoder-only architectures, these reasoning steps are externalized in natural language, improving interpretability at the cost of efficiency. To capture reasoning that is not easily represented in words, many works have explored recurrent architectures that aim to internalize reasoning in latent space, potentially supporting latent CoT. In this paper, we investigate whether such reasoning structures emerge in Huginn-3.5B, a depth-recurrent Transformer that reuses layers at inference time without increasing parameter count. We examine the model’s internal behavior on arithmetic tasks using a suite of probing techniques including the Logit Lens and Coda Lens. Our findings reveal limited evidence of interpretable latent CoT by tracking rank trajectories of final and intermediate result tokens. Furthermore, we uncover significant probing inconsistencies across recurrent blocks, where the interpretability of hidden states depends heavily on both the layer index and the decoding method. Finally, we empirically show that increasing recurrence depth yields only marginal gains and falls well short of models that explicitly externalize reasoning steps. The code is available at https://github.com/wenquanlu/huginn-latent-cot.

思维链(CoT)推理使得基于变压器的语言模型在复杂的数学和多步规划中能够出类拔萃。然而,在标准的解码器专用结构中,这些推理步骤在自然语言中被外部化,提高了解释效率。为了捕捉不易用文字表达的推理,许多作品探索了旨在将潜在空间的推理内在化的经常性结构,可能支持潜在的COT。在本文件中,我们调查了这种推理结构是否出现在Huginn-3.5B中,这是一个深度周期性变异器,在推断时重复使用层,而不增加参数计数。我们利用包括Logit Lens 和 Coda Lens 在内的一系列试算技术来审查模型的计算内部行为。我们的调查结果显示,通过跟踪最终和中间结果符号的级轨迹,可解释的潜在 CoT 证据有限。 此外,我们发现,在经常区块中,隐藏状态的可解释性严重取决于层次指数和解码方法。最后,我们从经验上表明,不断重复的深度只产生边际收益,而远为外部推理。


Article 177

Title@2025-07-02 (3): Temporally Consistent Koopman Autoencoders for Forecasting Dynamical Systems

Title: Temporally Consistent Koopman Autoencoders for Forecasting Dynamical Systems Vorübergehend konsistent Koopman Autoencoder für die Vorhersage dynamischer Systeme 用于预测动态系统的温度一致 Koopman 自动编译器 2403.12335v3

Authors (5): Indranil Nayak, Ananda Chakrabarty, Mrinal Kumar, Fernando Teixeira, Debdipta Goswami

Absence of sufficiently high-quality data often poses a key challenge in data-driven modeling of high-dimensional spatio-temporal dynamical systems. Koopman Autoencoders (KAEs) harness the expressivity of deep neural networks (DNNs), the dimension reduction capabilities of autoencoders, and the spectral properties of the Koopman operator to learn a reduced-order feature space with simpler, linear dynamics. However, the effectiveness of KAEs is hindered by limited and noisy training datasets, leading to poor generalizability. To address this, we introduce the temporally consistent Koopman autoencoder (tcKAE), designed to generate accurate long-term predictions even with limited and noisy training data. This is achieved through a consistency regularization term that enforces prediction coherence across different time steps, thus enhancing the robustness and generalizability of tcKAE over existing models. We provide analytical justification for this approach based on Koopman spectral theory and empirically demonstrate tcKAE’s superior performance over state-of-the-art KAE models across a variety of test cases, including simple pendulum oscillations, kinetic plasma, and fluid flow data.

缺乏足够高质量的数据往往对高维空间时空动态系统的数据驱动模型形成重大挑战。Koopman Autoencoders(KAEs)利用深神经网络(DNNs)的直观性、自动电算器的维度降低能力以及Koopman操作员的光谱特性,学习一个使用更简单线性动态的减序特性空间。然而,KAE的有效性受到有限和噪音的培训数据集的阻碍,导致普遍性差。为了解决这个问题,我们引入了时间一致的Koopman自动电算器(tcKAE),目的是生成准确的长期预测,即使培训数据有限和紧张。这是通过一致性规范术语实现的,该术语可以在不同时间步骤上执行预测的一致性,从而加强tcKAE相对于现有模型的稳健性和普遍性。我们根据Koopman光谱理论和实验性地展示了TcKAE的优于KAE的状态模型的超强性性能,包括简单的平流数据。


Article 178

Title@2025-07-02 (3): cVLA: Towards Efficient Camera-Space VLAs

Title: cVLA: Towards Efficient Camera-Space VLAs cVLA: Auf dem Weg zu effizienten Kamera-Raum-VLAs cVLA: 努力实现高效的相机-空间VLAs 2507.02190v1

Authors (7): Max Argus, Jelena Bratulic, Houman Masnavi, Maxim Velikanov, Nick Heppert, Abhinav Valada, Thomas Brox

Vision-Language-Action (VLA) models offer a compelling framework for tackling complex robotic manipulation tasks, but they are often expensive to train. In this paper, we propose a novel VLA approach that leverages the competitive performance of Vision Language Models (VLMs) on 2D images to directly infer robot end-effector poses in image frame coordinates. Unlike prior VLA models that output low-level controls, our model predicts trajectory waypoints, making it both more efficient to train and robot embodiment agnostic. Despite its lightweight design, our next-token prediction architecture effectively learns meaningful and executable robot trajectories. We further explore the underutilized potential of incorporating depth images, inference-time techniques such as decoding strategies, and demonstration-conditioned action generation. Our model is trained on a simulated dataset and exhibits strong sim-to-real transfer capabilities. We evaluate our approach using a combination of simulated and real data, demonstrating its effectiveness on a real robotic system.

视觉-语言-动作模型(VLA)模型为解决复杂的机器人操纵任务提供了令人信服的框架,但往往费用昂贵。在本文中,我们建议采用新的VLA方法,利用2D图像上的视觉语言模型(VLM)的竞争性性能,直接推断机器人终端效应器在图像框架坐标中形成。与以往的VLA模型不同的是,VLA模型输出低级别的控制,我们模型预测了轨迹路点,使培训和机器人化神性更有效率。尽管我们的下一代预测结构设计轻巧,但它有效地学习了有意义和可执行的机器人轨迹。我们进一步探索了将深度图像、推论时间技术(如解码战略)和示范化行动生成等纳入深度图像的利用不足的潜力。我们的模型在模拟数据集方面受过培训,并展示了强大的模拟和真实传输能力。我们使用模拟和真实数据组合评估了我们的方法,展示了它在实际机器人系统上的有效性。


Article 179

Title@2025-07-02 (3): KAIROS: Scalable Model-Agnostic Data Valuation

Title: KAIROS: Scalable Model-Agnostic Data Valuation KAIROS: Skalierbare modell-agnostische Datenbewertung 可缩放模型 – – 不可计量数据估价 2506.23799v2

Authors (4): Jiongli Zhu, Parjanya Prajakta Prashant, Alex Cloninger, Babak Salimi

Training data increasingly shapes not only model accuracy but also regulatory compliance and market valuation of AI assets. Yet existing valuation methods remain inadequate: model-based techniques depend on a single fitted model and inherit its biases, while algorithm-based approaches such as Data Shapley require costly retrainings at web scale. Recent Wasserstein-based model-agnostic methods rely on approximations that misrank examples relative to their true leave-one-out (LOO) utility. We introduce KAIROS, a scalable, model-agnostic valuation framework that assigns each example a distributional influence score: its contribution to the Maximum Mean Discrepancy (MMD) between the empirical training distribution and a clean reference set. Unlike Wasserstein surrogates, our MMD-based influence admits a closed-form solution that faithfully approximates the exact LOO ranking within $O(1/N^2)$ error, requires no retraining, and naturally extends to conditional kernels for unified label- and feature-error detection. Moreover, KAIROS supports efficient online updates: when a new batch of size m arrives, all scores can be updated in $O(mN)$ time, delivering up to 50x speedup without compromising ranking quality. Empirical evaluations on noise, mislabeling, and poisoning benchmarks show that KAIROS consistently outperforms state-of-the-art model-, Shapley-, and Wasserstein-based baselines in both accuracy and runtime. We provide rigorous theoretical guarantees, including symmetry for reproducible rankings and density-separation for interpretable thresholds.

培训数据不仅越来越多地影响AI资产的模型准确性,而且越来越多地影响AI资产的监管合规和市场估值。然而,现有的估值方法仍然不够:基于模型的技术依赖于单一的安装模型,并继承其偏差,而基于算法的方法,如数据屏等,则需要在网络规模上进行费用高昂的再培训。最近基于瓦森斯坦的模型认知方法依靠的近似方法,这些近似方法将精确的LOO排排排在$(1/N)2美元错误之内,不需要再培训,并且自然延伸到用于统一基于标签和特性的测算的有条件内核。此外,KAIOS支持高效的在线更新:当新批量的尺寸到达时,所有分数在经验性培训分布和干净的参考组之间(MMD ) 。与瓦瑟斯坦的模拟相比,我们基于MMD的影响力接受了一种封闭式的解决方案,它忠实地接近于LO在$(1/N2)美元范围内的准确的排名,不需要再培训,并且自然延伸到用于统一基于标签和特性的测算。 KAIROS的精确度级更新:当新的一批规模,所有评分在美元的准确度上,可以更新到不断的精确的精确的标定标定质量质量。


Article 180

Title@2025-07-02 (3): Bayesian Invariance Modeling of Multi-Environment Data

Title: Bayesian Invariance Modeling of Multi-Environment Data Bayesische Invarianz Modellierung von Multi-Umwelt-Daten 多种环境数据的贝耶斯因果建模 2506.22675v2

Authors (5): Luhuan Wu, Mingzhang Yin, Yixin Wang, John P. Cunningham, David M. Blei

Invariant prediction [Peters et al., 2016] analyzes feature/outcome data from multiple environments to identify invariant features - those with a stable predictive relationship to the outcome. Such features support generalization to new environments and help reveal causal mechanisms. Previous methods have primarily tackled this problem through hypothesis testing or regularized optimization. Here we develop Bayesian Invariant Prediction (BIP), a probabilistic model for invariant prediction. BIP encodes the indices of invariant features as a latent variable and recover them by posterior inference. Under the assumptions of Peters et al. [2016], the BIP posterior targets the true invariant features. We prove that the posterior is consistent and that greater environment heterogeneity leads to faster posterior contraction. To handle many features, we design an efficient variational approximation called VI-BIP. In simulations and real data, we find that BIP and VI-BIP are more accurate and scalable than existing methods for invariant prediction.

变化预测[Peters 等人,2016年]分析多种环境中的特征/结果数据,以查明与结果具有稳定的预测关系的变量特征。这些特征支持对新环境的概括化,并有助于揭示因果关系机制。以前的方法主要通过假设测试或常规化优化来解决这个问题。我们在这里开发了贝叶斯变化预测的概率模型(BIP),这是变化预测的概率模型。BIP将变化特征的指数编码为潜在变量,并通过事后推断加以恢复。根据Peters等人的假设([2016年]), BIP 海报针对的是真实的变量特征。我们证明,外表是一致的,更大的环境差异性导致更快的外表收缩。为了处理许多特征,我们设计了一个高效的变异近率,称为VI-BIP。在模拟和真实数据中,我们发现BIP和VI-BIP比现有的变量预测方法更准确、更可伸缩。


Article 181

Title@2025-07-02 (3): Tree-based Learning for High-Fidelity Prediction of Chaos

Title: Tree-based Learning for High-Fidelity Prediction of Chaos Baumbasiertes Lernen für hochtreue Vorhersage von Chaos 以树为本的 “ 以树为本的学习:对混乱现象的高成熟预测 “ 的学习 2403.13836v2

Authors (4): Adam Giammarese, Kamal Rana, Erik M. Bollt, Nishant Malik

Model-free forecasting of the temporal evolution of chaotic systems is crucial but challenging. Existing solutions require hyperparameter tuning, significantly hindering their wider adoption. In this work, we introduce a tree-based approach not requiring hyperparameter tuning: TreeDOX. It uses time delay overembedding as explicit short-term memory and Extra-Trees Regressors to perform feature reduction and forecasting. We demonstrate the state-of-the-art performance of TreeDOX using the Henon map, Lorenz and Kuramoto-Sivashinsky systems, and the real-world Southern Oscillation Index.

对混乱系统的时间演变情况进行无模型的预测至关重要,但具有挑战性。现有的解决方案需要超参数的调整,这大大阻碍了其更广泛的采用。在这项工作中,我们引入了一种不要求超参数调整的基于树木的方法:TreaDOX。它使用时间延迟来作为明确的短期记忆和外流回归器来进行特征减少和预测。我们用海农地图、洛伦茨和仓本-西瓦申斯基系统以及真实世界的南方振荡指数来展示TreaDOX的最新性能。


Article 182

Title@2025-07-02 (3): Analyzing and Improving Speaker Similarity Assessment for Speech Synthesis

Title: Analyzing and Improving Speaker Similarity Assessment for Speech Synthesis Analyse und Verbesserung der Speaker-Ähnlichkeitsbewertung für Sprachsynthese 分析和改进议长对发言综述的相似性评估 2507.02176v1

Authors (6): Marc-André Carbonneau, Benjamin van Niekerk, Hugo Seuté, Jean-Philippe Letendre, Herman Kamper, Julian Zaïdi

Modeling voice identity is challenging due to its multifaceted nature. In generative speech systems, identity is often assessed using automatic speaker verification (ASV) embeddings, designed for discrimination rather than characterizing identity. This paper investigates which aspects of a voice are captured in such representations. We find that widely used ASV embeddings focus mainly on static features like timbre and pitch range, while neglecting dynamic elements such as rhythm. We also identify confounding factors that compromise speaker similarity measurements and suggest mitigation strategies. To address these gaps, we propose U3D, a metric that evaluates speakers’ dynamic rhythm patterns. This work contributes to the ongoing challenge of assessing speaker identity consistency in the context of ever-better voice cloning systems. We publicly release our code.

建模语音特征因其多面性而具有挑战性。在基因化语音系统中,对身份的评估往往使用自动语音校验(ASV)嵌入器来评估,这种校验是针对歧视的,而不是特征的。本文调查了在这种表述中反映声音的哪些方面。我们发现,广泛使用的ASV嵌入器主要侧重于静态特征,如小音和小音范围,同时忽略了节奏等动态要素。我们还找出了影响发言者类似测量的混杂因素,并建议了缓解战略。为了弥补这些差距,我们建议了U3D,这是评估发言者动态节奏模式的尺度。这项工作有助于应对当前挑战,即评估发言者身份的一致性,在越来越好的语音克隆系统中。我们公开发布我们的代码。


Article 183

Title@2025-07-02 (3): Hardware and Software Platform Inference

Title: Hardware and Software Platform Inference Schlussfolgerung zur Hardware- und Softwareplattform 硬件和软件平台 2411.05197v2

Authors (5): Cheng Zhang, Hanna Foerster, Robert D. Mullins, Yiren Zhao, Ilia Shumailov

It is now a common business practice to buy access to large language model (LLM) inference rather than self-host, because of significant upfront hardware infrastructure and energy costs. However, as a buyer, there is no mechanism to verify the authenticity of the advertised service including the serving hardware platform, e.g. that it is actually being served using an NVIDIA H100. Furthermore, there are reports suggesting that model providers may deliver models that differ slightly from the advertised ones, often to make them run on less expensive hardware. That way, a client pays premium for a capable model access on more expensive hardware, yet ends up being served by a (potentially less capable) cheaper model on cheaper hardware. In this paper we introduce hardware and software platform inference (HSPI) – a method for identifying the underlying GPU architecture and software stack of a (black-box) machine learning model solely based on its input-output behavior. Our method leverages the inherent differences of various GPU architectures and compilers to distinguish between different GPU types and software stacks. By analyzing the numerical patterns in the model’s outputs, we propose a classification framework capable of accurately identifying the GPU used for model inference as well as the underlying software configuration. Our findings demonstrate the feasibility of inferring GPU type from black-box models. We evaluate HSPI against models served on different real hardware and find that in a white-box setting we can distinguish between different GPUs with between $83.9\%$ and $100\%$ accuracy. Even in a black-box setting we achieve results that are up to 3x higher than random guess accuracy. Our code is available at https://github.com/ChengZhang-98/HSPI.

买入大型语言模型(LLM)的推值目前是一种常见的商业惯例,因为前方硬件基础设施和能源成本相当高,因此购买大语言模型(LLM)的推值比自主主主机要少。然而,作为买方,没有机制核查广告服务的真实性,包括服务硬件平台的真实性,例如它实际上正在使用 NVIDIAHA H100。此外,有报告表明,模型提供者可以提供与所公布的模式略有不同的模型,往往让它们运行于更便宜的硬件。这样,客户会为更昂贵的硬件提供更高级的模型(LLLLM)推断值,而不是自主机,但最终却会得到更廉价的硬件硬件基础设施(潜在能力更弱)更廉价的更精度。在这个文件中,我们引入了硬件和软件平台的推断(HSPI) – – – 一种用来识别基GPU基础架构和软件堆(back框)机器学习模型的唯一方法。我们的方法利用各种GPU/白度模型的内在差异来找出各种GPU/白式结构的内在差异,以区分不同的GP/白价模型类型和软件渣。我们可以区分不同的GPU/格式和软件堆。在不同的标准类型之间,通过分析不同GPUUU的内在差异/结果,通过分析模型模型分析不同的数字模式模型的模型分析不同的数字模式中的数字模式,可以区分不同的不同。在模型的模型的模型的模型中的数字模式中的数字模式的模型中分析数字模式的模型的模型模式模式模式模式模式模式模式模式模式模式的模型的模型的模型的模型的模型模式,我们用来在分析,在模型中可以评估中可以评估中分析不同的框架框架框架框架框架框架框架可以使用。我們用一個比在分析不同的工具中可以使用一個可以使用一個比的更能框架的更的更能框架中,我們的更能框架框架框架框架框架框架框架框架框架框架框架框架框架框架框架框架,我們用一個—— ,我們用一個可以用一個可以用的確可以使用。我們用的比比的比比比比比的確可以使用一個—— ,我們可以使用一個,我們可以使用一個,我們可以使用一個可以使用一個的檢的檢的檢的檢的分比的檢的檢的檢的檢的檢的檢的


Article 184

Title@2025-07-02 (3): A physics informed neural network approach to simulating ice dynamics governed by the shallow ice approximation

Title: A physics informed neural network approach to simulating ice dynamics governed by the shallow ice approximation Ein physikerisch informierter neuronaler Netzwerkansatz zur Simulation der Eisdynamik, die durch die flache Eisannäherung bestimmt wird 利用物理知情神经网络法模拟由浅冰近似值管理的冰动态 2504.08136v2

Authors (2): Kapil Chawla, William Holmes

In this article we develop a Physics Informed Neural Network (PINN) approach to simulate ice sheet dynamics governed by the Shallow Ice Approximation. This problem takes the form of a time-dependent parabolic obstacle problem. Prior work has used this approach to address the stationary obstacle problem and here we extend it to the time dependent problem. Through comprehensive 1D and 2D simulations, we validate the model’s effectiveness in capturing complex free-boundary conditions. By merging traditional mathematical modeling with cutting-edge deep learning methods, this approach provides a scalable and robust solution for predicting temporal variations in ice thickness. To illustrate this approach in a real world setting, we simulate the dynamics of the Devon Ice Cap, incorporating aerogeophysical data from 2000 and 2018.

在本篇文章中,我们开发了一种物理知情神经网络(PINN)法,用于模拟由浅冰接近法管理的冰层动态,这一问题以取决于时间的抛物线障碍问题的形式出现。先前的工作已经用这种方法解决固定障碍问题,我们在此将其扩大到与时间相关的问题。我们通过全面的1D和2D模拟,验证了模型在捕捉复杂的自由边界条件方面的有效性。通过将传统的数学模型与尖端的深层学习方法相结合,这一方法为预测冰厚时间变化提供了可扩展和有力的解决方案。为了在现实环境中说明这一方法,我们模拟了2000年和2018年德文冰盖的动态,纳入了空气地球物理数据。


Article 185

Title@2025-07-02 (3): The unstable formula theorem revisited via algorithms

Title: The unstable formula theorem revisited via algorithms Die instabile Formel Theorem revisited über Algorithmen 通过算法重新检视不稳定的公式定理 2212.05050v3

Authors (2): Maryanthe Malliaris, Shay Moran

This paper is about the surprising interaction of a foundational result from model theory, about stability of theories, with algorithmic stability in learning. First, in response to gaps in existing learning models, we introduce a new statistical learning model, called ``Probably Eventually Correct’’ or PEC. We characterize Littlestone (stable) classes in terms of this model. As a corollary, Littlestone classes have frequent short definitions in a natural statistical sense. In order to obtain a characterization of Littlestone classes in terms of frequent definitions, we build an equivalence theorem highlighting what is common to many existing approximation algorithms, and to the new PEC. This is guided by an analogy to definability of types in model theory, but has its own character. Drawing on these theorems and on other recent work, we present a complete algorithmic analogue of Shelah’s celebrated Unstable Formula Theorem, with algorithmic properties taking the place of the infinite.

本文是关于模型理论基本结果的惊人互动,即理论的稳定性,以及学习中的算法稳定性。首先,为了应对现有学习模型的差距,我们引入了一个新的统计学习模型,称为“可能最终正确 ” 或 PEC。我们用这个模型来描述Littlestone(稳定)类。作为必然结果,Littlestone类从自然统计意义上经常使用简短的定义。为了从频繁的定义中获取对Littlestone类的描述,我们建立了一个等同的理论,突出现有许多近似算法和新的PEC的共同之处。这是以模型理论中类型可定义性为类比的,但有其自己的特性。我们用这些理论和最近的其他工作来描述Shelah所庆祝的不稳定的公式理论的完全的算法模拟,其算法特性取代无限的特征。


Article 186

Title@2025-07-02 (3): Towards Bio-Inspired Robotic Trajectory Planning via Self-Supervised RNN

Title: Towards Bio-Inspired Robotic Trajectory Planning via Self-Supervised RNN Auf dem Weg zu einer bio-inspirierten Roboter-Trajektorienplanung durch selbstüberwachte RNN 走向通过自我监督RNN进行生物激励机器人轨迹规划 2507.02171v1

Authors (3): Miroslav Cibula, Kristína Malinovská, Matthias Kerzel

Trajectory planning in robotics is understood as generating a sequence of joint configurations that will lead a robotic agent, or its manipulator, from an initial state to the desired final state, thus completing a manipulation task while considering constraints like robot kinematics and the environment. Typically, this is achieved via sampling-based planners, which are computationally intensive. Recent advances demonstrate that trajectory planning can also be performed by supervised sequence learning of trajectories, often requiring only a single or fixed number of passes through a neural architecture, thus ensuring a bounded computation time. Such fully supervised approaches, however, perform imitation learning; they do not learn based on whether the trajectories can successfully reach a goal, but try to reproduce observed trajectories. In our work, we build on this approach and propose a cognitively inspired self-supervised learning scheme based on a recurrent architecture for building a trajectory model. We evaluate the feasibility of the proposed method on a task of kinematic planning for a robotic arm. The results suggest that the model is able to learn to generate trajectories only using given paired forward and inverse kinematics models, and indicate that this novel method could facilitate planning for more complex manipulation tasks requiring adaptive solutions.

机器人的轨迹规划被理解为产生一系列联合配置,这些组合将引导机器人代理或其操纵者从初始状态到预期的最终状态,从而完成操纵任务,同时考虑机器人动脉学和环境等限制因素。通常,这是通过基于取样的规划者实现的,这些规划是计算密集的。最近的进展表明,轨迹规划也可以通过对轨迹的受监督序列学习来进行,通常只需要通过神经结构进行单次或固定次数的传球,从而保证一个封闭的计算时间。但是,这种受到充分监督的方法可以进行模仿学习;它们不学习轨迹能否成功达到目标,而是试图复制观察到的轨迹。在我们的工作中,我们以这种方法为基础,提出一个以常态结构为基础、有意识地激发自我监督的学习计划,以建立轨迹模型为基础。我们评估了拟议方法的可行性,这是关于机器人臂的运动规划任务。结果表明,模型能够学会产生轨迹轨迹,但只能使用配对齐的前沿和反向的移动模型,并表明,这种方法可以促进这种新式的模型,并表明需要更精确的调整模型。


Article 187

Title@2025-07-02 (3): Beyond Scale: The Diversity Coefficient as a Data Quality Metric for Variability in Natural Language Data

Title: Beyond Scale: The Diversity Coefficient as a Data Quality Metric for Variability in Natural Language Data Beyond Scale: Der Diversity-Koeffizient als Data Quality Metric für Variabilität in natürlichen Sprachdaten 超越尺度:多样性系数作为衡量自然语言数据可变性的数据质量计量标准 2306.13840v4

Authors (7): Brando Miranda, Alycia Lee, Sudharsan Sundar, Allison Casasola, Rylan Schaeffer, Elyas Obbad, Sanmi Koyejo

Current trends in pre-training Large Language Models (LLMs) primarily focus on the scaling of model and dataset size. While the quality of pre-training data is considered an important factor for training powerful LLMs, it remains a nebulous concept that has not been rigorously characterized. To this end, we propose a formalization of one key aspect of data quality – measuring the variability of natural language data – specifically via a measure we call the diversity coefficient. Our empirical analysis shows that the proposed diversity coefficient aligns with the intuitive properties of diversity and variability, e.g., it increases as the number of latent concepts increases. Then, we measure the diversity coefficient of publicly available pre-training datasets and demonstrate that their formal diversity is high compared to theoretical lower and upper bounds. Finally, we conduct a comprehensive set of controlled interventional experiments with GPT-2 and LLaMAv2 that demonstrate the diversity coefficient of pre-training data characterizes useful aspects of downstream model evaluation performance – totaling 44 models of various sizes (51M to 7B parameters). We conclude that our formal notion of diversity is an important aspect of data quality that captures variability and causally leads to improved evaluation performance.

培训前大语言模型(LLMS)目前的趋势主要侧重于模型和数据集规模的扩大。培训前数据的质量被视为培训强大的LLMS的一个重要因素,但它仍然是一个没有严格定性的模糊概念。为此,我们提议正式确定数据质量的一个关键方面 – – 衡量自然语言数据的变异性 – – 特别是通过我们称之为多样性系数的措施。我们的经验分析表明,拟议的多样性系数与多样性和变异性等直观特性相一致,例如随着潜在概念数量的增加而增加。然后,我们衡量公开提供的训练前数据集的多样性系数,并表明其形式多样性高于理论下限和上限。最后,我们用GPT-2和LLMAv2进行一套全面的有控制的干预实验,以显示培训前数据的多样性系数,这是下游模式评估业绩的有用方面 – – 总计44个不同规模的模型(51M至7B参数)。我们的结论是,我们正式的多样性概念是数据质量的一个重要方面,可以捕捉变异性和因果性地导致评估的改进。


Article 188

Title@2025-07-02 (3): Non-Convex Optimization with Spectral Radius Regularization

Title: Non-Convex Optimization with Spectral Radius Regularization Nicht konvexe Optimierung mit Spektralradius-Regularisierung 非Convex 优化光谱半半径常规化 2102.11210v2

Authors (3): Adam Sandler, Diego Klabjan, Yuan Luo

We develop regularization methods to find flat minima while training deep neural networks. These minima generalize better than sharp minima, yielding models outperforming baselines on real-world test data (which may be distributed differently than the training data). Specifically, we propose a method of regularized optimization to reduce the spectral radius of the Hessian of the loss function. We also derive algorithms to efficiently optimize neural network models and prove that these algorithms almost surely converge. Furthermore, we demonstrate that our algorithm works effectively on applications in different domains, including healthcare. To show that our models generalize well, we introduced various methods for testing generalizability and found that our models outperform comparable baseline models on these tests.

在训练深层神经网络的同时,我们开发了正规化的方法来寻找平坦的微粒。这些微型模型比尖锐的微粒要好,产生模型比真实世界测试数据(其分布可能不同于培训数据)的基线要好。具体地说,我们提出了一种正规化优化的方法,以减少损失函数赫西安人的光谱半径。我们还从中得出算法,以便有效地优化神经网络模型,并证明这些算法几乎肯定会趋同。此外,我们证明我们的算法在包括医疗保健在内的不同领域的应用中有效发挥作用。为了显示我们的模型非常普遍化,我们采用了各种测试通用性的方法,发现我们的模型比这些测试的可比基线模型要强。


Article 189

Title@2025-07-02 (3): Statistical Inference for Responsiveness Verification

Title: Statistical Inference for Responsiveness Verification Statistische Schlussfolgerung zur Überprüfung der Reaktionsfähigkeit 应对性核查的统计推论 2507.02169v1

Authors (5): Seung Hyun Cheon, Meredith Stewart, Bogdan Kulynych, Tsui-Wei Weng, Berk Ustun

Many safety failures in machine learning arise when models are used to assign predictions to people (often in settings like lending, hiring, or content moderation) without accounting for how individuals can change their inputs. In this work, we introduce a formal validation procedure for the responsiveness of predictions with respect to interventions on their features. Our procedure frames responsiveness as a type of sensitivity analysis in which practitioners control a set of changes by specifying constraints over interventions and distributions over downstream effects. We describe how to estimate responsiveness for the predictions of any model and any dataset using only black-box access, and how to use these estimates to support tasks such as falsification and failure probability estimation. We develop algorithms that construct these estimates by generating a uniform sample of reachable points, and demonstrate how they can promote safety in real-world applications such as recidivism prediction, organ transplant prioritization, and content moderation.

当模型用于向人们(通常在借贷、雇用或内容温和等环境下)分配预测时,不考虑个人如何改变其投入时,机器学习中的许多安全性失灵就会发生。在这项工作中,我们引入了一种正式的验证程序,以便预测对其特征的干预反应的响应性。我们的程序将响应性设定为一种敏感性分析,由从业人员通过具体说明干预和下游效应分布方面的制约因素来控制一系列变化。我们描述了如何估计对仅使用黑盒访问的任何模型和数据集的预测的响应性,以及如何使用这些估计数支持伪造和故障概率估计等任务。我们开发了算法,通过生成统一的可达点样本来构建这些估计,并展示它们如何促进真实世界应用的安全,如累犯预测、器官移植优先排序和内容节制。


Article 190

Title@2025-07-02 (3): Quantifying the Importance of Data Alignment in Downstream Model Performance

Title: Quantifying the Importance of Data Alignment in Downstream Model Performance Quantifizierung der Bedeutung der Datenausrichtung in Downstream-Modellleistung 量化数据协调在下游模式绩效中的重要性 2501.08496v3

Authors (7): Krrish Chawla, Aryan Sahai, Mario DePavia, Sudharsan Sundar, Brando Miranda, Elyas Obbad, Sanmi Koyejo

Contrary to the conventional emphasis on dataset size, we explore the role of data alignment – an often overlooked aspect of data quality – in training capable Large Language Models (LLMs). To do so, we use the Task2Vec-based alignment coefficient, a quantitative measure of the similarity between two datasets, to quantify the impact of alignment between training data and evaluation data on downstream performance. In particular, we conduct controlled \textit{interventional} experiments for two settings: 1. the impact of increased alignment coefficients between various pre-training (pt) against evaluation datasets, and 2. the impact of increased alignment coefficients between domain specific fine-tuning (ft) against domain specific evaluation. The domain specific task we explore is Autoformalization – the machine translation task between natural language and code for formal verification. In both settings, we find a strong, predictable negative correlation between the alignment coefficient of a model’s training and evaluation data and the model’s loss/perplexity on the respective downstream task. These findings suggest a re-evaluation of LLM training approaches, demonstrating the relevance of data alignment compared to data quantity, especially in specialized downstream tasks such as Autoformalization.

与传统强调数据集大小相反,我们探索了数据调整的作用 – – 这是数据质量中经常被忽视的一个方面 – – 培训有能力的大型语言模型(LLM)的作用。为此,我们使用基于TH2Vec的校准系数,这是衡量两个数据集之间相似性的量化尺度,以量化培训数据与下游业绩评价数据之间一致的影响。特别是,我们为两个环境进行了控制下对以下两个环境的测试:1. 提高培训前(试用)对评价数据集的校准系数的影响,2. 提高具体领域微调(软)对具体评价的校准系数的影响。我们探讨的领域具体任务是自动正规化 – – 自然语言与正式核查代码之间的机器翻译任务。在这两种情况下,我们发现模型的培训和评价数据的校准系数与模型对各下游任务的损失/难度之间存在强烈、可预测的负相关关系。这些结论表明,对LLM培训方法的重新评价,表明数据与数据数量的相关性,特别是在诸如自动正规化等专门的下游任务中。


Article 191

Title@2025-07-02 (3): Reliable algorithm selection for machine learning-guided design

Title: Reliable algorithm selection for machine learning-guided design Zuverlässige Algorithmusauswahl für maschinelles Lernen-geführtes Design 机器学习制导设计可靠算法选择 2503.20767v2

Authors (2): Clara Fannjiang, Ji Won Park

Algorithms for machine learning-guided design, or design algorithms, use machine learning-based predictions to propose novel objects with desired property values. Given a new design task – for example, to design novel proteins with high binding affinity to a therapeutic target – one must choose a design algorithm and specify any hyperparameters and predictive and/or generative models involved. How can these decisions be made such that the resulting designs are successful? This paper proposes a method for design algorithm selection, which aims to select design algorithms that will produce a distribution of design labels satisfying a user-specified success criterion – for example, that at least ten percent of designs’ labels exceed a threshold. It does so by combining designs’ predicted property values with held-out labeled data to reliably forecast characteristics of the label distributions produced by different design algorithms, building upon techniques from prediction-powered inference. The method is guaranteed with high probability to return design algorithms that yield successful label distributions (or the null set if none exist), if the density ratios between the design and labeled data distributions are known. We demonstrate the method’s effectiveness in simulated protein and RNA design tasks, in settings with either known or estimated density ratios.

用于机器学习指导设计或设计算法的算法。本文建议设计算法选择一种方法,目的是选择设计算法,以产生符合用户指定的成功标准的设计标签分布 – – 例如,至少10%的设计标签超过临界值。在设计新设计任务中,例如设计与治疗目标具有高度关联性的新式蛋白质 – – 必须选择设计算法,并指定任何超参数和预测和/或基因模型。这些决定如何使由此产生的设计获得成功?本文件建议一种设计算法选择方法,目的是选择设计算法,以产生符合用户指定的成功标准的设计标签分布 – – 例如,至少10%的设计标签超过临界值。通过将设计预测的属性值与持有的标签数据结合起来,以可靠地预测不同设计算法产生的标签分布特性,以预测力推理的技术为基础。如果设计与标签数据分布的密度比率为已知,我们用模拟的蛋白质和RNA的密度来证明方法的有效性。


Article 192

Title@2025-07-02 (3): Non-exchangeable Conformal Prediction for Temporal Graph Neural Networks

Title: Non-exchangeable Conformal Prediction for Temporal Graph Neural Networks Nicht austauschbare konforme Vorhersage für zeitliche Graphen-Neural-Netzwerke 时图神经网络的非正规非正式非正式预测 2507.02151v1

Authors (5): Tuo Wang, Jian Kang, Yujun Yan, Adithya Kulkarni, Dawei Zhou

Conformal prediction for graph neural networks (GNNs) offers a promising framework for quantifying uncertainty, enhancing GNN reliability in high-stakes applications. However, existing methods predominantly focus on static graphs, neglecting the evolving nature of real-world graphs. Temporal dependencies in graph structure, node attributes, and ground truth labels violate the fundamental exchangeability assumption of standard conformal prediction methods, limiting their applicability. To address these challenges, in this paper, we introduce NCPNET, a novel end-to-end conformal prediction framework tailored for temporal graphs. Our approach extends conformal prediction to dynamic settings, mitigating statistical coverage violations induced by temporal dependencies. To achieve this, we propose a diffusion-based non-conformity score that captures both topological and temporal uncertainties within evolving networks. Additionally, we develop an efficiency-aware optimization algorithm that improves the conformal prediction process, enhancing computational efficiency and reducing coverage violations. Extensive experiments on diverse real-world temporal graphs, including WIKI, REDDIT, DBLP, and IBM Anti-Money Laundering dataset, demonstrate NCPNET’s capability to ensure guaranteed coverage in temporal graphs, achieving up to a 31% reduction in prediction set size on the WIKI dataset, significantly improving efficiency compared to state-of-the-art methods. Our data and code are available at https://github.com/ODYSSEYWT/NCPNET.

图表神经网络(GNNs)的复合预测为量化不确定性提供了一个充满希望的框架,提高了GNN在高取量应用中的可靠性。然而,现有方法主要侧重于静态图形,忽视了真实世界图形的不断演变的性质。图形结构、节点属性和地面真相标签的时际依赖性违反了标准一致预测方法的基本互换假设,限制了其适用性。为了应对这些挑战,我们在本文件中引入了NCPNNET,这是为时间图形定制的新颖的端对端符合一致的预测框架。我们的方法将符合的预测扩展至动态环境,减轻因时间依赖而导致的统计覆盖违约现象。为此,我们建议采用基于扩散的不兼容性分数,在不断演变的网络中记录表层和时间不确定性。此外,我们开发了一种效率认知优化算法,改进了符合标准的预测程序,提高了计算效率,减少了覆盖面。对各种真实世界范围的网络/时间图进行了广泛的实验,包括WIKI、REDIT、DBLP和IB-ADADADDD数据集,展示了NCPNCPNCPNCPS-Syal-SDRDRDMDRDRDGM-SM-SUDM-SDM-SUDVDGMDMDGMDGM-SUDGMDMDMDMDGMDMDMDMDMSUDMDMDSMDSMSMMDMDMDMDGMDMDGDGDGDGMDGMDGDMDSMDGDGDs 能力, 将大大保证在大幅到大幅降低数据到比基数据到比基数据基数据基数据基数据基数据基数据基数据基数据基数据比基数据基数据基。


Article 193

Title@2025-07-02 (3): ODE$_t$(ODE$_l$): Shortcutting the Time and Length in Diffusion and Flow Models for Faster Sampling

Title: ODE$_t$(ODE$_l$): Shortcutting the Time and Length in Diffusion and Flow Models for Faster Sampling ODE$_t$(ODE$_l$): Verkürzung der Zeit und Länge in Diffusions- und Strömungsmodellen für eine schnellere Probenahme ODE$_t$(ODE$_美元):缩短传播的时间和长度以及更快抽样的流程模型 2506.21714v2

Authors (5): Denis Gudovskiy, Wenzhao Zheng, Tomoyuki Okuno, Yohei Nakata, Kurt Keutzer

Recently, continuous normalizing flows (CNFs) and diffusion models (DMs) have been studied using the unified theoretical framework. Although such models can generate high-quality data points from a noise distribution, the sampling demands multiple iterations to solve an ordinary differential equation (ODE) with high computational complexity. Most existing methods focus on reducing the number of time steps during the sampling process to improve efficiency. In this work, we explore a complementary direction in which the quality-complexity tradeoff can be dynamically controlled in terms of time steps and in the length of the neural network. We achieve this by rewiring the blocks in the transformer-based architecture to solve an inner discretized ODE w.r.t. its length. Then, we employ time- and length-wise consistency terms during flow matching training, and as a result, the sampling can be performed with an arbitrary number of time steps and transformer blocks. Unlike others, our ODE$_t$(ODE$_l$) approach is solver-agnostic in time dimension and decreases both latency and memory usage. Compared to the previous state of the art, image generation experiments on CelebA-HQ and ImageNet show a latency reduction of up to 3$\times$ in the most efficient sampling mode, and a FID score improvement of up to 3.5 points for high-quality sampling. We release our code and model weights with fully reproducible experiments.

最近,利用统一的理论框架,对连续的正常流动(CNFs)和传播模型(DMs)进行了研究。虽然这些模型能够从噪音分布中产生高质量的数据点,但抽样要求多重迭代,以解决计算复杂度高的普通差异方程式(ODE),大多数现有方法侧重于减少取样过程中的时间步骤数量,以提高效率。在这项工作中,我们探索一个互补方向,在时间步骤和神经网络的长度方面,可以动态地控制质量兼容性权衡。我们通过在基于变压器的结构中重新对块进行连接,以解决内分解的ODE w.r.t. 长度。然后,我们在流动匹配培训中采用时间和长的一致条件来解决普通差异方程式(ODE),结果,在取样过程中可以任意地减少时间步骤和变异方块。与其他人不同,我们的ODOD_t$(Odemod$_l$)方法在时间上可以解解析,同时降低调和记忆的用量。比起来,在CEREBA和最高级图像升级的升级模式中,将显示一个高比例和高比例的图像升级的升级。


Article 194

Title@2025-07-02 (3): Generating Hypotheses of Dynamic Causal Graphs in Neuroscience: Leveraging Generative Factor Models of Observed Time Series

Title: Generating Hypotheses of Dynamic Causal Graphs in Neuroscience: Leveraging Generative Factor Models of Observed Time Series Generieren von Hypothesen dynamischer Kausalgraphen in der Neurowissenschaft: Nutzung generativer Faktorenmodelle beobachteter Zeitreihen 在神经科学中生成动态因果图的假设:利用观测时间序列的生成因数模型 2505.20697v3

Authors (2): Zachary C. Brown, David Carlson

The field of hypothesis generation promises to reduce costs in neuroscience by narrowing the range of interventional studies needed to study various phenomena. Existing machine learning methods can generate scientific hypotheses from complex datasets, but many approaches assume causal relationships are static over time, limiting their applicability to systems with dynamic, state-dependent behavior, such as the brain. While some techniques attempt dynamic causal discovery through factor models, they often restrict relationships to linear patterns or impose other simplifying assumptions. We propose a novel method that models dynamic graphs as a conditionally weighted superposition of static graphs, where each static graph can capture nonlinear relationships. This approach enables the detection of complex, time-varying interactions between variables beyond linear limitations. Our method improves f1-scores of predicted dynamic causal patterns by roughly 22-28% on average over baselines in some of our experiments, with some improvements reaching well over 60%. A case study on real brain data demonstrates our method’s ability to uncover relationships linked to specific behavioral states, offering valuable insights into neural dynamics.

假设生成领域通过缩小研究各种现象所需的干预研究范围,有望降低神经科学的成本。 现有的机器学习方法可以从复杂的数据集中产生科学假设,但许多方法假设因果关系随时间变化而静止,限制其适用于动态、状态行为如大脑等系统。 虽然有些技术试图通过要素模型进行动态因果发现,但它们往往限制与线性模式的关系,或强加其他简化假设。我们提出了一个新颖的方法,将动态图形模型作为静态图形的有条件加权叠加位置,每个静态图形都能够捕捉非线性关系。这个方法能够探测线性限制以外的变量之间的复杂、时间变化的相互作用。我们的方法使我们的一些实验中预测动态因果模式的F1-核心平均比基线提高了约22-28 % , 某些改进幅度远远超过60%。关于实际大脑数据的案例研究表明我们发现与特定行为状态相关关系的方法,提供了对神经动态的宝贵洞察力。


Article 195

Title@2025-07-02 (3): Parameters vs FLOPs: Scaling Laws for Optimal Sparsity for Mixture-of-Experts Language Models

Title: Parameters vs FLOPs: Scaling Laws for Optimal Sparsity for Mixture-of-Experts Language Models Parameter vs FLOPs: Skalierungsgesetze für optimale Sparsamkeit für Mixture-of-Experts Sprachmodelle 参数 vs FLOPs: 用于混合专家语言模型的最佳质量的缩放法律 2501.12370v3

Authors (6): Samira Abnar, Harshay Shah, Dan Busbridge, Alaaeldin Mohamed Elnouby Ali, Josh Susskind, Vimal Thilak

Scaling the capacity of language models has consistently proven to be a reliable approach for improving performance and unlocking new capabilities. Capacity can be primarily defined by two dimensions: the number of model parameters and the compute per example. While scaling typically involves increasing both, the precise interplay between these factors and their combined contribution to overall capacity remains not fully understood. We explore this relationship in the context of sparse Mixture-of-Experts (MoEs), which allow scaling the number of parameters without proportionally increasing the FLOPs per example. We investigate how varying the sparsity level, i.e., the fraction of inactive parameters, impacts model’s performance during pretraining and downstream few-shot evaluation. We find that under different constraints (e.g., parameter size and total training compute), there is an optimal level of sparsity that improves both training efficiency and model performance. These results provide a better understanding of the impact of sparsity in scaling laws for MoEs and complement existing works in this area, offering insights for designing more efficient architectures.

提高语言模型能力一贯证明是改进业绩和释放新能力的可靠方法,能力主要可分为两个方面:模型参数数量和每个实例的计算。虽然比例一般涉及增加两者,但这些因素之间的精确相互作用及其对总体能力的综合贡献仍然不完全理解。我们从稀疏的专家混合公司(MoEs)的角度探讨这种关系,这样可以扩大参数数量,而不会按比例增加FLOPs的每个实例。我们调查广度程度的程度如何不同,即非活动参数的分数、影响模型在培训前和下游几次评估期间的绩效。我们发现,在不同的制约因素下(例如参数大小和总培训计算),存在着最佳的宽度水平,可以提高培训效率和模型性能。这些结果使人们更好地了解了为教育部扩大法律规模的影响,并补充了该领域的现有工作,为设计效率更高的结构提供了见解。


Article 196

Title@2025-07-02 (3): Generative Latent Diffusion for Efficient Spatiotemporal Data Reduction

Title: Generative Latent Diffusion for Efficient Spatiotemporal Data Reduction Generative Latent Diffusion für effiziente Spatiotemporale Datenreduktion 高效的平时数据减少的生成前传播 2507.02129v1

Authors (4): Xiao Li, Liangji Zhu, Anand Rangarajan, Sanjay Ranka

Generative models have demonstrated strong performance in conditional settings and can be viewed as a form of data compression, where the condition serves as a compact representation. However, their limited controllability and reconstruction accuracy restrict their practical application to data compression. In this work, we propose an efficient latent diffusion framework that bridges this gap by combining a variational autoencoder with a conditional diffusion model. Our method compresses only a small number of keyframes into latent space and uses them as conditioning inputs to reconstruct the remaining frames via generative interpolation, eliminating the need to store latent representations for every frame. This approach enables accurate spatiotemporal reconstruction while significantly reducing storage costs. Experimental results across multiple datasets show that our method achieves up to 10 times higher compression ratios than rule-based state-of-the-art compressors such as SZ3, and up to 63 percent better performance than leading learning-based methods under the same reconstruction error.

生成模型在有条件环境下表现出很强的性能,可以被视为一种数据压缩形式,在这种形式中,条件可以是一种压缩,但控制性和重建精确度有限,限制了数据压缩的实际应用。在这项工作中,我们提出了一个高效的潜在扩散框架,通过将一个变式自动编码器与一个有条件扩散模型相结合来弥合这一差距。我们的方法将少量关键框架压缩到潜藏空间,并把它们用作通过基因化内插法重建剩余框架的调节投入,从而消除了每个框架储存潜在代表的需要。这种方法可以精确地进行时空重建,同时大大减少存储成本。多个数据集的实验结果显示,我们的方法比SZ3等基于规则的状态压缩器达到10倍的压缩率,并且比同一重建错误下以学习为基础的方法达到63%的更好性能。


Article 197

Title@2025-07-02 (3): CROP: Circuit Retrieval and Optimization with Parameter Guidance using LLMs

Title: CROP: Circuit Retrieval and Optimization with Parameter Guidance using LLMs CROP: Circuit Retrieval und Optimierung mit Parameterführung mit LLMs CROP: 使用LLMM 提供参数指导的电路检索和优化 2507.02128v1

Authors (8): Jingyu Pan, Isaac Jacobson, Zheng Zhao, Tung-Chieh Chen, Guanglei Zhou, Chen-Chia Chang, Vineet Rashingkar, Yiran Chen

Modern very large-scale integration (VLSI) design requires the implementation of integrated circuits using electronic design automation (EDA) tools. Due to the complexity of EDA algorithms, the vast parameter space poses a huge challenge to chip design optimization, as the combination of even moderate numbers of parameters creates an enormous solution space to explore. Manual parameter selection remains industrial practice despite being excessively laborious and limited by expert experience. To address this issue, we present CROP, the first large language model (LLM)-powered automatic VLSI design flow tuning framework. Our approach includes: (1) a scalable methodology for transforming RTL source code into dense vector representations, (2) an embedding-based retrieval system for matching designs with semantically similar circuits, and (3) a retrieval-augmented generation (RAG)-enhanced LLM-guided parameter search system that constrains the search process with prior knowledge from similar designs. Experiment results demonstrate CROP’s ability to achieve superior quality-of-results (QoR) with fewer iterations than existing approaches on industrial designs, including a 9.9% reduction in power consumption.

现代大型集成(VLSI)设计要求使用电子设计自动化工具实施集成电路。由于EDA算法的复杂性,庞大的参数空间对芯片设计优化提出了巨大的挑战,因为即使是少量参数的结合也为探索提供了巨大的解决方案空间。人工参数选择仍然是工业实践,尽管过度劳累,而且受到专家经验的限制。为解决这一问题,我们介绍了第一个大型语言模型(LLLM)驱动的自动VLSI设计流调框架CROOP。我们的方法包括:(1) 将RTL源码转换成密度矢量表示的可扩展方法;(2) 嵌入式检索系统,用于将设计与语系相似的电路相匹配;(3) 检索型(RAG)加固LLM制导参数搜索系统,该系统以类似设计的知识限制搜索进程。实验结果显示CROP实现高质量(QOR)的能力,其重复率比工业设计的现有方法要小,包括减少9.9%的电力消耗量。


Article 198

Title@2025-07-02 (3): Identifying Systems with Symmetries using Equivariant Autoregressive Reservoir Computers

Title: Identifying Systems with Symmetries using Equivariant Autoregressive Reservoir Computers Identifizierung von Systemen mit Symmetrien mit äquivalenten Autoregressiven Reservoir-Computern 使用等量自动递减储量计算机识别对称系统 2311.09511v4

Authors (5): Fredy Vides, Idelfonso B. R. Nogueira, Gabriela Lopez Gutierrez, Lendy Banegas, Evelyn Flores

The investigation reported in this document focuses on identifying systems with symmetries using equivariant autoregressive reservoir computers. General results in structured matrix approximation theory are presented, exploring a two-fold approach. Firstly, a comprehensive examination of generic symmetry-preserving nonlinear time delay embedding is conducted. This involves analyzing time series data sampled from an equivariant system under study. Secondly, sparse least-squares methods are applied to discern approximate representations of the output coupling matrices. These matrices play a critical role in determining the nonlinear autoregressive representation of an equivariant system. The structural characteristics of these matrices are dictated by the set of symmetries inherent in the system. The document outlines prototypical algorithms derived from the described techniques, offering insight into their practical applications. Emphasis is placed on the significant improvement on structured identification precision when compared to classical reservoir computing methods for the simulation of equivariant dynamical systems.

本文件中所报告的调查重点是利用等式自动递减储油层计算机查明对称系统;在结构矩阵近似理论中提出一般结果,探讨双重方法;首先,对通用对称保存非线性时间延迟嵌入进行全面审查;这涉及分析从正在研究的等式系统中抽样的时间序列数据;其次,使用稀有的最小方位方法来辨别产出组合矩阵的近似表示;这些矩阵在确定一个等式系统的非线性自动递减代表方面发挥着关键作用;这些矩阵的结构特征由系统中固有的对称组合决定;该文件概述了从所述技术中得出的原型算法,提供了对其实际应用的洞察力;强调在模拟等式动态系统时,与典型储油层计算方法相比,结构识别精确性显著提高。


Article 199

Title@2025-07-02 (3): Can Artificial Intelligence solve the blockchain oracle problem? Unpacking the Challenges and Possibilities

Title: Can Artificial Intelligence solve the blockchain oracle problem? Unpacking the Challenges and Possibilities Kann Künstliche Intelligenz das Blockchain-Orakel-Problem lösen? Auspacken der Herausforderungen und Möglichkeiten 人工智能能解决连锁铁链和手链的问题吗? 2507.02125v1

Authors (1): Giulio Caldarelli

The blockchain oracle problem, which refers to the challenge of injecting reliable external data into decentralized systems, remains a fundamental limitation to the development of trustless applications. While recent years have seen a proliferation of architectural, cryptographic, and economic strategies to mitigate this issue, no one has yet fully resolved the fundamental question of how a blockchain can gain knowledge about the off-chain world. In this position paper, we critically assess the role artificial intelligence (AI) can play in tackling the oracle problem. Drawing from both academic literature and practitioner implementations, we examine how AI techniques such as anomaly detection, language-based fact extraction, dynamic reputation modeling, and adversarial resistance can enhance oracle systems. We observe that while AI introduces powerful tools for improving data quality, source selection, and system resilience, it cannot eliminate the reliance on unverifiable off-chain inputs. Therefore, this study supports the idea that AI should be understood as a complementary layer of inference and filtering within a broader oracle design, not a substitute for trust assumptions.

将可靠的外部数据注入分散化的系统的挑战,这一链链条问题,即将可靠的外部数据注入到分散化的系统中的挑战,仍然是对开发无信任应用的根本限制。近年来,尽管建筑、加密和经济战略在减少这一问题上激增,但还没有人完全解决链条如何获得关于离链世界知识这一根本问题。在本立场文件中,我们严格评估人工智能(AI)在解决这个关节问题中的作用。从学术文献和从业人员的实践中,我们研究AI技术,例如异常现象检测、语言事实提取、动态名声建模和对抗性抵制,如何能够加强触电系统。我们注意到,尽管AI为改进数据质量、源选择和系统复原力提供了强有力的工具,但它无法消除对不可核实的离链投入的依赖。因此,本研究报告支持这样一种想法,即应当把AI理解为一个在更宽泛的设计中互补的推论和过滤层,而不是替代信任假设。


Article 200

Title@2025-07-02 (3): Scaling Collapse Reveals Universal Dynamics in Compute-Optimally Trained Neural Networks

Title: Scaling Collapse Reveals Universal Dynamics in Compute-Optimally Trained Neural Networks Skalierungskollaps offenbart universelle Dynamiken in berechenbaren neuralen Netzwerken 计算最佳培训神经网络中的缩放 缩放 折叠 Revials 通用动态 2507.02119v1

Authors (5): Shikai Qiu, Lechao Xiao, Andrew Gordon Wilson, Jeffrey Pennington, Atish Agarwala

What scaling limits govern neural network training dynamics when model size and training time grow in tandem? We show that despite the complex interactions between architecture, training algorithms, and data, compute-optimally trained models exhibit a remarkably precise universality. Specifically, loss curves from models of varying sizes collapse onto a single universal curve when training compute and loss are normalized to unity at the end of training. With learning rate decay, the collapse becomes so tight that differences in the normalized curves across models fall below the noise floor of individual loss curves across random seeds, a phenomenon we term supercollapse. We observe supercollapse across learning rate schedules, datasets, and architectures, including transformers trained on next-token prediction, and find it breaks down when hyperparameters are scaled suboptimally, providing a precise and practical indicator of good scaling. We explain these phenomena by connecting collapse to the power-law structure in typical neural scaling laws, and analyzing a simple yet surprisingly effective model of SGD noise dynamics that accurately predicts loss curves across various learning rate schedules and quantitatively explains the origin of supercollapse.

当模型规模和培训时间同步增长时,神经网络培训动态将受到何种程度的限制?我们显示,尽管结构、培训算法和数据之间有着复杂的相互作用,但计算出非常理想的训练有素模型却表现出非常精确的普遍性。具体地说,在培训结束时,当培训计算和损失后,将一个单一的通用曲线稳定下来时,不同规模模型的损失曲线会崩溃成一个单一的通用曲线。随着学习率的衰减,各种模型的标准化曲线的差别变得非常紧密,以至于各种模型的大小都低于随机种子个人损失曲线的噪声底限,一种我们称之为超脱落的现象。我们观察到学习率表、数据集和结构,包括接受过后方预测培训的变异器,在超大尺度缩缩小的时候发现这种曲线会崩溃,提供精确和实用的良好缩放指标。我们通过将典型神经缩放法中的权力法结构与崩溃联系起来来解释这些现象,并分析一个简单但令人惊讶的有效SGD噪音动态模型,准确预测各种学习率表的损失曲线,并量化地解释超缩缩的起源。


Article 201

Title@2025-07-02 (3): TAROT: Targeted Data Selection via Optimal Transport

Title: TAROT: Targeted Data Selection via Optimal Transport TAROT: Gezielte Datenauswahl über optimalen Transport TAROT:通过最佳运输选择目标数据 2412.00420v2

Authors (4): Lan Feng, Fan Nie, Yuejiang Liu, Alexandre Alahi

We propose TAROT, a targeted data selection framework grounded in optimal transport theory. Previous targeted data selection methods primarily rely on influence-based greedy heuristics to enhance domain-specific performance. While effective on limited, unimodal data (i.e., data following a single pattern), these methods struggle as target data complexity increases. Specifically, in multimodal distributions, these heuristics fail to account for multiple inherent patterns, leading to suboptimal data selection. This work identifies two primary factors contributing to this limitation: (i) the disproportionate impact of dominant feature components in high-dimensional influence estimation, and (ii) the restrictive linear additive assumptions inherent in greedy selection strategies. To address these challenges, TAROT incorporates whitened feature distance to mitigate dominant feature bias, providing a more reliable measure of data influence. Building on this, TAROT uses whitened feature distance to quantify and minimize the optimal transport distance between the selected data and target domains. Notably, this minimization also facilitates the estimation of optimal selection ratios. We evaluate TAROT across multiple tasks, including semantic segmentation, motion prediction, and instruction tuning. Results consistently show that TAROT outperforms state-of-the-art methods, highlighting its versatility across various deep learning tasks. Code is available at https://github.com/vita-epfl/TAROT.

我们提议采用TAROT,这是基于最佳运输理论的定向数据选择框架。先前的定向数据选择方法主要依赖基于影响力的贪婪惯性理论来提高特定领域的绩效。尽管TAROT对有限的单一方式数据(即按照单一模式提供的数据)有效,但这些方法随着目标数据复杂性的增加而挣扎。具体地说,在多式联运分销中,这些惯性没有考虑到多种固有模式,导致数据选择不尽理想。这项工作确定了造成这一局限性的两个主要因素:(一) 高维影响估计中主要特征组成部分的不成比例影响,以及(二) 贪婪选择战略中固有的限制性线性添加假设。为了应对这些挑战,TAROT采用了白化特征距离来减少主要特征的偏差,提供了更可靠的数据影响度衡量标准。在此基础上,TAROT使用白色特征距离来量化和尽量减少选定数据和目标领域之间的最佳运输距离。值得注意的是,这种最小化还有助于估计最佳选择比率。我们评估了TAROT的多种任务,包括语系分化、运动预测和指示调整。为了应对这些挑战,TAROT会不断显示其深度定位/FIFF格式上的现有任务。


Article 202

Title@2025-07-02 (3): Parametric Neural Amp Modeling with Active Learning

Title: Parametric Neural Amp Modeling with Active Learning Parametrische Neuralamp-Modellierung mit aktivem Lernen 与积极学习学习建模 2507.02109v1

Authors (4): Florian Grötschla, Luca A. Lanzendörfer, Longxiang Jiao, Roger Wattenhofer

We introduce PANAMA, an active learning framework for the training of end-to-end parametric guitar amp models using a WaveNet-like architecture. With \model, one can create a virtual amp by recording samples that are determined by an active learning strategy to use a minimum amount of datapoints (i.e., amp knob settings). We show that gradient-based optimization algorithms can be used to determine the optimal datapoints to sample, and that the approach helps under a constrained number of samples.

我们引入了利用WaveNet(WaveNet)式结构培训端到端参数吉他模拟模型的积极学习框架PARNAMA(PARNAMA) 。 有了这样的模型,我们可以通过记录样本来创建虚拟模型,样本由积极学习战略决定,以使用最低数量的数据点( 即 amp knob 设置 ) 。 我们显示,基于梯度的优化算法可以用来确定样本的最佳数据点,而且这种方法有助于受限制的样本数量。


Article 203

Title@2025-07-02 (3): Resolving Turbulent Magnetohydrodynamics: A Hybrid Operator-Diffusion Framework

Title: Resolving Turbulent Magnetohydrodynamics: A Hybrid Operator-Diffusion Framework Auflösen der Turbulenten Magnetohydrodynamik: Ein hybrides Operator-Diffusions-Framework 溶解扰动磁磁流流体动力学:混合操作者-扩散框架 2507.02106v1

Authors (3): Semih Kacmaz, E. A. Huerta, Roland Haas

We present a hybrid machine learning framework that combines Physics-Informed Neural Operators (PINOs) with score-based generative diffusion models to simulate the full spatio-temporal evolution of two-dimensional, incompressible, resistive magnetohydrodynamic (MHD) turbulence across a broad range of Reynolds numbers ($\mathrm{Re}$). The framework leverages the equation-constrained generalization capabilities of PINOs to predict coherent, low-frequency dynamics, while a conditional diffusion model stochastically corrects high-frequency residuals, enabling accurate modeling of fully developed turbulence. Trained on a comprehensive ensemble of high-fidelity simulations with $\mathrm{Re} \in {100, 250, 500, 750, 1000, 3000, 10000}$, the approach achieves state-of-the-art accuracy in regimes previously inaccessible to deterministic surrogates. At $\mathrm{Re}=1000$ and $3000$, the model faithfully reconstructs the full spectral energy distributions of both velocity and magnetic fields late into the simulation, capturing non-Gaussian statistics, intermittent structures, and cross-field correlations with high fidelity. At extreme turbulence levels ($\mathrm{Re}=10000$), it remains the first surrogate capable of recovering the high-wavenumber evolution of the magnetic field, preserving large-scale morphology and enabling statistically meaningful predictions.

我们提出了一个混合机器学习框架,将物理-内分神经操作员(PINOs)与基于分数的遗传扩散模型结合起来,以模拟二维、不压缩、抗性磁体动力学(MHD)波动的全面瞬时演化,范围很广,包括Reynolds的数字(美元/美元/美元/美元/美元/美元/美元/美元/美元/美元/美元/美元/美元/美元/美元/美元/美元/美元/美元/美元/美元/美元/美元/美元/美元/美元/美元/美元/美元/美元/美元/美元/美元/美元/美元/美元/美元/美元/美元/美元/美元/美元/美元/美元/美元/美元/美元/美元/美元/美元/美元/美元/美元/美元/美元/美元/美元/美元/美元/美元/美元/美元/美元/美元/美元/美元/美元/美元/美元/美元/美元/美元/美元/美元/美元/美元/美元/美元/美元/美元/美元/美元/美元/美元/美元/美元/美元/美元/美元/美元/美元/美元/美元/美元/美元/美元/美元/美元/美元/美元/美元/美元/美元/美元/美元/美元/美元/美元/美元/美元/美元/美元/美元/美元/美元/美元/美元/美元/美元/美元/美元/美元/美元/美元/美元/美元/美元/美元/美元/美元/美元/美元/美元/美元/美元/美元/美元/美元/美元/美元/美元/美元/美元/美元/美元/美元/美元/美元/美元/美元/美元/美元/美元/美元/美元/美元/美元/美元/美元/美元/美元/美元/美元/美元/美元/美元/美元/美元/美元/美元/美元/美元/美元/美元/美元/美元/美元/美元/美元/美元/美元/美元/美元/美元/美元/美元/美元/美元/美元/美元/美元/美元/美元/美元/美元/美元/美元/美元/美元/美元/美元/美元/美元/美元/美元/美元/美元/美元/美元/美元/美元/美元/美元/美元/美元/美元/美元/美元/美元/美元/美元/


Article 204

Title@2025-07-02 (3): A robust and adaptive MPC formulation for Gaussian process models

Title: A robust and adaptive MPC formulation for Gaussian process models Eine robuste und adaptive MPC-Formulierung für Gaußsche Prozessmodelle Gaussian 过程模型的强有力和适应性的多氯三氯杀螨(MPC)配方 2507.02098v1

Authors (4): Mathieu Dubied, Amon Lahr, Melanie N. Zeilinger, Johannes Köhler

In this paper, we present a robust and adaptive model predictive control (MPC) framework for uncertain nonlinear systems affected by bounded disturbances and unmodeled nonlinearities. We use Gaussian Processes (GPs) to learn the uncertain dynamics based on noisy measurements, including those collected during system operation. As a key contribution, we derive robust predictions for GP models using contraction metrics, which are incorporated in the MPC formulation. The proposed design guarantees recursive feasibility, robust constraint satisfaction and convergence to a reference state, with high probability. We provide a numerical example of a planar quadrotor subject to difficult-to-model ground effects, which highlights significant improvements achieved through the proposed robust prediction method and through online learning.

在本文中,我们提出了一个强健和适应性模型预测控制框架(MPC),用于受受受受受受约束干扰和未建非线性影响的不确定非线性系统影响的非线性系统;我们利用高斯进程(GPs)了解以噪音测量为基础的不确定动态,包括在系统运行期间收集的动态;作为关键贡献,我们利用收缩指标对GP模型作出稳健预测,这些指标已纳入MPC的拟订中;拟议设计保障了循环可行性、强力制约满意度和与参考状态的趋同,概率很高。我们提供了受到难以建模的地面效应影响的平面二次钻研机的数字示例,其中突出了通过拟议的稳健预测方法和在线学习所取得的重大改进。


Article 205

Title@2025-07-02 (3): Asymptotically perfect seeded graph matching without edge correlation (and applications to inference)

Title: Asymptotically perfect seeded graph matching without edge correlation (and applications to inference) Asymptotisch perfekt gesäte Graphen ohne Randkorrelation (und Anwendungen zur Inferenz) 无边缘关联(和推断应用)的种子化图表匹配 2506.02825v2

Authors (4): Tong Qi, Vera Andersson, Peter Viechnicki, Vince Lyzinski

We present the OmniMatch algorithm for seeded multiple graph matching. In the setting of $d$-dimensional Random Dot Product Graphs (RDPG), we prove that under mild assumptions, OmniMatch with $s$ seeds asymptotically and efficiently perfectly aligns $O(s^{\alpha})$ unseeded vertices – for $\alpha<2\wedge d/4$ – across multiple networks even in the presence of no edge correlation. We demonstrate the effectiveness of our algorithm across numerous simulations and in the context of shuffled graph hypothesis testing. In the shuffled testing setting, testing power is lost due to the misalignment/shuffling of vertices across graphs, and we demonstrate the capacity of OmniMatch to correct for misaligned vertices prior to testing and hence recover the lost testing power. We further demonstrate the algorithm on a pair of data examples from connectomics and machine translation.

我们展示了种子多图匹配的 OmniMatch 算法。 在设定 $d$ 的维度随机点产品图(RDPG) 时, 我们证明, 在轻度假设下, 以美元种子进行O( säalpha}) 的OmniMatch , 以未种子的脊椎为单位, 以$alpha < 2\wedge d/4 美元为单位, 跨多个网络, 即便没有边际关联 。 我们展示了我们算法在众多模拟中的有效性, 并在平整的图形假设测试中展示了我们算法的有效性。 在平整的测试设置中, 测试力因图面脊椎的错配/ 麻动而丧失, 我们展示了 OmniMatch 在测试之前纠正错配错的脊椎从而恢复丢失的测试力的能力。 我们进一步展示了从连接组和机器翻译得到的一对一对数据示例的算法 。


Article 206

Title@2025-07-02 (3): Energy-Based Transformers are Scalable Learners and Thinkers

Title: Energy-Based Transformers are Scalable Learners and Thinkers Energiebasierte Transformer sind skalierbare Lernende und Denker 以能源为基础的变换器是可缩放的学习者和思想家 2507.02092v1

Authors (10): Alexi Gladstone, Ganesh Nanduru, Md Mofijul Islam, Peixuan Han, Hyeonjeong Ha, Aman Chadha, Yilun Du, Heng Ji, Jundong Li, Tariq Iqbal

Inference-time computation techniques, analogous to human System 2 Thinking, have recently become popular for improving model performances. However, most existing approaches suffer from several limitations: they are modality-specific (e.g., working only in text), problem-specific (e.g., verifiable domains like math and coding), or require additional supervision/training on top of unsupervised pretraining (e.g., verifiers or verifiable rewards). In this paper, we ask the question “Is it possible to generalize these System 2 Thinking approaches, and develop models that learn to think solely from unsupervised learning?” Interestingly, we find the answer is yes, by learning to explicitly verify the compatibility between inputs and candidate-predictions, and then re-framing prediction problems as optimization with respect to this verifier. Specifically, we train Energy-Based Transformers (EBTs) – a new class of Energy-Based Models (EBMs) – to assign an energy value to every input and candidate-prediction pair, enabling predictions through gradient descent-based energy minimization until convergence. Across both discrete (text) and continuous (visual) modalities, we find EBTs scale faster than the dominant Transformer++ approach during training, achieving an up to 35% higher scaling rate with respect to data, batch size, parameters, FLOPs, and depth. During inference, EBTs improve performance with System 2 Thinking by 29% more than the Transformer++ on language tasks, and EBTs outperform Diffusion Transformers on image denoising while using fewer forward passes. Further, we find that EBTs achieve better results than existing models on most downstream tasks given the same or worse pretraining performance, suggesting that EBTs generalize better than existing approaches. Consequently, EBTs are a promising new paradigm for scaling both the learning and thinking capabilities of models.

与人类系统 2 思维法相似的推断时间计算技术最近为改进模型性能而变得流行。然而,大多数现有方法都受到若干限制:它们与模式有关(例如,仅在文本中工作),与问题有关(例如,数学和编码等可核查的领域),或需要在未经监督的预培训前(例如,核查者或可核查的奖励)的顶部进行更多的监督/培训。在本文中,我们问“是否有可能将这些系统2的思考参数加以推广,并开发仅从未经监督的学习中进行思考的模型?”有趣的是,我们发现答案是肯定的,它们通过明确核实投入和候选人预言的兼容性,与问题有关(例如,数学和编码等可核实的领域),或者在未经监督的预言前导前导前导前导前导前导前导前导前导前导前导前导前导(例如,核查者或核查前导前导前导前导师),我们现有能源模型(EMM(EMM)的能量值值比新的更高,我们发现基于梯级的精化后导流流流流流流流流流流流流流流的能量模型,然后在EB的深度模型中测测测测测到EB期间比EV值比EB的更快, 和不断测测,在EB的变后演程中,在EB的变后演程中,在EB的变后演程中,在EB的变后演期间,在EB的变后演期间,在EV的变后演进进进进进进进。


Article 207

Title@2025-07-02 (3): Sample Complexity Bounds for Linear Constrained MDPs with a Generative Model

Title: Sample Complexity Bounds for Linear Constrained MDPs with a Generative Model Probenkomplexitätsbounds für lineare eingeschränkte MDPs mit einem Generativen Modell 具有创创型模型的线性封闭式微DP样品复杂度宽度 2507.02089v1

Authors (3): Xingtu Liu, Lin F. Yang, Sharan Vaswani

We consider infinite-horizon $\gamma$-discounted (linear) constrained Markov decision processes (CMDPs) where the objective is to find a policy that maximizes the expected cumulative reward subject to expected cumulative constraints. Given access to a generative model, we propose to solve CMDPs with a primal-dual framework that can leverage any black-box unconstrained MDP solver. For linear CMDPs with feature dimension $d$, we instantiate the framework by using mirror descent value iteration (\texttt{MDVI})~\citep{kitamura2023regularization} an example MDP solver. We provide sample complexity bounds for the resulting CMDP algorithm in two cases: (i) relaxed feasibility, where small constraint violations are allowed, and (ii) strict feasibility, where the output policy is required to exactly satisfy the constraint. For (i), we prove that the algorithm can return an $\epsilon$-optimal policy with high probability by using $\tilde{O}\left(\frac{d^2}{(1-\gamma)^4\epsilon^2}\right)$ samples. We note that these results exhibit a near-optimal dependence on both $d$ and $\epsilon$. For (ii), we show that the algorithm requires $\tilde{O}\left(\frac{d^2}{(1-\gamma)^6\epsilon^2\zeta^2}\right)$ samples, where $\zeta$ is the problem-dependent Slater constant that characterizes the size of the feasible region. Finally, we instantiate our framework for tabular CMDPs and show that it can be used to recover near-optimal sample complexities in this setting.

我们考虑无限- holizon $\ gamma $ 折扣( 线性) 限制 Markov 决策程序( CMDPs ) , 目标是找到一个政策, 在预期累积限制的情况下最大限度地实现预期累积奖赏。 在获得基因化模型时, 我们建议用一个原始的框架来解决 CMDPs , 这个框架可以利用任何黑盒不受约束的 MDP 求解器。 对于具有特性维度的线性 CMDPs 来说, 我们通过镜面下下移值( textt{ MDVI}) \ ciep{ kitmura2023 常规化 示例 MDP SDP 解析器。 我们为由此导致的 CMDP 算法提供了样本复杂性 :(i) 放松可行性, 允许小约束性违反, 和 (ii) 严格的可行性, 这个输出政策需要完全满足限制 。 对于 (i) 我们证明, 算法可以返回一个 $\ eplon $- opmatal $ $( mother) $@ rimax $ $x_ $x_ 美元) rass2\\\\\\\\\\\\ gmax lax lax max max max max max max max max max max max max max


Article 208

Title@2025-07-02 (3): Learning From Crowdsourced Noisy Labels: A Signal Processing Perspective

Title: Learning From Crowdsourced Noisy Labels: A Signal Processing Perspective Von Crowdsourced Noisy Labels lernen: Eine Perspektive der Signalverarbeitung 从众包噪音标签学习:信号处理视角 2407.06902v2

Authors (4): Shahana Ibrahim, Panagiotis A. Traganitis, Xiao Fu, Georgios B. Giannakis

One of the primary catalysts fueling advances in artificial intelligence (AI) and machine learning (ML) is the availability of massive, curated datasets. A commonly used technique to curate such massive datasets is crowdsourcing, where data are dispatched to multiple annotators. The annotator-produced labels are then fused to serve downstream learning and inference tasks. This annotation process often creates noisy labels due to various reasons, such as the limited expertise, or unreliability of annotators, among others. Therefore, a core objective in crowdsourcing is to develop methods that effectively mitigate the negative impact of such label noise on learning tasks. This feature article introduces advances in learning from noisy crowdsourced labels. The focus is on key crowdsourcing models and their methodological treatments, from classical statistical models to recent deep learning-based approaches, emphasizing analytical insights and algorithmic developments. In particular, this article reviews the connections between signal processing (SP) theory and methods, such as identifiability of tensor and nonnegative matrix factorization, and novel, principled solutions of longstanding challenges in crowdsourcing – showing how SP perspectives drive the advancements of this field. Furthermore, this article touches upon emerging topics that are critical for developing cutting-edge AI/ML systems, such as crowdsourcing in reinforcement learning with human feedback (RLHF) and direct preference optimization (DPO) that are key techniques for fine-tuning large language models (LLMs).

推动人工智能(AI)和机器学习(ML)进步的主要催化剂之一是提供大规模、可分类化的数据集。一种常用的方法是将数据发送给多个批注者,将批注者制作的标签合并起来,用于下游的学习和推论任务。这种批注过程往往由于各种原因,例如专门知识有限或批注者不可靠等原因,制造噪音标签。因此,众包的核心目标是制定有效减轻这类标签噪音对学习任务的消极影响的方法。这一特别文章介绍了从吵闹的众包标签学习的进展。重点是关键的众包模型及其方法处理,从典型的统计模型到最近的深层次学习方法,强调分析见解和算法发展。特别是,这一文章回顾了信号处理(SP)理论和方法之间的联系,例如感应和不负偏重性矩阵要素化,以及更精细的群包化挑战解决方案 – 展示了聚标本的群包件标签(SP-观点是核心的升级的IML),从而推进了核心的IML。


Article 209

Title@2025-07-02 (3): Evaluating the Promise and Pitfalls of LLMs in Hiring Decisions

Title: Evaluating the Promise and Pitfalls of LLMs in Hiring Decisions Bewertung des Versprechens und der Fälle von LLMs bei Hiring-Entscheidungen 评估LLM女士在雇用决定中的许诺和机会 2507.02087v1

Authors (4): Eitan Anzenberg, Arunava Samajpati, Sivasankaran Chandrasekar, Varun Kacholia

The use of large language models (LLMs) in hiring promises to streamline candidate screening, but it also raises serious concerns regarding accuracy and algorithmic bias where sufficient safeguards are not in place. In this work, we benchmark several state-of-the-art foundational LLMs - including models from OpenAI, Anthropic, Google, Meta, and Deepseek, and compare them with our proprietary domain-specific hiring model (Match Score) for job candidate matching. We evaluate each model’s predictive accuracy (ROC AUC, Precision-Recall AUC, F1-score) and fairness (impact ratio of cut-off analysis across declared gender, race, and intersectional subgroups). Our experiments on a dataset of roughly 10,000 real-world recent candidate-job pairs show that Match Score outperforms the general-purpose LLMs on accuracy (ROC AUC 0.85 vs 0.77) and achieves significantly more equitable outcomes across demographic groups. Notably, Match Score attains a minimum race-wise impact ratio of 0.957 (near-parity), versus 0.809 or lower for the best LLMs, (0.906 vs 0.773 for the intersectionals, respectively). We discuss why pretraining biases may cause LLMs with insufficient safeguards to propagate societal biases in hiring scenarios, whereas a bespoke supervised model can more effectively mitigate these biases. Our findings highlight the importance of domain-specific modeling and bias auditing when deploying AI in high-stakes domains such as hiring, and caution against relying on off-the-shelf LLMs for such tasks without extensive fairness safeguards. Furthermore, we show with empirical evidence that there shouldn’t be a dichotomy between choosing accuracy and fairness in hiring: a well-designed algorithm can achieve both accuracy in hiring and fairness in outcomes.

大型语言模型(LLMS)在招聘时的使用有望简化候选人筛选程序,但也引起了人们对准确性和算法偏差的严重关切。 在这项工作中,我们将一些最先进的基础模型(包括OpenAI、Anthroopicic、Google、Meta和DeepSeek的模型)作为基准,并将它们与我们专有的域别特定招聘模式(Match Scord)相比,用于招聘候选人匹配。我们评估了每个模型的预测准确性(ROC ACUC、 Precision-Recall AUC、F1-Score)和公平性(在宣布的性别、种族和交叉分组之间缺乏足够保障的情况下,截断率和算分析的比重)以及公平性(在宣布的性别、种族、种族和交叉分组之间缺乏足够保障的情况下,我们对最先进的基本基本基本基本基本基本基本基本基本基本基本基本基本基本标准 — — 在招聘过程中,我们的标准(Ox906)和(BLMS)之间可以有效地进行准确性评估。


Article 210

Title@2025-07-02 (3): Non-negative matrix factorization algorithms generally improve topic model fits

Title: Non-negative matrix factorization algorithms generally improve topic model fits Nicht-negative Matrixfaktorisierungsalgorithmen verbessern generell Themenmodell-Anpassungen 非负矩阵非负矩阵乘数化算法一般改进专题模式 2105.13440v3

Authors (4): Peter Carbonetto, Abhishek Sarkar, Zihao Wang, Matthew Stephens

We report on the potential for using algorithms for non-negative matrix factorization (NMF) to improve parameter estimation in topic models. While several papers have studied connections between NMF and topic models, none have suggested leveraging these connections to develop new algorithms for fitting topic models. NMF avoids the “sum-to-one” constraints on the topic model parameters, resulting in an optimization problem with simpler structure and more efficient computations. Building on recent advances in optimization algorithms for NMF, we show that first solving the NMF problem then recovering the topic model fit can produce remarkably better fits, and in less time, than standard algorithms for topic models. While we focus primarily on maximum likelihood estimation, we show that this approach also has the potential to improve variational inference for topic models. Our methods are implemented in the R package fastTopics.

我们报告使用非负矩阵系数化算法改进专题模型参数估计的可能性。虽然有几份论文研究了NMF和专题模型之间的联系,但没有人建议利用这些联系来开发适合专题模型的新算法。NMF避免了专题模型参数的“总对一”限制,从而导致结构更简单、计算效率更高的优化问题。根据NMF优化算法的最近进展,我们表明,首先解决NMF问题,然后恢复专题模型的合适性能可以比专题模型的标准算法更合适,时间更短。虽然我们主要侧重于最大的可能性估计,但我们表明,这一方法还有可能改善专题模型的变异推论。我们在R包快速托运中采用了我们的方法。


Article 211

Title@2025-07-02 (3): Selective Feature Re-Encoded Quantum Convolutional Neural Network with Joint Optimization for Image Classification

Title: Selective Feature Re-Encoded Quantum Convolutional Neural Network with Joint Optimization for Image Classification Selektives Feature neukodiertes Quantum-Konvolutionäres Neuralnetzwerk mit gemeinsamer Optimierung für die Bildklassifizierung 具有图像分类联合优化功能的量子革命神经网络 2507.02086v1

Authors (5): Shaswata Mahernob Sarkar, Sheikh Iftekhar Ahmed, Jishnu Mahmud, Shaikh Anowarul Fattah, Gaurav Sharma

Quantum Machine Learning (QML) has seen significant advancements, driven by recent improvements in Noisy Intermediate-Scale Quantum (NISQ) devices. Leveraging quantum principles such as entanglement and superposition, quantum convolutional neural networks (QCNNs) have demonstrated promising results in classifying both quantum and classical data. This study examines QCNNs in the context of image classification and proposes a novel strategy to enhance feature processing and a QCNN architecture for improved classification accuracy. First, a selective feature re-encoding strategy is proposed, which directs the quantum circuits to prioritize the most informative features, thereby effectively navigating the crucial regions of the Hilbert space to find the optimal solution space. Secondly, a novel parallel-mode QCNN architecture is designed to simultaneously incorporate features extracted by two classical methods, Principal Component Analysis (PCA) and Autoencoders, within a unified training scheme. The joint optimization involved in the training process allows the QCNN to benefit from complementary feature representations, enabling better mutual readjustment of model parameters. To assess these methodologies, comprehensive experiments have been performed using the widely used MNIST and Fashion MNIST datasets for binary classification tasks. Experimental findings reveal that the selective feature re-encoding method significantly improves the quantum circuit’s feature processing capability and performance. Furthermore, the jointly optimized parallel QCNN architecture consistently outperforms the individual QCNN models and the traditional ensemble approach involving independent learning followed by decision fusion, confirming its superior accuracy and generalization capabilities.

量子机器学习(QML)取得了显著进展,其推动因素是最近新颖中度量子(NISQ)装置的改进;利用量子原则,如缠绕和叠加,量子进动神经网络(QCNN),在对量子和古典数据进行分类方面展示了大有希望的成果;这项研究在图像分类方面审查了QCNN, 提出了加强特征处理的新战略,并提出了提高分类准确性能的QCNN结构;首先,提出了选择性特性重新编码战略,指导量子电路确定最传统的信息特征的优先次序,从而有效地浏览希尔伯特空间的关键区域,找到最佳的解决方案空间;第二,设计了新型平行模型QCNN结构,在统一的培训计划范围内,将两种传统方法,即主构件分析(PCA)和Autocle Concorders所提取的特征同时纳入新的平行模型;在培训过程中,联合优化使QCNN公司能够通过补充性特征介绍,更好地相互调整模型参数;为了评估这些方法,全面试验了Hilbert-ML的精度分析,并运用了基础数据分析,从而大大改进了基础分析,从而改进了IMIST-IST-ILS-ILS-IL-I-I-IL-IL-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-IL-B-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I)-I)-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-


Article 212

Title@2025-07-02 (3): GeoAda: Efficiently Finetune Geometric Diffusion Models with Equivariant Adapters

Title: GeoAda: Efficiently Finetune Geometric Diffusion Models with Equivariant Adapters GeoAda: Effizient feintune geometrische Diffusionsmodelle mit äquivalenten Adaptern GeoAda: 配有等差适应器的有效微粒几何扩散模型 2507.02085v1

Authors (6): Wanjia Zhao, Jiaqi Han, Siyi Gu, Mingjian Jiang, James Zou, Stefano Ermon

Geometric diffusion models have shown remarkable success in molecular dynamics and structure generation. However, efficiently fine-tuning them for downstream tasks with varying geometric controls remains underexplored. In this work, we propose an SE(3)-equivariant adapter framework ( GeoAda) that enables flexible and parameter-efficient fine-tuning for controlled generative tasks without modifying the original model architecture. GeoAda introduces a structured adapter design: control signals are first encoded through coupling operators, then processed by a trainable copy of selected pretrained model layers, and finally projected back via decoupling operators followed by an equivariant zero-initialized convolution. By fine-tuning only these lightweight adapter modules, GeoAda preserves the model’s geometric consistency while mitigating overfitting and catastrophic forgetting. We theoretically prove that the proposed adapters maintain SE(3)-equivariance, ensuring that the geometric inductive biases of the pretrained diffusion model remain intact during adaptation. We demonstrate the wide applicability of GeoAda across diverse geometric control types, including frame control, global control, subgraph control, and a broad range of application domains such as particle dynamics, molecular dynamics, human motion prediction, and molecule generation. Empirical results show that GeoAda achieves state-of-the-art fine-tuning performance while preserving original task accuracy, whereas other baselines experience significant performance degradation due to overfitting and catastrophic forgetting.

GeoAda 引入了结构化的适配设计: 控制信号首先通过混合操作器编码,然后由经过培训的选定模型层的可培训副本处理,最后通过分解操作器进行反射,然后进行分解操作器,然后进行等离子化零初始化演动。在这项工作中,我们建议建立一个SE(3) QQQQevariant 适配器框架(GeoAda) , 使在不修改原始模型结构的情况下对受控的基因化任务进行灵活和有参数效率的微调。 GeoAda 引入了一个结构结构结构结构化的调整: 控制信号首先通过混合操作器编码,然后由经过培训的选定预选模型层进行处理,最后通过分解操作器进行反射。 通过微调这些轻重调调调调调器模块,GeoAda 保持模型的几何一致性,同时减轻和灾难性的遗忘性能。 我们从理论上证明,拟议的调整模型的测算模型的精确性能,同时展示了其他恒定型模型的精确度, 模型测测测算结果,并展示了模型的模型的精确性能,并显示其他的精确性能,例如颗粒力的模型测测测测测测测测测测测测测测测力等等。


Article 213

Title@2025-07-02 (3): Adaptive Iterative Soft-Thresholding Algorithm with the Median Absolute Deviation

Title: Adaptive Iterative Soft-Thresholding Algorithm with the Median Absolute Deviation Adaptive iterative Soft-Thresholding-Algorithmus mit der Median Absolute Abweichung 与中位绝对偏离相适应性迭替性软- 悬浮- 悬浮- 悬浮持量 2507.02084v1

Authors (2): Yining Feng, Ivan Selesnick

The adaptive Iterative Soft-Thresholding Algorithm (ISTA) has been a popular algorithm for finding a desirable solution to the LASSO problem without explicitly tuning the regularization parameter $\lambda$. Despite that the adaptive ISTA is a successful practical algorithm, few theoretical results exist. In this paper, we present the theoretical analysis on the adaptive ISTA with the thresholding strategy of estimating noise level by median absolute deviation. We show properties of the fixed points of the algorithm, including scale equivariance, non-uniqueness, and local stability, prove the local linear convergence guarantee, and show its global convergence behavior.

适应性迭代软压力控股ALgorithm(ISTA)一直是一种为LASSO问题寻找理想解决办法的流行算法,而没有明确地调整正规化参数$\lambda$。尽管适应性ISTA是一个成功的实用算法,但几乎没有理论结果。在本文中,我们介绍了关于适应性ISTA的理论分析,其阈值战略是通过中位绝对偏差估计噪音水平。我们显示了算法固定点的特性,包括规模等差、非独一性和当地稳定,证明了本地线性趋同保证,并展示了其全球趋同行为。


Article 214

Title@2025-07-02 (3): Unraveling particle dark matter with Physics-Informed Neural Networks

Title: Unraveling particle dark matter with Physics-Informed Neural Networks Entwirren von Teilchen Dunkle Materie mit physik-informierten Neuronalen Netzwerken 用物理内成形神经网络解除粒子暗物质 2502.17597v2

Authors (3): M. P. Bento, H. B. Câmara, J. F. Seabra

We parametrically solve the Boltzmann equations governing freeze-in dark matter (DM) in alternative cosmologies with Physics-Informed Neural Networks (PINNs), a mesh-free method. Through inverse PINNs, using a single DM experimental point – observed relic density – we determine the physical attributes of the theory, namely power-law cosmologies, inspired by braneworld scenarios, and particle interaction cross sections. The expansion of the Universe in such alternative cosmologies has been parameterized through a switch-like function reproducing the Hubble law at later times. Without loss of generality, we model more realistically this transition with a smooth function. We predict a distinct pair-wise relationship between power-law exponent and particle interactions: for a given cosmology with negative (positive) exponent, smaller (larger) cross sections are required to reproduce the data. Lastly, via Bayesian methods, we quantify the epistemic uncertainty of theoretical parameters found in inverse problems.

我们用一种无网状方法,即物理化神经网络(PINNs)的替代宇宙体中,用一种无网状法的方法,对关于冻结在暗物质(DM)的Boltzmann方程式进行参数处理。我们通过反PINNs,使用一个单一的DM实验点 – – 观察到的遗迹密度 – – 我们确定理论的物理属性,即受branneworld设想的动力法宇宙和粒子交互作用所启发的动力法宇宙和宇宙学的物理属性。在这种替代宇宙体中,宇宙的扩展通过一个类似开关的功能来参数来参数化,随后再生成哈勃法。在不丧失一般性的情况下,我们更现实地模拟这种转变,以一个平稳的功能。我们预测了权力法前言和粒子相互作用之间的一种截然不同的对对式关系:对于一个带有负(正)前奏的宇宙学说,需要较小(大)的跨段来复制数据。最后,我们通过巴耶斯的方法来量化在反问题中发现的理论参数的认知不确定性。


Article 215

Title@2025-07-02 (3): Towards Explaining Deep Neural Network Compression Through a Probabilistic Latent Space

Title: Towards Explaining Deep Neural Network Compression Through a Probabilistic Latent Space Auf dem Weg zur Erklärung der Komprimierung eines tiefen neuralen Netzwerks durch einen probabilistischen Latentenraum 致力于通过概率低端空间解释深神经网络压缩 2403.00155v3

Authors (2): Mahsa Mozafari-Nia, Salimeh Yasaei Sekeh

Despite the impressive performance of deep neural networks (DNNs), their computational complexity and storage space consumption have led to the concept of network compression. While DNN compression techniques such as pruning and low-rank decomposition have been extensively studied, there has been insufficient attention paid to their theoretical explanation. In this paper, we propose a novel theoretical framework that leverages a probabilistic latent space of DNN weights and explains the optimal network sparsity by using the information-theoretic divergence measures. We introduce new analogous projected patterns (AP2) and analogous-in-probability projected patterns (AP3) notions for DNNs and prove that there exists a relationship between AP3/AP2 property of layers in the network and its performance. Further, we provide a theoretical analysis that explains the training process of the compressed network. The theoretical results are empirically validated through experiments conducted on standard pre-trained benchmarks, including AlexNet, ResNet50, and VGG16, using CIFAR10 and CIFAR100 datasets. Through our experiments, we highlight the relationship of AP3 and AP2 properties with fine-tuning pruned DNNs and sparsity levels.

尽管深层神经网络(DNNs)的表现令人印象深刻,但其计算复杂性和存储空间消耗导致网络压缩的概念。虽然对DNN压缩技术,如裁剪和低空分解等,进行了广泛的研究,但对其理论解释不够重视。在本文件中,我们提出一个新的理论框架,利用信息理论差异措施,利用DNN的概率潜值空间解释最佳网络宽度。我们为DNs引入了新的类似预测模式(AP2)和类似的可能性预测模式(AP3),并证明AP3/AP2的网络层属性及其性能之间存在关系。此外,我们提供理论分析,解释压缩网络的培训过程。理论结果通过在标准培训前基准(包括AlexNet、ResNet50和VGG16)上进行的实验,通过使用CIFAR10和CIFAR100数据集,在经验上得到验证。我们通过实验,强调了AP3和AP2的特性与精细调整的DNS和空间水平之间的关系。我们强调AP3和AP2的特性与精细调整的DNS和APs。


Article 216

Title@2025-07-02 (3): A Model-Consistent Data-Driven Computational Strategy for PDE Joint Inversion Problems

Title: A Model-Consistent Data-Driven Computational Strategy for PDE Joint Inversion Problems Eine modellkonsistente datengestützte Computerstrategie für gemeinsame PDE-Inversionsprobleme PDE 联合流入问题数据驱动计算示范战略 2210.09228v3

Authors (2): Kui Ren, Lu Zhang

The task of simultaneously reconstructing multiple physical coefficients in partial differential equations (PDEs) from observed data is ubiquitous in applications. In this work, we propose an integrated data-driven and model-based iterative reconstruction framework for such joint inversion problems where additional data on the unknown coefficients are supplemented for better reconstructions. Our method couples the supplementary data with the PDE model to make the data-driven modeling process consistent with the model-based reconstruction procedure. We characterize the impact of learning uncertainty on the joint inversion results for two typical inverse problems. Numerical evidence is provided to demonstrate the feasibility of using data-driven models to improve the joint inversion of multiple coefficients in PDEs.

从观察到的数据中同时重建部分差异方程中多重物理系数的任务在应用中普遍存在。在这项工作中,我们提议为这类联合倒置问题建立一个综合的数据驱动和基于模型的迭代重建框架,为更好的重建补充关于未知系数的额外数据。我们的方法是将补充数据与PDE模型结合起来,使数据驱动模型进程与基于模型的重建程序相一致。我们说明了学习不确定性对两个典型的反向问题的共同倒置结果的影响。我们提供了数字证据,以证明使用数据驱动模型改进多系数在PDE中联合倒置的可行性。


Article 217

Title@2025-07-02 (3): Privacy-Preserving in Connected and Autonomous Vehicles Through Vision to Text Transformation

Title: Privacy-Preserving in Connected and Autonomous Vehicles Through Vision to Text Transformation Schutz der Privatsphäre in vernetzten und autonomen Fahrzeugen durch Vision zur Texttransformation 通过愿景在连接和自主车辆中保护隐私到文字转换 2506.15854v2

Authors (3): Abdolazim Rezaei, Mehdi Sookhak, Ahmad Patooghy

Connected and Autonomous Vehicles (CAVs) rely on a range of devices that often process privacy-sensitive data. Among these, roadside units play a critical role particularly through the use of AI-equipped (AIE) cameras for applications such as violation detection. However, the privacy risks associated with captured imagery remain a major concern, as such data can be misused for identity theft, profiling, or unauthorized commercial purposes. While traditional techniques such as face blurring and obfuscation have been applied to mitigate privacy risks, individual privacy remains at risk, as individuals can still be tracked using other features such as their clothing. This paper introduces a novel privacy-preserving framework that leverages feedback-based reinforcement learning (RL) and vision-language models (VLMs) to protect sensitive visual information captured by AIE cameras. The main idea is to convert images into semantically equivalent textual descriptions, ensuring that scene-relevant information is retained while visual privacy is preserved. A hierarchical RL strategy is employed to iteratively refine the generated text, enhancing both semantic accuracy and privacy. Evaluation results demonstrate significant improvements in both privacy protection and textual quality, with the Unique Word Count increasing by approximately 77\% and Detail Density by around 50\% compared to existing approaches.

连接和自主车辆(CAVs)往往依赖一系列往往处理对隐私敏感数据的装置,其中路边单位发挥着关键作用,特别是利用AI设备(AIE)相机进行违规检测等应用,但与所摄图像有关的隐私风险仍然是一个重大关切问题,因为此类数据可能被滥用于身份盗窃、貌相或未经授权的商业目的。虽然应用了模糊和模糊等传统技术来降低隐私风险,但个人隐私仍然处于风险之中,因为个人仍然可以使用诸如其服装等其他特征进行跟踪。本文引入了一个新的隐私保护框架,利用基于反馈的强化学习(RL)和视觉语言模型(VLMS)来利用反馈的强化学习(RL)和视觉语言模型(VLMS)来保护AIE相机所捕捉的敏感视觉信息。主要想法是将图像转换成等同的文字描述,确保在维护视觉隐私的同时保留与现场有关的信息。使用等级RL战略来反复完善生成的文本,提高语义性和隐私。评估结果显示隐私保护和文字质量方面的显著改进,将隐私保护和文字质量与现有的50Retriquen和Dretalesn(通过近77xxxxxxxxxxnxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx77xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx


Article 218

Title@2025-07-02 (3): Reasoning on a Budget: A Survey of Adaptive and Controllable Test-Time Compute in LLMs

Title: Reasoning on a Budget: A Survey of Adaptive and Controllable Test-Time Compute in LLMs Begründung eines Budgets: Eine Übersicht über adaptive und kontrollierbare Test-Time Compute in LLMs 预算理由:关于LLMM中适应和可控试验时间计算调查 2507.02076v1

Authors (17): Mohammad Ali Alomrani, Yingxue Zhang, Derek Li, Qianyi Sun, Soumyasundar Pal, Zhanguang Zhang, Yaochen Hu, Rohan Deepak Ajwani, Antonios Valkanas, Raika Karimi, Peng Cheng, Yunzhou Wang, Pengyi Liao, Hanrui Huang, Bin Wang, Jianye Hao, Mark Coates

Large language models (LLMs) have rapidly progressed into general-purpose agents capable of solving a broad spectrum of tasks. However, current models remain inefficient at reasoning: they apply fixed inference-time compute regardless of task complexity, often overthinking simple problems while underthinking hard ones. This survey presents a comprehensive review of efficient test-time compute (TTC) strategies, which aim to improve the computational efficiency of LLM reasoning. We introduce a two-tiered taxonomy that distinguishes between L1-controllability, methods that operate under fixed compute budgets, and L2-adaptiveness, methods that dynamically scale inference based on input difficulty or model confidence. We benchmark leading proprietary LLMs across diverse datasets, highlighting critical trade-offs between reasoning performance and token usage. Compared to prior surveys on efficient reasoning, our review emphasizes the practical control, adaptability, and scalability of TTC methods. Finally, we discuss emerging trends such as hybrid thinking models and identify key challenges for future work towards making LLMs more computationally efficient, robust, and responsive to user constraints.

大型语言模型(LLMS)已迅速发展成为能够解决一系列广泛任务的一般用途代理商,然而,目前的模型在推理上仍然效率低下:无论任务复杂程度如何,它们都采用固定的推算时间计算方法,往往在考虑困难程度时过度思考简单的问题。本调查全面审查了高效测试时间计算(TTC)战略,目的是提高LLM推理的计算效率。我们引入了两级分类法,区分L1控制能力、在固定计算预算下运作的方法以及基于投入困难或模型信任度的动态推论方法。我们把主要专有有限责任有限公司作为基准,跨越不同的数据集,突出推理性业绩和象征性使用之间的重大权衡。与以前关于有效推理的调查相比,我们的审查强调TTC方法的实际控制、适应性和可扩缩性。最后,我们讨论了诸如混合思维模型等新出现的趋势,并确定了今后使LMS提高计算效率、稳健和对用户制约作出反应的工作所面临的主要挑战。


Article 219

Title@2025-07-02 (3): Exploring the Design Space of Diffusion Bridge Models

Title: Exploring the Design Space of Diffusion Bridge Models Erforschung des Designraums von Modellen der Diffusionsbrücke 探索传播桥梁模型的设计空间 2410.21553v2

Authors (3): Shaorong Zhang, Yuanbin Cheng, Greg Ver Steeg

Diffusion bridge models and stochastic interpolants enable high-quality image-to-image (I2I) translation by creating paths between distributions in pixel space. However, the proliferation of techniques based on incompatible mathematical assumptions have impeded progress. In this work, we unify and expand the space of bridge models by extending Stochastic Interpolants (SIs) with preconditioning, endpoint conditioning, and an optimized sampling algorithm. These enhancements expand the design space of diffusion bridge models, leading to state-of-the-art performance in both image quality and sampling efficiency across diverse I2I tasks. Furthermore, we identify and address a previously overlooked issue of low sample diversity under fixed conditions. We introduce a quantitative analysis for output diversity and demonstrate how we can modify the base distribution for further improvements.

通过在像素空间的分布间建立路径,传播桥梁模型和随机图象(I2I)能够实现高质量的图像到图像翻译。然而,基于不相容数学假设的技术扩散阻碍了进展。在这项工作中,我们通过扩大具有先决条件、端点调节和优化取样算法的斯托切内托(SIs)来统一和扩大桥梁模型的空间。这些增强扩大了传播桥模型的设计空间,导致在图像质量和取样效率方面在不同I2I任务中取得最新业绩。此外,我们查明并解决以前忽视的一个在固定条件下样本多样性低的问题。我们对产出多样性进行定量分析,并展示我们如何能够为进一步改进而修改基本分布。


Article 220

Title@2025-07-02 (3): HCVR: A Hybrid Approach with Correlation-aware Voting Rules for Feature Selection

Title: HCVR: A Hybrid Approach with Correlation-aware Voting Rules for Feature Selection HCVR: Ein hybrider Ansatz mit korrelationsbewussten Abstimmungsregeln für die Feature Selection HCVR:关于选择地物的具有关联意识的表决规则的混合办法 2507.02073v1

Authors (2): Nikita Bhedasgaonkar, Rushikesh K. Joshi

In this paper, we propose HCVR (Hybrid approach with Correlation-aware Voting Rules), a lightweight rule-based feature selection method that combines Parameter-to-Parameter (P2P) and Parameter-to-Target (P2T) correlations to eliminate redundant features and retain relevant ones. This method is a hybrid of non-iterative and iterative filtering approaches for dimensionality reduction. It is a greedy method, which works by backward elimination, eliminating possibly multiple features at every step. The rules contribute to voting for features, and a decision to keep or discard is made by majority voting. The rules make use of correlation thresholds between every pair of features, and between features and the target. We provide the results from the application of HCVR to the SPAMBASE dataset. The results showed improvement performance as compared to traditional non-iterative (CFS, mRMR and MI) and iterative (RFE, SFS and Genetic Algorithm) techniques. The effectiveness was assessed based on the performance of different classifiers after applying filtering.

在本文中,我们提议采用 “ HCVR “ (与 “ 关联度投票规则 “ 相结合的 “ 湿法 “ ),即基于规则的轻量级特征选择方法,结合参数到参数(P2P)和参数到目标(P2T)的相互关系,以消除冗余特征并保留相关特征。这一方法混合了用于减少维度的不透析和迭代过滤方法。这是一种贪婪方法,通过向后消除,在每一个步骤中消除可能的多重特征。规则有助于对特征的投票,保留或丢弃的决定由多数人投票做出。规则使用每种特征之间以及特征与目标之间的相关阈值。我们提供了将HCVR应用于SPAMBASE数据集的结果。结果显示,与传统的非粘性(CFS、MRMR和MI)和迭代(RFE、SFS和遗传Algorithm)技术相比,效果有所改善。根据应用过滤后不同分类员的绩效进行评估。


Article 221

Title@2025-07-02 (3): Benign Overfitting without Linearity: Neural Network Classifiers Trained by Gradient Descent for Noisy Linear Data

Title: Benign Overfitting without Linearity: Neural Network Classifiers Trained by Gradient Descent for Noisy Linear Data Benign Overfiting ohne Linearität: Neurale Netzwerkklassifikatoren von Gradient Descent für Noisy Linear Data ausgebildet 无线性地穿透穿透: 由梯度下层训练的神经网络分类器, 用于 Noisy 线性数据 2202.05928v5

Authors (3): Spencer Frei, Niladri S. Chatterji, Peter L. Bartlett

Benign overfitting, the phenomenon where interpolating models generalize well in the presence of noisy data, was first observed in neural network models trained with gradient descent. To better understand this empirical observation, we consider the generalization error of two-layer neural networks trained to interpolation by gradient descent on the logistic loss following random initialization. We assume the data comes from well-separated class-conditional log-concave distributions and allow for a constant fraction of the training labels to be corrupted by an adversary. We show that in this setting, neural networks exhibit benign overfitting: they can be driven to zero training error, perfectly fitting any noisy training labels, and simultaneously achieve minimax optimal test error. In contrast to previous work on benign overfitting that require linear or kernel-based predictors, our analysis holds in a setting where both the model and learning dynamics are fundamentally nonlinear.

位于偏僻处的超光层,即当有吵闹的数据时,内插模型非常普遍的现象,首先在经过梯度下降训练的神经网络模型中观察到。为了更好地了解这一经验性观察,我们考虑了在随机初始化后,通过梯度下降对后勤损失进行内插而训练的两层神经网络的普遍误差。我们假设数据来自分离的等级条件对线和内核的分布,并允许不断有一部分的培训标签被对手腐蚀。我们表明,在这种环境下,神经网络显示出无害的超值:它们可以被驱动到零训练错误,完全适合任何吵闹的培训标签,同时达到微量最大最佳测试错误。 与以往关于需要线性或内核预测器的良性超标的工作相比,我们的分析存在于一个模型和学习动态基本上非线性的环境中。


Article 222

Title@2025-07-02 (3): Byzantine-Robust Gossip: Insights from a Dual Approach

Title: Byzantine-Robust Gossip: Insights from a Dual Approach Byzantinisch-Robuster Klatsch: Einblicke aus einem doppelten Ansatz Byzantine-Robust Gossip:从双向办法中透视 2405.03449v2

Authors (3): Renaud Gaucher, Aymeric Dieuleveut, Hadrien Hendrikx

Distributed learning has many computational benefits but is vulnerable to attacks from a subset of devices transmitting incorrect information. This paper investigates Byzantine-resilient algorithms in a decentralized setting, where devices communicate directly in a peer-to-peer manner within a communication network. We leverage the so-called dual approach for decentralized optimization and propose a Byzantine-robust algorithm. We provide convergence guarantees in the average consensus subcase, discuss the potential of the dual approach beyond this subcase, and re-interpret existing algorithms using the dual framework. Lastly, we experimentally show the soundness of our method.

分布式学习有许多计算上的好处,但很容易受到传递不正确信息的一组装置的攻击。本文调查了分散环境中的拜占庭抗逆算法,这些装置在通信网络内以对等方式直接通信。我们利用所谓的双重方法进行分散优化,并提出了拜占庭-野蛮算法。我们在平均共识子案例中提供趋同保证,讨论这一子案例之外的双重方法的潜力,并利用双重框架重新解释现有的算法。最后,我们实验地展示了我们方法的健全性。


Article 223

Title@2025-07-02 (3): SPACE-SUIT: An Artificial Intelligence Based Chromospheric Feature Extractor and Classifier for SUIT

Title: SPACE-SUIT: An Artificial Intelligence Based Chromospheric Feature Extractor and Classifier for SUIT SPACE-SUIT: Ein auf Künstlicher Intelligenz basierender chromosphärischer Feature Extractor und Klassifikator für SUIT 天基:人造情报基础铬化层地物提取器和SUIT分类器 2412.08589v2

Authors (9): Pranava Seth, Vishal Upendran, Megha Anand, Janmejoy Sarkar, Soumya Roy, Priyadarshan Chaki, Pratyay Chowdhury, Borishan Ghosh, Durgesh Tripathi

The Solar Ultraviolet Imaging Telescope(SUIT) onboard Aditya-L1 is an imager that observes the solar photosphere and chromosphere through observations in the wavelength range of 200-400 nm. A comprehensive understanding of the plasma and thermodynamic properties of chromospheric and photospheric morphological structures requires a large sample statistical study, necessitating the development of automatic feature detection methods. To this end, we develop the feature detection algorithm SPACE-SUIT: Solar Phenomena Analysis and Classification using Enhanced vision techniques for SUIT, to detect and classify the solar chromospheric features to be observed from SUIT’s Mg II k filter. Specifically, we target plage regions, sunspots, filaments, and off-limb structures. SPACE uses YOLO, a neural network-based model to identify regions of interest. We train and validate SPACE using mock-SUIT images developed from Interface Region Imaging Spectrometer(IRIS) full-disk mosaic images in Mg II k line, while we also perform detection on Level-1 SUIT data. SPACE achieves an approximate precision of 0.788, recall 0.863 and MAP of 0.874 on the validation mock SUIT FITS dataset. Given the manual labeling of our dataset, we perform “self-validation” by applying statistical measures and Tamura features on the ground truth and predicted bounding boxes. We find the distributions of entropy, contrast, dissimilarity, and energy to show differences in the features. These differences are qualitatively captured by the detected regions predicted by SPACE and validated with the observed SUIT images, even in the absence of labeled ground truth. This work not only develops a chromospheric feature extractor but also demonstrates the effectiveness of statistical metrics and Tamura features for distinguishing chromospheric features, offering independent validation for future detection schemes.

Aditya-L1 上的太阳紫外线成像望远镜(SUIT)是一个图像仪,通过波长范围200-400纳米的观测观察太阳光谱和色圈。 全面了解铬和光球形态结构的等离子体和热动力特性需要大量的抽样统计研究,需要开发自动特征检测方法。 为此,我们开发了地貌检测算法空间-SUIT:太阳光谱分析和分类,使用SUIT增强的视觉技术,探测和分类太阳色谱特征,以便从SUIT Mg II k过滤器的对比中观测到太阳色谱和色圈。 具体地说,我们针对冰川区域、太阳点、紫色线和光谱形态结构结构的全面了解。 利用以神经网络为基础的模型模型模型,我们仅使用模型SUIVIIT的图像来为界面区域定位SUIRI(IRIS) 进行完全分解的变色色色的图像,但是在MG II k 线上,我们还在SIRIMI 的精确度数据中进行检测, 通过S ASIMAL IMAL 数据 IML 的精确数据 显示这些数据,我们通过SAL IMAL IMAL IMAL 的精确数据 的测量数据,我们通过SAL IMAL IMAL 的测量数据 的测量数据显示了这些数据 的测量数据 的测量数据 的测量数据 的模型 数据 数据,这些数据 的测量数据显示这些数据。


Article 224

Title@2025-07-02 (3): How Well Does GPT-4o Understand Vision? Evaluating Multimodal Foundation Models on Standard Computer Vision Tasks

Title: How Well Does GPT-4o Understand Vision? Evaluating Multimodal Foundation Models on Standard Computer Vision Tasks Wie gut versteht GPT-4o Vision? Bewertung multimodaler Basismodelle auf Standard Computer Vision Aufgaben GPT-4o GPT-4o如何理解愿景?评估标准计算机愿景任务多模式基金会模式 2507.01955v1

Authors (6): Rahul Ramachandran, Ali Garjani, Roman Bachmann, Andrei Atanov, Oğuzhan Fatih Kar, Amir Zamir

Multimodal foundation models, such as GPT-4o, have recently made remarkable progress, but it is not clear where exactly these models stand in terms of understanding vision. In this paper, we benchmark the performance of popular multimodal foundation models (GPT-4o, o4-mini, Gemini 1.5 Pro and Gemini 2.0 Flash, Claude 3.5 Sonnet, Qwen2-VL, Llama 3.2) on standard computer vision tasks (semantic segmentation, object detection, image classification, depth and surface normal prediction) using established datasets (e.g., COCO, ImageNet and its variants, etc). The main challenges to performing this are: 1) most models are trained to output text and cannot natively express versatile domains, such as segments or 3D geometry, and 2) many leading models are proprietary and accessible only at an API level, i.e., there is no weight access to adapt them. We address these challenges by translating standard vision tasks into equivalent text-promptable and API-compatible tasks via prompt chaining to create a standardized benchmarking framework. We observe that 1) the models are not close to the state-of-the-art specialist models at any task. However, 2) they are respectable generalists; this is remarkable as they are presumably trained on primarily image-text-based tasks. 3) They perform semantic tasks notably better than geometric ones. 4) While the prompt-chaining techniques affect performance, better models exhibit less sensitivity to prompt variations. 5) GPT-4o performs the best among non-reasoning models, securing the top position in 4 out of 6 tasks, 6) reasoning models, e.g. o3, show improvements in geometric tasks, and 7) a preliminary analysis of models with native image generation, like the latest GPT-4o, shows they exhibit quirks like hallucinations and spatial misalignments.

GPT-4o等多式基础模型最近取得了显著进展,但尚不清楚这些模型在理解愿景方面的确切位置。在本文中,我们用固定的数据集(例如,COCO、图像网及其变异器等)衡量了流行型多式联运基础模型(GPT-4o、o4-mini、Gemini1.5 Pro和Gemini 2.0 Flash、Claude 3.5 Sonnet、Qwen2-VL、Llama 3.2)在标准计算机愿景任务(静态空间分解、物体探测、图像分类、深度和表面正常预测)方面的性能。 使用固定的数据集(例如,COCOCO、图像网及其变异器等 ) 。 实现这一点的主要挑战是:(1) 多数模型都经过了输出文本文本培训,无法本地显示多功能域,例如区段或3D几度和2) 。 许多主要模型都是专有的, 只能在 APIPI 级别上进行调整。 我们通过将标准视觉任务转换成对应的文本-可探测和AP- 比较的模型, 通过快速的模型来建立标准化基准框架。我们观察的变换的模型, 3, 发现这些模型是不甚甚甚甚甚不甚甚甚甚甚甚甚甚甚甚甚甚甚甚甚甚甚甚甚甚甚甚甚甚甚甚甚甚于专业的G- 任务。


Article 225

Title@2025-07-02 (3): Test-Time Scaling with Reflective Generative Model

Title: Test-Time Scaling with Reflective Generative Model Test-Zeit-Skalierung mit reflektierendem Generativem Modell 具有反反思考生成模型的试验时间缩放 2507.01951v1

Authors (11): Zixiao Wang, Yuxin Wang, Xiaorui Wang, Mengting Xing, Jie Gao, Jianjun Xu, Guangcan Liu, Chenhui Jin, Zhuo Wang, Shengzhuo Zhang, Hongtao Xie

We introduce our first reflective generative model MetaStone-S1, which obtains OpenAI o3’s performance via the self-supervised process reward model (SPRM). Through sharing the backbone network and using task-specific heads for next token prediction and process scoring respectively, SPRM successfully integrates the policy model and process reward model(PRM) into a unified interface without extra process annotation, reducing over 99% PRM parameters for efficient reasoning. Equipped with SPRM, MetaStone-S1 is naturally suitable for test time scaling (TTS), and we provide three reasoning effort modes (low, medium, and high), based on the controllable thinking length. Moreover, we empirically establish a scaling law that reveals the relationship between total thinking computation and TTS performance. Experiments demonstrate that our MetaStone-S1 achieves comparable performance to OpenAI-o3-mini’s series with only 32B parameter size. To support the research community, we have open-sourced MetaStone-S1 at https://github.com/MetaStone-AI/MetaStone-S1.

我们引入了第一个反光基因模型MetaStone-S1, 通过自我监督的进程奖励模式(SPRM)获得了OpenAI O3的性能。通过共享骨干网络,并使用特定任务头分别进行下一轮象征性的预测和进程评分,SPRM成功地将政策模型和进程奖励模式(PRM)整合到一个统一的界面中,而没有额外的过程批注,为有效推理降低了99%的PRM参数。与SPRM设备安装的MetaStone-S1自然适合测试时间缩放(TTS),我们提供了三种基于可控制思维长度的推理努力模式(低、中高)。此外,我们通过经验性地制定了一个显示总体思维计算与 TTS 性业绩之间关系的比例法。实验表明,我们的MetStone-S1 取得了仅32B参数大小的Oi- Mini系列的类似性能。为了支持研究界,我们在 http://github.com/MetaStone-AI/MetaStonone-S1 。


Article 226

Title@2025-07-02 (3): Characterizing control between interacting subsystems with deep Jacobian estimation

Title: Characterizing control between interacting subsystems with deep Jacobian estimation Charakterisierung der Kontrolle zwischen interagierenden Teilsystemen mit tiefer Jacobian-Schätzung 确定具有深度 Jacobian 估计的交互子系统之间的控制特性 2507.01946v1

Authors (6): Adam J. Eisen, Mitchell Ostrow, Sarthak Chandra, Leo Kozachkov, Earl K. Miller, Ila R. Fiete

Biological function arises through the dynamical interactions of multiple subsystems, including those between brain areas, within gene regulatory networks, and more. A common approach to understanding these systems is to model the dynamics of each subsystem and characterize communication between them. An alternative approach is through the lens of control theory: how the subsystems control one another. This approach involves inferring the directionality, strength, and contextual modulation of control between subsystems. However, methods for understanding subsystem control are typically linear and cannot adequately describe the rich contextual effects enabled by nonlinear complex systems. To bridge this gap, we devise a data-driven nonlinear control-theoretic framework to characterize subsystem interactions via the Jacobian of the dynamics. We address the challenge of learning Jacobians from time-series data by proposing the JacobianODE, a deep learning method that leverages properties of the Jacobian to directly estimate it for arbitrary dynamical systems from data alone. We show that JacobianODEs outperform existing Jacobian estimation methods on challenging systems, including high-dimensional chaos. Applying our approach to a multi-area recurrent neural network (RNN) trained on a working memory selection task, we show that the “sensory” area gains greater control over the “cognitive” area over learning. Furthermore, we leverage the JacobianODE to directly control the trained RNN, enabling precise manipulation of its behavior. Our work lays the foundation for a theoretically grounded and data-driven understanding of interactions among biological subsystems.

生物功能来自多个子系统的动态互动, 包括大脑区域之间的动态互动, 基因监管网络内部, 以及更多。 理解这些系统的共同方法是模拟每个子系统的动态, 并描述它们之间的交流。 另一种方法是通过控制理论的透镜: 子子系统如何相互控制。 这种方法涉及推断子系统之间控制的方向性、 强度和背景调控。 但是, 理解子系统控制的方法一般是线性, 无法充分描述非线性复杂系统促成的丰富背景效应。 为了缩小这一差距, 我们设计了一个数据驱动的非线性控制理论框架, 通过动态的雅各布人来描述子系统的动态互动。 我们应对从时间序列数据中学习雅各布人的挑战, 这种深层次的学习方法利用雅各布人的特性来直接估计子系统之间任意的动态系统。 我们显示, 雅各布纳德模式在挑战性系统(包括高维度的混杂状态)下, 将我们的方法应用到多区域经常性的神经网络( RNNE) 来描述子系统互动特征。 我们展示了“我们所训练的正确理解的系统基础, 学习了“ 我们的轨道基础” 学习了“ 我们的生理基础的系统 的系统, 学习了我们“ 我们的生理基础的“ 我们的轨道” 的系统, 我们的“ 学习的“ 我们的“ 基础” 学习了“ 我们的“ 我们的” 基础的“ 的” 的“ 我们的“ 我们的“ 的” 的” 的” 的“ 我们的“ 的“ 的” 的” 的” 的” 的“ 的” 的” 的” 的“ 的” 的” 的” 的“ 的” 的” 的“我们的“ 的” 的” 的“ 的” 的” 的“基础基础的“ 的“ 的“ 的“ 的“ 的“ 的” 的” 的” 的” 的” 的” 的“基础的“ 的“ 的” 的” 的” 的” 的“基础的“ 的“ 的“ 的“ 的“ 的“ 的“ 的“ 的“ 的“ 的“ 的“ 的“ 的“


Article 227

Title@2025-07-02 (3): SpecCLIP: Aligning and Translating Spectroscopic Measurements for Stars

Title: SpecCLIP: Aligning and Translating Spectroscopic Measurements for Stars SpecCLIP: Richten und Übersetzen spektroskopischer Messungen für Sterne spectCLIP: 恒星光谱测量的对齐和转换 2507.01939v1

Authors (9): Xiaosheng Zhao, Yang Huang, Guirong Xue, Xiao Kong, Jifeng Liu, Xiaoyu Tang, Timothy C. Beers, Yuan-Sen Ting, A-Li Luo

In recent years, large language models (LLMs) have transformed natural language understanding through vast datasets and large-scale parameterization. Inspired by this success, we present SpecCLIP, a foundation model framework that extends LLM-inspired methodologies to stellar spectral analysis. Stellar spectra, akin to structured language, encode rich physical and chemical information about stars. By training foundation models on large-scale spectral datasets, our goal is to learn robust and informative embeddings that support diverse downstream applications. As a proof of concept, SpecCLIP involves pre-training on two spectral types–LAMOST low-resolution and Gaia XP–followed by contrastive alignment using the CLIP (Contrastive Language-Image Pre-training) framework, adapted to associate spectra from different instruments. This alignment is complemented by auxiliary decoders that preserve spectrum-specific information and enable translation (prediction) between spectral types, with the former achieved by maximizing mutual information between embeddings and input spectra. The result is a cross-spectrum framework enabling intrinsic calibration and flexible applications across instruments. We demonstrate that fine-tuning these models on moderate-sized labeled datasets improves adaptability to tasks such as stellar-parameter estimation and chemical-abundance determination. SpecCLIP also enhances the accuracy and precision of parameter estimates benchmarked against external survey data. Additionally, its similarity search and cross-spectrum prediction capabilities offer potential for anomaly detection. Our results suggest that contrastively trained foundation models enriched with spectrum-aware decoders can advance precision stellar spectroscopy.

近年来,大型语言模型(LLMS)通过庞大的数据集和大规模参数化改变了自然语言理解。受这一成功启发,我们展示了SpecCLIP,这是一个基础模型框架,将受LLM启发的方法扩展至星光分析。Starr光谱与结构化语言相似,对关于恒星的丰富物理和化学信息进行编码。通过对大型光谱数据集的基础模型进行培训,我们的目标是学习支持不同下游应用的强大和知情的嵌入。作为概念的证明,SpecCLIP涉及两个光谱类型LAMOST低分辨率和Gaia XP的预先培训,其基础框架是利用CLIP(Contrastem Stateal-Imaage Stregy)框架将LLLLLMSFSF(CR)推导出对比性调整方法,使其适应于不同仪器的相联光谱化光谱化数据。这种校准由辅助的解算器进行补充,使光谱型类型之间能够通过存储和输入光谱光谱谱谱谱化的相互信息。结果显示我们精度精确的精确的精确的精确的精确精确精确度估算基础基础基础基础基础基础,并显示其内部校正和弹性的精确度评估工具,使这些精确度的校正的校正的校正。我们的校正的校正的校正和弹性的校正的校正。


Article 228

Title@2025-07-02 (3): A first-order method for nonconvex-nonconcave minimax problems under a local Kurdyka-Łojasiewicz condition

Title: A first-order method for nonconvex-nonconcave minimax problems under a local Kurdyka-Łojasiewicz condition Eine Methode erster Ordnung für nicht konvex-nichtkonkave Minimax-Probleme unter einem lokalen Kurdyka-Łojasiewicz Zustand 当地Kurdyka-ojasiewicz状态下非convex-nonconcoln非conculev 迷你 问题的第一阶方法 2507.01932v1

Authors (2): Zhaosong Lu, Xiangyuan Wang

We study a class of nonconvex-nonconcave minimax problems in which the inner maximization problem satisfies a local Kurdyka-{\L}ojasiewicz (KL) condition that may vary with the outer minimization variable. In contrast to the global KL or Polyak-{\L}ojasiewicz (PL) conditions commonly assumed in the literature – which are significantly stronger and often too restrictive in practice – this local KL condition accommodates a broader range of practical scenarios. However, it also introduces new analytical challenges. In particular, as an optimization algorithm progresses toward a stationary point of the problem, the region over which the KL condition holds may shrink, resulting in a more intricate and potentially ill-conditioned landscape. To address this challenge, we show that the associated maximal function is locally H"older smooth. Leveraging this key property, we develop an inexact proximal gradient method for solving the minimax problem, where the inexact gradient of the maximal function is computed by applying a proximal gradient method to a KL-structured subproblem. Under mild assumptions, we establish complexity guarantees for computing an approximate stationary point of the minimax problem.

我们研究的是一类非convex-nonconcoln-nonconcolent 迷你轴问题,其中内部最大化问题满足了本地 Kurdyka- {L}jasiewicz (KL) 的条件,这种条件可能随外部最小化变量而变化。与文献中通常假设的全球 KL 或 Polliak- {L} ojasiewicz (PL) 条件形成对比 – – 这些条件在实践上大大加强,而且往往过于严格 – – 这个本地的 KL 条件适应了范围更广的实用情景。然而,它也带来了新的分析挑战。特别是,由于优化算法在问题固定点上的进展, KL 条件所在的区域可能会缩小,导致更加复杂和可能条件较差的景观。为了应对这一挑战,我们证明相关的最大最大功能是本地 H"older “ 平稳。我们利用这一关键属性开发了一种非异性准梯度梯度的梯度方法来解决微质量问题, 在那里,最大函数的异性梯度梯度梯度会通过将准梯度方法计算,通过对KL- 结构子子子子的亚轴计算,我们建立了一个最复杂度计算。


Article 229

Title@2025-07-02 (3): Exploring a Hybrid Deep Learning Approach for Anomaly Detection in Mental Healthcare Provider Billing: Addressing Label Scarcity through Semi-Supervised Anomaly Detection

Title: Exploring a Hybrid Deep Learning Approach for Anomaly Detection in Mental Healthcare Provider Billing: Addressing Label Scarcity through Semi-Supervised Anomaly Detection Erforschung eines hybriden Deep-Learning-Ansatzes für Anomalienerkennung in der psychischen Gesundheitsversorgung Anbieter Rechnungsstellung: Adressierung von Label Scarcity durch semi-überwachte Anomalienerkennung 探索在精神保健提供者的账单中以异性检测为异性探索的混合深学习方法:通过半超异性检测解决标签的稀缺问题 2507.01924v1

Authors (3): Samirah Bakker, Yao Ma, Seyed Sahand Mohammadi Ziabari

The complexity of mental healthcare billing enables anomalies, including fraud. While machine learning methods have been applied to anomaly detection, they often struggle with class imbalance, label scarcity, and complex sequential patterns. This study explores a hybrid deep learning approach combining Long Short-Term Memory (LSTM) networks and Transformers, with pseudo-labeling via Isolation Forests (iForest) and Autoencoders (AE). Prior work has not evaluated such hybrid models trained on pseudo-labeled data in the context of healthcare billing. The approach is evaluated on two real-world billing datasets related to mental healthcare. The iForest LSTM baseline achieves the highest recall (0.963) on declaration-level data. On the operation-level data, the hybrid iForest-based model achieves the highest recall (0.744), though at the cost of lower precision. These findings highlight the potential of combining pseudo-labeling with hybrid deep learning in complex, imbalanced anomaly detection settings.

心理保健账单的复杂性导致异常现象,包括欺诈。虽然机器学习方法被用于异常现象的检测,但它们往往与阶级不平衡、标签稀缺和复杂的相继模式作斗争。本研究探索一种混合的深层次学习方法,结合长期短期内存(LSTM)网络和变异器,通过隔离森林(iForest)和Autoencoders(AE)进行假标签。先前的工作没有评估在保健账单方面接受假标签数据培训的这种混合模型。这种方法在与心理健康有关的两个真实世界的账单数据集上进行了评估。 iForest LSTM基线在宣言一级数据上取得了最高回顾率(0.963)。在操作一级数据上,基于混合的iForest模型取得了最高回顾率(0.744),尽管成本较低。这些研究结果强调了在复杂、不平衡的异常检测环境中将伪标签与混合深层次学习相结合的可能性。


Article 230

Title@2025-07-02 (3): Gradient-Adaptive Policy Optimization: Towards Multi-Objective Alignment of Large Language Models

Title: Gradient-Adaptive Policy Optimization: Towards Multi-Objective Alignment of Large Language Models Gradient-Adaptive Policy Optimization: Auf dem Weg zu einer multi-objektiven Ausrichtung großer Sprachmodelle 渐进式政策优化:实现大语言模式多目标一致 2507.01915v1

Authors (6): Chengao Li, Hanyu Zhang, Yunkun Xu, Hongyan Xue, Xiang Ao, Qing He

Reinforcement Learning from Human Feedback (RLHF) has emerged as a powerful technique for aligning large language models (LLMs) with human preferences. However, effectively aligning LLMs with diverse human preferences remains a significant challenge, particularly when they are conflict. To address this issue, we frame human value alignment as a multi-objective optimization problem, aiming to maximize a set of potentially conflicting objectives. We introduce Gradient-Adaptive Policy Optimization (GAPO), a novel fine-tuning paradigm that employs multiple-gradient descent to align LLMs with diverse preference distributions. GAPO adaptively rescales the gradients for each objective to determine an update direction that optimally balances the trade-offs between objectives. Additionally, we introduce P-GAPO, which incorporates user preferences across different objectives and achieves Pareto solutions that better align with the user’s specific needs. Our theoretical analysis demonstrates that GAPO converges towards a Pareto optimal solution for multiple objectives. Empirical results on Mistral-7B show that GAPO outperforms current state-of-the-art methods, achieving superior performance in both helpfulness and harmlessness.

从人类反馈中强化学习(RLHF)已成为使大型语言模型(LLMs)与人类偏好相匹配的有力技术,然而,将LLMs与人类不同偏好有效结合仍然是一个重大挑战。为了解决这一问题,我们把人类价值结合作为一个多目标优化问题,目的是最大限度地实现一系列可能相互冲突的目标。我们引入了一种新型微调模式,即 “ 渐进-偏向政策优化 “ (GAPO),它使用多级血统使LLMs与不同偏爱分布相匹配。GAPO对每个目标的梯度进行了适应性调整,以确定最佳平衡目标之间取舍的更新方向。此外,我们引入了P-GAPO,它将用户的偏好纳入不同目标,并实现更符合用户具体需要的Pareto解决方案。我们的理论分析表明,GAPO为多个目标的Pareto最佳解决方案趋于一致。Mistral-7B的实证结果显示,GAPO超越了当前的最新方法,在有用性和无害性两方面都取得了优异性业绩。


Article 231

Title@2025-07-02 (3): Advancing Magnetic Materials Discovery – A structure-based machine learning approach for magnetic ordering and magnetic moment prediction

Title: Advancing Magnetic Materials Discovery – A structure-based machine learning approach for magnetic ordering and magnetic moment prediction Advancing Magnetic Materials Discovery – Ein strukturbasierter maschineller Lernansatz für magnetische Ordnung und magnetische Momentvorhersage 推进磁材料发现 – – 一种基于结构的机器学习方法,用于磁定序和磁瞬时预测 2507.01913v1

Authors (3): Apoorv Verma, Junaid Jami, Amrita Bhattacharya

Accurately predicting magnetic behavior across diverse materials systems remains a longstanding challenge due to the complex interplay of structural and electronic factors and is pivotal for the accelerated discovery and design of next-generation magnetic materials. In this work, a refined descriptor is proposed that significantly improves the prediction of two critical magnetic properties – magnetic ordering (Ferromagnetic vs. Ferrimagnetic) and magnetic moment per atom – using only the structural information of materials. Unlike previous models limited to Mn-based or lanthanide-transition metal compounds, the present approach generalizes across a diverse dataset of 5741 stable, binary and ternary, ferromagnetic and ferrimagnetic compounds sourced from the Materials Project. Leveraging an enriched elemental vector representation and advanced feature engineering, including nonlinear terms and reduced matrix sparsity, the LightGBM-based model achieves an accuracy of 82.4% for magnetic ordering classification and balanced recall across FM and FiM classes, addressing a key limitation in prior studies. The model predicts magnetic moment per atom with a correlation coefficient of 0.93, surpassing the Hund’s matrix and orbital field matrix descriptors. Additionally, it accurately estimates formation energy per atom, enabling assessment of both magnetic behavior and material stability. This generalized and computationally efficient framework offers a robust tool for high-throughput screening of magnetic materials with tailored properties.

由于结构和电子因素的复杂相互作用,准确预测不同材料系统的磁性行为仍然是长期挑战,因为结构和电子因素的复杂相互作用,对于加速发现和设计下一代磁材料至关重要。在这项工作中,提议改进描述,大大改进对两种关键磁性特性的预测 – – 磁定序(Ferromaganitic 诉Ferrimaganitic)和原子的磁瞬时 – – 仅使用材料的结构信息。与以前局限于Mn基或lanthanide过渡金属化合物的模型不同,目前的方法在5741稳定、二进制和制成、铁磁磁和硫磁变化合物的多种数据集中广泛归纳,这些数据集来自材料项目,加速发现和设计。利用浓缩的元素矢量矢量表示和高级地工程,包括非线性条件和减少的矩阵宽度,基于 LightGBMMM的模型在磁定序分类和平衡回顾方面达到82.4%的准确度,解决了先前研究中的一个关键限制。模型预测每个原子磁时点,其相关系数为0.93、双向、铁磁磁磁性磁性磁定式的比值分析模型和地面结构的精确地分析模型,从而得出了稳定的磁定式材料的精确的模型和实地分析结果。


Article 232

Title@2025-07-02 (3): High-Layer Attention Pruning with Rescaling

Title: High-Layer Attention Pruning with Rescaling Hochebene Aufmerksamkeit Pruning mit Rescaling 高关注度 以降降降为缓冲 2507.01900v1

Authors (2): Songtao Liu, Peng Liu

Pruning is a highly effective approach for compressing large language models (LLMs), significantly reducing inference latency. However, conventional training-free structured pruning methods often employ a heuristic metric that indiscriminately removes some attention heads across all pruning layers, without considering their positions within the network architecture. In this work, we propose a novel pruning algorithm that strategically prunes attention heads in the model’s higher layers. Since the removal of attention heads can alter the magnitude of token representations, we introduce an adaptive rescaling parameter that calibrates the representation scale post-pruning to counteract this effect. We conduct comprehensive experiments on a wide range of LLMs, including LLaMA3.1-8B, Mistral-7B-v0.3, Qwen2-7B, and Gemma2-9B. Our evaluation includes both generation and discriminative tasks across 27 datasets. The results consistently demonstrate that our method outperforms existing structured pruning methods. This improvement is particularly notable in generation tasks, where our approach significantly outperforms existing baselines.

在压缩大型语言模型(LLMS)时,普鲁宁是一种非常有效的方法,可以大幅降低推导延迟度。然而,常规的无培训结构裁剪方法通常使用一种超光度测量法,不加考虑地将一些注意力排出所有运行层,而没有考虑其在网络结构中的位置。在这项工作中,我们建议一种新型的裁剪算法,在战略上将注意力排入模型较高层中。由于转移注意力头可以改变象征性表达的大小,我们引入了适应性调整参数,校准代表比例的调整后调整,以抵消这一效应。我们在广泛的LLLMS中进行了全面实验,包括LLAMA3.1-8B、Mistral-7B-v0.3、Qwen2-7B和Gemma2-9B。我们的评价包括27个数据集的生成和歧视性任务。结果始终表明,我们的方法超过了现有的结构化的运行方法。这种改进在生成任务中特别显著地突出,我们的方法大大超过现有的基准。


Article 233

Title@2025-07-02 (3): Recursive Training Loops in LLMs: How training data properties modulate distribution shift in generated data?

Title: Recursive Training Loops in LLMs: How training data properties modulate distribution shift in generated data? Rekursive Trainingsschleifen in LLMs: Wie modulieren Trainingsdateneigenschaften die Verteilungsverschiebung in generierten Daten? LLMM中的递归培训循环:培训数据特性如何调整生成数据的分布变化? 2504.03814v3

Authors (5): Grgur Kovač, Jérémy Perez, Rémy Portelas, Peter Ford Dominey, Pierre-Yves Oudeyer

Large language models (LLMs) are increasingly used in the creation of online content, creating feedback loops as subsequent generations of models will be trained on this synthetic data. Such loops were shown to lead to distribution shifts - models misrepresenting the true underlying distributions of human data (also called model collapse). However, how human data properties affect such shifts remains poorly understood. In this paper, we provide the first empirical examination of the effect of such properties on the outcome of recursive training. We first confirm that using different human datasets leads to distribution shifts of different magnitudes. Through exhaustive manipulation of dataset properties combined with regression analyses, we then identify a set of properties predicting distribution shift magnitudes. Lexical diversity is found to amplify these shifts, while semantic diversity and data quality mitigate them. Furthermore, we find that these influences are highly modular: data scrapped from a given internet domain has little influence on the content generated for another domain. Finally, experiments on political bias reveal that human data properties affect whether the initial bias will be amplified or reduced. Overall, our results portray a novel view, where different parts of internet may undergo different types of distribution shift.

大型语言模型(LLMS)越来越多地用于创建在线内容,创建反馈循环,因为随后几代模型将接受有关这一合成数据的培训。这种循环显示会导致分布变化 — — 扭曲人类数据真实基本分布的模型(也称为模型崩溃 ) 。然而,人类数据属性如何影响这种变化仍然不甚为人理解。在本文中,我们首次对此类属性对循环培训结果的影响进行了经验性审查。我们首先确认,使用不同的人类数据集会导致不同规模的分布变化。我们通过对数据集属性的详尽操作,加上回归分析,我们随后确定了一套预测分布变化大小的属性。发现,超文本多样性可以扩大这些变化,同时,语义多样性和数据质量可以减轻这些变化。此外,我们发现这些影响是高度模块化的:从特定互联网域中分离的数据对另一个域生成的内容没有多大影响。最后,关于政治偏见的实验表明,人类数据属性会影响最初的偏差是否会扩大或缩小。总体而言,我们的结果描绘了一种新颖的视角,因为互联网的不同部分可能会发生不同类型的分布变化。


Article 234

Title@2025-07-02 (3): STEM Diffraction Pattern Analysis with Deep Learning Networks

Title: STEM Diffraction Pattern Analysis with Deep Learning Networks STEM Diffraktionsmusteranalyse mit Deep Learning Networks STEM 利用深学习网络分析违规模式 2507.01889v1

Authors (7): Sebastian Wissel, Jonas Scheunert, Aaron Dextre, Shamail Ahmed, Andreas Bayer, Kerstin Volz, Bai-Xiang Xu

Accurate grain orientation mapping is essential for understanding and optimizing the performance of polycrystalline materials, particularly in energy-related applications. Lithium nickel oxide (LiNiO$_{2}$) is a promising cathode material for next-generation lithium-ion batteries, and its electrochemical behaviour is closely linked to microstructural features such as grain size and crystallographic orientations. Traditional orientation mapping methods–such as manual indexing, template matching (TM), or Hough transform-based techniques–are often slow and noise-sensitive when handling complex or overlapping patterns, creating a bottleneck in large-scale microstructural analysis. This work presents a machine learning-based approach for predicting Euler angles directly from scanning transmission electron microscopy (STEM) diffraction patterns (DPs). This enables the automated generation of high-resolution crystal orientation maps, facilitating the analysis of internal microstructures at the nanoscale. Three deep learning architectures–convolutional neural networks (CNNs), Dense Convolutional Networks (DenseNets), and Shifted Windows (Swin) Transformers–are evaluated, using an experimentally acquired dataset labelled via a commercial TM algorithm. While the CNN model serves as a baseline, both DenseNets and Swin Transformers demonstrate superior performance, with the Swin Transformer achieving the highest evaluation scores and the most consistent microstructural predictions. The resulting crystal maps exhibit clear grain boundary delineation and coherent intra-grain orientation distributions, underscoring the potential of attention-based architectures for analyzing diffraction-based image data. These findings highlight the promise of combining advanced machine learning models with STEM data for robust, high-throughput microstructural characterization.

精确的谷物方向绘图对于理解和优化多晶素材料的性能至关重要,特别是在与能源有关的应用中。氧化镍碱(LiNiO$2}$)是下一代锂离子电池有希望的阴极材料,其电化学行为与谷物大小和晶体方向等微结构特征密切相关。传统方向绘图方法,如手工索引、模板匹配(TM)或人工变换技术结构,在处理复杂或重叠模式时往往缓慢和噪音敏感,造成大规模微结构分析中的瓶颈。这项工作展示了一种基于机器的学习方法,直接从扫描传输电子显微镜(STEM)变形模型(DPs)中预测Euler角度。这可以自动生成高分辨率晶体方向地图,便利纳米级内部结构的分析。三种深学习结构-变异结构网络(CNNs)、高级晶体网络(DESencialNets),以及最高级的直径流数据流流流分析(SIML),以及最清晰的直径直径的直径分析(Slimal-dealal-dealalalalal-dealalalalalal lader lader),用于Syal-de Stal-de Stal-deal-de Stal-deal-deal-deal-deal-deal-deal-deal-deal-deal-deal-deal-deal-deal-dealdal-deal-dealisalmasalmamamamamasal-deal-deal 和Sildal-dealdaldaldalmasal-dealdaldaldaldal-deal-deal-deal-deal-deal-deal-deal-deal-deal-deal-deal-deal-deal-deal-deal-deal-dealalalalalalalalal-Salalalalalalalalalalalalalalalalalalalalalalalalalalalalalalalalalalalalalalalalalalaldaldaldaldaldaldaldaldaldaldalal,这些数据,这些数据,通过Salalalalal


Article 235

Title@2025-07-02 (3): A computationally frugal open-source foundation model for thoracic disease detection in lung cancer screening programs

Title: A computationally frugal open-source foundation model for thoracic disease detection in lung cancer screening programs Ein rechnerisch frugales Open-Source-Stiftungsmodell für Thorax-Erkennung in Lungenkrebs-Screening-Programmen 肺癌筛查方案中胸腔酸疾病检测的计算节节制开源基础模型 2507.01881v1

Authors (16): Niccolò McConnell, Pardeep Vasudev, Daisuke Yamada, Daryl Cheng, Mehran Azimbagirad, John McCabe, Shahab Aslani, Ahmed H. Shahin, Yukun Zhou, The SUMMIT Consortium, Andre Altmann, Yipeng Hu, Paul Taylor, Sam M. Janes, Daniel C. Alexander, Joseph Jacob

Low-dose computed tomography (LDCT) imaging employed in lung cancer screening (LCS) programs is increasing in uptake worldwide. LCS programs herald a generational opportunity to simultaneously detect cancer and non-cancer-related early-stage lung disease. Yet these efforts are hampered by a shortage of radiologists to interpret scans at scale. Here, we present TANGERINE, a computationally frugal, open-source vision foundation model for volumetric LDCT analysis. Designed for broad accessibility and rapid adaptation, TANGERINE can be fine-tuned off the shelf for a wide range of disease-specific tasks with limited computational resources and training data. Relative to models trained from scratch, TANGERINE demonstrates fast convergence during fine-tuning, thereby requiring significantly fewer GPU hours, and displays strong label efficiency, achieving comparable or superior performance with a fraction of fine-tuning data. Pretrained using self-supervised learning on over 98,000 thoracic LDCTs, including the UK’s largest LCS initiative to date and 27 public datasets, TANGERINE achieves state-of-the-art performance across 14 disease classification tasks, including lung cancer and multiple respiratory diseases, while generalising robustly across diverse clinical centres. By extending a masked autoencoder framework to 3D imaging, TANGERINE offers a scalable solution for LDCT analysis, departing from recent closed, resource-intensive models by combining architectural simplicity, public availability, and modest computational requirements. Its accessible, open-source lightweight design lays the foundation for rapid integration into next-generation medical imaging tools that could transform LCS initiatives, allowing them to pivot from a singular focus on lung cancer detection to comprehensive respiratory disease management in high-risk populations.

肺癌筛查(LCS)方案采用的低剂量计算断层成像(LDCT)正在全世界推广。LCS方案预示了同时检测癌症和非癌症早期肺病的代际机会。但这些努力因缺乏对大规模扫描进行解释的放射学家而受阻。这里我们介绍TANGERINE,一个计算节奏、开源的用于体积LDCT分析的计算性硬性、开源视觉基础模型。为广泛获取和快速适应而设计的TANGERINE可以将一系列特定疾病任务从架子上微调出来,其中含有有限的计算资源和培训数据。TANGERINE相对于从抓伤中培训的模型而言,在微调期间表现出快速趋近的趋近趋同,从而要求GPUP小时大幅减少,并显示出强大的标签效率,以微调数据的一小部分实现可比或优性性能。使用自我超强的LGAGERNEEE, 一个自我强化的学习的LCS倡议,包括迄今为止最大的LCS倡议和27个公共数据集, TETERNEEO-直径直径直径直径可获取性快速的快速分类, 一个快速的快速分类分析, 跨14个组织的临床诊断的快速诊断,一个快速诊断分析框架。


Article 236

Title@2025-07-02 (3): Evolving HPC services to enable ML workloads on HPE Cray EX

Title: Evolving HPC services to enable ML workloads on HPE Cray EX Evolving HPC-Dienste, um ML-Workloads auf HPE Cray EX zu ermöglichen 不断演化的HPC服务,使HPE Cray EX 的ML工作量得以完成 2507.01880v1

Authors (13): Stefano Schuppli, Fawzi Mohamed, Henrique Mendonça, Nina Mujkanovic, Elia Palme, Dino Conciatore, Lukas Drescher, Miguel Gila, Pim Witlox, Joost VandeVondele, Maxime Martinasso, Thomas C. Schulthess, Torsten Hoefler

The Alps Research Infrastructure leverages GH200 technology at scale, featuring 10,752 GPUs. Accessing Alps provides a significant computational advantage for researchers in Artificial Intelligence (AI) and Machine Learning (ML). While Alps serves a broad range of scientific communities, traditional HPC services alone are not sufficient to meet the dynamic needs of the ML community. This paper presents an initial investigation into extending HPC service capabilities to better support ML workloads. We identify key challenges and gaps we have observed since the early-access phase (2023) of Alps by the Swiss AI community and propose several technological enhancements. These include a user environment designed to facilitate the adoption of HPC for ML workloads, balancing performance with flexibility; a utility for rapid performance screening of ML applications during development; observability capabilities and data products for inspecting ongoing large-scale ML workloads; a utility to simplify the vetting of allocated nodes for compute readiness; a service plane infrastructure to deploy various types of workloads, including support and inference services; and a storage infrastructure tailored to the specific needs of ML workloads. These enhancements aim to facilitate the execution of ML workloads on HPC systems, increase system usability and resilience, and better align with the needs of the ML community. We also discuss our current approach to security aspects. This paper concludes by placing these proposals in the broader context of changes in the communities served by HPC infrastructure like ours.

阿尔卑斯山研究基础设施在规模上利用GH200技术,拥有10,752个全球动力源;阿尔卑斯山利用阿尔卑斯山为人工智能(AI)和机器学习(ML)的研究人员提供了巨大的计算优势;阿尔卑斯山为科学界提供了巨大的计算优势;阿尔卑斯山为大量科学界服务,但传统的高常委会服务本身不足以满足ML社区的动态需求;本文件对扩大高常委会服务能力以更好地支持最低业务水平工作量进行了初步调查;我们确定了瑞士阿尔卑斯山早期进入阶段(2023年)以来我们观察到的主要挑战和差距,并提出了几项技术改进建议,其中包括一个用户环境,旨在便利采用高常委会处理最低业务负荷工作量,平衡业绩与灵活性;在开发过程中快速业绩筛选应用高常力应用软件的效用;用于检查当前大型最低业务预算单位工作量的可耐用性和数据产品;用于简化所分配的无记记号的审评;用于部署各种工作量的服务飞机基础设施,包括支助和推断服务;以及适应高常委工作量具体需要的储存基础设施;这些加强用户环境环境环境,目的是促进在更大范围内执行高常地管理我们的安全要求。


Article 237

Title@2025-07-02 (3): Towards Foundation Auto-Encoders for Time-Series Anomaly Detection

Title: Towards Foundation Auto-Encoders for Time-Series Anomaly Detection Towards Foundation Auto-Encoder für Zeitreihenanomalienerkennung 建立用于时间-中外异常探测的自动计算器基金会 2507.01875v1

Authors (4): Gastón García González, Pedro Casas, Emilio Martínez, Alicia Fernández

We investigate a novel approach to time-series modeling, inspired by the successes of large pretrained foundation models. We introduce FAE (Foundation Auto-Encoders), a foundation generative-AI model for anomaly detection in time-series data, based on Variational Auto-Encoders (VAEs). By foundation, we mean a model pretrained on massive amounts of time-series data which can learn complex temporal patterns useful for accurate modeling, forecasting, and detection of anomalies on previously unseen datasets. FAE leverages VAEs and Dilated Convolutional Neural Networks (DCNNs) to build a generic model for univariate time-series modeling, which could eventually perform properly in out-of-the-box, zero-shot anomaly detection applications. We introduce the main concepts of FAE, and present preliminary results in different multi-dimensional time-series datasets from various domains, including a real dataset from an operational mobile ISP, and the well known KDD 2021 Anomaly Detection dataset.

我们根据经过培训的大型基础模型的成功经验,研究对时间序列建模的新颖方法。我们引入了FAE(Foundation Auto-Eccoders),这是一个基于变式自动计算器(VAEs)的、用于在时间序列数据中探测异常现象的基础基因-AI模型。根据基础,我们指的是对大量时间序列数据进行预先培训的模型,该模型可以学习对精确建模、预测和探测先前未见数据集异常现象有用的复杂时间模型。 FAE利用VAEs和变幻神经网络(DNNNs)来构建一个用于单向时间序列模型的通用模型,该模型最终可以在框外零发式检测应用中正常运行。我们介绍了FAE的主要概念,并介绍了不同领域不同的多维时间序列数据集的初步结果,包括来自操作的移动ISP的真实数据集,以及众所周知的KDD 2021异常系列探测数据集。


Article 238

Title@2025-07-02 (3): Adapting Probabilistic Risk Assessment for AI

Title: Adapting Probabilistic Risk Assessment for AI Anpassung der probabilistischen Risikobewertung für KI 适应大赦国际的概率风险评估 2504.18536v3

Authors (4): Anna Katariina Wisakanto, Joe Rogero, Avyay M. Casheekar, Richard Mallah

Modern general-purpose artificial intelligence (AI) systems present an urgent risk management challenge, as their rapidly evolving capabilities and potential for catastrophic harm outpace our ability to reliably assess their risks. Current methods often rely on selective testing and undocumented assumptions about risk priorities, frequently failing to make a serious attempt at assessing the set of pathways through which AI systems pose direct or indirect risks to society and the biosphere. This paper introduces the probabilistic risk assessment (PRA) for AI framework, adapting established PRA techniques from high-reliability industries (e.g., nuclear power, aerospace) for the new challenges of advanced AI. The framework guides assessors in identifying potential risks, estimating likelihood and severity bands, and explicitly documenting evidence, underlying assumptions, and analyses at appropriate granularities. The framework’s implementation tool synthesizes the results into a risk report card with aggregated risk estimates from all assessed risks. It introduces three methodological advances: (1) Aspect-oriented hazard analysis provides systematic hazard coverage guided by a first-principles taxonomy of AI system aspects (e.g. capabilities, domain knowledge, affordances); (2) Risk pathway modeling analyzes causal chains from system aspects to societal impacts using bidirectional analysis and incorporating prospective techniques; and (3) Uncertainty management employs scenario decomposition, reference scales, and explicit tracing protocols to structure credible projections with novelty or limited data. Additionally, the framework harmonizes diverse assessment methods by integrating evidence into comparable, quantified absolute risk estimates for lifecycle decisions. We have implemented this as a workbook tool for AI developers, evaluators, and regulators.

现有方法往往依赖选择性测试和无证假设,对风险优先事项进行有选择地记录证据、基本假设和分析,往往未能认真尝试评估AI系统直接或间接对社会和生物圈构成风险的一套途径;本文件介绍了AI框架的概率风险评估,调整了高可靠性行业(如核电、航空航天)的既定PRA技术,以适应先进的AI的新挑战。 框架指导评估员在确定潜在风险、估计可能性和严重程度以及明确记录证据、基本假设和在适当的粮仓分析方面,往往没有认真尝试评估AI系统系统对社会和生物圈构成直接或间接风险的一套途径。本文件介绍了对AI框架的概率风险评估(PRA),对高可靠性行业(如核电、航空航天)的既定PRA技术进行调整,以适应先进的AI系统的新挑战。 框架指导了以下三个方法进展:(1) 以第一原则为指导的系统分类、 能力、 域知识、 承保能力 3 用于将因果链分析从系统到具有可比较性的风险预测,并采用可比较性的证据分析, 将所有评估风险的总体分析, 采用双级分析; 将结果分析,将这一系统进行可靠的统计分析,将分析,将分析,以采用可比较性数据分析,将分析,将分析,将分析,将分析,将评估


Article 239

Title@2025-07-02 (3): Distribution Matching for Self-Supervised Transfer Learning

Title: Distribution Matching for Self-Supervised Transfer Learning Verteilung passend für selbstüberwachtes Transfer-Lernen 自控传输学习配对分布 2502.14424v2

Authors (5): Yuling Jiao, Wensen Ma, Defeng Sun, Hansheng Wang, Yang Wang

In this paper, we propose a novel self-supervised transfer learning method called \underline{\textbf{D}}istribution \underline{\textbf{M}}atching (DM), which drives the representation distribution toward a predefined reference distribution while preserving augmentation invariance. DM results in a learned representation space that is intuitively structured and therefore easy to interpret. Experimental results across multiple real-world datasets and evaluation metrics demonstrate that DM performs competitively on target classification tasks compared to existing self-supervised transfer learning methods. Additionally, we provide robust theoretical guarantees for DM, including a population theorem and an end-to-end sample theorem. The population theorem bridges the gap between the self-supervised learning task and target classification accuracy, while the sample theorem shows that, even with a limited number of samples from the target domain, DM can deliver exceptional classification performance, provided the unlabeled sample size is sufficiently large.

在本文中,我们提出一种新的自我监督的传输学习方法,名为“下线 69bf{Disprition\ underline_textbf{Mcatching (DM) ”,它将代表分布推向预先定义的参考分布,同时保留增殖差异。DM 的结果是一个学习的表达空间,它具有直观结构,因此容易解释。多个现实世界数据集和评价指标的实验结果表明,DM在目标分类任务上比现有的自监督的转移学习方法有竞争力。此外,我们为DM提供了强有力的理论保证,包括一个人口定点和终端到终端样本的定点定点标本。 人口标本缩小了自监督学习任务和目标分类准确性之间的差距,而样本标本则表明,即使目标域的样本数量有限,DM也可以提供特殊的分类性能,只要无标签样本大小足够大。


Article 240

Title@2025-07-02 (3): Low-Perplexity LLM-Generated Sequences and Where To Find Them

Title: Low-Perplexity LLM-Generated Sequences and Where To Find Them Low-Perplexity LLM-generierte Sequenzen und wo sie zu finden sind 低重复性 LLM 生成序列及其查找地点 2507.01844v1

Authors (3): Arthur Wuhrmann, Anastasiia Kucherenko, Andrei Kucharavy

As Large Language Models (LLMs) become increasingly widespread, understanding how specific training data shapes their outputs is crucial for transparency, accountability, privacy, and fairness. To explore how LLMs leverage and replicate their training data, we introduce a systematic approach centered on analyzing low-perplexity sequences - high-probability text spans generated by the model. Our pipeline reliably extracts such long sequences across diverse topics while avoiding degeneration, then traces them back to their sources in the training data. Surprisingly, we find that a substantial portion of these low-perplexity spans cannot be mapped to the corpus. For those that do match, we quantify the distribution of occurrences across source documents, highlighting the scope and nature of verbatim recall and paving a way toward better understanding of how LLMs training data impacts their behavior.

随着大语言模型(LLMs)日益普及,了解具体培训数据如何塑造其产出对于透明度、问责制、隐私和公平性至关重要。为了探索LLMs如何利用和复制其培训数据,我们引入了一种系统的方法,以分析低重复序列为中心,即模型产生的高概率文本范围。我们的管道可靠地从不同主题中提取了如此长的序列,同时避免退化,然后在培训数据中将其追溯到源头。令人惊讶的是,我们发现这些低难度范围中有很大一部分无法绘制成文体。对于与之匹配的文件,我们量化了源文件之间的事件分布,突出了逐字记录的范围和性质,并为更好地了解LLMs培训数据如何影响其行为铺平了一条路。


Article 241

Title@2025-07-02 (3): Automatic Rank Determination for Low-Rank Adaptation via Submodular Function Maximization

Title: Automatic Rank Determination for Low-Rank Adaptation via Submodular Function Maximization Automatische Rangbestimmung für Low-Rank-Anpassung über Submodulare Funktionsmaximierung 通过子模块函数最大化自动确定低 Rank 适应的自动排序 2507.01841v1

Authors (2): Yihang Gao, Vincent Y. F. Tan

In this paper, we propose SubLoRA, a rank determination method for Low-Rank Adaptation (LoRA) based on submodular function maximization. In contrast to prior approaches, such as AdaLoRA, that rely on first-order (linearized) approximations of the loss function, SubLoRA utilizes second-order information to capture the potentially complex loss landscape by incorporating the Hessian matrix. We show that the linearization becomes inaccurate and ill-conditioned when the LoRA parameters have been well optimized, motivating the need for a more reliable and nuanced second-order formulation. To this end, we reformulate the rank determination problem as a combinatorial optimization problem with a quadratic objective. However, solving this problem exactly is NP-hard in general. To overcome the computational challenge, we introduce a submodular function maximization framework and devise a greedy algorithm with approximation guarantees. We derive a sufficient and necessary condition under which the rank-determination objective becomes submodular, and construct a closed-form projection of the Hessian matrix that satisfies this condition while maintaining computational efficiency. Our method combines solid theoretical foundations, second-order accuracy, and practical computational efficiency. We further extend SubLoRA to a joint optimization setting, alternating between LoRA parameter updates and rank determination under a rank budget constraint. Extensive experiments on fine-tuning physics-informed neural networks (PINNs) for solving partial differential equations (PDEs) demonstrate the effectiveness of our approach. Results show that SubLoRA outperforms existing methods in both rank determination and joint training performance.

本文中, 我们提议 SubLORA , 这是基于亚模函数最大化的低兰度适应( LORA) 的等级确定方法 。 与以前的方法, 如 AdaLORA , 依赖损失函数的第一阶( 线性) 近似值的 AdaLORA , 相比, SubLORA 使用第二阶信息 , 通过纳入赫西安矩阵来捕捉潜在的复杂损失场景。 我们显示, 当LORA 参数得到优化后, 线性化变得不准确且条件不完善, 从而需要更可靠和细微的第二阶级配方配制。 为此, 我们重新将级别确定问题作为组合式优化问题, 与四阶目标相反。 但是, 要克服计算挑战, 我们引入一个亚模范最大功能最大化框架, 并设计一个含近似保证的贪婪算法。 我们得出一个足够和必要的条件, 确定等级确定目标为亚比值目标, 并构建一个符合这一条件的封闭式的黑萨基矩阵矩阵配置, 同时保持精确的计算效率。 我们的方法结合了实际的理论精确性成本计算基础, , , 水平的升级的升级的计算方法 , , , 在水平上, 水平的计算方法在水平上, 水平上, 水平的精确的计算, 水平的计算中, 水平上, 我们的精确性平比比级的计算 的 的 的计算 的 的计算 , , , , , , , , , 的 的 , , , , , 的 , , 的 的 的 的 的 的 的 的 的 的 的 的 的 的 的 的 的 的 的 的 的 的 的 的 的 的 的 的 的 , 我们的 , 我们的 , 我们的 的 , 我们的 , 我们的 的 的 的 的 的 的 的 的 的 的 的 的 的 的 的 的 的 的 的 的 的 的 的 的 的 的 的 的 的 的 的


Article 242

Title@2025-07-02 (3): GLM-4.1V-Thinking: Towards Versatile Multimodal Reasoning with Scalable Reinforcement Learning

Title: GLM-4.1V-Thinking: Towards Versatile Multimodal Reasoning with Scalable Reinforcement Learning GLM-4.1V-Denken: Auf dem Weg zu einer vielseitigen multimodalen Vernunft mit skalierbarem Stärkungslernen GLM-4.1V-思考:努力以可缩放的强化学习实现多式多式理由 2507.01006v2

Authors (79): GLM-V Team, :, Wenyi Hong, Wenmeng Yu, Xiaotao Gu, Guo Wang, Guobing Gan, Haomiao Tang, Jiale Cheng, Ji Qi, Junhui Ji, Lihang Pan, Shuaiqi Duan, Weihan Wang, Yan Wang, Yean Cheng, Zehai He, Zhe Su, Zhen Yang, Ziyang Pan, Aohan Zeng, Baoxu Wang, Boyan Shi, Changyu Pang, Chenhui Zhang, Da Yin, Fan Yang, Guoqing Chen, Jiazheng Xu, Jiali Chen, Jing Chen, Jinhao Chen, Jinghao Lin, Jinjiang Wang, Junjie Chen, Leqi Lei, Letian Gong, Leyi Pan, Mingzhi Zhang, Qinkai Zheng, Sheng Yang, Shi Zhong, Shiyu Huang, Shuyuan Zhao, Siyan Xue, Shangqin Tu, Shengbiao Meng, Tianshu Zhang, Tianwei Luo, Tianxiang Hao, Wenkai Li, Wei Jia, Xin Lyu, Xuancheng Huang, Yanling Wang, Yadong Xue, Yanfeng Wang, Yifan An, Yifan Du, Yiming Shi, Yiheng Huang, Yilin Niu, Yuan Wang, Yuanchang Yue, Yuchen Li, Yutao Zhang, Yuxuan Zhang, Zhanxiao Du, Zhenyu Hou, Zhao Xue, Zhengxiao Du, Zihan Wang, Peng Zhang, Debing Liu, Bin Xu, Juanzi Li, Minlie Huang, Yuxiao Dong, Jie Tang

We present GLM-4.1V-Thinking, a vision-language model (VLM) designed to advance general-purpose multimodal understanding and reasoning. In this report, we share our key findings in the development of the reasoning-centric training framework. We first develop a capable vision foundation model with significant potential through large-scale pre-training, which arguably sets the upper bound for the final performance. We then propose Reinforcement Learning with Curriculum Sampling (RLCS) to unlock the full potential of the model, leading to comprehensive capability enhancement across a diverse range of tasks, including STEM problem solving, video understanding, content recognition, coding, grounding, GUI-based agents, and long document understanding. We open-source GLM-4.1V-9B-Thinking, which achieves state-of-the-art performance among models of comparable size. In a comprehensive evaluation across 28 public benchmarks, our model outperforms Qwen2.5-VL-7B on nearly all tasks and achieves comparable or even superior performance on 18 benchmarks relative to the significantly larger Qwen2.5-VL-72B. Notably, GLM-4.1V-9B-Thinking also demonstrates competitive or superior performance compared to closed-source models such as GPT-4o on challenging tasks including long document understanding and STEM reasoning, further underscoring its strong capabilities. Code, models and more information are released at https://github.com/THUDM/GLM-4.1V-Thinking.

我们提出GLM-4.1V-Tinking,这是一个旨在推进通用多式联运理解和推理的愿景语言模型(VLM),目的是推进通用多式联运理解和推理;我们在本报告中分享我们在制定以推理为中心的培训框架方面的主要结论;我们首先通过大规模培训前的训练,开发一个具有巨大潜力的有能力的愿景基础模型,这或许为最终业绩设定了上限;然后我们提议用课程抽样强化学习,以释放模型的全部潜力,从而在各种任务中实现全面能力增强,包括STEM问题的解决、视频理解、内容识别、编码、基础化、基于图形的代理商以及长期文件理解;我们首先开发一个能够实现类似规模模型中最新性能的能够;在对28个公共基准的全面评估中,我们的模型在几乎所有任务中都比Qwen2.5-VVV-V-7B,在18个基准中实现可比或甚至优异的绩效,包括GLM-M-M-TF-SIM-SIM-SUL-SUL-S-Silvical Ex、G-Supal-Supal-Supal-Supal-Supal-Silviewslation Suplation Suplick 和G-Suplvial-Supal-Supal-Supal-Supal-Supslislisl)进一步展示等高等高级的高级性工作能力,在高级文件上,进一步展示和高超超超度和高级/Supal-M-SUIM-SUL-SUL-Slvial-Sl-SUL-Sl-S-SL-Sl-S-SL-Sl-S-S-S-SUL-S-S-S-S-S-SUL-SUL-S-S-S-S-S-S-Sl-SUL-SUL-SL-SUL-SUD-S-Sl-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-SUL-S-S-S


Article 243

Title@2025-07-02 (3): Generalization performance of narrow one-hidden layer networks in the teacher-student setting

Title: Generalization performance of narrow one-hidden layer networks in the teacher-student setting Generalisierungsleistung von schmalen ein-versteckten Ebenennetzwerken im Lehrer-Studenten-Setting 师生环境中狭窄的单层网络的普及性业绩 2507.00629v2

Authors (7): Jean Barbier, Federica Gerace, Alessandro Ingrosso, Clarissa Lauditi, Enrico M. Malatesta, Gibbs Nwemadji, Rodrigo Pérez Ortiz

Understanding the generalization abilities of neural networks for simple input-output distributions is crucial to account for their learning performance on real datasets. The classical teacher-student setting, where a network is trained from data obtained thanks to a label-generating teacher model, serves as a perfect theoretical test bed. In this context, a complete theoretical account of the performance of fully connected one-hidden layer networks in the presence of generic activation functions is lacking. In this work, we develop such a general theory for narrow networks, i.e. networks with a large number of hidden units, yet much smaller than the input dimension. Using methods from statistical physics, we provide closed-form expressions for the typical performance of both finite temperature (Bayesian) and empirical risk minimization estimators, in terms of a small number of weight statistics. In doing so, we highlight the presence of a transition where hidden neurons specialize when the number of samples is sufficiently large and proportional to the number of parameters of the network. Our theory accurately predicts the generalization error of neural networks trained on regression or classification tasks with either noisy full-batch gradient descent (Langevin dynamics) or full-batch gradient descent.

理解神经网络在简单输入输出分布方面的普及能力对于在真实数据集中说明其学习性能至关重要。古典师生设置,在这种设置中,一个网络从通过标签生成的教师模型获得的数据中接受培训,可以作为一个完美的理论测试床。在这方面,缺乏一个完整的理论说明,说明在通用激活功能面前完全连接的单层网络的性能。在这项工作中,我们为狭窄网络制定了这样一个一般性理论,即拥有大量隐藏单元的网络,但比输入层面要小得多。我们使用统计物理的方法,为有限的温度(巴耶斯)和实验性风险最小化估计员的典型性能提供了封闭式的表达方式,用少量重量统计来表示。我们为此强调,在样品数量足够大且与网络参数数量成比例的情况下,隐藏的神经元在过渡中具有特殊性。我们的理论准确地预测了在全面冷却的梯度下或全面梯度下进行回归或分类任务培训的神经网络的普遍误差。


Article 244

Title@2025-07-02 (3): Out-of-Distribution Detection Methods Answer the Wrong Questions

Title: Out-of-Distribution Detection Methods Answer the Wrong Questions Out-of-Distribution Detection Methoden beantworten die falschen Fragen 解答错误问题的方法 2507.01831v1

Authors (7): Yucen Lily Li, Daohan Lu, Polina Kirichenko, Shikai Qiu, Tim G. J. Rudner, C. Bayan Bruss, Andrew Gordon Wilson

To detect distribution shifts and improve model safety, many out-of-distribution (OOD) detection methods rely on the predictive uncertainty or features of supervised models trained on in-distribution data. In this paper, we critically re-examine this popular family of OOD detection procedures, and we argue that these methods are fundamentally answering the wrong questions for OOD detection. There is no simple fix to this misalignment, since a classifier trained only on in-distribution classes cannot be expected to identify OOD points; for instance, a cat-dog classifier may confidently misclassify an airplane if it contains features that distinguish cats from dogs, despite generally appearing nothing alike. We find that uncertainty-based methods incorrectly conflate high uncertainty with being OOD, while feature-based methods incorrectly conflate far feature-space distance with being OOD. We show how these pathologies manifest as irreducible errors in OOD detection and identify common settings where these methods are ineffective. Additionally, interventions to improve OOD detection such as feature-logit hybrid methods, scaling of model and data size, epistemic uncertainty representation, and outlier exposure also fail to address this fundamental misalignment in objectives. We additionally consider unsupervised density estimation and generative models for OOD detection, which we show have their own fundamental limitations.

为了检测分布变化和改善模型安全性,许多分配外(OOD)检测方法依靠的是预测性不确定性或经过过分配数据培训的受监督模型的特征。在本文件中,我们严格地重新审查OOD检测程序这一广受欢迎的系列,我们争辩说,这些方法从根本上回答了OOD检测的错误问题。对于这种不匹配现象,没有简单的解决办法,因为仅受过分配类培训的分类员不可能发现OOD点;例如,如果飞机含有区分猫和狗的特征,猫狗的分类方法可能会令人信服地错误地分类。我们发现,基于不确定性的方法错误地将高度不确定性与OOD混为一谈,而基于特征的方法错误地将远的地物空间距离与OOD检测混为一谈。我们表明,这些病理如何表现为OOD检测中不可避免的错误,并找出这些方法无效的常见环境。此外,改进OODD检测的干预措施,例如特征与混合方法,缩小模型和数据大小,发现更多的不确定性说明,外部接触也未能解决其基本的分辨率测量目标。


Article 245

Title@2025-07-02 (3): mGRADE: Minimal Recurrent Gating Meets Delay Convolutions for Lightweight Sequence Modeling

Title: mGRADE: Minimal Recurrent Gating Meets Delay Convolutions for Lightweight Sequence Modeling mGRADE: Minimales Recurrent-Gating trifft auf Verzögerungskonvolutionen für leichte Sequenzmodellierung mGRADE: 轻量级序建模的最小经常定时满足延迟变数 2507.01829v1

Authors (4): Tristan Torchet, Christian Metzner, Laura Kriener, Melika Payvand

Edge devices for temporal processing demand models that capture both short- and long- range dynamics under tight memory constraints. While Transformers excel at sequence modeling, their quadratic memory scaling with sequence length makes them impractical for such settings. Recurrent Neural Networks (RNNs) offer constant memory but train sequentially, and Temporal Convolutional Networks (TCNs), though efficient, scale memory with kernel size. To address this, we propose mGRADE (mininally Gated Recurrent Architecture with Delay Embedding), a hybrid-memory system that integrates a temporal 1D-convolution with learnable spacings followed by a minimal gated recurrent unit (minGRU). This design allows the convolutional layer to realize a flexible delay embedding that captures rapid temporal variations, while the recurrent module efficiently maintains global context with minimal memory overhead. We validate our approach on two synthetic tasks, demonstrating that mGRADE effectively separates and preserves multi-scale temporal features. Furthermore, on challenging pixel-by-pixel image classification benchmarks, mGRADE consistently outperforms both pure convolutional and pure recurrent counterparts using approximately 20% less memory footprint, highlighting its suitability for memory-constrained temporal processing at the edge. This highlights mGRADE’s promise as an efficient solution for memory-constrained multi-scale temporal processing at the edge.

用于时间处理需求模型的边缘装置,在严格的记忆限制下捕捉短距离和长距离动态。 虽然变异器在序列建模方面十分出色, 其四级记忆缩放和序列长度使这些设置不切实际。 经常神经网络( RNNS) 提供恒定的记忆, 但顺序列列列, 以及时空变迁网络(TCNs) , 虽然效率高, 规模小于内核的内存。 为了解决这个问题, 我们提议 mGRAADE( 带延迟嵌入的最小GED 常规架构) , 混合模版系统, 将时间1D变异与可学习的间隔结合, 并随后有一个最小的锁定的固定的经常单位( minGRU) 。 这个设计让变动层可以实现灵活的延迟嵌入, 捕捉到快速的时间变化, 而经常模块则有效地维持全球环境, 最小的内存中, 我们验证我们的两个合成任务的方法, 表明 mGRADE 有效地分离并保存多尺度的时间特征。 此外, IMADE 持续在精确的后端 和纯的时空级的记忆边缘 , 质级的内存中继器 , , 在大约 20 % 节段的多级的 。


Article 246

Title@2025-07-02 (3): MILP-SAT-GNN: Yet Another Neural SAT Solver

Title: MILP-SAT-GNN: Yet Another Neural SAT Solver MILP-SAT-GNN: Noch ein neuraler SAT Solver MILP-SAT-GNN:然而,另一个神经SAT溶剂 2507.01825v1

Authors (3): Franco Alberto Cardillo, Hamza Khyari, Umberto Straccia

We proposes a novel method that enables Graph Neural Networks (GNNs) to solve SAT problems by leveraging a technique developed for applying GNNs to Mixed Integer Linear Programming (MILP). Specifically, k-CNF formulae are mapped into MILP problems, which are then encoded as weighted bipartite graphs and subsequently fed into a GNN for training and testing. From a theoretical perspective: (i) we establish permutation and equivalence invariance results, demonstrating that the method produces outputs that are stable under reordering of clauses and variables; (ii) we identify a theoretical limitation, showing that for a class of formulae called foldable formulae, standard GNNs cannot always distinguish satisfiable from unsatisfiable instances; (iii) we prove a universal approximation theorem, establishing that with Random Node Initialization (RNI), the method can approximate SAT solving to arbitrary precision on finite datasets, that is, the GNN becomes approximately sound and complete on such datasets. Furthermore, we show that for unfoldable formulae, the same approximation guarantee can be achieved without the need for RNI. Finally, we conduct an experimental evaluation of our approach, which show that, despite the simplicity of the neural architecture, the method achieves promising results.

我们建议一种新颖的方法,使图形神经网络(GNN)能够利用为将GNN应用到混合整线性线性编程(MILP)而开发的一种技术解决SAT问题。 具体地说,K-CNF公式被映入MILP问题,然后以加权双边图形编码,然后输入GNN,用于培训和测试。 从理论角度看:(一) 我们建立偏差和等值差异结果,表明该方法产生的产出在重新排序条款和变量下是稳定的;(二) 我们确定一个理论限制,表明对于被称为折叠式公式的公式类别来说,标准GNNNS公式不能总是区分可作比较的和不可满足的事例;(三) 我们证明一个通用的近似理论,通过随机节点初始化(RNI),这种方法可以近似沙特卫星在限定数据集上任意精确地解决问题,也就是说,GNNNN在这种数据集上变得近似和完整。 此外,我们表明,对于可演化的公式而言,同样的近似的保证可以永远区分为无法令人满意的情况;(三) 我们证明,尽管需要实现一个有希望的实验性的方法,最终能显示我们实现的精确的模型。


Article 247

Title@2025-07-02 (3): TD-MPC-Opt: Distilling Model-Based Multi-Task Reinforcement Learning Agents

Title: TD-MPC-Opt: Distilling Model-Based Multi-Task Reinforcement Learning Agents TD-MPC-Opt: Destillieren modellbasierter Multi-Task-Verstärkungs-Learning-Agenten TD-MPC-Opt: 以示范为基础的多任务加强学习机构 2507.01823v1

Authors (2): Dmytro Kuzmenko, Nadiya Shvai

We present a novel approach to knowledge transfer in model-based reinforcement learning, addressing the critical challenge of deploying large world models in resource-constrained environments. Our method efficiently distills a high-capacity multi-task agent (317M parameters) into a compact model (1M parameters) on the MT30 benchmark, significantly improving performance across diverse tasks. Our distilled model achieves a state-of-the-art normalized score of 28.45, surpassing the original 1M parameter model score of 18.93. This improvement demonstrates the ability of our distillation technique to capture and consolidate complex multi-task knowledge. We further optimize the distilled model through FP16 post-training quantization, reducing its size by $\sim$50\%. Our approach addresses practical deployment limitations and offers insights into knowledge representation in large world models, paving the way for more efficient and accessible multi-task reinforcement learning systems in robotics and other resource-constrained applications. Code available at https://github.com/dmytro-kuzmenko/td-mpc-opt.

在基于模型的强化学习中,我们提出了一种新颖的知识转让方法,以解决在资源受限制的环境中部署大型世界模型的重大挑战。我们的方法有效地将一个高容量的多任务代理(317M参数)蒸馏成关于MT30基准的紧凑模型(1M参数),大大提高了各种任务的业绩。我们精炼的模型达到了28.45分,超过了最初的1M参数模型的18.93分。这一改进显示了我们蒸馏技术捕捉和巩固复杂的多任务知识的能力。我们通过FP16培训后四分制进一步优化了蒸馏模型,将其规模缩小了50美元。我们的方法解决了实际的部署限制,并提供了对大型世界模型中知识代表的洞察力,为在机器人和其他资源密集的应用中更有效和方便的多任务强化学习系统铺平了道路。代码见https://github.com/dmytro-kuzmenko/t-mpc-opt。


Article 248

Title@2025-07-02 (3): Deep Reinforcement Learning for Traveling Purchaser Problems

Title: Deep Reinforcement Learning for Traveling Purchaser Problems Deep Verstärkung Learning for Traveling Buyer Probleme 为旅行购买者问题深入强化学习 2404.02476v6

Authors (7): Haofeng Yuan, Rongping Zhu, Wanlu Yang, Shiji Song, Keyou You, Wei Fan, C. L. Philip Chen

The traveling purchaser problem (TPP) is an important combinatorial optimization problem with broad applications. Due to the coupling between routing and purchasing, existing works on TPPs commonly address route construction and purchase planning simultaneously, which, however, leads to exact methods with high computational cost and heuristics with sophisticated design but limited performance. In sharp contrast, we propose a novel approach based on deep reinforcement learning (DRL), which addresses route construction and purchase planning separately, while evaluating and optimizing the solution from a global perspective. The key components of our approach include a bipartite graph representation for TPPs to capture the market-product relations, and a policy network that extracts information from the bipartite graph and uses it to sequentially construct the route. One significant advantage of our framework is that we can efficiently construct the route using the policy network, and once the route is determined, the associated purchasing plan can be easily derived through linear programming, while, by leveraging DRL, we can train the policy network towards optimizing the global solution objective. Furthermore, by introducing a meta-learning strategy, the policy network can be trained stably on large-sized TPP instances, and generalize well across instances of varying sizes and distributions, even to much larger instances that are never seen during training. Experiments on various synthetic TPP instances and the TPPLIB benchmark demonstrate that our DRL-based approach can significantly outperform well-established TPP heuristics, reducing the optimality gap by 40%-90%, and also showing an advantage in runtime, especially on large-sized instances.

旅行购买者问题(TPP)是一个具有广泛应用性的重要组合优化问题。由于路线和采购之间的交错,TPP的现有工程通常同时处理路线建设和采购规划问题,然而,这导致计算成本高、杂费高、设计复杂但绩效有限的精确方法。 形成鲜明对照的是,我们提出了基于深度强化学习的新颖方法(DRL),该方法分别解决路线建设和采购规划问题,同时从全球角度评估和优化解决方案。我们方法的关键组成部分包括两部分图显示贸易点获取市场-产品关系,以及一个政策网络,从双面图中提取信息,并用它来按顺序构建路线。我们框架的一个重大优势是,我们可以使用政策网络高效地构建路线,一旦路线确定,就可以通过线性规划,同时通过利用基于DRL,我们可以对政策网络进行优化全球解决方案目标的培训。此外,通过引入元化学习战略,政策网络可以对大比例为90的TPPP(T-90)优势进行精确的测试,在大型的TRP(T-L)情况下,在大规模的TPP(P)列表中,特别展示了大比例的模型,在大型的分布中可以明显地展示。


Article 249

Title@2025-07-02 (3): LoRA Fine-Tuning Without GPUs: A CPU-Efficient Meta-Generation Framework for LLMs

Title: LoRA Fine-Tuning Without GPUs: A CPU-Efficient Meta-Generation Framework for LLMs LoRA Feintuning ohne GPUs: Ein CPU-effizientes Meta-Generation-Framework für LLMs LoRA 无GPUs的精细调整:LLMs的CPU-提高功能元元发光框架 2507.01806v1

Authors (3): Reza Arabpour, Haitz Sáez de Ocáriz Borde, Anastasis Kratsios

Low-Rank Adapters (LoRAs) have transformed the fine-tuning of Large Language Models (LLMs) by enabling parameter-efficient updates. However, their widespread adoption remains limited by the reliance on GPU-based training. In this work, we propose a theoretically grounded approach to LoRA fine-tuning designed specifically for users with limited computational resources, particularly those restricted to standard laptop CPUs. Our method learns a meta-operator that maps any input dataset, represented as a probability distribution, to a set of LoRA weights by leveraging a large bank of pre-trained adapters for the Mistral-7B-Instruct-v0.2 model. Instead of performing new gradient-based updates, our pipeline constructs adapters via lightweight combinations of existing LoRAs directly on CPU. While the resulting adapters do not match the performance of GPU-trained counterparts, they consistently outperform the base Mistral model on downstream tasks, offering a practical and accessible alternative to traditional GPU-based fine-tuning.

低兰克适应器(LORAs)通过允许具有参数效率的更新,改变了大语言模型的微调,但是,由于依赖基于GPU的培训,这些模型的广泛采用仍然有限。在这项工作中,我们建议对专门为计算资源有限的用户,特别是仅限于标准膝上型膝上型计算机的用户设计的LORA微调采取基于理论的微调方法。我们的方法学了一个元操作器,该元操作器将代表概率分布的任何输入数据集映射成一组LORA重量,利用大批受过训练的适应器来进行Mistral-7B-Instruct-v0.2模型。我们的管道结构不是进行新的基于梯度的更新,而是通过直接在CPU上对现有LRA的轻量组合来建造适应器。虽然所产生的调整器与受GPUP培训的对应方的性能不匹配,但它们始终比下游任务的基本Mistral模型高,为传统的基于GPU的微调制提供了实用和方便的替代方法。


Article 250

Title@2025-07-02 (3): Towards Decentralized and Sustainable Foundation Model Training with the Edge

Title: Towards Decentralized and Sustainable Foundation Model Training with the Edge Auf dem Weg zu einer dezentralisierten und nachhaltigen Stiftung Modellausbildung mit der Kante 实现权力下放和可持续基础建设的边缘模式培训 2507.01803v1

Authors (5): Leyang Xue, Meghana Madhyastha, Randal Burns, Myungjin Lee, Mahesh K. Marina

Foundation models are at the forefront of AI research, appealing for their ability to learn from vast datasets and cater to diverse tasks. Yet, their significant computational demands raise issues of environmental impact and the risk of centralized control in their development. We put forward a vision towards decentralized and sustainable foundation model training that leverages the collective compute of sparingly used connected edge AI devices. We present the rationale behind our vision, particularly in support of its sustainability benefit. We further outline a set of challenges that need to be addressed to turn this vision into reality.

基金会模式处于大赦国际研究的最前沿,它呼吁它们有能力从庞大的数据集中学习,并适应各种任务。然而,它们重要的计算要求提出了环境影响问题和在发展过程中集中控制的风险。我们提出了分散和可持续的基础模式培训的愿景,利用集体计算零用连接的边缘AI设备。我们提出了我们的愿景背后的理由,特别是支持其可持续性的好处。我们进一步概述了将这一愿景变为现实需要应对的一系列挑战。


Article 251

Title@2025-07-02 (3): The Anatomy of Evidence: An Investigation Into Explainable ICD Coding

Title: The Anatomy of Evidence: An Investigation Into Explainable ICD Coding Die Anatomie der Beweise: Eine Untersuchung zur erklärbaren ICD-Kodierung 证据解剖学:调查可解释的 ICD 编码 2507.01802v1

Authors (5): Katharina Beckh, Elisa Studeny, Sujan Sai Gannamaneni, Dario Antweiler, Stefan Rüping

Automatic medical coding has the potential to ease documentation and billing processes. For this task, transparency plays an important role for medical coders and regulatory bodies, which can be achieved using explainability methods. However, the evaluation of these approaches has been mostly limited to short text and binary settings due to a scarcity of annotated data. Recent efforts by Cheng et al. (2023) have introduced the MDACE dataset, which provides a valuable resource containing code evidence in clinical records. In this work, we conduct an in-depth analysis of the MDACE dataset and perform plausibility evaluation of current explainable medical coding systems from an applied perspective. With this, we contribute to a deeper understanding of automatic medical coding and evidence extraction. Our findings reveal that ground truth evidence aligns with code descriptions to a certain degree. An investigation into state-of-the-art approaches shows a high overlap with ground truth evidence. We propose match measures and highlight success and failure cases. Based on our findings, we provide recommendations for developing and evaluating explainable medical coding systems.

自动医疗编码具有方便文件和计费程序的潜力。对于这项任务,透明度对医疗编码员和监管机构起着重要作用,可以通过解释性方法实现。然而,由于缺少附加说明的数据,对这些方法的评价主要限于短文本和二进制设置。Cheng等人(2023年)最近的努力引入了MDACE数据集,该数据集提供了含有临床记录中代码证据的宝贵资源。在这项工作中,我们对MDACE数据集进行深入分析,并从应用角度对当前可解释的医疗编码系统进行合理性评价。我们以此帮助加深对自动医学编码和证据提取的理解。我们的调查结果显示,地面真相证据与代码描述在某种程度上是一致的。对最新方法的调查显示,与地面真相证据有很大的重叠。我们提出了匹配措施,并强调成功和失败案例。根据我们的调查结果,我们为开发和评价可解释的医疗编码系统提出了建议。


Article 252

Title@2025-07-02 (3): Neural Entropy-stable conservative flux form neural networks for learning hyperbolic conservation laws

Title: Neural Entropy-stable conservative flux form neural networks for learning hyperbolic conservation laws Neurale Entropie-stabile konservative Flussform neuronale Netzwerke zum Lernen hyperbolischer Erhaltungsgesetze 为学习双曲保护法而形成的神经网络 2507.01795v1

Authors (3): Lizuo Liu, Lu Zhang, Anne Gelb

We propose a neural entropy-stable conservative flux form neural network (NESCFN) for learning hyperbolic conservation laws and their associated entropy functions directly from solution trajectories, without requiring any predefined numerical discretization. While recent neural network architectures have successfully integrated classical numerical principles into learned models, most rely on prior knowledge of the governing equations or assume a fixed discretization. Our approach removes this dependency by embedding entropy-stable design principles into the learning process itself, enabling the discovery of physically consistent dynamics in a fully data-driven setting. By jointly learning both the numerical flux function and a corresponding entropy, the proposed method ensures conservation and entropy dissipation, critical for long-term stability and fidelity in the system of hyperbolic conservation laws. Numerical results demonstrate that the method achieves stability and conservation over extended time horizons and accurately captures shock propagation speeds, even without oracle access to future-time solution profiles in the training data.

我们建议建立一个神经可变性保守通量神经网络(NESCFN),直接从解析轨迹中学习双曲保护法及其相关的酶功能,而不需要预先确定数字分解。虽然最近的神经网络结构成功地将古典数字原则纳入了已学模型,但多数依赖对治理方程的先前知识或假设固定离散。我们的方法通过在学习过程中嵌入可变设计原则来消除这种依赖性,从而能够在完全由数据驱动的环境中发现物理一致性的动态。通过共同学习数字通量函数和对应的酶,拟议的方法可以确保保存和消化,这对于超曲保护法体系的长期稳定性和忠诚至关重要。数字结果表明,该方法在延长的时间范围内实现了稳定性和保全,并准确地捕捉了震荡传播速度,即使没有在培训数据中获取未来时间解决方案简介的孔隙。


Article 253

Title@2025-07-02 (3): How Do Vision-Language Models Process Conflicting Information Across Modalities?

Title: How Do Vision-Language Models Process Conflicting Information Across Modalities? Wie verarbeiten Vision-Language-Modelle widersprüchliche Informationen über Modalitäten hinweg? 愿景-语言模型如何以不同方式处理信息冲突问题? 2507.01790v1

Authors (3): Tianze Hua, Tian Yun, Ellie Pavlick

AI models are increasingly required to be multimodal, integrating disparate input streams into a coherent state representation on which subsequent behaviors and actions can be based. This paper seeks to understand how such models behave when input streams present conflicting information. Focusing specifically on vision-language models, we provide inconsistent inputs (e.g., an image of a dog paired with the caption “A photo of a cat”) and ask the model to report the information present in one of the specific modalities (e.g., “What does the caption say / What is in the image?”). We find that models often favor one modality over the other, e.g., reporting the image regardless of what the caption says, but that different models differ in which modality they favor. We find evidence that the behaviorally preferred modality is evident in the internal representational structure of the model, and that specific attention heads can restructure the representations to favor one modality over the other. Moreover, we find modality-agnostic “router heads” which appear to promote answers about the modality requested in the instruction, and which can be manipulated or transferred in order to improve performance across datasets and modalities. Together, the work provides essential steps towards identifying and controlling if and how models detect and resolve conflicting signals within complex multimodal environments.

AI模式日益成为多式模式,将不同的输入流纳入一个连贯的国家代表模式,随后的行为和行动可以以此为基础。本文件试图了解当输入流带来相互矛盾的信息时,这种模式的行为方式如何。 具体侧重于愿景语言模式,我们提供不一致的投入(例如,一条配有“猫的照片”标题的狗的图像),并要求模型报告特定模式之一(例如,“标题表示什么/图像中是什么?” )的信息。我们发现,模式往往偏向于一种模式,例如,报告图像,而不管标题说明什么,但不同的模式不同,它们倾向于哪种模式。我们发现,行为偏好模式在模型的内部代表结构中显而易见,而具体关注负责人可以调整代表结构,赞成一种模式而不是另一种模式。此外,我们发现,模式-无差别的“路径头”似乎有助于回答指令中所要求的模式,而且可以被操纵或转移,以便改进跨数据集和模式的运行方式。我们发现,如果在复杂的情况下,工作提供了必要的步骤,如何在模式和模式中确定和模式内,则如何控制,如何稳定。


Article 254

Title@2025-07-02 (3): SpikeNAS: A Fast Memory-Aware Neural Architecture Search Framework for Spiking Neural Network-based Embedded AI Systems

Title: SpikeNAS: A Fast Memory-Aware Neural Architecture Search Framework for Spiking Neural Network-based Embedded AI Systems SpikeNAS: Ein schnelles Speicher-Aware Neural Architecture Such-Framework für neurale eingebettete KI-Systeme SpikNAS: 以嵌入式AI系统为基础的Spiking神经网络嵌入式人工智能系统快速记忆-智能神经结构搜索框架 2402.11322v4

Authors (2): Rachmad Vidya Wicaksana Putra, Muhammad Shafique

Embedded AI systems are expected to incur low power/energy consumption for solving machine learning tasks, as these systems are usually power constrained (e.g., object recognition task in autonomous mobile agents with portable batteries). These requirements can be fulfilled by Spiking Neural Networks (SNNs), since their bio-inspired spike-based operations offer high accuracy and ultra low-power/energy computation. Currently, most of SNN architectures are derived from Artificial Neural Networks whose neurons’ architectures and operations are different from SNNs, and/or developed without considering memory budgets from the underlying processing hardware of embedded platforms. These limitations hinder SNNs from reaching their full potential in accuracy and efficiency. Toward this, we propose SpikeNAS, a novel fast memory-aware neural architecture search (NAS) framework for SNNs that quickly finds an appropriate SNN architecture with high accuracy under the given memory budgets from targeted embedded systems. To do this, our SpikeNAS employs several key steps: analyzing the impacts of network operations on the accuracy, enhancing the network architecture to improve the learning quality, developing a fast memory-aware search algorithm, and performing quantization. The experimental results show that our SpikeNAS improves the searching time and maintains high accuracy compared to state-of-the-art while meeting the given memory budgets (e.g., 29x, 117x, and 3.7x faster search for CIFAR10, CIFAR100, and TinyImageNet200 respectively, using an Nvidia RTX A6000 GPU machine), thereby quickly providing the appropriate SNN architecture for the memory-constrained embedded AI systems.

由于这些系统通常受到电源限制(例如,用便携式电池自动移动代理器中的物体识别任务),因此,嵌入式的AI系统预计耗电量较低,用于解决机器学习任务。这些要求可由Spiking神经网络(SNNS)满足,因为其生物引发的螺旋型操作提供高精度和超低功能/能量计算。目前,SNN的多数结构来自人工神经元神经网络,其结构和运作与SNNNS不同,和/或开发这些系统通常不考虑嵌入平台基本处理硬件的记忆预算。这些限制阻碍了SNNNNS在准确性和效率方面达到全部潜力。为此,我们建议SpikingNAS(Spiking Neas-aware Neal Compainte),一个为SNNNNNNS(NA-NR)快速智能搜索框架(NA-NR)迅速找到合适的S-RER(S-RA), 快速搜索SDR(S-RA-RA), 快速搜索Silental-ral-ral Ralalalalal) 预算, 和运行(SD-ral-ral-ral-SL) 。


Article 255

Title@2025-07-02 (3): MuRating: A High Quality Data Selecting Approach to Multilingual Large Language Model Pretraining

Title: MuRating: A High Quality Data Selecting Approach to Multilingual Large Language Model Pretraining MuRating: Ein qualitativ hochwertiger Datenauswahlansatz zur Mehrsprachigen Vorschulung großer Sprachmodelle 词汇:多语言大语言模式预科培训的高质量数据选择方法 2507.01785v1

Authors (12): Zhixun Chen, Ping Guo, Wenhan Han, Yifan Zhang, Binbin Liu, Haobin Lin, Fengze Liu, Yan Zhao, Bingni Zhang, Taifeng Wang, Yin Zheng, Meng Fang

Data quality is a critical driver of large language model performance, yet existing model-based selection methods focus almost exclusively on English. We introduce MuRating, a scalable framework that transfers high-quality English data-quality signals into a single rater for 17 target languages. MuRating aggregates multiple English “raters” via pairwise comparisons to learn unified document-quality scores,then projects these judgments through translation to train a multilingual evaluator on monolingual, cross-lingual, and parallel text pairs. Applied to web data, MuRating selects balanced subsets of English and multilingual content to pretrain a 1.2 B-parameter LLaMA model. Compared to strong baselines, including QuRater, AskLLM, DCLM and so on, our approach boosts average accuracy on both English benchmarks and multilingual evaluations, with especially large gains on knowledge-intensive tasks. We further analyze translation fidelity, selection biases, and underrepresentation of narrative material, outlining directions for future work.

数据质量是大语言模型业绩的关键驱动因素,但现有基于模式的选择方法几乎完全以英语为重点。我们引入了“模调”,这是一个可扩缩的框架,将高质量的英语数据质量信号转换成17种目标语言的单一调率器。通过对称比较,将多重英语“拉子”聚合在一起,学习统一的文件质量分数,然后通过翻译将这些判断投射为对单语、跨语言和平行文本对多语言评价员的培训。应用到网络数据,将选择的英语和多语言内容的平衡子集用于预演1.2 B-参数Lama模型。与强大的基线相比,包括Qurater、AskLLM、DCLM等,我们的方法提高了英语基准和多语言评价的平均准确性,特别是在知识密集型任务上取得的巨大成果。我们进一步分析翻译的忠贞、选择偏差和叙述材料的不足,并概述未来工作的方向。


Article 256

Title@2025-07-02 (3): BranchNet: A Neuro-Symbolic Learning Framework for Structured Multi-Class Classification

Title: BranchNet: A Neuro-Symbolic Learning Framework for Structured Multi-Class Classification BranchNet: Ein neuro-symbolisches Lernrahmenwerk für strukturierte Multi-Klassen-Klassifikation 分支网:结构化多类分类的神经-双曲学习框架 2507.01781v1

Authors (2): Dalia Rodríguez-Salas, Christian Riess

We introduce BranchNet, a neuro-symbolic learning framework that transforms decision tree ensembles into sparse, partially connected neural networks. Each branch, defined as a decision path from root to a parent of leaves, is mapped to a hidden neuron, preserving symbolic structure while enabling gradient-based optimization. The resulting models are compact, interpretable, and require no manual architecture tuning. Evaluated on a suite of structured multi-class classification benchmarks, BranchNet consistently outperforms XGBoost in accuracy, with statistically significant gains. We detail the architecture, training procedure, and sparsity dynamics, and discuss the model’s strengths in symbolic interpretability as well as its current limitations, particularly on binary tasks where further adaptive calibration may be beneficial.

我们引入了分流网,这是一个将决策树集合成零散、部分连接的神经网络的神经-同步学习框架。 每一个分支被定义为从树根到树叶父的“决定路径 ” , 被绘制成一个隐藏的神经元, 保存象征性结构, 并同时能够实现基于梯度的优化。 所产生的模型是紧凑的、 可解释的, 不需要手动的架构调整 。 在一套结构化的多级分类基准的基础上进行评估, 分流网一直比XGBoost精确地表现得更好, 并取得了统计上的重大收益 。 我们详细介绍了结构、 培训程序和宽度动态, 并讨论了模型在符号可解释性方面的优势及其目前的局限性, 特别是对于进一步适应校准可能有用的二进制任务而言。


Article 257

Title@2025-07-02 (3): Beating Transformers using Synthetic Cognition

Title: Beating Transformers using Synthetic Cognition Transformer mit synthetischer Kognition schlagen 使用合成认知击打变形器 2504.07619v3

Authors (5): Alfredo Ibias, Miguel Rodriguez-Galindo, Hector Antona, Guillem Ramirez-Miranda, Enric Guinovart

The road to Artificial General Intelligence goes through the generation of context-aware reactive behaviors, where the Transformer architecture has been proven to be the state-of-the-art. However, they still fail to develop reasoning. Recently, a novel approach for developing cognitive architectures, called Synthetic Cognition, has been proposed and implemented to develop instantaneous reactive behavior. In this study, we aim to explore the use of Synthetic Cognition to develop context-aware reactive behaviors. We propose a mechanism to deal with sequences for the recent implementation of Synthetic Cognition, and test it against DNA foundation models in DNA sequence classification tasks. In our experiments, our proposal clearly outperforms the DNA foundation models, obtaining the best score on more benchmark tasks than the alternatives. Thus, we achieve two goals: expanding Synthetic Cognition to deal with sequences, and beating the Transformer architecture for sequence classification.

人造通用智能之路经过了一种有背景觉悟的被动反应行为,在这些行为中,变异器结构已被证明是最先进的。然而,它们仍然未能发展推理。最近,一种开发认知结构的新颖方法,称为合成认知概念,被提出并付诸实施,以发展即时反应行为。在这个研究中,我们旨在探索如何利用合成认知来发展有背景觉悟的被动反应行为。我们提出了一个机制来处理最近实施合成认知的序列,并在DNA序列分类任务中用DNA基础模型测试它。在我们的实验中,我们的提案明显超越了DNA基础模型,在比替代方法更多的基准任务上获得了最好的分数。因此,我们实现了两个目标:扩大合成认知来应对序列,并击打变异器结构来进行序列分类。


Article 258

Title@2025-07-02 (3): Retrieving snow depth distribution by downscaling ERA5 Reanalysis with ICESat-2 laser altimetry

Title: Retrieving snow depth distribution by downscaling ERA5 Reanalysis with ICESat-2 laser altimetry Rückgewinnung der Schneetiefe durch Downscaling ERA5 Reanalyse mit ICESat-2 Laseraltimetry 通过用 ICESat-2 激光测高法进行重新分析,缩小ERA5 的尺度,获取积雪深度分布 2410.17934v2

Authors (3): Zhihao Liu, Simon Filhol, Désirée Treichler

Estimating the variability of seasonal snow cover, in particular snow depth in remote areas, poses significant challenges due to limited spatial and temporal data availability. This study uses snow depth measurements from the ICESat-2 satellite laser altimeter, which are sparse in both space and time, and incorporates them with climate reanalysis data into a downscaling-calibration scheme to produce monthly gridded snow depth maps at microscale (10 m). Snow surface elevation measurements from ICESat-2 along profiles are compared to a digital elevation model to determine snow depth at each point. To efficiently turn sparse measurements into snow depth maps, a regression model is fitted to establish a relationship between the retrieved snow depth and the corresponding ERA5 Land snow depth. This relationship, referred to as subgrid variability, is then applied to downscale the monthly ERA5 Land snow depth data. The method can provide timeseries of monthly snow depth maps for the entire ERA5 time range (since 1950). The validation of downscaled snow depth data was performed at an intermediate scale (100 m x 500 m) using datasets from airborne laser scanning (ALS) in the Hardangervidda region of southern Norway. Results show that snow depth prediction achieved R2 values ranging from 0.74 to 0.88 (post-calibration). The method relies on globally available data and is applicable to other snow regions above the treeline. Though requiring area-specific calibration, our approach has the potential to provide snow depth maps in areas where no such data exist and can be used to extrapolate existing snow surveys in time and over larger areas. With this, it can offer valuable input data for hydrological, ecological or permafrost modeling tasks.

估计季节性雪雪覆盖的变异性,特别是偏远地区的雪深,由于空间和时间数据提供有限,因此带来重大挑战。本研究使用ICESat-2号卫星激光测高仪的雪深测量数据,这些测量数据在时空间和时间上都稀少,并将这些数据与气候再分析数据纳入一个降尺度校准计划,以制作微尺度(10米)的月雪深图;将ICESat-2的雪表层高度测量数据与数字升降模型进行比较,以确定每个点的雪深。为了有效地将稀释的测量数据转换为雪深雪图,安装了一个回归模型,以便在回收的雪层深度和相应的ELA5陆地雪深层之间建立起一种关系。这种关系被称为亚电网变异,然后用于缩小月度ERA5地雪深数据,以便提供整个ERA5时间范围内的月雪深图的时间序列(自1950年起);对降尺度具体的雪深层数据进行校准(100米x500米),利用空中激光扫描(ALS)在Hardangerevia深度上采集(ALS)的雪深层区域建立更多的雪层,从Rangervid Syalal4号区域到全球的Syal-laxxx。


Article 259

Title@2025-07-02 (3): Long-Context Linear System Identification

Title: Long-Context Linear System Identification Langkontext-Linearsystem-Identifikation 长全文线性系统识别 2410.05690v2

Authors (3): Oğuz Kaan Yüksel, Mathieu Even, Nicolas Flammarion

This paper addresses the problem of long-context linear system identification, where the state $x_t$ of a dynamical system at time $t$ depends linearly on previous states $x_s$ over a fixed context window of length $p$. We establish a sample complexity bound that matches the i.i.d. parametric rate up to logarithmic factors for a broad class of systems, extending previous works that considered only first-order dependencies. Our findings reveal a learning-without-mixing phenomenon, indicating that learning long-context linear autoregressive models is not hindered by slow mixing properties potentially associated with extended context windows. Additionally, we extend these results to (i) shared low-rank representations, where rank-regularized estimators improve the dependence of the rates on the dimensionality, and (ii) misspecified context lengths in strictly stable systems, where shorter contexts offer statistical advantages.

本文论述长文本线性系统识别问题,即动态系统在时间上的状态 $x$t美元在时间上依赖前一州 $x$t 美元,在长度的固定背景窗口中则依赖前一州 $x$s$s$s$p$。我们建立了与一大类系统的对数率相匹配的样本复杂性,以对数率为限,扩大以前只考虑一级依赖性的工程。我们的调查结果揭示了一种学习-无混合现象,表明学习长文本线性自动反差模型不会因与扩展背景窗口可能相关的缓慢混合性能而受到阻碍。此外,我们将这些结果推广到(一) 共享的低级别代表面,按等级定级的估测算者可以提高比率对维性的依赖性,以及(二) 在严格稳定的系统中,在较短的环境下,有错误的上下文长度提供统计上的优势。


Article 260

Title@2025-07-02 (3): Interact2Vec – An efficient neural network-based model for simultaneously learning users and items embeddings in recommender systems

Title: Interact2Vec – An efficient neural network-based model for simultaneously learning users and items embeddings in recommender systems Interact2Vec – Ein effizientes neuronales Netzwerk-basiertes Modell zum gleichzeitigen Lernen von Benutzern und Elementen in Empfehlungssysteme Interact2Vec – – 一个有效的神经网络模式,用于同时学习用户和项目嵌入建议系统 2506.22648v2

Authors (2): Pedro R. Pires, Tiago A. Almeida

Over the past decade, recommender systems have experienced a surge in popularity. Despite notable progress, they grapple with challenging issues, such as high data dimensionality and sparseness. Representing users and items as low-dimensional embeddings learned via neural networks has become a leading solution. However, while recent studies show promising results, many approaches rely on complex architectures or require content data, which may not always be available. This paper presents Interact2Vec, a novel neural network-based model that simultaneously learns distributed embeddings for users and items while demanding only implicit feedback. The model employs state-of-the-art strategies that natural language processing models commonly use to optimize the training phase and enhance the final embeddings. Two types of experiments were conducted regarding the extrinsic and intrinsic quality of the model. In the former, we benchmarked the recommendations generated by Interact2Vec’s embeddings in a top-$N$ ranking problem, comparing them with six other recommender algorithms. The model achieved the second or third-best results in 30% of the datasets, being competitive with other recommenders, and has proven to be very efficient with an average training time reduction of 274% compared to other embedding-based models. Later, we analyzed the intrinsic quality of the embeddings through similarity tables. Our findings suggest that Interact2Vec can achieve promising results, especially on the extrinsic task, and is an excellent embedding-generator model for scenarios of scarce computing resources, enabling the learning of item and user embeddings simultaneously and efficiently.

近十年来,推荐者系统经历了受欢迎程度的激增。 尽管取得了显著的进步, 但它们仍然在应对挑战性的问题, 如高数据维度和稀缺性。 代表用户和项目作为通过神经网络学习的低维嵌入器, 已成为一个领先的解决办法。 然而, 尽管最近的研究显示, 许多方法都依赖于复杂的结构或需要内容数据, 可能并非总能得到。 本文展示了基于新颖神经网络的模型Interact2Vec, 这个基于神经网络的新颖模式, 既学习为用户和项目分布的嵌入, 同时又只要求提供隐含的反馈。 该模型使用最先进的战略, 自然语言处理模型通常用来优化培训阶段和加强最终嵌入。 在模型的外部和内在质量方面, 进行了两种实验。 在前者, 我们将Interact2Vec(InteractVec) 的嵌入于一个最高至N$的排名问题中, 比较了它们与其他六个推荐者测算法。 模型实现了第二个或第三个最佳的结果是30%的数据集模型, 与其他推荐者相比, 具有竞争力, 和最后嵌入率的模型比, 我们的排序的内置的内置的模型, 是一个高效的模型, 和升级的内置 。


Article 261

Title@2025-07-02 (3): Improving Consistency Models with Generator-Augmented Flows

Title: Improving Consistency Models with Generator-Augmented Flows Verbesserung von Konsistenzmodellen mit Generator-Augmented Flows 改进发电机加速流动的一致性模式 2406.09570v4

Authors (6): Thibaut Issenhuth, Sangchul Lee, Ludovic Dos Santos, Jean-Yves Franceschi, Chansoo Kim, Alain Rakotomamonjy

Consistency models imitate the multi-step sampling of score-based diffusion in a single forward pass of a neural network. They can be learned in two ways: consistency distillation and consistency training. The former relies on the true velocity field of the corresponding differential equation, approximated by a pre-trained neural network. In contrast, the latter uses a single-sample Monte Carlo estimate of this velocity field. The related estimation error induces a discrepancy between consistency distillation and training that, we show, still holds in the continuous-time limit. To alleviate this issue, we propose a novel flow that transports noisy data towards their corresponding outputs derived from a consistency model. We prove that this flow reduces the previously identified discrepancy and the noise-data transport cost. Consequently, our method not only accelerates consistency training convergence but also enhances its overall performance. The code is available at: https://github.com/thibautissenhuth/consistency_GC.

一致性模型仿照神经网络单一前方传承的基于分数的传播的多步抽样,它们可以用两种方式学习:一致性蒸馏和一致性培训。前者依赖于相应的差异方程式的真正速度场,以经过预先训练的神经网络为近。相比之下,后者使用单一的蒙特卡洛估计该速度场。相关的估计错误导致一致性蒸馏与培训之间出现差异,而我们显示,这种差异仍在持续时间限制中。为了缓解这一问题,我们提议一种新的流动,将噪音数据传送到从一致性模型得出的相应产出上。我们证明,这种流动减少了先前确定的差异和噪音数据传输成本。因此,我们的方法不仅加快了一致性培训的趋同,而且还提高了总体性能。代码见:https://github.com/thibautissenhuth/continenty_GC。


Article 262

Title@2025-07-02 (3): Enhanced Generative Model Evaluation with Clipped Density and Coverage

Title: Enhanced Generative Model Evaluation with Clipped Density and Coverage Verbesserte Generative Modellbewertung mit Clipped Dichte und Abdeckung 具有缩小密度和覆盖率的强化生成模型评价 2507.01761v1

Authors (3): Nicolas Salvy, Hugues Talbot, Bertrand Thirion

Although generative models have made remarkable progress in recent years, their use in critical applications has been hindered by their incapacity to reliably evaluate sample quality. Quality refers to at least two complementary concepts: fidelity and coverage. Current quality metrics often lack reliable, interpretable values due to an absence of calibration or insufficient robustness to outliers. To address these shortcomings, we introduce two novel metrics, Clipped Density and Clipped Coverage. By clipping individual sample contributions and, for fidelity, the radii of nearest neighbor balls, our metrics prevent out-of-distribution samples from biasing the aggregated values. Through analytical and empirical calibration, these metrics exhibit linear score degradation as the proportion of poor samples increases. Thus, they can be straightforwardly interpreted as equivalent proportions of good samples. Extensive experiments on synthetic and real-world datasets demonstrate that Clipped Density and Clipped Coverage outperform existing methods in terms of robustness, sensitivity, and interpretability for evaluating generative models.

尽管基因模型近年来取得了显著进展,但由于无法可靠地评估样本质量,在关键应用中的使用受到阻碍。质量至少是指两个互补概念:忠诚和覆盖面。目前质量指标由于缺乏校准或外部生物的强度不足,往往缺乏可靠的、可解释的数值。为了解决这些缺陷,我们引入了两种新型指标:Clipped Density 和 clipped 覆盖率。通过剪裁单个样本贡献,并为了准确性,近邻球的半透明性,我们的指标防止分配之外的样本偏向总值。通过分析和实验校准,这些指标显示随着低样品比例的增加,线性得分下降。因此,它们可以直截了当地被解释为等量的样本。合成和真实世界数据集方面的广泛实验表明,Clipped Density 和 clipped 覆盖率比现有方法的坚固性、敏感性和可解释性,在评估基因化模型方面超过了现有方法。


Article 263

Title@2025-07-02 (3): AirRadar: Inferring Nationwide Air Quality in China with Deep Neural Networks

Title: AirRadar: Inferring Nationwide Air Quality in China with Deep Neural Networks AirRadar: Rückschlüsse auf landesweite Luftqualität in China mit tiefen neuralen Netzwerken 空中雷达:与深神经网络一起推断中国全国空气质量 2501.13141v2

Authors (9): Qiongyan Wang, Yutong Xia, Siru ZHong, Weichuang Li, Yuankai Wu, Shifen Cheng, Junbo Zhang, Yu Zheng, Yuxuan Liang

Monitoring real-time air quality is essential for safeguarding public health and fostering social progress. However, the widespread deployment of air quality monitoring stations is constrained by their significant costs. To address this limitation, we introduce \emph{AirRadar}, a deep neural network designed to accurately infer real-time air quality in locations lacking monitoring stations by utilizing data from existing ones. By leveraging learnable mask tokens, AirRadar reconstructs air quality features in unmonitored regions. Specifically, it operates in two stages: first capturing spatial correlations and then adjusting for distribution shifts. We validate AirRadar’s efficacy using a year-long dataset from 1,085 monitoring stations across China, demonstrating its superiority over multiple baselines, even with varying degrees of unobserved data. The source code can be accessed at https://github.com/CityMind-Lab/AirRadar.

实时空气质量监测是保障公众健康和促进社会进步的关键,然而,空气质量监测站的广泛部署受到其巨大成本的制约。为了应对这一限制,我们引入了emmph{AirRadar},这是一个深神经网络,目的是利用现有数据,在缺乏监测站的地点准确推断实时空气质量。通过利用可学习的面具标志,AirRadar在未监测区域重建空气质量特征。具体地说,它分两个阶段运作:首先捕捉空间相关性,然后调整分布变化。我们利用中国各地1 085个监测站的长达一年的数据集验证AirRadar的功效,显示其优于多个基线,即使有不同程度的未观测数据。源代码可在https://github.com/CityMind-Lab/AirRadar查阅。


Article 264

Title@2025-07-02 (3): Visual Structures Helps Visual Reasoning: Addressing the Binding Problem in VLMs

Title: Visual Structures Helps Visual Reasoning: Addressing the Binding Problem in VLMs Visuelle Strukturen helfen Visual Reasoning: Das Bindeproblem in VLMs beheben 视觉结构帮助视觉理性:解决VLM 中具有约束力的问题 2506.22146v2

Authors (7): Amirmohammad Izadi, Mohammad Ali Banayeeanzade, Fatemeh Askari, Ali Rahimiakbar, Mohammad Mahdi Vahedi, Hosein Hasani, Mahdieh Soleymani Baghshah

Despite progress in Vision-Language Models (VLMs), their capacity for visual reasoning is often limited by the \textit{binding problem}: the failure to reliably associate perceptual features with their correct visual referents. This limitation underlies persistent errors in tasks such as counting, visual search, scene description, and spatial relationship understanding. A key factor is that current VLMs process visual features largely in parallel, lacking mechanisms for spatially grounded, serial attention. This paper introduces a simple yet effective intervention: augmenting visual inputs with low-level spatial structures (e.g., horizontal lines) and pairing this with a textual prompt that encourages sequential, spatially-aware parsing. We empirically demonstrate substantial performance improvements across core visual reasoning tasks. Specifically, our method improves GPT-4o visual search accuracy by 25.00%, increases counting accuracy by 26.83%, reduces edit distance error in scene description by 0.32, and enhances performance on spatial relationship tasks by 9.50% on a a 2D synthetic dataset. Furthermore, we find that the visual modification is essential for these gains; purely textual strategies, including Chain-of-Thought prompting, are insufficient and can even degrade performance. Our method enhances binding only with a single-query inference, underscoring the importance of visual input design over purely linguistically-based approaches. These findings suggest that low-level visual structuring is a powerful and underexplored direction for improving compositional visual reasoning and could serve as a general strategy for enhancing VLM performance on spatially grounded tasks.

尽管在视觉语言模型(VLM)方面取得了进展,但是其视觉推理能力往往受到以下问题的限制:没有可靠地将感知特征与其正确的视觉参考文献联系起来。这一限制是计算、视觉搜索、场景描述和空间关系理解等任务中长期错误的基础。一个关键因素是当前VLM过程视觉特征大体上平行,缺乏空间定位和序列关注机制。本文引入了一个简单而有效的干预措施:用低水平空间结构(例如水平线)增加视觉投入,并配以鼓励顺序、空间认知度分辨的文字提示。我们从经验上表明,在核心视觉推理任务中,我们的工作表现显著改善。具体地说,我们的方法将GPT-4视觉搜索精确度提高25.0%,将现场描述中的距离错误增加26.83%,在基于2D的合成数据集中将空间关系任务的绩效提高9.50%。此外,我们发现视觉修改对于这些进步至关重要;纯文本战略(包括链路段)只能用来提高直观判断的精确性能。


Article 265

Title@2025-07-02 (3): Tuning without Peeking: Provable Privacy and Generalization Bounds for LLM Post-Training

Title: Tuning without Peeking: Provable Privacy and Generalization Bounds for LLM Post-Training Tuning ohne Peeking: Provable Privacy and Generalization Bounds for LLM Post-Training 无足足迹的注资:LLM培训后可实现的隐私和普遍化的圈子 2507.01752v1

Authors (7): Ismail Labiad, Mathurin Videau, Matthieu Kowalski, Marc Schoenauer, Alessandro Leite, Julia Kempe, Olivier Teytaud

Gradient-based optimization is the workhorse of deep learning, offering efficient and scalable training via backpropagation. However, its reliance on large volumes of labeled data raises privacy and security concerns such as susceptibility to data poisoning attacks and the risk of overfitting. In contrast, black box optimization methods, which treat the model as an opaque function, relying solely on function evaluations to guide optimization, offer a promising alternative in scenarios where data access is restricted, adversarial risks are high, or overfitting is a concern. However, black box methods also pose significant challenges, including poor scalability to high-dimensional parameter spaces, as prevalent in large language models (LLMs), and high computational costs due to reliance on numerous model evaluations. This paper introduces BBoxER, an evolutionary black-box method for LLM post-training that induces an information bottleneck via implicit compression of the training data. Leveraging the tractability of information flow, we provide strong theoretical bounds on generalization, differential privacy, susceptibility to data poisoning attacks, and robustness to extraction attacks. BBoxER operates on top of pre-trained LLMs, offering a lightweight and modular enhancement suitable for deployment in restricted or privacy-sensitive environments, in addition to non-vacuous generalization guarantees. In experiments with LLMs, we demonstrate empirically that Retrofitting methods are able to learn, showing how a few iterations of BBoxER improve performance and generalize well on a benchmark of reasoning datasets. This positions BBoxER as an attractive add-on on top of gradient-based optimization.

渐进式优化是深层次学习的一匹马,通过反向调整提供高效和可扩展的培训。然而,对大量标签数据的依赖引起了隐私和安全方面的关注,如容易发生数据中毒袭击和过度适应的风险。相比之下,黑盒优化方法将模型视为不透明功能,完全依靠功能评估来引导优化,在数据访问受限、对抗风险高或过度适应的情景中提供了一个有希望的替代方案。然而,黑盒方法也带来了重大挑战,包括大语言模型(LLLMS)中普遍存在的高维参数空间的可缩缩缩性差,以及依赖许多模型评估而导致的具有很高的计算成本。本文介绍了BBBoxER,这是LLMM后培训的进化黑盒方法,通过对培训数据进行隐蔽的压缩来造成信息瓶颈。 利用信息流动的可感动性,我们提供了关于一般信息污染攻击的广度、差异性隐私、易感受数据中毒攻击的易感力和强力的理论界限。BoxER公司在经过培训的LMMS顶端上操作,提供了一种不易变的精度的精度的精度的精度的精度测试方法,为我们在一般测试中展示的精度的精度的精度的精度和模度实验环境中展示的精度的精度的精度,以展示了一种不细度的精度。


Article 266

Title@2025-07-02 (3): Time-Series JEPA for Predictive Remote Control under Capacity-Limited Networks

Title: Time-Series JEPA for Predictive Remote Control under Capacity-Limited Networks Zeitreihen-JEPA für vorausschauende Fernsteuerung unter kapazitätsbeschränkten Netzen 在能力限制的网络下进行预测遥控的JEPA时间系列JEPA 2406.04853v2

Authors (3): Abanoub M. Girgis, Alvaro Valcarce, Mehdi Bennis

In remote control systems, transmitting large data volumes (e.g., images, video frames) from wireless sensors to remote controllers is challenging when uplink capacity is limited (e.g., RedCap devices or massive wireless sensor networks). Furthermore, controllers often need only information-rich representations of the original data. To address this, we propose a semantic-driven predictive control combined with a channel-aware scheduling to enhance control performance for multiple devices under limited network capacity. At its core, the proposed framework, coined Time-Series Joint Embedding Predictive Architecture (TS-JEPA), encodes high-dimensional sensory data into low-dimensional semantic embeddings at the sensor, reducing communication overhead. Furthermore, TS-JEPA enables predictive inference by predicting future embeddings from current ones and predicted commands, which are directly used by a semantic actor model to compute control commands within the embedding space, eliminating the need to reconstruct raw data. To further enhance reliability and communication efficiency, a channel-aware scheduling is integrated to dynamically prioritize device transmissions based on channel conditions and age of information (AoI). Simulations on inverted cart-pole systems show that the proposed framework significantly outperforms conventional control baselines in communication efficiency, control cost, and predictive accuracy. It enables robust and scalable control under limited network capacity compared to traditional scheduling schemes.

在远程控制系统中,如果连接能力有限,从无线传感器向远程控制器传输大量数据(例如图像、视频框架)将大量数据(例如图像、视频框架)具有挑战性;此外,控制器通常只需要对原始数据进行信息丰富的表述;为此,我们提议采用语义驱动预测控制,同时采用频道感知列表,以加强在网络能力有限的情况下对多种装置的控制性能。在核心方面,拟议框架、创建的时间-系统联合嵌入式预测架构(TS-JEPA),将高维感应数据编码为传感器低维感官嵌入低维度静态嵌入器,减少通信管理管理。此外,TS-JEPA通过预测从现有数据和预测指令的未来嵌入和预测指令,从而能够预测预测性地推断。 静脉动行为模型直接用于在嵌入空间内对控制指令进行配置,从而消除重建原始数据的需要。为了进一步提高可靠性和通信效率,频道感应安排将高清晰度感测数据纳入基于频道条件和传统通信系统下动态优先级的传输设备,从而在常规控制基线框架中明显地显示稳定的控制能力。


Article 267

Title@2025-07-02 (3): A Real-Time Digital Twin for Type 1 Diabetes using Simulation-Based Inference

Title: A Real-Time Digital Twin for Type 1 Diabetes using Simulation-Based Inference Ein Echtzeit-Digital Twin für Typ-1-Diabetes mit simulationsbasierter Schlussfolgerung 使用模拟推断法为1型糖尿病实时数字双 2507.01740v1

Authors (7): Trung-Dung Hoang, Alceu Bissoto, Vihangkumar V. Naik, Tim Flühmann, Artemii Shlychkov, José Garcia-Tirado, Lisa M. Koch

Accurately estimating parameters of physiological models is essential to achieving reliable digital twins. For Type 1 Diabetes, this is particularly challenging due to the complexity of glucose-insulin interactions. Traditional methods based on Markov Chain Monte Carlo struggle with high-dimensional parameter spaces and fit parameters from scratch at inference time, making them slow and computationally expensive. In this study, we propose a Simulation-Based Inference approach based on Neural Posterior Estimation to efficiently capture the complex relationships between meal intake, insulin, and glucose level, providing faster, amortized inference. Our experiments demonstrate that SBI not only outperforms traditional methods in parameter estimation but also generalizes better to unseen conditions, offering real-time posterior inference with reliable uncertainty quantification.

精确估计生理模型的参数对于实现可靠的数字双胞胎至关重要。 对于1型糖尿病来说,由于葡萄糖-胰岛素相互作用的复杂性,这尤其具有挑战性。基于Markov链链-蒙特卡洛的传统方法与高维参数空间和从推论时从零到零的参数搏斗,使其缓慢和计算成本昂贵。在这项研究中,我们提议基于神经地表内侧估计的模拟推论方法,以有效捕捉进食、胰岛素和葡萄糖水平之间的复杂关系,提供更快、摊销的推论。我们的实验表明,履行机构不仅在参数估计方面超越了传统方法,而且还向不可见的条件概括了较好的情况,提供了可靠的不确定性量化的实时后方推论。


Article 268

Title@2025-07-02 (3): Extracting Interpretable Models from Tree Ensembles: Computational and Statistical Perspectives

Title: Extracting Interpretable Models from Tree Ensembles: Computational and Statistical Perspectives Extrahieren von interpretierbaren Modellen aus Baumensembles: Computational and Statistical Perspectives 从树形集合中提取解释模型:计算和统计视角 2506.20114v2

Authors (3): Brian Liu, Rahul Mazumder, Peter Radchenko

Tree ensembles are non-parametric methods widely recognized for their accuracy and ability to capture complex interactions. While these models excel at prediction, they are difficult to interpret and may fail to uncover useful relationships in the data. We propose an estimator to extract compact sets of decision rules from tree ensembles. The extracted models are accurate and can be manually examined to reveal relationships between the predictors and the response. A key novelty of our estimator is the flexibility to jointly control the number of rules extracted and the interaction depth of each rule, which improves accuracy. We develop a tailored exact algorithm to efficiently solve optimization problems underlying our estimator and an approximate algorithm for computing regularization paths, sequences of solutions that correspond to varying model sizes. We also establish novel non-asymptotic prediction error bounds for our proposed approach, comparing it to an oracle that chooses the best data-dependent linear combination of the rules in the ensemble subject to the same complexity constraint as our estimator. The bounds illustrate that the large-sample predictive performance of our estimator is on par with that of the oracle. Through experiments, we demonstrate that our estimator outperforms existing algorithms for rule extraction.

虽然这些模型在预测方面十分出色,但很难解释,而且可能无法发现数据中的有用关系。我们建议一个估算器,从树群中提取一套紧凑的决策规则。所提取的模型是准确的,可以人工检查,以揭示预测器和响应器之间的关系。我们估计器的一个关键新颖之处是,可以灵活地联合控制所采出的规则的数量和每项规则的相互作用深度,从而提高准确性。我们开发了一种量身定制的精确算法,以有效解决我们估算器背后的优化问题,以及计算正规化路径和符合不同模型大小的解决方案序列的大致算法。我们还为我们拟议的方法设置了新的非机械性预测错误,将它与选择共同对象规则中最依赖于数据的线性组合的神器相比较,与我们估算器相同的复杂性制约。这些界限表明,我们估算器的大型预测性工作业绩与我们目前测算器的测算法相比,显示我们目前测算器的测算法的形状。


Article 269

Title@2025-07-02 (3): Average Calibration Error: A Differentiable Loss for Improved Reliability in Image Segmentation

Title: Average Calibration Error: A Differentiable Loss for Improved Reliability in Image Segmentation Durchschnittlicher Kalibrierungsfehler: Ein differenzierbarer Verlust für verbesserte Zuverlässigkeit in der Bildsegmentierung 平均校准误差:图像分割法可靠性提高的可区别损失 2403.06759v2

Authors (4): Theodore Barfoot, Luis Garcia-Peraza-Herrera, Ben Glocker, Tom Vercauteren

Deep neural networks for medical image segmentation often produce overconfident results misaligned with empirical observations. Such miscalibration, challenges their clinical translation. We propose to use marginal L1 average calibration error (mL1-ACE) as a novel auxiliary loss function to improve pixel-wise calibration without compromising segmentation quality. We show that this loss, despite using hard binning, is directly differentiable, bypassing the need for approximate but differentiable surrogate or soft binning approaches. Our work also introduces the concept of dataset reliability histograms which generalises standard reliability diagrams for refined visual assessment of calibration in semantic segmentation aggregated at the dataset level. Using mL1-ACE, we reduce average and maximum calibration error by 45% and 55% respectively, maintaining a Dice score of 87% on the BraTS 2021 dataset. We share our code here: https://github.com/cai4cai/ACE-DLIRIS

用于医学图像分解的深神经网络往往产生与实验观测结果不相符的过度自信结果。 此类误校、 挑战临床翻译。 我们提议使用边缘L1平均校准错误( mL1- ACE)作为新的辅助损失功能, 来改进像素- 精准校准, 同时又不损害分解质量 。 我们显示, 尽管使用了硬盘, 但这一损失是直接不同的, 却绕过了对近似但不同代用或软宾机方法的需要 。 我们的工作还引入了数据集可靠性直方图概念, 将标准可靠性图表概括为在数据集一级对精细化的语义分解校准校准的可靠性评估。 我们使用 mL1- ACE, 将平均和最大校准错误分别减少45%和55%, 在 BRATS 2021 数据集上维持87%的Dice分数 。 我们在此共享我们的代码 : https://github.com/ cai4cai/ACE- DLIRIS) 。


Article 270

Title@2025-07-02 (3): ECCV 2024 W-CODA: 1st Workshop on Multimodal Perception and Comprehension of Corner Cases in Autonomous Driving

Title: ECCV 2024 W-CODA: 1st Workshop on Multimodal Perception and Comprehension of Corner Cases in Autonomous Driving ECCV 2024 W-CODA: 1. Workshop zur multimodalen Wahrnehmung und Verständlichkeit von Eckfällen im autonomen Fahren ECCV 2024 W-CODA:第一次关于自主驾驶时对拐角案例的多模式认识和了解的讲习班 2507.01735v1

Authors (14): Kai Chen, Ruiyuan Gao, Lanqing Hong, Hang Xu, Xu Jia, Holger Caesar, Dengxin Dai, Bingbing Liu, Dzmitry Tsishkou, Songcen Xu, Chunjing Xu, Qiang Xu, Huchuan Lu, Dit-Yan Yeung

In this paper, we present details of the 1st W-CODA workshop, held in conjunction with the ECCV 2024. W-CODA aims to explore next-generation solutions for autonomous driving corner cases, empowered by state-of-the-art multimodal perception and comprehension techniques. 5 Speakers from both academia and industry are invited to share their latest progress and opinions. We collect research papers and hold a dual-track challenge, including both corner case scene understanding and generation. As the pioneering effort, we will continuously bridge the gap between frontier autonomous driving techniques and fully intelligent, reliable self-driving agents robust towards corner cases.

在本文中,我们介绍了与ECCV 2024合作举办的第一次W-CODA讲习班的详细情况。W-CODA旨在探索下一代自主驾驶角落案例的解决办法,这种解决办法得到最先进的多式联运观念和理解技术的扶持。5名学术界和工业界的发言者应邀分享最新进展和意见。我们收集研究论文并面临双轨挑战,包括边角案场理解和生成。作为开创性努力,我们将不断弥合边际自主驾驶技术与全智能、可靠的自我驾驶代理人之间的鸿沟。


Article 271

Title@2025-07-02 (3): Leveraging Genetic Algorithms for Efficient Demonstration Generation in Real-World Reinforcement Learning Environments

Title: Leveraging Genetic Algorithms for Efficient Demonstration Generation in Real-World Reinforcement Learning Environments Nutzung genetischer Algorithmen für effiziente Demonstrationsgenerierung in realen Lernumgebungen zur Stärkung der Welt 在现实世界加强学习环境中利用遗传遗传价值促进高效示范的生成 2507.00762v2

Authors (3): Tom Maus, Asma Atamna, Tobias Glasmachers

Reinforcement Learning (RL) has demonstrated significant potential in certain real-world industrial applications, yet its broader deployment remains limited by inherent challenges such as sample inefficiency and unstable learning dynamics. This study investigates the utilization of Genetic Algorithms (GAs) as a mechanism for improving RL performance in an industrially inspired sorting environment. We propose a novel approach in which GA-generated expert demonstrations are used to enhance policy learning. These demonstrations are incorporated into a Deep Q-Network (DQN) replay buffer for experience-based learning and utilized as warm-start trajectories for Proximal Policy Optimization (PPO) agents to accelerate training convergence. Our experiments compare standard RL training with rule-based heuristics, brute-force optimization, and demonstration data, revealing that GA-derived demonstrations significantly improve RL performance. Notably, PPO agents initialized with GA-generated data achieved superior cumulative rewards, highlighting the potential of hybrid learning paradigms, where heuristic search methods complement data-driven RL. The utilized framework is publicly available and enables further research into adaptive RL strategies for real-world applications.

在某些现实世界的工业应用中,强化学习(RL)显示出了巨大的潜力,然而,其更广泛的部署仍然受到效率低下和不稳定学习动态等抽样调查等内在挑战的限制。本研究调查了利用遗传地算法(GAs)作为在工业激励的分类环境中提高RL性能的机制。我们建议一种新颖的办法,即利用GA产生的专家示范来提高政策学习。这些示范已被纳入一个深Q-网络(DQN),为基于经验的学习重新发挥缓冲作用,并用作刺激政策最佳化(PPPPO)代理的热启动轨迹,以加速培训趋同。我们的实验将标准RL培训与基于规则的Heurist、布鲁特力优化和示范数据进行比较,表明GA派的演示大大提高了RL的性能。值得注意的是,以GA产生的数据为首创的PPOs代理取得了高额累积收益,突出了混合学习模式的潜力,其中的超潮搜索方法补充了数据驱动的RL。使用的框架是公开的,并能够进一步研究实际应用的适应性RL战略。


Article 272

Title@2025-07-02 (3): Token Communication in the Era of Large Models: An Information Bottleneck-Based Approach

Title: Token Communication in the Era of Large Models: An Information Bottleneck-Based Approach Token Communication in der Ära der großen Modelle: Ein auf Informationen basierender Ansatz 大模式时代中的 Tokon 通信:基于信息瓶颈的办法 2507.01728v1

Authors (6): Hao Wei, Wanli Ni, Wen Wang, Wenjun Xu, Dusit Niyato, Ping Zhang

This letter proposes UniToCom, a unified token communication paradigm that treats tokens as the fundamental units for both processing and wireless transmission. Specifically, to enable efficient token representations, we propose a generative information bottleneck (GenIB) principle, which facilitates the learning of tokens that preserve essential information while supporting reliable generation across multiple modalities. By doing this, GenIB-based tokenization is conducive to improving the communication efficiency and reducing computational complexity. Additionally, we develop $\sigma$-GenIB to address the challenges of variance collapse in autoregressive modeling, maintaining representational diversity and stability. Moreover, we employ a causal Transformer-based multimodal large language model (MLLM) at the receiver to unify the processing of both discrete and continuous tokens under the next-token prediction paradigm. Simulation results validate the effectiveness and superiority of the proposed UniToCom compared to baselines under dynamic channel conditions. By integrating token processing with MLLMs, UniToCom enables scalable and generalizable communication in favor of multimodal understanding and generation, providing a potential solution for next-generation intelligent communications.

这封信建议 UniToCom , 这是一种统一的象征性通信模式, 将象征性物品视为处理和无线传输的基本单位。 具体地说, 为了能够高效的象征性表示, 我们提议了一个基因化信息瓶颈(GenIB) 原则, 该原则有助于学习保存基本信息的象征物, 同时支持以多种方式进行可靠的生成。 以此为基础, GenIB 的象征物化有助于提高通信效率和降低计算复杂性。 此外, 我们开发了 $\ sigma$- GenIB , 以应对自动递减模型中差异性崩溃的挑战, 维护代表性的多样性和稳定性。 此外, 我们在接收器中采用了一个基于因果变异的多式联运大型语言模型( MLLM ) , 以统一在下一调的预测模式下对离散和连续的象征物的处理。 模拟结果验证了拟议 UniCom 的有效性和优越性, 与动态频道条件下的基线相比较。 通过将象征性处理与 MLLMS 整合, UNToCom 使得可扩展和可推广的通信, 有利于多式理解和生成, 提供下一代智能通信的潜在解决方案。


Article 273

Title@2025-07-02 (3): Unified Triplet-Level Hallucination Evaluation for Large Vision-Language Models

Title: Unified Triplet-Level Hallucination Evaluation for Large Vision-Language Models Unified Triplet-Level Halluzination Evaluation für große Vision-Sprache Modelle 大型视觉语言模型统一三维级幻觉评价 2410.23114v3

Authors (4): Junjie Wu, Tsz Ting Chung, Kai Chen, Dit-Yan Yeung

Despite the outstanding performance in vision-language reasoning, Large Vision-Language Models (LVLMs) might generate hallucinated contents that do not exist in the given image. Most existing LVLM hallucination benchmarks are constrained to evaluate the object-related hallucinations. However, the potential hallucination on the relations between two objects, i.e., relation hallucination, still lacks investigation. To remedy that, we design a unified framework to measure the object and relation hallucination in LVLMs simultaneously. The core idea of our framework is to evaluate hallucinations via (object, relation, object) triplets extracted from LVLMs’ responses, making it easily generalizable to different vision-language tasks. Based on our framework, we further introduce Tri-HE, a novel Triplet-level Hallucination Evaluation benchmark which can be used to study both object and relation hallucination at the same time. With comprehensive evaluations on Tri-HE, we observe that the relation hallucination issue is even more serious than object hallucination among existing LVLMs, highlighting a previously neglected problem towards reliable LVLMs. Moreover, based on our findings, we design a simple training-free approach that effectively mitigates hallucinations for LVLMs. Our dataset and code for the reproduction of our experiments are available publicly at https://github.com/wujunjie1998/Tri-HE.

尽管在视觉语言推理方面表现出色,大型视觉语言模型(LVLM)可能会产生在特定图像中不存在的幻觉内容。大多数现有的LVLM幻觉基准都不得不评估与目标有关的幻觉。然而,对两个对象之间的关系的潜在幻觉,即关系幻觉,仍然缺乏调查。为了纠正这一点,我们设计了一个统一框架,以同时测量LVLMs中的对象和关系幻觉。我们框架的核心思想是通过LVMs答复中提取的幻觉(对象、关系、对象)三重幻觉,使LVLMS易于将其推广到不同的视觉语言任务中。此外,我们根据我们的框架,进一步引入了Tri-HE,一个全新的三重幻觉评价标准,可以同时用于研究对象和关系幻觉。在对Tri-HE的全面评价中,我们观察到,与LVLMMs之间的关系比目标幻觉问题更为严重,突出了以前被忽视的LVLMS的问题。此外,我们根据我们的研究结果,设计了一个简单的培训/MLVMS/J 有效减少我们现有的数据。


Article 274

Title@2025-07-02 (3): NGAT: A Node-level Graph Attention Network for Long-term Stock Prediction

Title: NGAT: A Node-level Graph Attention Network for Long-term Stock Prediction NGAT: Ein knotenbasiertes Graphen-Achtungsnetzwerk für langfristige Bestandsvorhersage NGAT: 长期库存预测节点图表关注网络 2507.02018v1

Authors (4): Yingjie Niu, Mingchuan Zhao, Valerio Poti, Ruihai Dong

Graph representation learning methods have been widely adopted in financial applications to enhance company representations by leveraging inter-firm relationships. However, current approaches face three key challenges: (1) The advantages of relational information are obscured by limitations in downstream task designs; (2) Existing graph models specifically designed for stock prediction often suffer from excessive complexity and poor generalization; (3) Experience-based construction of corporate relationship graphs lacks effective comparison of different graph structures. To address these limitations, we propose a long-term stock prediction task and develop a Node-level Graph Attention Network (NGAT) specifically tailored for corporate relationship graphs. Furthermore, we experimentally demonstrate the limitations of existing graph comparison methods based on model downstream task performance. Experimental results across two datasets consistently demonstrate the effectiveness of our proposed task and model. The project is publicly available on GitHub to encourage reproducibility and future research.

在金融应用中广泛采用图表代表学习方法,利用公司间关系加强公司代表性,但是,目前的做法面临三个主要挑战:(1) 下游任务设计的局限性掩盖了关系信息的优势;(2) 专门为股票预测设计的现有图表模型往往过于复杂和笼统化程度不高;(3) 公司关系图表的基于经验的构建缺乏对不同图表结构的有效比较;为解决这些局限性,我们提出长期股票预测任务,并开发专为公司关系图表设计的节点图表关注网络(NGAT),此外,我们实验性地展示了基于下游任务业绩模型的现有图表比较方法的局限性;两个数据集的实验结果始终表明我们拟议任务和模型的有效性;该项目在GitHub上公开,以鼓励再生和未来研究。


Article 275

Title@2025-07-02 (3): Revisiting Learning Rate Control

Title: Revisiting Learning Rate Control Überprüfung der Steuerung der Lernquoten 重新审查学习率控制 2507.01724v1

Authors (3): Micha Henheik, Theresa Eimer, Marius Lindauer

The learning rate is one of the most important hyperparameters in deep learning, and how to control it is an active area within both AutoML and deep learning research. Approaches for learning rate control span from classic optimization to online scheduling based on gradient statistics. This paper compares paradigms to assess the current state of learning rate control. We find that methods from multi-fidelity hyperparameter optimization, fixed-hyperparameter schedules, and hyperparameter-free learning often perform very well on selected deep learning tasks but are not reliable across settings. This highlights the need for algorithm selection methods in learning rate control, which have been neglected so far by both the AutoML and deep learning communities. We also observe a trend of hyperparameter optimization approaches becoming less effective as models and tasks grow in complexity, even when combined with multi-fidelity approaches for more expensive model trainings. A focus on more relevant test tasks and new promising directions like finetunable methods and meta-learning will enable the AutoML community to significantly strengthen its impact on this crucial factor in deep learning.

学习率是深层学习中最重要的超参数之一,而如何控制它则是自动ML和深层学习研究中的一个活跃领域。 学习率控制方法从传统的优化到基于梯度统计的在线时间安排。 本文比较了评估学习率控制现状的范式。 我们发现,多信仰超光度优化、固定健康计表和超光度免费学习方法在选定深层学习任务方面往往表现很好,但在各种情况下并不可靠。 这突出表明了在学习率控制中采用算法选择方法的必要性,而这种方法迄今为止已经被自动ML和深层学习社区所忽视。 我们还观察到,随着模型和任务日益复杂,即使与更昂贵模型培训的多信仰方法相结合,超光量优化方法也越来越不那么有效。 侧重于更相关的测试任务和新的有希望的方向,如微调方法和元学习,将使自动ML社群能够大大加强其在深层学习中对这一关键因素的影响。


Article 276

Title@2025-07-02 (3): 15,500 Seconds: Lean UAV Classification Leveraging PEFT and Pre-Trained Networks

Title: 15,500 Seconds: Lean UAV Classification Leveraging PEFT and Pre-Trained Networks 15.500 Sekunden: Lean UAV Classification Leveraging PEFT und Pre-Trained Networks 15,500秒: Lean UAV 分类利用PEFT和训练前网络 2506.11049v2

Authors (3): Andrew P. Berg, Qian Zhang, Mia Y. Wang

Unmanned Aerial Vehicles (UAVs) pose an escalating security concerns as the market for consumer and military UAVs grows. This paper address the critical data scarcity challenges in deep UAV audio classification. We build upon our previous work expanding novel approaches such as: parameter efficient fine-tuning, data augmentation, and pre-trained networks. We achieve performance upwards of 95\% validation accuracy with EfficientNet-B0.

随着消费和军用无人驾驶飞行器市场的扩大,无人驾驶飞行器(无人驾驶飞行器)的安全顾虑日益加剧。本文论述深层无人驾驶飞行器音频分类中的关键数据稀缺挑战。我们以以前的工作为基础,扩大新的方法,例如:参数高效微调、数据增强和预先培训的网络。我们以高效网-B0实现了95的验证准确性。


Article 277

Title@2025-07-02 (3): A Review of Bayesian Uncertainty Quantification in Deep Probabilistic Image Segmentation

Title: A Review of Bayesian Uncertainty Quantification in Deep Probabilistic Image Segmentation Eine Überprüfung der Bayesian Uncertainty Quantification in Deep Probabilistic Image Segmentation 对贝叶斯不确定因素在深概率图像分割中量化的回顾 2411.16370v4

Authors (5): M. M. A. Valiuddin, R. J. G. van Sloun, C. G. A. Viviers, P. H. N. de With, F. van der Sommen

Advancements in image segmentation play an integral role within the broad scope of Deep Learning-based Computer Vision. Furthermore, their widespread applicability in critical real-world tasks has resulted in challenges related to the reliability of such algorithms. Hence, uncertainty quantification has been extensively studied within this context, enabling the expression of model ignorance (epistemic uncertainty) or data ambiguity (aleatoric uncertainty) to prevent uninformed decision-making. Due to the rapid adoption of Convolutional Neural Network (CNN)-based segmentation models in high-stake applications, a substantial body of research has been published on this very topic, causing its swift expansion into a distinct field. This work provides a comprehensive overview of probabilistic segmentation, by discussing fundamental concepts of uncertainty quantification, governing advancements in the field as well as the application to various tasks. Moreover, literature on both types of uncertainties trace back to four key applications: (1) to quantify statistical inconsistencies in the annotation process due ambiguous images, (2) correlating prediction error with uncertainty, (3) expanding the model hypothesis space for better generalization, and (4) Active Learning. An extensive discussion follows that includes an overview of utilized datasets for each of the applications and evaluation of the available methods. We also highlight challenges related to architectures, uncertainty quantification methods, standardization and benchmarking, and finally end with recommendations for future work such as methods based on single forward passes and models that appropriately leverage volumetric data.

图像分割方面的进步在深层学习计算机愿景的广泛范围内发挥着不可或缺的作用。此外,这些进步在基于深层学习的计算机愿景中具有广泛适用性,在关键的现实世界任务中产生了与这种算法可靠性有关的挑战。因此,在这一背景下,对不确定性量化进行了广泛研究,从而能够表达模型无知(普遍不确定性)或数据模糊性(隐含不确定性),以防止不知情的决策。由于迅速采用以革命神经网络为基础的高吸量应用中的分类模型,因此就这个主题发表了大量的研究报告,导致其迅速扩展到一个不同的领域。这项工作通过讨论不确定性量化的基本概念、规范实地进展和各种任务的应用,全面概述了概率分割。此外,关于这两种不确定性的文献可追溯到四个主要应用:(1) 量化说明过程中的统计不一致性,因为图像模糊性;(2) 将预测错误与不确定性相联系,(3) 扩大模型假设空间,以更好地概括化,以及(4) 积极学习。随后进行的广泛讨论,包括了对已利用的不确定性量化概念的基本概念、实地进展以及各种标准化方法,我们最后还着重指出了各种标准化的方法,以及各种标准化的标准化方法,以及各种标准化的标准化方法,以便适当地评估。


Article 278

Title@2025-07-02 (3): Agent Ideate: A Framework for Product Idea Generation from Patents Using Agentic AI

Title: Agent Ideate: A Framework for Product Idea Generation from Patents Using Agentic AI Agent-Idee: Ein Rahmen für Produkt-Ideen-Erzeugung aus Patenten mit Agent-KI Agent Ideate: 使用Agentic AI 专利产品创意一代框架 2507.01717v1

Authors (4): Gopichand Kanumolu, Ashok Urlana, Charaka Vinayak Kumar, Bala Mallikarjunarao Garlapati

Patents contain rich technical knowledge that can inspire innovative product ideas, yet accessing and interpreting this information remains a challenge. This work explores the use of Large Language Models (LLMs) and autonomous agents to mine and generate product concepts from a given patent. In this work, we design Agent Ideate, a framework for automatically generating product-based business ideas from patents. We experimented with open-source LLMs and agent-based architectures across three domains: Computer Science, Natural Language Processing, and Material Chemistry. Evaluation results show that the agentic approach consistently outperformed standalone LLMs in terms of idea quality, relevance, and novelty. These findings suggest that combining LLMs with agentic workflows can significantly enhance the innovation pipeline by unlocking the untapped potential of business idea generation from patent data.

专利包含丰富的技术知识,可以激发创新产品想法,但获取和解释这种信息仍是一项挑战。这项工作探索了使用大语言模型和自主代理商来开采和从特定专利中产生产品概念。在这项工作中,我们设计了“设计”代理商,这是一个自动产生专利产品商业理念的框架。我们试验了在计算机科学、自然语言处理和材料化学三个领域(计算机科学、自然语言处理和材料化学)的开放源LLM和代理商建筑。评价结果显示,在思想质量、相关性和新颖性方面,该代理商做法一贯优于独立的LMS。这些研究结果表明,将LMS与代理工作流程相结合,通过从专利数据中释放尚未开发的产生商业理念的潜力,可以极大地增强创新管道。


Article 279

Title@2025-07-02 (3): Enhancing Expressivity of Quantum Neural Networks Based on the SWAP test

Title: Enhancing Expressivity of Quantum Neural Networks Based on the SWAP test Steigerung der Expressivität von Quantum-Neuralen Netzwerken auf Basis des SWAP-Tests 根据全部门办法测试,提高量子神经网络的表达性 2506.16938v2

Authors (4): Sebastian Nagies, Emiliano Tolotti, Davide Pastorello, Enrico Blanzieri

Parameterized quantum circuits represent promising architectures for machine learning applications, yet many lack clear connections to classical models, potentially limiting their ability to translate the wide success of classical neural networks to the quantum realm. We examine a specific type of quantum neural network (QNN) built exclusively from SWAP test circuits, and discuss its mathematical equivalence to a classical two-layer feedforward network with quadratic activation functions under amplitude encoding. Our analysis across classical real-world and synthetic datasets reveals that while this architecture can successfully learn many practical tasks, it exhibits fundamental expressivity limitations due to violating the universal approximation theorem, particularly failing on harder problems like the parity check function. To address this limitation, we introduce a circuit modification using generalized SWAP test circuits that effectively implements classical neural networks with product layers. This enhancement enables successful learning of parity check functions in arbitrary dimensions which we analytically argue to be impossible for the original architecture beyond two dimensions regardless of network size. Our results establish a framework for enhancing QNN expressivity through classical task analysis and demonstrate that our SWAP test-based architecture offers broad representational capacity, suggesting potential promise also for quantum learning tasks.

参数化量子电路代表了机器学习应用的有前途的结构,但许多参数缺乏与古典模型的明确连接,从而有可能限制它们将古典神经网络的广泛成功转化为量子领域的能力。我们检查了专门从SWAP测试电路建造的一类特定的量子神经网络(QNN),并讨论了其数学等同性与传统的双层向进化网络,在振幅下具有四层激活功能。我们对古典现实世界和合成数据集的分析表明,虽然这一结构能够成功学习许多实际任务,但由于违反普世近似理论,特别是无法解决像对等检查功能这样的更棘手问题,它表现出根本的表达性局限性。为了解决这一局限性,我们采用了一种使用通用SWAP测试电路来有效地实施带有产品层的古典神经网络(QNNN)的电路路变电路。这种增强使得在任意的维度上成功学习等同性检查功能,我们认为,无论网络大小如何,原始结构不可能超越两个维度。我们的结果建立了一个框架,通过经典任务分析加强QNN的直观性,并表明我们的SWAP测试架构提供了广泛的代表性能力,也承诺进行量级学习任务。


Article 280

Title@2025-07-02 (3): B-PL-PINN: Stabilizing PINN Training with Bayesian Pseudo Labeling

Title: B-PL-PINN: Stabilizing PINN Training with Bayesian Pseudo Labeling B-PL-PINN: Stabilisierendes PINN-Training mit Bayesian Pseudo-Etikettierung B-PL-PINN:与Bayesian Pseudo标签公司一起稳定PINN培训 2507.01714v1

Authors (3): Kevin Innerebner, Franz M. Rohrhofer, Bernhard C. Geiger

Training physics-informed neural networks (PINNs) for forward problems often suffers from severe convergence issues, hindering the propagation of information from regions where the desired solution is well-defined. Haitsiukevich and Ilin (2023) proposed an ensemble approach that extends the active training domain of each PINN based on i) ensemble consensus and ii) vicinity to (pseudo-)labeled points, thus ensuring that the information from the initial condition successfully propagates to the interior of the computational domain. In this work, we suggest replacing the ensemble by a Bayesian PINN, and consensus by an evaluation of the PINN’s posterior variance. Our experiments show that this mathematically principled approach outperforms the ensemble on a set of benchmark problems and is competitive with PINN ensembles trained with combinations of Adam and LBFGS.

针对远期问题的物理学知情培训神经网络(PINNs)往往面临严重的趋同问题,妨碍传播来自那些有明确理想解决办法的地区的信息。Haitsiukevich和Ilin(2023年)提出了一个共同方法,根据(一)共同共识,将每个PINN的积极培训领域扩大到(假体-)标签点附近(假体-),从而确保初始状态的信息能够成功地传播到计算域的内部。在这项工作中,我们建议用Bayesian PINN取代集合,并通过对PINN的后遗差异进行评估达成共识。我们的实验表明,这一有数学原则的方法超越了一组基准问题的组合,并且与经过亚当和LBFGS组合培训的PINNencommbles具有竞争力。


Article 281

Title@2025-07-02 (3): Grower-in-the-Loop Interactive Reinforcement Learning for Greenhouse Climate Control

Title: Grower-in-the-Loop Interactive Reinforcement Learning for Greenhouse Climate Control Grower-in-the-Loop Interaktives Verstärkungslernen für Greenhouse Climate Control 种植者在Loop-Loop 互动强化学习促进温室气候控制 2505.23355v2

Authors (4): Maxiu Xiao, Jianglin Lan, Jingxin Yu, Congcong Sun

Climate control is crucial for greenhouse production as it directly affects crop growth and resource use. Reinforcement learning (RL) has received increasing attention in this field, but still faces challenges, including limited training efficiency and high reliance on initial learning conditions. Interactive RL, which combines human (grower) input with the RL agent’s learning, offers a potential solution to overcome these challenges. However, interactive RL has not yet been applied to greenhouse climate control and may face challenges related to imperfect inputs. Therefore, this paper aims to explore the possibility and performance of applying interactive RL with imperfect inputs into greenhouse climate control, by: (1) developing three representative interactive RL algorithms tailored for greenhouse climate control (reward shaping, policy shaping and control sharing); (2) analyzing how input characteristics are often contradicting, and how the trade-offs between them make grower’s inputs difficult to perfect; (3) proposing a neural network-based approach to enhance the robustness of interactive RL agents under limited input availability; (4) conducting a comprehensive evaluation of the three interactive RL algorithms with imperfect inputs in a simulated greenhouse environment. The demonstration shows that interactive RL incorporating imperfect grower inputs has the potential to improve the performance of the RL agent. RL algorithms that influence action selection, such as policy shaping and control sharing, perform better when dealing with imperfect inputs, achieving 8.4% and 6.8% improvement in profit, respectively. In contrast, reward shaping, an algorithm that manipulates the reward function, is sensitive to imperfect inputs and leads to a 9.4% decrease in profit. This highlights the importance of selecting an appropriate mechanism when incorporating imperfect inputs.

气候控制对温室生产至关重要,因为它直接影响到作物增长和资源使用。强化学习(RL)在这一领域受到越来越多的关注,但仍然面临挑战,包括培训效率有限和高度依赖初始学习条件。交互式RL将人(growwer)的投入与RL代理的学习结合起来,为克服这些挑战提供了潜在的解决办法。然而,互动的RL尚未应用于温室气候控制,并可能面临与不完善的投入有关的挑战。因此,本文件旨在探讨应用互动式RL(RL)和不完善的投入对温室气候控制进行互动RL的可能性和性能,具体做法是:(1) 开发三种具有代表性的互动式RL(RL)算法,专门用于温室气体控制重要投入(升级成型、政策制定和共享);(2) 分析投入特点常常与R(growwer)交错,使种植者的投入难以完善;(3) 提出以神经网络为基础的方法,在有限的投入提供量的情况下,增强互动RL算法的稳健性;(4) 在模拟的温室环境中,对具有不完善投入的三种互动的RL算法进行全面评价。演示表明,将不完善的RL的RL(ralimalevilevalimalevalim )的计算,在改进过程中,这种算值的精定值的精定值的精细性能能能的精细性能能能能改进了精准性能,从而改进了精制成成成成成成成成成成成成成成成能,从而改进了RL,从而改进了RL 改进了RL 改进了Ral性能性能,使RL 改进了RL 。


Article 282

Title@2025-07-02 (3): Relational Causal Discovery with Latent Confounders

Title: Relational Causal Discovery with Latent Confounders Beziehungale kausale Entdeckung mit latenten Gründern 与内部扰乱者发生关系性因果发现 2507.01700v1

Authors (5): Andrea Piras, Matteo Negro, Ragib Ahsan, David Arbour, Elena Zheleva

Estimating causal effects from real-world relational data can be challenging when the underlying causal model and potential confounders are unknown. While several causal discovery algorithms exist for learning causal models with latent confounders from data, they assume that the data is independent and identically distributed (i.i.d.) and are not well-suited for learning from relational data. Similarly, existing relational causal discovery algorithms assume causal sufficiency, which is unrealistic for many real-world datasets. To address this gap, we propose RelFCI, a sound and complete causal discovery algorithm for relational data with latent confounders. Our work builds upon the Fast Causal Inference (FCI) and Relational Causal Discovery (RCD) algorithms and it defines new graphical models, necessary to support causal discovery in relational domains. We also establish soundness and completeness guarantees for relational d-separation with latent confounders. We present experimental results demonstrating the effectiveness of RelFCI in identifying the correct causal structure in relational causal models with latent confounders.

当潜在的因果关系模型和潜在混淆者不为人知时,从真实世界关系数据得出的估计因果关系可能具有挑战性。虽然在与数据的潜在混淆者学习因果关系模型方面存在着若干因果发现算法,但它们假定数据是独立的,分布相同(i.d.),不适合从关联数据中学习。同样,现有的因果发现算法假定因果关系的充分性,这对许多真实世界的数据集是不现实的。为弥补这一差距,我们提议RelFCI,这是与潜在融和者进行关系数据的健全和完整的因果发现算法。我们的工作以快速剖析和再联系构造解析算法为基础,并确定了新的图形模型,以支持在相关领域的因果发现。我们还为与潜在粘结者的关系分离建立了健全和完整的保障。我们介绍了RelFCI在确定与潜在粘结者相关的因果关系模型的正确因果关系结构方面所取得的实验结果。


Article 283

Title@2025-07-02 (3): Variational Graph Convolutional Neural Networks

Title: Variational Graph Convolutional Neural Networks Variationale Graphenkonvolutionale Neurale Netzwerke 变形图 革命神经神经网络 2507.01699v1

Authors (3): Illia Oleksiienko, Juho Kanniainen, Alexandros Iosifidis

Estimation of model uncertainty can help improve the explainability of Graph Convolutional Networks and the accuracy of the models at the same time. Uncertainty can also be used in critical applications to verify the results of the model by an expert or additional models. In this paper, we propose Variational Neural Network versions of spatial and spatio-temporal Graph Convolutional Networks. We estimate uncertainty in both outputs and layer-wise attentions of the models, which has the potential for improving model explainability. We showcase the benefits of these models in the social trading analysis and the skeleton-based human action recognition tasks on the Finnish board membership, NTU-60, NTU-120 and Kinetics datasets, where we show improvement in model accuracy in addition to estimated model uncertainties.

对模型不确定性的估计有助于改善图变网络的解释性和模型的准确性,不确定性也可以用于专家或其他模型核查模型结果的关键应用中,我们在本文件中提议空间和时空图变异神经网络版本,我们估计模型产出和分层注意的不确定性,这有可能改进模型解释性,我们在社会贸易分析以及芬兰董事会成员NTU-60、NTU-120和动因数据集基于骨骼的人类行动识别任务中展示这些模型的效益,我们在那里除了估计模型不确定性外,还显示模型准确性方面的改进。


Article 284

Title@2025-07-02 (3): Dynamic Similarity Graph Construction with Kernel Density Estimation

Title: Dynamic Similarity Graph Construction with Kernel Density Estimation Dynamische Ähnlichkeit Graphenkonstruktion mit Kernel Dichteschätzung 具有内核密度估计的动态相似性图建图 2507.01696v1

Authors (3): Steinar Laenen, Peter Macgregor, He Sun

In the kernel density estimation (KDE) problem, we are given a set $X$ of data points in $\mathbb{R}^d$, a kernel function $k: \mathbb{R}^d \times \mathbb{R}^d \rightarrow \mathbb{R}$, and a query point $\mathbf{q} \in \mathbb{R}^d$, and the objective is to quickly output an estimate of $\sum_{\mathbf{x} \in X} k(\mathbf{q}, \mathbf{x})$. In this paper, we consider $\textsf{KDE}$ in the dynamic setting, and introduce a data structure that efficiently maintains the estimates for a set of query points as data points are added to $X$ over time. Based on this, we design a dynamic data structure that maintains a sparse approximation of the fully connected similarity graph on $X$, and develop a fast dynamic spectral clustering algorithm. We further evaluate the effectiveness of our algorithms on both synthetic and real-world datasets.

在内核密度估计(KDE)问题中,我们得到一套以$mathbb{Rd$($mathbb{Rd$)计算的数据点的一套美元美元,一个内核函数美元:\mathbb{Rd\time\mathbb{R}$;一个查询点$\mathbf{ {q}(在\mathbb{Rd$)中,一个查询点$\mathb{Rd$(美元),目标是快速输出一个以$sum=mathbbf{x}\in X} k} k(\mathbf{q}},\mathbf{x}}美元计算的数据点的估计数。在本文中,我们考虑在动态环境中的$\ textsf{KDE}$,并引入一个数据结构,有效地维持一套查询点的估计数,因为数据点会随着时间增加到$X$X美元。在此基础上,我们设计一个动态的数据结构,它将维持一个完全相连的图表的鲜密近似近,并发展一个快速动态的合成的合成光谱数据组合。我们进一步评估了我们的数据的效能。


Article 285

Title@2025-07-02 (3): FAMES: Fast Approximate Multiplier Substitution for Mixed-Precision Quantized DNNs–Down to 2 Bits!

Title: FAMES: Fast Approximate Multiplier Substitution for Mixed-Precision Quantized DNNs–Down to 2 Bits! FAMES: Fast ungefähre Multiplizierer-Substitution für gemischt-Präzision Quantisierte DNNs–Down auf 2 Bits! 快速近似乘数替代混合精度量化 DNNs - 下到 2 位! 2411.18055v3

Authors (4): Yi Ren, Ruge Xu, Xinfei Guo, Weikang Qian

A widely-used technique in designing energy-efficient deep neural network (DNN) accelerators is quantization. Recent progress in this direction has reduced the bitwidths used in DNN down to 2. Meanwhile, many prior works apply approximate multipliers (AppMuls) in designing DNN accelerators to lower their energy consumption. Unfortunately, these works still assume a bitwidth much larger than 2, which falls far behind the state-of-the-art in quantization area and even challenges the meaningfulness of applying AppMuls in DNN accelerators, since a high-bitwidth AppMul consumes much more energy than a low-bitwidth exact multiplier! Thus, an important problem to study is: Can approximate multipliers be effectively applied to quantized DNN models with very low bitwidths? In this work, we give an affirmative answer to this question and present a systematic solution that achieves the answer: FAMES, a fast approximate multiplier substitution method for mixed-precision DNNs. Our experiments demonstrate an average 28.67% energy reduction on state-of-the-art mixed-precision quantized models with bitwidths as low as 2 bits and accuracy losses kept under 1%. Additionally, our approach is up to 300x faster than previous genetic algorithm-based methods.

在设计高能效深神经网络(DNN)加速器(DNN)过程中,广泛使用的技术是量化的。最近在这方面的进展将DNN使用的小字节降低到2。与此同时,许多先前的工程在设计DNN加速器时应用了近似乘数(AppMuls)来降低能源消耗。不幸的是,这些工程仍然假定比2小得多,远远大于2小点,远远落后于量化区域的最新技术,甚至挑战了在DNN加速器中应用AppMuls的有意义性,因为高位位数AppMul的消耗能量远远大于低位位数精确乘数的倍数。因此,一个需要研究的重要问题是:近似乘数乘数可以有效地应用于以非常低位宽度的四点点点点数模型?在这项工作中,我们给出了一个肯定的答案,并提出了一个系统化的解决方案,即:FAMES,一种用于混合精度DNNP的快速近似倍替代方法。我们的实验展示了比低位数-六十七分点二的精度的精度模型,比位数减少了1的精度减少了。我们以前的位位位数模式。


Article 286

Title@2025-07-02 (3): PERTINENCE: Input-based Opportunistic Neural Network Dynamic Execution

Title: PERTINENCE: Input-based Opportunistic Neural Network Dynamic Execution PERTINENCE: Inputbasierte opportunistische neurale Netzwerk-Dynamische Ausführung 以投入为基础的机会神经网络动态执行 2507.01695v1

Authors (3): Omkar Shende, Gayathri Ananthanarayanan, Marcello Traiola

Deep neural networks (DNNs) have become ubiquitous thanks to their remarkable ability to model complex patterns across various domains such as computer vision, speech recognition, robotics, etc. While large DNN models are often more accurate than simpler, lightweight models, they are also resource- and energy-hungry. Hence, it is imperative to design methods to reduce reliance on such large models without significant degradation in output accuracy. The high computational cost of these models is often necessary only for a reduced set of challenging inputs, while lighter models can handle most simple ones. Thus, carefully combining properties of existing DNN models in a dynamic, input-based way opens opportunities to improve efficiency without impacting accuracy. In this work, we introduce PERTINENCE, a novel online method designed to analyze the complexity of input features and dynamically select the most suitable model from a pre-trained set to process a given input effectively. To achieve this, we employ a genetic algorithm to explore the training space of an ML-based input dispatcher, enabling convergence towards the Pareto front in the solution space that balances overall accuracy and computational efficiency. We showcase our approach on state-of-the-art Convolutional Neural Networks (CNNs) trained on the CIFAR-10 and CIFAR-100, as well as Vision Transformers (ViTs) trained on TinyImageNet dataset. We report results showing PERTINENCE’s ability to provide alternative solutions to existing state-of-the-art models in terms of trade-offs between accuracy and number of operations. By opportunistically selecting among models trained for the same task, PERTINENCE achieves better or comparable accuracy with up to 36% fewer operations.

深心神经网络(DNNs)已经变得无处不在,因为它们在计算机视觉、语音识别、机器人等不同领域建模复杂模式的能力非常高。虽然大型DNN模型往往比简单的轻量模型更准确,但它们也是资源和能源缺乏型号。因此,必须设计一些方法,以减少对大型模型的依赖,而不会显著降低产出的准确性。这些模型的高计算成本往往仅对于减少具有挑战性的投入而言是必要的,而较轻的模型则能够处理最简单的投入。因此,以动态的、基于输入的准确性方法将现有DNNN模型的特性仔细结合起来,为提高效率而不影响准确性。在这项工作中,我们引入了PERTIENCE,这是用来分析投入特征复杂性的新型在线方法,并动态地从一个经过预先培训的模型中选择最合适的模型,以便有效地处理给定的投入。为了实现这一点,我们使用基因算算算法探索一个基于ML的更少的投入发送器的培训空间的培训空间,使经过训练的Preto Front 空间在平衡总体准确性和计算效率的升级的模型之间趋于一致。我们所训练的NCNN-FAR网络运行运行中,在经过训练的模型上展示的模型上展示了我们关于州-FAR-FAR-VILIFAL-IL-I-I-I-I-I-I-I-I-I-I-I-I-C-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-ITR-I-C-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I


Article 287

Title@2025-07-02 (3): EP-Diffuser: An Efficient Diffusion Model for Traffic Scene Generation and Prediction via Polynomial Representations

Title: EP-Diffuser: An Efficient Diffusion Model for Traffic Scene Generation and Prediction via Polynomial Representations EP-Diffusor: Ein effizientes Diffusionsmodell für die Verkehrsszenengenerierung und -vorhersage über polynomische Darstellungen EP-Diffuser:通过多边代表制有效传播交通景点生成和预测模式 2504.05422v2

Authors (4): Yue Yao, Mohamed-Khalil Bouzidi, Daniel Goehring, Joerg Reichardt

As the prediction horizon increases, predicting the future evolution of traffic scenes becomes increasingly difficult due to the multi-modal nature of agent motion. Most state-of-the-art (SotA) prediction models primarily focus on forecasting the most likely future. However, for the safe operation of autonomous vehicles, it is equally important to cover the distribution for plausible motion alternatives. To address this, we introduce EP-Diffuser, a novel parameter-efficient diffusion-based generative model designed to capture the distribution of possible traffic scene evolutions. Conditioned on road layout and agent history, our model acts as a predictor and generates diverse, plausible scene continuations. We benchmark EP-Diffuser against two SotA models in terms of accuracy and plausibility of predictions on the Argoverse 2 dataset. Despite its significantly smaller model size, our approach achieves both highly accurate and plausible traffic scene predictions. We further evaluate model generalization ability in an out-of-distribution (OoD) test setting using Waymo Open dataset and show superior robustness of our approach.

随着预测地平线的扩大,预测交通场景的未来演变由于代理运动的多模式性质而变得日益困难。大多数最先进的(SotA)预测模型主要侧重于预测最可能的未来。然而,对于自主车辆的安全运行来说,同样重要的是要覆盖分配,以寻找可信的运动替代办法。为了解决这个问题,我们引入了一个新的参数高效扩散模型,即EP-Diffuser,这是一个新的参数高效扩散模型,旨在捕捉交通场演变的分布。在道路布局和代理历史中,我们的模型起到预测的作用,产生多样化的、可信的场景延续。我们用两种SotA模型来衡量EP-Diffuser在Argoverse 2数据集预测的准确性和可行性。尽管我们的模型规模要小得多,但我们的方法还是实现了高度准确和可信的交通场预测。我们进一步评估了在使用Waymo Om Open数据集进行分配(OoD)测试时的模型普及能力,并展示了我们方法的超强性。


Article 288

Title@2025-07-02 (3): GPT, But Backwards: Exactly Inverting Language Model Outputs

Title: GPT, But Backwards: Exactly Inverting Language Model Outputs GPT, aber zurück: Genaue Invertierung von Sprachmodell-Ausgaben GPT, 但向后: 完全反转语言模型产出 2507.01693v1

Authors (4): Adrians Skapars, Edoardo Manino, Youcheng Sun, Lucas C. Cordeiro

While existing auditing techniques attempt to identify potential unwanted behaviours in large language models (LLMs), we address the complementary forensic problem of reconstructing the exact input that led to an existing LLM output - enabling post-incident analysis and potentially the detection of fake output reports. We formalize exact input reconstruction as a discrete optimisation problem with a unique global minimum and introduce SODA, an efficient gradient-based algorithm that operates on a continuous relaxation of the input search space with periodic restarts and parameter decay. Through comprehensive experiments on LLMs ranging in size from 33M to 3B parameters, we demonstrate that SODA significantly outperforms existing approaches. We succeed in fully recovering 79.5% of shorter out-of-distribution inputs from next-token logits, without a single false positive, but struggle to extract private information from the outputs of longer (15+ token) input sequences. This suggests that standard deployment practices may currently provide adequate protection against malicious use of our method. Our code is available at https://doi.org/10.5281/zenodo.15539879.

虽然现有的审计技术试图在大型语言模型中查明潜在的不想要的行为,但我们在重塑导致现有LLM产出的确切投入时,解决了补充的法证问题 – – 能够进行事件后分析,并有可能发现假产出报告。我们把确切的投入重建正式确定为一个离散的优化问题,具有独特的全球最低要求,并引入了SODA, 这是一种高效的梯度算法,其运作方式是不断放松输入搜索空间,定期重新启动和参数衰减。我们通过对LLLMs进行全面试验,范围从33M到3B参数,证明SODA大大超过现有方法。我们成功地从下对口的登录中完全回收了79.5%的分发外投入,没有单一的虚假的正面,但努力从更长的(15+象征性)输入序列的产出中提取私人信息。这表明,标准部署做法目前可以提供充分的保护,防止恶意使用我们的方法。我们的代码可在https://doi.org/105281/zeno15539879上查阅。


Article 289

Title@2025-07-02 (3): A generative modeling / Physics-Informed Neural Network approach to random differential equations

Title: A generative modeling / Physics-Informed Neural Network approach to random differential equations Eine generative Modellierung / Physik-informierte Neural Network Ansatz zu zufälligen Differentialgleichungen 随机差异方程的基因建模/物理-内建神经网络方法 2507.01687v1

Authors (3): Georgios Arampatzis, Stylianos Katsarakis, Charalambos Makridakis

The integration of Scientific Machine Learning (SciML) techniques with uncertainty quantification (UQ) represents a rapidly evolving frontier in computational science. This work advances Physics-Informed Neural Networks (PINNs) by incorporating probabilistic frameworks to effectively model uncertainty in complex systems. Our approach enhances the representation of uncertainty in forward problems by combining generative modeling techniques with PINNs. This integration enables in a systematic fashion uncertainty control while maintaining the predictive accuracy of the model. We demonstrate the utility of this method through applications to random differential equations and random partial differential equations (PDEs).

将科学机器学习技术与不确定性量化(UQ)相结合,是计算科学中迅速发展的前沿领域,通过纳入概率框架,有效地模拟复杂系统中的不确定性,推动物理进化神经网络(PINNs),我们的方法通过将基因化模型技术与PINNs相结合,增加了前期问题的不确定性的代表性。这种整合能够系统地控制不确定性,同时保持模型的预测准确性。我们通过应用随机差异方程式和随机部分差异方程式,证明了这种方法的实用性。


Article 290

Title@2025-07-02 (3): Enhancing Robustness to Missing Modalities through Clustered Federated Learning

Title: Enhancing Robustness to Missing Modalities through Clustered Federated Learning Verbesserung der Robustheit fehlender Modalitäten durch clustered Federated Learning 通过分组联邦学习增强对失踪模式的有力性 2505.06911v2

Authors (7): Lishan Yang, Wei Emma Zhang, Quan Z. Sheng, Weitong Chen, Lina Yao, Weitong Chen, Ali Shakeri

In the era of big data, data mining has become indispensable for uncovering hidden patterns and insights from vast and complex datasets. The integration of multimodal data sources further enhances its potential. Multimodal Federated Learning (MFL) is a distributed approach that enhances the efficiency and quality of multimodal learning, ensuring collaborative work and privacy protection. However, missing modalities pose a significant challenge in MFL, often due to data quality issues or privacy policies across the clients. In this work, we present MMiC, a framework for Mitigating Modality incompleteness in MFL within the Clusters. MMiC replaces partial parameters within client models inside clusters to mitigate the impact of missing modalities. Furthermore, it leverages the Banzhaf Power Index to optimize client selection under these conditions. Finally, MMiC employs an innovative approach to dynamically control global aggregation by utilizing Markovitz Portfolio Optimization. Extensive experiments demonstrate that MMiC consistently outperforms existing federated learning architectures in both global and personalized performance on multimodal datasets with missing modalities, confirming the effectiveness of our proposed solution.

在海量数据时代,数据挖掘已成为发现大量复杂数据集的隐蔽模式和洞察力所不可或缺的。多式联运数据源的整合进一步增强了其潜力。多式联邦学习是一种分布式方法,提高了多式学习的效率和质量,确保合作工作和隐私保护。然而,缺失模式对MFL构成重大挑战,这往往是由于数据质量问题或客户的隐私政策造成的。在这项工作中,我们介绍了MMIC,这是在集群内减缓MFL不完善模式的框架。MMIC取代了集群内客户模式中的一部分参数,以减轻缺失模式的影响。此外,它利用班扎夫动力指数优化了在这些条件下选择客户的优化。最后,MIC利用Markovitz Portfolio Optimination,采用创新方法来动态控制全球聚合。广泛的实验表明,MMIC始终超越了以缺失模式在多式数据集全球和个人化业绩中的现有联合学习结构,证实了我们提出的解决方案的有效性。


Article 291

Title@2025-07-02 (3): Blending Supervised and Reinforcement Fine-Tuning with Prefix Sampling

Title: Blending Supervised and Reinforcement Fine-Tuning with Prefix Sampling Mischen von Supervised und Verstärkung Feintuning mit Präfix-Sampling 与前缀抽样混合监管和强化精细推荐 2507.01679v1

Authors (7): Zeyu Huang, Tianhao Cheng, Zihan Qiu, Zili Wang, Yinghui Xu, Edoardo M. Ponti, Ivan Titov

Existing post-training techniques for large language models are broadly categorized into Supervised Fine-Tuning (SFT) and Reinforcement Fine-Tuning (RFT). Each paradigm presents a distinct trade-off: SFT excels at mimicking demonstration data but can lead to problematic generalization as a form of behavior cloning. Conversely, RFT can significantly enhance a model’s performance but is prone to learn unexpected behaviors, and its performance is highly sensitive to the initial policy. In this paper, we propose a unified view of these methods and introduce Prefix-RFT, a hybrid approach that synergizes learning from both demonstration and exploration. Using mathematical reasoning problems as a testbed, we empirically demonstrate that Prefix-RFT is both simple and effective. It not only surpasses the performance of standalone SFT and RFT but also outperforms parallel mixed-policy RFT methods. A key advantage is its seamless integration into existing open-source frameworks, requiring only minimal modifications to the standard RFT pipeline. Our analysis highlights the complementary nature of SFT and RFT, and validates that Prefix-RFT effectively harmonizes these two learning paradigms. Furthermore, ablation studies confirm the method’s robustness to variations in the quality and quantity of demonstration data. We hope this work offers a new perspective on LLM post-training, suggesting that a unified paradigm that judiciously integrates demonstration and exploration could be a promising direction for future research.

大型语言模型的现有培训后技术大致分为监督性美食(SFT)和强化性美食(RFT)等。每种模式都有不同的权衡:SFT在模拟示范数据方面十分出色,但可能导致作为行为克隆的一种形式出现问题。相反,RFT可以显著提高模型的性能,但容易了解出乎意料的行为,其业绩对最初的政策非常敏感。在本文件中,我们建议统一对这些方法的看法,并引入Prefix-RFT,这是一种混合方法,可以同时从示范和探索中学习。我们用数学推理问题作为测试台,从经验上证明,Prefix-RFT既简单又有效,不仅会导致问题化为行为克隆的一种形式。它不仅超越了独立的SFT和RFT的表现,而且超越了平行的混合政策RFT方法。一个关键优势是,它与现有的开放源框架紧密结合,只需要对标准RFT管道进行最低限度的修改。我们的分析强调了SFT和RFT的互补性,并证实Pref-RFT在测试中都有效地统一了一种稳定性研究方向,从而验证出一种未来质量变化的示范方法,从而验证了这种在质量上展示性模型中可以证实一种可靠的示范性研究的新的质量分析。


Article 292

Title@2025-07-02 (3): AsyncFlow: An Asynchronous Streaming RL Framework for Efficient LLM Post-Training

Title: AsyncFlow: An Asynchronous Streaming RL Framework for Efficient LLM Post-Training AsyncFlow: Asynchrones Streaming RL-Framework für effizientes LLM-Post-Training AsyncFlow: 高效LLM 培训后高效LLM的同步流流RL框架 2507.01663v1

Authors (19): Zhenyu Han, Ansheng You, Haibo Wang, Kui Luo, Guang Yang, Wenqi Shi, Menglong Chen, Sicheng Zhang, Zeshun Lan, Chunshi Deng, Huazhong Ji, Wenjie Liu, Yu Huang, Yixiang Zhang, Chenyi Pan, Jing Wang, Xin Huang, Chunsheng Li, Jianping Wu

Reinforcement learning (RL) has become a pivotal technology in the post-training phase of large language models (LLMs). Traditional task-colocated RL frameworks suffer from significant scalability bottlenecks, while task-separated RL frameworks face challenges in complex dataflows and the corresponding resource idling and workload imbalance. Moreover, most existing frameworks are tightly coupled with LLM training or inference engines, making it difficult to support custom-designed engines. To address these challenges, we propose AsyncFlow, an asynchronous streaming RL framework for efficient post-training. Specifically, we introduce a distributed data storage and transfer module that provides a unified data management and fine-grained scheduling capability in a fully streamed manner. This architecture inherently facilitates automated pipeline overlapping among RL tasks and dynamic load balancing. Moreover, we propose a producer-consumer-based asynchronous workflow engineered to minimize computational idleness by strategically deferring parameter update process within staleness thresholds. Finally, the core capability of AsynFlow is architecturally decoupled from underlying training and inference engines and encapsulated by service-oriented user interfaces, offering a modular and customizable user experience. Extensive experiments demonstrate an average of 1.59 throughput improvement compared with state-of-the-art baseline. The presented architecture in this work provides actionable insights for next-generation RL training system designs.

加强学习已成为大型语言模式(LLMS)培训后阶段的关键技术,传统任务分配的RL框架面临重大的可缩放瓶颈,而任务划分的RL框架在复杂的数据流和相应的资源调配和工作量失衡方面面临着挑战,此外,大多数现有框架与LLM培训或推论引擎紧密结合,难以支持定制的引擎。为了应对这些挑战,我们提议AsyncFlow(AsyncFlow),这是高效培训后不同步流流出RL框架。具体地说,我们采用分布式数据存储和传输模块,以完整流方式提供统一的数据管理和细化的列表能力,而任务划分的RL框架则面临复杂的数据流流和相应的资源调配和工作量不平衡;此外,我们提议设计一个基于生产者-消费的无序工作流程,以便从战略上延缓调整参数更新门槛。 最后,AynFlow的核心能力在结构上与基础培训和推导型数据储存模块进行分解,并包罗列了用户模型的下一个定制模型化模型,通过用户模型化模型展示了用户模型的定制设计结构,从而展示了用户模型化的基线结构。


Article 293

Title@2025-07-02 (3): Co-Optimizing Reconfigurable Environments and Policies for Decentralized Multi-Agent Navigation

Title: Co-Optimizing Reconfigurable Environments and Policies for Decentralized Multi-Agent Navigation Co-Optimierung neu konfigurierbarer Umgebungen und Politiken für dezentralisierte Multi-Agent-Navigation 共同优化可重新配置的环境和权力下放多机构导航政策 2403.14583v2

Authors (3): Zhan Gao, Guang Yang, Amanda Prorok

This work views the multi-agent system and its surrounding environment as a co-evolving system, where the behavior of one affects the other. The goal is to take both agent actions and environment configurations as decision variables, and optimize these two components in a coordinated manner to improve some measure of interest. Towards this end, we consider the problem of decentralized multi-agent navigation in a cluttered environment, where we assume that the layout of the environment is reconfigurable. By introducing two sub-objectives – multi-agent navigation and environment optimization – we propose an agent-environment co-optimization problem and develop a coordinated algorithm that alternates between these sub-objectives to search for an optimal synthesis of agent actions and environment configurations; ultimately, improving the navigation performance. Due to the challenge of explicitly modeling the relation between the agents, the environment and their performance therein, we leverage policy gradient to formulate a model-free learning mechanism within the coordinated framework. A formal convergence analysis shows that our coordinated algorithm tracks the local minimum solution of an associated time-varying non-convex optimization problem. Experiments corroborate theoretical findings and show the benefits of co-optimization. Interestingly, the results also indicate that optimized environments can offer structural guidance to de-conflict agents in motion.

这项工作将多试剂系统及其周围环境视为一个共同演变的系统,其中一方的行为会影响另一方的行为。目标是将代理行为和环境配置同时作为决定变量,并以协调的方式优化这两个组成部分,以提高某种程度的兴趣。为此,我们考虑在一种杂乱的环境中分散多试剂导航的问题,我们假设环境的布局是可以重新配置的。通过引入两个次级目标 – – 多试剂导航和环境优化 – – 我们提出一个代理-环境共同优化问题,并开发一种协调的算法,以替代这些次级目标,寻找最佳的代理行为和环境配置综合;最终,改进导航性能。由于明确模拟代理人、环境及其在其中的性能之间的关系的挑战,我们利用政策梯度来制定在协调框架内不使用模型的学习机制。正式的趋同分析表明,我们协调的算法追踪了相关时间分配非convex优化问题的当地最低解决办法。实验证实了理论性结论,并展示了冲突动态推动者提供最佳结构化指导的好处。


Article 294

Title@2025-07-02 (3): Contrastive Learning and Adversarial Disentanglement for Privacy-Aware Task-Oriented Semantic Communication

Title: Contrastive Learning and Adversarial Disentanglement for Privacy-Aware Task-Oriented Semantic Communication Kontrastives Lernen und zweifelhafte Entflechtung für die datenschutzbewusste, aufgabenorientierte semantische Kommunikation 隐私-软件任务导向的语义交流的矛盾学习和反对解脱 2410.22784v3

Authors (3): Omar Erak, Omar Alhussein, Wen Tong

Task-oriented semantic communication systems have emerged as a promising approach to achieving efficient and intelligent data transmission in next-generation networks, where only information relevant to a specific task is communicated. This is particularly important in 6G-enabled Internet of Things (6G-IoT) scenarios, where bandwidth constraints, latency requirements, and data privacy are critical. However, existing methods struggle to fully disentangle task-relevant and task-irrelevant information, leading to privacy concerns and suboptimal performance. To address this, we propose an information-bottleneck inspired method, named CLAD (contrastive learning and adversarial disentanglement). CLAD utilizes contrastive learning to effectively capture task-relevant features while employing adversarial disentanglement to discard task-irrelevant information. Additionally, due to the absence of reliable and reproducible methods to quantify the minimality of encoded feature vectors, we introduce the Information Retention Index (IRI), a comparative metric used as a proxy for the mutual information between the encoded features and the input. The IRI reflects how minimal and informative the representation is, making it highly relevant for privacy-preserving and bandwidth-efficient 6G-IoT systems. Extensive experiments demonstrate that CLAD outperforms state-of-the-art baselines in terms of semantic extraction, task performance, privacy preservation, and IRI, making it a promising building block for responsible, efficient and trustworthy 6G-IoT services.

以任务为导向的语义通信系统已成为在下一代网络中实现高效和智能数据传输的一种有希望的方法,在下一代网络中,只有与具体任务相关的信息才被传递,这在6G驱动的互联网(6G-IoT)情景中特别重要,在6G驱动的Tings(6G-IoT)情景中,带宽限制、延缓要求和数据隐私至关重要;然而,现有方法努力完全分解与任务相关和任务相关的信息,从而导致隐私关切和业绩欠佳。为此,我们提议采用信息保存指数(IRI),作为编码特征和投入之间相互信息的代用比较性指标。 CIRAD利用对比性学习有效地捕捉与任务相关的特征,同时采用对抗性调和调和调调,抛弃与任务有关的信息。此外,由于缺乏可靠和可复制的方法来量化与任务相关的特性矢量的信息保存指数,我们引入了信息保存指数(IRI)。


Article 295

Title@2025-07-02 (3): NegMerge: Sign-Consensual Weight Merging for Machine Unlearning

Title: NegMerge: Sign-Consensual Weight Merging for Machine Unlearning NegMerge: Sign-Consensual Weight Merging für maschinelles Lernen NegMelge: 机器脱学的信号-协同重力合并 2410.05583v2

Authors (3): Hyo Seo Kim, Dongyoon Han, Junsuk Choe

Machine unlearning aims to selectively remove specific knowledge from a trained model. Existing approaches, such as Task Arithmetic, fine-tune the model on the forget set to create a task vector (i.e., a direction in weight space) for subtraction from the original model’s weight. However, their effectiveness is highly sensitive to hyperparameter selection, requiring extensive validation to identify the optimal vector from many fine-tuned candidates. In this paper, we propose a novel method that utilizes all fine-tuned models trained with varying hyperparameters instead of a single selection. Specifically, we aggregate the computed task vectors by retaining only the elements with consistent shared signs. The merged task vector is then negated to induce unlearning on the original model. Evaluations on zero-shot and standard image recognition tasks across twelve datasets and four backbone architectures show that our approach outperforms state-of-the-art methods while requiring similar or fewer computational resources. Code is available at https://github.com/naver-ai/negmerge.

机器不学习的目的是有选择地将特定知识从经过训练的模型中去除。 现有的方法, 如任务自理, 微调模型, 将模型放在为创建任务矢量( 即重量空间方向) 以从原始模型的重量中减去而设定任务矢量( 即重量空间方向) 。 但是, 它们的有效性对于超光度选择非常敏感, 需要从许多经过微调的候选人中广泛验证才能确定最佳矢量。 本文中, 我们建议一种新颖的方法, 利用所有经过不同超光度训练的精细调整模型, 而不是单一选择。 具体地说, 我们通过只保留具有一致共享符号的元素来汇总计算计算计算任务矢量。 合并任务矢量的合并后, 要在原始模型上诱导出不学习 。 12个数据集和4个主干结构的零点和标准图像识别任务评估显示, 我们的方法在需要相似或更少计算资源的情况下, 超越了状态的计算方法。 代码可在 https://github. com/ naver- negmerge/negmerge 上查阅 。


Article 296

Title@2025-07-02 (3): SPoT: Subpixel Placement of Tokens in Vision Transformers

Title: SPoT: Subpixel Placement of Tokens in Vision Transformers SPoT: Subpixel-Platzierung von Token in Vision Transformers SPOT: 视觉变形器中当当量子像素配置 2507.01654v1

Authors (4): Martine Hjelkrem-Tan, Marius Aasan, Gabriel Y. Arteaga, Adín Ramírez Rivera

Vision Transformers naturally accommodate sparsity, yet standard tokenization methods confine features to discrete patch grids. This constraint prevents models from fully exploiting sparse regimes, forcing awkward compromises. We propose Subpixel Placement of Tokens (SPoT), a novel tokenization strategy that positions tokens continuously within images, effectively sidestepping grid-based limitations. With our proposed oracle-guided search, we uncover substantial performance gains achievable with ideal subpixel token positioning, drastically reducing the number of tokens necessary for accurate predictions during inference. SPoT provides a new direction for flexible, efficient, and interpretable ViT architectures, redefining sparsity as a strategic advantage rather than an imposed limitation.

视觉变异器自然会容纳聚变,但标准象征性化方法将特性限制在离散的补丁网格上。 这一限制阻止了模型充分利用稀疏的政权,迫使尴尬的妥协。 我们提议了将托肯斯(Spot)子像素安插(Spot)这个新颖的象征性化战略(Spixel Tokens ) , 持续在图像中标注符号, 有效地绕开基于网格的限制。 通过我们提议的甲骨文指导搜索,我们发现通过理想的次像标定定位可以取得大量绩效收益, 大幅削减了在推论期间准确预测所需的符号数量。 SPoT为灵活、高效和可解释的维肯斯结构提供了新的方向,重新将聚变性重新定义为战略优势而不是强加的限制。


Article 297

Title@2025-07-02 (3): Sublinear Regret for a Class of Continuous-Time Linear-Quadratic Reinforcement Learning Problems

Title: Sublinear Regret for a Class of Continuous-Time Linear-Quadratic Reinforcement Learning Problems Sublinearer Bedauern für eine Klasse von linear-Quadratischen Lernproblemen 连续时线性强化学习问题分类的子线性遗憾 2407.17226v5

Authors (3): Yilie Huang, Yanwei Jia, Xun Yu Zhou

We study reinforcement learning (RL) for a class of continuous-time linear-quadratic (LQ) control problems for diffusions, where states are scalar-valued and running control rewards are absent but volatilities of the state processes depend on both state and control variables. We apply a model-free approach that relies neither on knowledge of model parameters nor on their estimations, and devise an RL algorithm to learn the optimal policy parameter directly. Our main contributions include the introduction of an exploration schedule and a regret analysis of the proposed algorithm. We provide the convergence rate of the policy parameter to the optimal one, and prove that the algorithm achieves a regret bound of $O(N^{\frac{3}{4}})$ up to a logarithmic factor, where $N$ is the number of learning episodes. We conduct a simulation study to validate the theoretical results and demonstrate the effectiveness and reliability of the proposed algorithm. We also perform numerical comparisons between our method and those of the recent model-based stochastic LQ RL studies adapted to the state- and control-dependent volatility setting, demonstrating a better performance of the former in terms of regret bounds.

我们研究的是一组连续时间线性赤道(LQ)控制扩散问题的强化学习(RL)问题,在这个类别中,国家是计算价值和运行控制奖励,但国家过程的波动取决于状态变量和控制变量。我们采用不依赖模型参数知识或不依赖其估计的模型方法,并设计了一种RL算法,直接学习最佳政策参数。我们的主要贡献包括采用探索时间表和对拟议算法的遗憾分析。我们为最佳的参数提供了政策参数的趋同率,并证明该算法达到了美元(Nfrac{344)至对数系数的遗憾度,而美元是学习事件的数量。我们进行了模拟研究,以验证理论结果,并展示拟议算法的有效性和可靠性。我们还对我们的方法和最近基于模型的LQRL研究方法进行了数字比较,这些方法与依赖状态和控制的波动设置相适应,显示了前一个数值的更好表现。


Article 298

Title@2025-07-02 (3): On the Fundamental Impossibility of Hallucination Control in Large Language Models

Title: On the Fundamental Impossibility of Hallucination Control in Large Language Models Über die grundsätzliche Unmöglichkeit der Halluzinationskontrolle in großen Sprachmodellen 关于大语言模型中幻听控制的基本不可能性 2506.06382v2

Authors (1): Michał P. Karpowicz

We prove that perfect hallucination control in large language models is mathematically impossible. No LLM inference mechanism can simultaneously achieve truthful response generation, semantic information conservation, relevant knowledge revelation, and knowledge-constrained optimality. This impossibility is fundamental, arising from the mathematical structure of information aggregation itself rather than engineering limitations. The proof spans three mathematical frameworks: auction theory, proper scoring theory for probabilistic predictions, and log-sum-exp analysis for transformer architectures. In each setting, we demonstrate that information aggregation creates unavoidable violations of conservation principles. The Jensen gap in transformer probability aggregation provides a direct measure of this impossibility. These results reframe hallucination from an engineering bug to an inevitable mathematical feature of distributed intelligence. There are fundamental trade-offs between truthfulness, knowledge utilization, and response completeness, providing principled foundations for managing rather than eliminating hallucination. This work reveals deep connections between neural network inference, philosophy of knowledge and reasoning, and classical results in game theory and information theory, opening new research directions for developing beneficial AI systems within mathematical constraints.

我们证明大型语言模型的完美幻觉控制在数学上是不可能的。 没有LLM 推论机制能够同时实现真实的反应生成、语义信息保护、相关知识披露和知识限制的最佳性。 这种不可能性是根本的,产生于信息集成本身的数学结构而不是工程限制。 证据包含三个数学框架:拍卖理论、概率预测的适当评分理论以及变压器结构的日志和参数分析。 在每种情况下,我们证明信息汇总都不可避免地违反了保护原则。 变压器概率汇总中的Jensen差距提供了这种不可能性的直接衡量。 这些结果是将幻觉从工程错误重新定位为分布式情报的不可避免的数学特征。 在真实性、知识利用和反应完整性之间有着基本的权衡,为管理而不是消除幻觉提供了原则基础。 这项工作揭示了神经网络的推论、知识和推理哲学以及游戏理论和信息理论的经典结果之间的密切联系,为在数学限制范围内开发有益的AI系统开辟了新的研究方向。


Article 299

Title@2025-07-02 (3): GradMetaNet: An Equivariant Architecture for Learning on Gradients

Title: GradMetaNet: An Equivariant Architecture for Learning on Gradients GradMetaNet: Eine gleichwertige Architektur zum Lernen auf Gradienten GradMetaNet:关于渐进式学习的等同架构 2507.01649v1

Authors (8): Yoav Gelberg, Yam Eitan, Aviv Navon, Aviv Shamsian, Theo, Putterman, Michael Bronstein, Haggai Maron

Gradients of neural networks encode valuable information for optimization, editing, and analysis of models. Therefore, practitioners often treat gradients as inputs to task-specific algorithms, e.g. for pruning or optimization. Recent works explore learning algorithms that operate directly on gradients but use architectures that are not specifically designed for gradient processing, limiting their applicability. In this paper, we present a principled approach for designing architectures that process gradients. Our approach is guided by three principles: (1) equivariant design that preserves neuron permutation symmetries, (2) processing sets of gradients across multiple data points to capture curvature information, and (3) efficient gradient representation through rank-1 decomposition. Based on these principles, we introduce GradMetaNet, a novel architecture for learning on gradients, constructed from simple equivariant blocks. We prove universality results for GradMetaNet, and show that previous approaches cannot approximate natural gradient-based functions that GradMetaNet can. We then demonstrate GradMetaNet’s effectiveness on a diverse set of gradient-based tasks on MLPs and transformers, such as learned optimization, INR editing, and estimating loss landscape curvature.

神经网络的梯度将宝贵的信息编码起来,用于优化、编辑和分析模型。因此,从业者经常将梯度作为特定任务算法的投入,例如用于修剪或优化。最近的工作探索了直接在梯度上运行的学习算法,但使用不是为梯度处理而专门设计的架构,限制了其适用性。在本文件中,我们提出了一个设计结构来加工梯度的结构的原则性方法。我们的方法以三项原则为指导:(1) 保存神经变异对称的等式设计;(2) 处理多个数据点的梯度,以获取曲线信息;(3) 通过一级变异定位进行高效率的梯度代表。基于这些原则,我们引入了格拉德-梅塔网络,这是一个从简单的等离子区块建起的关于梯度学习的新结构。我们证明了格拉德-梅塔网络的普遍性结果,并表明以往的方法无法与Grad-MetaNet能够提供的自然梯度函数相近似。我们随后展示了格拉德-梅塔网络在多种基于梯度的任务中的有效性,通过一级分级-1的变换,例如所学的压和变压。


Article 300

Title@2025-07-02 (3): Dance Dance ConvLSTM

Title: Dance Dance ConvLSTM Tanztanz ConvLSTM 舞蹈舞蹈ConvLSTM 2507.01644v1

Authors (1): Miguel O’Malley

\textit{Dance Dance Revolution} is a rhythm game consisting of songs and accompanying choreography, referred to as charts. Players press arrows on a device referred to as a dance pad in time with steps determined by the song’s chart. In 2017, the authors of Dance Dance Convolution (DDC) developed an algorithm for the automatic generation of \textit{Dance Dance Revolution} charts, utilizing a CNN-LSTM architecture. We introduce Dance Dance ConvLSTM (DDCL), a new method for the automatic generation of DDR charts using a ConvLSTM based model, which improves upon the DDC methodology and substantially increases the accuracy of chart generation.

nan


Article 301

Title@2025-07-02 (3): Initialization Method for Factorization Machine Based on Low-Rank Approximation for Constructing a Corrected Approximate Ising Model

Title: Initialization Method for Factorization Machine Based on Low-Rank Approximation for Constructing a Corrected Approximate Ising Model Initialisierungsmethode für Factorization Machine basierend auf Low-Rank-Annäherung für die Konstruktion eines korrigierten ungefähren Ising-Modells 建筑正确近似误差型号的基于低Rank近似度的保理机的初始化方法 2410.12747v3

Authors (3): Yuya Seki, Hyakka Nakada, Shu Tanaka

This paper presents an initialization method that can approximate a given approximate Ising model with a high degree of accuracy using a factorization machine (FM), a machine learning model. The construction of an Ising models using an FM is applied to black-box combinatorial optimization problems using factorization machine with quantum annealing (FMQA). It is anticipated that the optimization performance of FMQA will be enhanced through an implementation of the warm-start method. Nevertheless, the optimal initialization method for leveraging the warm-start approach in FMQA remains undetermined. Consequently, the present study compares initialization methods based on random initialization and low-rank approximation, and then identifies a suitable one for use with warm-start in FMQA through numerical experiments. Furthermore, the properties of the initialization method by the low-rank approximation for the FM are analyzed using random matrix theory, demonstrating that the approximation accuracy of the proposed method is not significantly influenced by the specific Ising model under consideration. The findings of this study will facilitate advancements of research in the field of black-box combinatorial optimization through the use of Ising machines.

nan


Article 302

Title@2025-07-02 (3): Truthful Elicitation of Imprecise Forecasts

Title: Truthful Elicitation of Imprecise Forecasts Wahre Botschaft von ungenauen Prognosen 以真真真真真真真真真切的易感简易预报 2503.16395v3

Authors (3): Anurag Singh, Siu Lun Chau, Krikamol Muandet

The quality of probabilistic forecasts is crucial for decision-making under uncertainty. While proper scoring rules incentivize truthful reporting of precise forecasts, they fall short when forecasters face epistemic uncertainty about their beliefs, limiting their use in safety-critical domains where decision-makers (DMs) prioritize proper uncertainty management. To address this, we propose a framework for scoring imprecise forecasts – forecasts given as a set of beliefs. Despite existing impossibility results for deterministic scoring rules, we enable truthful elicitation by drawing connection to social choice theory and introducing a two-way communication framework where DMs first share their aggregation rules (e.g., averaging or min-max) used in downstream decisions for resolving forecast ambiguity. This, in turn, helps forecasters resolve indecision during elicitation. We further show that truthful elicitation of imprecise forecasts is achievable using proper scoring rules randomized over the aggregation procedure. Our approach allows DM to elicit and integrate the forecaster’s epistemic uncertainty into their decision-making process, thus improving credibility.

nan


Article 303

Title@2025-07-02 (3): Kernel Recursive Least Squares Dictionary Learning Algorithm

Title: Kernel Recursive Least Squares Dictionary Learning Algorithm Kernel Rekursive Least Squares Wörterbuch Lernen Algorithmen 最小广场字典学习算法 2507.01636v1

Authors (2): Ghasem Alipoor, Karl Skretting

We propose an efficient online dictionary learning algorithm for kernel-based sparse representations. In this framework, input signals are nonlinearly mapped to a high-dimensional feature space and represented sparsely using a virtual dictionary. At each step, the dictionary is updated recursively using a novel algorithm based on the recursive least squares (RLS) method. This update mechanism works with single samples or mini-batches and maintains low computational complexity. Experiments on four datasets across different domains show that our method not only outperforms existing online kernel dictionary learning approaches but also achieves classification accuracy close to that of batch-trained models, while remaining significantly more efficient.

nan


Article 304

Title@2025-07-02 (3): Non-collective Calibrating Strategy for Time Series Forecasting

Title: Non-collective Calibrating Strategy for Time Series Forecasting Nicht-kollektive Kalibrierstrategie für Zeitreihenprognosen 非集体时间序列预测校准战略 2506.03176v2

Authors (7): Bin Wang, Yongqi Han, Minbo Ma, Tianrui Li, Junbo Zhang, Feng Hong, Yanwei Yu

Deep learning-based approaches have demonstrated significant advancements in time series forecasting. Despite these ongoing developments, the complex dynamics of time series make it challenging to establish the rule of thumb for designing the golden model architecture. In this study, we argue that refining existing advanced models through a universal calibrating strategy can deliver substantial benefits with minimal resource costs, as opposed to elaborating and training a new model from scratch. We first identify a multi-target learning conflict in the calibrating process, which arises when optimizing variables across time steps, leading to the underutilization of the model’s learning capabilities. To address this issue, we propose an innovative calibrating strategy called Socket+Plug (SoP). This approach retains an exclusive optimizer and early-stopping monitor for each predicted target within each Plug while keeping the fully trained Socket backbone frozen. The model-agnostic nature of SoP allows it to directly calibrate the performance of any trained deep forecasting models, regardless of their specific architectures. Extensive experiments on various time series benchmarks and a spatio-temporal meteorological ERA5 dataset demonstrate the effectiveness of SoP, achieving up to a 22% improvement even when employing a simple MLP as the Plug (highlighted in Figure 1). Code is available at https://github.com/hanyuki23/SoP.

nan


Article 305

Title@2025-07-02 (3): Tile and Slide : A New Framework for Scaling NeRF from Local to Global 3D Earth Observation

Title: Tile and Slide : A New Framework for Scaling NeRF from Local to Global 3D Earth Observation Tile and Slide : Ein neuer Rahmen für die Skalierung von NeRF von lokaler bis globaler 3D-Erdbeobachtung 平板和幻灯片:从地方向全球3D地球观测扩大内域FF的新框架 2507.01631v1

Authors (4): Camille Billouard, Dawa Derksen, Alexandre Constantin, Bruno Vallet

Neural Radiance Fields (NeRF) have recently emerged as a paradigm for 3D reconstruction from multiview satellite imagery. However, state-of-the-art NeRF methods are typically constrained to small scenes due to the memory footprint during training, which we study in this paper. Previous work on large-scale NeRFs palliate this by dividing the scene into NeRFs. This paper introduces Snake-NeRF, a framework that scales to large scenes. Our out-of-core method eliminates the need to load all images and networks simultaneously, and operates on a single device. We achieve this by dividing the region of interest into NeRFs that 3D tile without overlap. Importantly, we crop the images with overlap to ensure each NeRFs is trained with all the necessary pixels. We introduce a novel $2\times 2$ 3D tile progression strategy and segmented sampler, which together prevent 3D reconstruction errors along the tile edges. Our experiments conclude that large satellite images can effectively be processed with linear time complexity, on a single GPU, and without compromise in quality.

nan


Article 306

Title@2025-07-02 (3): Adapting Rule Representation With Four-Parameter Beta Distribution for Learning Classifier Systems

Title: Adapting Rule Representation With Four-Parameter Beta Distribution for Learning Classifier Systems Anpassung der Regeldarstellung mit Vier-Parameter Beta-Distribution für Lernklassifikatoren 调整教学分类系统有四分之四张贝塔分配的细则代表制 2506.03602v2

Authors (6): Hiroki Shiraishi, Yohei Hayamizu, Tomonori Hashiyama, Keiki Takadama, Hisao Ishibuchi, Masaya Nakata

Rule representations significantly influence the search capabilities and decision boundaries within the search space of Learning Classifier Systems (LCSs), a family of rule-based machine learning systems that evolve interpretable models through evolutionary processes. However, it is very difficult to choose an appropriate rule representation for each problem. Additionally, some problems benefit from using different representations for different subspaces within the input space. Thus, an adaptive mechanism is needed to choose an appropriate rule representation for each rule in LCSs. This article introduces a flexible rule representation using a four-parameter beta distribution and integrates it into a fuzzy-style LCS. The four-parameter beta distribution can form various function shapes, and this flexibility enables our LCS to automatically select appropriate representations for different subspaces. Our rule representation can represent crisp/fuzzy decision boundaries in various boundary shapes, such as rectangles and bells, by controlling four parameters, compared to the standard representations such as trapezoidal ones. Leveraging this flexibility, our LCS is designed to adapt the appropriate rule representation for each subspace. Moreover, our LCS incorporates a generalization bias favoring crisp rules where feasible, enhancing model interpretability without compromising accuracy. Experimental results on real-world classification tasks show that our LCS achieves significantly superior test accuracy and produces more compact rule sets. Our implementation is available at https://github.com/YNU-NakataLab/Beta4-UCS. An extended abstract related to this work is available at https://doi.org/10.36227/techrxiv.174900805.59801248/v1.

nan


Article 307

Title@2025-07-02 (3): Query Complexity of Classical and Quantum Channel Discrimination

Title: Query Complexity of Classical and Quantum Channel Discrimination Abfrage der Komplexität der klassischen und Quantenkanal-Diskriminierung 古典和量道歧视的复杂程度问题 2504.12989v2

Authors (2): Theshani Nuradha, Mark M. Wilde

Quantum channel discrimination has been studied from an information-theoretic perspective, wherein one is interested in the optimal decay rate of error probabilities as a function of the number of unknown channel accesses. In this paper, we study the query complexity of quantum channel discrimination, wherein the goal is to determine the minimum number of channel uses needed to reach a desired error probability. To this end, we show that the query complexity of binary channel discrimination depends logarithmically on the inverse error probability and inversely on the negative logarithm of the (geometric and Holevo) channel fidelity. As a special case of these findings, we precisely characterize the query complexity of discriminating two classical channels and two classical-quantum channels. Furthermore, by obtaining a tighter characterization of the sample complexity of quantum hypothesis testing, including prior probabilities, we provide a more precise characterization of query complexity when the error probability does not exceed a fixed threshold. We also provide lower and upper bounds on the query complexity of binary asymmetric channel discrimination and multiple quantum channel discrimination. For the former, the query complexity depends on the geometric R'enyi and Petz R'enyi channel divergences, while for the latter, it depends on the negative logarithm of the (geometric and Uhlmann) channel fidelity. For multiple channel discrimination, the upper bound scales as the logarithm of the number of channels.

nan


Article 308

Title@2025-07-02 (3): When Less Is More: Binary Feedback Can Outperform Ordinal Comparisons in Ranking Recovery

Title: When Less Is More: Binary Feedback Can Outperform Ordinal Comparisons in Ranking Recovery Wenn weniger ist mehr: Binäre Rückkopplung kann übertreffen Ordinal-Vergleiche in Ranking Recovery 少少少少少:二进反馈在评分回收方面能够超过常规比较 2507.01613v1

Authors (3): Shirong Xu, Jingnan Zhang, Junhui Wang

Paired comparison data, where users evaluate items in pairs, play a central role in ranking and preference learning tasks. While ordinal comparison data intuitively offer richer information than binary comparisons, this paper challenges that conventional wisdom. We propose a general parametric framework for modeling ordinal paired comparisons without ties. The model adopts a generalized additive structure, featuring a link function that quantifies the preference difference between two items and a pattern function that governs the distribution over ordinal response levels. This framework encompasses classical binary comparison models as special cases, by treating binary responses as binarized versions of ordinal data. Within this framework, we show that binarizing ordinal data can significantly improve the accuracy of ranking recovery. Specifically, we prove that under the counting algorithm, the ranking error associated with binary comparisons exhibits a faster exponential convergence rate than that of ordinal data. Furthermore, we characterize a substantial performance gap between binary and ordinal data in terms of a signal-to-noise ratio (SNR) determined by the pattern function. We identify the pattern function that minimizes the SNR and maximizes the benefit of binarization. Extensive simulations and a real application on the MovieLens dataset further corroborate our theoretical findings.

nan


Article 309

Title@2025-07-02 (3): Upper and lower bounds for the Lipschitz constant of random neural networks

Title: Upper and lower bounds for the Lipschitz constant of random neural networks Obere und untere Grenzen für die Lipschitz-Konstante zufälliger neuronaler Netzwerke 随机神经网络Lipschitz常量的上下界限 2311.01356v4

Authors (4): Paul Geuchen, Dominik Stöger, Thomas Telaar, Felix Voigtlaender

Empirical studies have widely demonstrated that neural networks are highly sensitive to small, adversarial perturbations of the input. The worst-case robustness against these so-called adversarial examples can be quantified by the Lipschitz constant of the neural network. In this paper, we study upper and lower bounds for the Lipschitz constant of random ReLU neural networks. Specifically, we assume that the weights and biases follow a generalization of the He initialization, where general symmetric distributions for the biases are permitted. For deep networks of fixed depth and sufficiently large width, our established upper bound is larger than the lower bound by a factor that is logarithmic in the width. In contrast, for shallow neural networks we characterize the Lipschitz constant up to an absolute numerical constant that is independent of all parameters.

nan


Article 310

Title@2025-07-02 (3): Survivability of Backdoor Attacks on Unconstrained Face Recognition Systems

Title: Survivability of Backdoor Attacks on Unconstrained Face Recognition Systems Überlebensfähigkeit von Hintertürangriffen auf ungezwungene Gesichtserkennungssysteme 对不受限制的面对面承认系统的后门攻击的可存活性 2507.01607v1

Authors (5): Quentin Le Roux, Yannick Teglia, Teddy Furon, Philippe Loubet-Moundi, Eric Bourbao

The widespread use of deep learning face recognition raises several security concerns. Although prior works point at existing vulnerabilities, DNN backdoor attacks against real-life, unconstrained systems dealing with images captured in the wild remain a blind spot of the literature. This paper conducts the first system-level study of backdoors in deep learning-based face recognition systems. This paper yields four contributions by exploring the feasibility of DNN backdoors on these pipelines in a holistic fashion. We demonstrate for the first time two backdoor attacks on the face detection task: face generation and face landmark shift attacks. We then show that face feature extractors trained with large margin losses also fall victim to backdoor attacks. Combining our models, we then show using 20 possible pipeline configurations and 15 attack cases that a single backdoor enables an attacker to bypass the entire function of a system. Finally, we provide stakeholders with several best practices and countermeasures.

nan


Article 311

Title@2025-07-02 (3): Data Agent: A Holistic Architecture for Orchestrating Data+AI Ecosystems

Title: Data Agent: A Holistic Architecture for Orchestrating Data+AI Ecosystems Data Agent: Eine ganzheitliche Architektur für die Orchestrierung von Daten+AI-Ökosystemen 数据代号:一个用于管弦化数据+AI生态系统的综合结构 2507.01599v1

Authors (5): Zhaoyan Sun, Jiayi Wang, Xinyang Zhao, Jiachi Wang, Guoliang Li

Traditional Data+AI systems utilize data-driven techniques to optimize performance, but they rely heavily on human experts to orchestrate system pipelines, enabling them to adapt to changes in data, queries, tasks, and environments. For instance, while there are numerous data science tools available, developing a pipeline planning system to coordinate these tools remains challenging. This difficulty arises because existing Data+AI systems have limited capabilities in semantic understanding, reasoning, and planning. Fortunately, we have witnessed the success of large language models (LLMs) in enhancing semantic understanding, reasoning, and planning abilities. It is crucial to incorporate LLM techniques to revolutionize data systems for orchestrating Data+AI applications effectively. To achieve this, we propose the concept of a ‘Data Agent’ - a comprehensive architecture designed to orchestrate Data+AI ecosystems, which focuses on tackling data-related tasks by integrating knowledge comprehension, reasoning, and planning capabilities. We delve into the challenges involved in designing data agents, such as understanding data/queries/environments/tools, orchestrating pipelines/workflows, optimizing and executing pipelines, and fostering pipeline self-reflection. Furthermore, we present examples of data agent systems, including a data science agent, data analytics agents (such as unstructured data analytics agent, semantic structured data analytics agent, data lake analytics agent, and multi-modal data analytics agent), and a database administrator (DBA) agent. We also outline several open challenges associated with designing data agent systems.

nan


Article 312

Title@2025-07-02 (3): Analysis of Muon’s Convergence and Critical Batch Size

Title: Analysis of Muon’s Convergence and Critical Batch Size Analyse der Konvergenz von Muon und der kritischen Batch-Größe Muon的聚合和关键批量大小分析 2507.01598v1

Authors (3): Naoki Sato, Hiroki Naganuma, Hideaki Iiduka

This paper presents a theoretical analysis of Muon, a new optimizer that leverages the inherent matrix structure of neural network parameters. We provide convergence proofs for four practical variants of Muon: with and without Nesterov momentum, and with and without weight decay. We then show that adding weight decay leads to strictly tighter bounds on both the parameter and gradient norms, and we clarify the relationship between the weight decay coefficient and the learning rate. Finally, we derive Muon’s critical batch size minimizing the stochastic first-order oracle (SFO) complexity, which is the stochastic computational cost, and validate our theoretical findings with experiments.

nan


Article 313

Title@2025-07-02 (3): Efficiently Vectorized MCMC on Modern Accelerators

Title: Efficiently Vectorized MCMC on Modern Accelerators Effizient vektorisierte MCMC auf modernen Beschleunigern 现代加速器问题MCMC 2503.17405v2

Authors (4): Hugh Dance, Pierre Glaser, Peter Orbanz, Ryan Adams

With the advent of automatic vectorization tools (e.g., JAX’s $\texttt{vmap}$), writing multi-chain MCMC algorithms is often now as simple as invoking those tools on single-chain code. Whilst convenient, for various MCMC algorithms this results in a synchronization problem – loosely speaking, at each iteration all chains running in parallel must wait until the last chain has finished drawing its sample. In this work, we show how to design single-chain MCMC algorithms in a way that avoids synchronization overheads when vectorizing with tools like $\texttt{vmap}$ by using the framework of finite state machines (FSMs). Using a simplified model, we derive an exact theoretical form of the obtainable speed-ups using our approach, and use it to make principled recommendations for optimal algorithm design. We implement several popular MCMC algorithms as FSMs, including Elliptical Slice Sampling, HMC-NUTS, and Delayed Rejection, demonstrating speed-ups of up to an order of magnitude in experiments.

nan


Article 314

Title@2025-07-02 (3): Autonomous AI Surveillance: Multimodal Deep Learning for Cognitive and Behavioral Monitoring

Title: Autonomous AI Surveillance: Multimodal Deep Learning for Cognitive and Behavioral Monitoring Autonome KI-Überwachung: Multimodales Deep Learning für Kognitives und Verhaltensmonitoring AI 自主监测:认知和行为监测的多式深层学习 2507.01590v1

Authors (6): Ameer Hamza, Zuhaib Hussain But, Umar Arif, Samiya, M. Abdullah Asad, Muhammad Naeem

This study presents a novel classroom surveillance system that integrates multiple modalities, including drowsiness, tracking of mobile phone usage, and face recognition,to assess student attentiveness with enhanced precision.The system leverages the YOLOv8 model to detect both mobile phone and sleep usage,(Ghatge et al., 2024) while facial recognition is achieved through LResNet Occ FC body tracking using YOLO and MTCNN.(Durai et al., 2024) These models work in synergy to provide comprehensive, real-time monitoring, offering insights into student engagement and behavior.(S et al., 2023) The framework is trained on specialized datasets, such as the RMFD dataset for face recognition and a Roboflow dataset for mobile phone detection. The extensive evaluation of the system shows promising results. Sleep detection achieves 97. 42% mAP@50, face recognition achieves 86. 45% validation accuracy and mobile phone detection reach 85. 89% mAP@50. The system is implemented within a core PHP web application and utilizes ESP32-CAM hardware for seamless data capture.(Neto et al., 2024) This integrated approach not only enhances classroom monitoring, but also ensures automatic attendance recording via face recognition as students remain seated in the classroom, offering scalability for diverse educational environments.(Banada,2025)

nan


Article 315

Title@2025-07-02 (3): A Privacy-Preserving Indoor Localization System based on Hierarchical Federated Learning

Title: A Privacy-Preserving Indoor Localization System based on Hierarchical Federated Learning Ein datenschutzerhaltendes Indoor-Lokalisierungssystem basierend auf hierarchischem Federated Learning 基于等级联邦学习的维护隐私室内本地化系统 2507.01581v1

Authors (3): Masood Jan, Wafa Njima, Xun Zhang

Location information serves as the fundamental element for numerous Internet of Things (IoT) applications. Traditional indoor localization techniques often produce significant errors and raise privacy concerns due to centralized data collection. In response, Machine Learning (ML) techniques offer promising solutions by capturing indoor environment variations. However, they typically require central data aggregation, leading to privacy, bandwidth, and server reliability issues. To overcome these challenges, in this paper, we propose a Federated Learning (FL)-based approach for dynamic indoor localization using a Deep Neural Network (DNN) model. Experimental results show that FL has the nearby performance to Centralized Model (CL) while keeping the data privacy, bandwidth efficiency and server reliability. This research demonstrates that our proposed FL approach provides a viable solution for privacy-enhanced indoor localization, paving the way for advancements in secure and efficient indoor localization systems.

nan


Article 316

Title@2025-07-02 (3): Transfer Learning for VLC-based indoor Localization: Addressing Environmental Variability

Title: Transfer Learning for VLC-based indoor Localization: Addressing Environmental Variability Transfer Learning für VLC-basierte Indoor-Localization: Adressierung von Umweltvariabilität 以VLC为基础的室内本地化转让学习:解决环境变化多变性问题 2507.01575v1

Authors (4): Masood Jan, Wafa Njima, Xun Zhang, Alexander Artemenko

Accurate indoor localization is crucial in industrial environments. Visible Light Communication (VLC) has emerged as a promising solution, offering high accuracy, energy efficiency, and minimal electromagnetic interference. However, VLC-based indoor localization faces challenges due to environmental variability, such as lighting fluctuations and obstacles. To address these challenges, we propose a Transfer Learning (TL)-based approach for VLC-based indoor localization. Using real-world data collected at a BOSCH factory, the TL framework integrates a deep neural network (DNN) to improve localization accuracy by 47\%, reduce energy consumption by 32\%, and decrease computational time by 40\% compared to the conventional models. The proposed solution is highly adaptable under varying environmental conditions and achieves similar accuracy with only 30\% of the dataset, making it a cost-efficient and scalable option for industrial applications in Industry 4.0.

nan


Article 317

Title@2025-07-02 (3): Is merging worth it? Securely evaluating the information gain for causal dataset acquisition

Title: Is merging worth it? Securely evaluating the information gain for causal dataset acquisition Lohnt es sich zu verschmelzen? Sichere Auswertung des Informationsgewinns für die kausale Datenerfassung 合并是否值得? 安全评估因果数据集获取的信息收益 2409.07215v3

Authors (5): Jake Fawkes, Lucile Ter-Minassian, Desi Ivanova, Uri Shalit, Chris Holmes

Merging datasets across institutions is a lengthy and costly procedure, especially when it involves private information. Data hosts may therefore want to prospectively gauge which datasets are most beneficial to merge with, without revealing sensitive information. For causal estimation this is particularly challenging as the value of a merge depends not only on reduction in epistemic uncertainty but also on improvement in overlap. To address this challenge, we introduce the first cryptographically secure information-theoretic approach for quantifying the value of a merge in the context of heterogeneous treatment effect estimation. We do this by evaluating the Expected Information Gain (EIG) using multi-party computation to ensure that no raw data is revealed. We further demonstrate that our approach can be combined with differential privacy (DP) to meet arbitrary privacy requirements whilst preserving more accurate computation compared to DP alone. To the best of our knowledge, this work presents the first privacy-preserving method for dataset acquisition tailored to causal estimation. We demonstrate the effectiveness and reliability of our method on a range of simulated and realistic benchmarks. Code is publicly available: https://github.com/LucileTerminassian/causal_prospective_merge.

nan


Article 318

Title@2025-07-02 (3): On the Effect of Ruleset Tuning and Data Imbalance on Explainable Network Security Alert Classifications: a Case-Study on DeepCASE

Title: On the Effect of Ruleset Tuning and Data Imbalance on Explainable Network Security Alert Classifications: a Case-Study on DeepCASE Über die Auswirkungen von Ruleset Tuning und Datenunwucht auf erklärbare Netzsicherheits-Alarmklassifikationen: eine Fallstudie zu DeepCASE 规则图案和数据平衡对可解释的网络安全警报分类的影响:深 CASE案例研究 2507.01571v1

Authors (4): Koen T. W. Teuwen, Sam Baggen, Emmanuele Zambon, Luca Allodi

Automation in Security Operations Centers (SOCs) plays a prominent role in alert classification and incident escalation. However, automated methods must be robust in the presence of imbalanced input data, which can negatively affect performance. Additionally, automated methods should make explainable decisions. In this work, we evaluate the effect of label imbalance on the classification of network intrusion alerts. As our use-case we employ DeepCASE, the state-of-the-art method for automated alert classification. We show that label imbalance impacts both classification performance and correctness of the classification explanations offered by DeepCASE. We conclude tuning the detection rules used in SOCs can significantly reduce imbalance and may benefit the performance and explainability offered by alert post-processing methods such as DeepCASE. Therefore, our findings suggest that traditional methods to improve the quality of input data can benefit automation.

nan


Article 319

Title@2025-07-02 (3): TRACED: Transition-aware Regret Approximation with Co-learnability for Environment Design

Title: TRACED: Transition-aware Regret Approximation with Co-learnability for Environment Design TRACED: Transition-aware regret Annäherung mit Mitlernbarkeit für Umweltdesign TRACEED: 环境设计中具有共负环境设计共负作用的过渡-意识到遗憾相近情况 2506.19997v2

Authors (6): Geonwoo Cho, Jaegyun Im, Jihwan Lee, Hojun Yi, Sejin Kim, Sundong Kim

Generalizing deep reinforcement learning agents to unseen environments remains a significant challenge. One promising solution is Unsupervised Environment Design (UED), a co-evolutionary framework in which a teacher adaptively generates tasks with high learning potential, while a student learns a robust policy from this evolving curriculum. Existing UED methods typically measure learning potential via regret, the gap between optimal and current performance, approximated solely by value-function loss. Building on these approaches, we introduce the transition prediction error as an additional term in our regret approximation. To capture how training on one task affects performance on others, we further propose a lightweight metric called co-learnability. By combining these two measures, we present Transition-aware Regret Approximation with Co-learnability for Environment Design (TRACED). Empirical evaluations show that TRACED yields curricula that improve zero-shot generalization across multiple benchmarks while requiring up to 2x fewer environment interactions than strong baselines. Ablation studies confirm that the transition prediction error drives rapid complexity ramp-up and that co-learnability delivers additional gains when paired with the transition prediction error. These results demonstrate how refined regret approximation and explicit modeling of task relationships can be leveraged for sample-efficient curriculum design in UED.

nan


Article 320

Title@2025-07-02 (3): Time Series Representations for Classification Lie Hidden in Pretrained Vision Transformers

Title: Time Series Representations for Classification Lie Hidden in Pretrained Vision Transformers Zeitreihendarstellungen für die Klassifizierung liegen in vorgebildeten Visionstransformatoren versteckt 未受过训练的愿景变异者中隐藏的分类隐隐藏物的 时间序列 2506.08641v2

Authors (5): Simon Roschmann, Quentin Bouniot, Vasilii Feofanov, Ievgen Redko, Zeynep Akata

Time series classification is a fundamental task in healthcare and industry, yet the development of time series foundation models (TSFMs) remains limited by the scarcity of publicly available time series datasets. In this work, we propose Time Vision Transformer (TiViT), a framework that converts time series into images to leverage the representational power of frozen Vision Transformers (ViTs) pretrained on large-scale image datasets. First, we theoretically motivate our approach by analyzing the 2D patching of ViTs for time series, showing that it can increase the number of label-relevant tokens and reduce the sample complexity. Second, we empirically demonstrate that TiViT achieves state-of-the-art performance on standard time series classification benchmarks by utilizing the hidden representations of large OpenCLIP models. We explore the structure of TiViT representations and find that intermediate layers with high intrinsic dimension are the most effective for time series classification. Finally, we assess the alignment between TiViT and TSFM representation spaces and identify a strong complementarity, with further performance gains achieved by combining their features. Our findings reveal a new direction for reusing vision representations in a non-visual domain. Code is available at https://github.com/ExplainableML/TiViT.

nan


Article 321

Title@2025-07-02 (3): Unsupervised Panoptic Interpretation of Latent Spaces in GANs Using Space-Filling Vector Quantization

Title: Unsupervised Panoptic Interpretation of Latent Spaces in GANs Using Space-Filling Vector Quantization Unüberwachte Panoptische Interpretation von Latent Spaces in GANs mittels raumfüllender Vector-Quantisierung 利用空间填空矢量量化,在GANs中无监督地对低空空间进行全光解释 2410.20573v2

Authors (2): Mohammad Hassan Vali, Tom Bäckström

Generative adversarial networks (GANs) learn a latent space whose samples can be mapped to real-world images. Such latent spaces are difficult to interpret. Some earlier supervised methods aim to create an interpretable latent space or discover interpretable directions, which requires exploiting data labels or annotated synthesized samples for training. However, we propose using a modification of vector quantization called space-filling vector quantization (SFVQ), which quantizes the data on a piece-wise linear curve. SFVQ can capture the underlying morphological structure of the latent space, making it interpretable. We apply this technique to model the latent space of pre-trained StyleGAN2 and BigGAN networks on various datasets. Our experiments show that the SFVQ curve yields a general interpretable model of the latent space such that it determines which parts of the latent space correspond to specific generative factors. Furthermore, we demonstrate that each line of the SFVQ curve can potentially refer to an interpretable direction for applying intelligible image transformations. We also demonstrate that the points located on an SFVQ line can be used for controllable data augmentation.

nan


Article 322

Title@2025-07-02 (3): Who Should I Listen To? Adaptive Collaboration in Personalized Federated Learning

Title: Who Should I Listen To? Adaptive Collaboration in Personalized Federated Learning Wem sollte ich zuhören? Adaptive Zusammenarbeit im personalisierten Federated Learning 个人化联邦学习中的适应性协作 2507.00259v2

Authors (4): Amr Abourayya, Jens Kleesiek, Bharat Rao, Michael Kamp

Data heterogeneity is a central challenge in federated learning, and personalized federated learning (PFL) aims to address it by tailoring models to each client’s distribution. Yet many PFL methods fail to outperform local or centralized baselines, suggesting a mismatch between the collaboration they enforce and the structure of the data. We propose an approach based on adaptive collaboration, where clients decide adaptively not only how much to rely on others, but also whom to trust at the level of individual examples. We instantiate this principle in FEDMOSAIC, a federated co-training method in which clients exchange predictions over a shared unlabeled dataset. This enables fine-grained trust decisions that are difficult to achieve with parameter sharing alone. Each client adjusts its loss weighting based on the agreement between private and public data, and contributes to global pseudo-labels in proportion to its estimated per-example confidence. Empirically, FEDMOSAIC improves upon state-of-the-art PFL methods across diverse non-IID settings, and we provide convergence guarantees under standard assumptions. Our results demonstrate the potential of data-aware collaboration for robust and effective personalization.

nan


Article 323

Title@2025-07-02 (3): How Weight Resampling and Optimizers Shape the Dynamics of Continual Learning and Forgetting in Neural Networks

Title: How Weight Resampling and Optimizers Shape the Dynamics of Continual Learning and Forgetting in Neural Networks Wie Gewicht-Resampling und Optimierer die Dynamik des kontinuierlichen Lernens und Vergessens in neuralen Netzwerken gestalten 重力再采样和优化如何塑造神经网络中持续学习和遗忘的动态 2507.01559v1

Authors (4): Lapo Frati, Neil Traft, Jeff Clune, Nick Cheney

Recent work in continual learning has highlighted the beneficial effect of resampling weights in the last layer of a neural network (``zapping”). Although empirical results demonstrate the effectiveness of this approach, the underlying mechanisms that drive these improvements remain unclear. In this work, we investigate in detail the pattern of learning and forgetting that take place inside a convolutional neural network when trained in challenging settings such as continual learning and few-shot transfer learning, with handwritten characters and natural images. Our experiments show that models that have undergone zapping during training more quickly recover from the shock of transferring to a new domain. Furthermore, to better observe the effect of continual learning in a multi-task setting we measure how each individual task is affected. This shows that, not only zapping, but the choice of optimizer can also deeply affect the dynamics of learning and forgetting, causing complex patterns of synergy/interference between tasks to emerge when the model learns sequentially at transfer time.

nan


Article 324

Title@2025-07-02 (3): Learned-Database Systems Security

Title: Learned-Database Systems Security Gelernte Datenbanksysteme Sicherheit 数据库系统安全 2212.10318v4

Authors (5): Roei Schuster, Jin Peng Zhou, Thorsten Eisenhofer, Paul Grubbs, Nicolas Papernot

A learned database system uses machine learning (ML) internally to improve performance. We can expect such systems to be vulnerable to some adversarial-ML attacks. Often, the learned component is shared between mutually-distrusting users or processes, much like microarchitectural resources such as caches, potentially giving rise to highly-realistic attacker models. However, compared to attacks on other ML-based systems, attackers face a level of indirection as they cannot interact directly with the learned model. Additionally, the difference between the attack surface of learned and non-learned versions of the same system is often subtle. These factors obfuscate the de-facto risks that the incorporation of ML carries. We analyze the root causes of potentially-increased attack surface in learned database systems and develop a framework for identifying vulnerabilities that stem from the use of ML. We apply our framework to a broad set of learned components currently being explored in the database community. To empirically validate the vulnerabilities surfaced by our framework, we choose 3 of them and implement and evaluate exploits against these. We show that the use of ML cause leakage of past queries in a database, enable a poisoning attack that causes exponential memory blowup in an index structure and crashes it in seconds, and enable index users to snoop on each others’ key distributions by timing queries over their own keys. We find that adversarial ML is an universal threat against learned components in database systems, point to open research gaps in our understanding of learned-systems security, and conclude by discussing mitigations, while noting that data leakage is inherent in systems whose learned component is shared between multiple parties.

nan


Article 325

Title@2025-07-02 (3): ZeCO: Zero Communication Overhead Sequence Parallelism for Linear Attention

Title: ZeCO: Zero Communication Overhead Sequence Parallelism for Linear Attention ZeCO: Null Kommunikation Überkopfsequenz Parallelität für lineare Aufmerksamkeit ZeCO: 线性注意的无通信重叠序列平行主义 2507.01004v2

Authors (9): Yuhong Chou, Zehao Liu, Ruijie Zhu, Xinyi Wan, Tianjian Li, Congying Chu, Qian Liu, Jibin Wu, Zejun Ma

Linear attention mechanisms deliver significant advantages for Large Language Models (LLMs) by providing linear computational complexity, enabling efficient processing of ultra-long sequences (e.g., 1M context). However, existing Sequence Parallelism (SP) methods, essential for distributing these workloads across devices, become the primary bottleneck due to substantial communication overhead. In this paper, we introduce ZeCO (Zero Communication Overhead) sequence parallelism for linear attention models, a new SP method designed to overcome these limitations and achieve end-to-end near-linear scalability for long sequence training. For example, training a model with a 1M sequence length across 64 devices using ZeCO takes roughly the same time as training with an 16k sequence on a single device. At the heart of ZeCO lies All-Scan, a new collective communication primitive. All-Scan provides each SP rank with precisely the initial operator state it requires while maintaining a minimal communication footprint, effectively eliminating communication overhead. Theoretically, we prove the optimaity of ZeCO, showing that it introduces only negligible time and space overhead. Empirically, we compare the communication costs of different sequence parallelism strategies and demonstrate that All-Scan achieves the fastest communication in SP scenarios. Specifically, on 256 GPUs with an 8M sequence length, ZeCO achieves a 60\% speedup compared to the current state-of-the-art (SOTA) SP method. We believe ZeCO establishes a clear path toward efficiently training next-generation LLMs on previously intractable sequence lengths.

nan


Article 326

Title@2025-07-02 (3): AI and Remote Sensing for Resilient and Sustainable Built Environments: A Review of Current Methods, Open Data and Future Directions

Title: AI and Remote Sensing for Resilient and Sustainable Built Environments: A Review of Current Methods, Open Data and Future Directions KI und Fernerkundung für resiliente und nachhaltig gebaute Umgebungen: Eine Übersicht über aktuelle Methoden, offene Daten und zukünftige Richtungen 适应性和可持续建筑环境的AI和遥感:对当前方法、开放数据和未来方向的审查 2507.01547v1

Authors (4): Ubada El Joulani, Tatiana Kalganova, Stergios-Aristoteles Mitoulis, Sotirios Argyroudis

Critical infrastructure, such as transport networks, underpins economic growth by enabling mobility and trade. However, ageing assets, climate change impacts (e.g., extreme weather, rising sea levels), and hybrid threats ranging from natural disasters to cyber attacks and conflicts pose growing risks to their resilience and functionality. This review paper explores how emerging digital technologies, specifically Artificial Intelligence (AI), can enhance damage assessment and monitoring of transport infrastructure. A systematic literature review examines existing AI models and datasets for assessing damage in roads, bridges, and other critical infrastructure impacted by natural disasters. Special focus is given to the unique challenges and opportunities associated with bridge damage detection due to their structural complexity and critical role in connectivity. The integration of SAR (Synthetic Aperture Radar) data with AI models is also discussed, with the review revealing a critical research gap: a scarcity of studies applying AI models to SAR data for comprehensive bridge damage assessment. Therefore, this review aims to identify the research gaps and provide foundations for AI-driven solutions for assessing and monitoring critical transport infrastructures.

nan


Article 327

Title@2025-07-02 (3): MARVIS: Modality Adaptive Reasoning over VISualizations

Title: MARVIS: Modality Adaptive Reasoning over VISualizations MARVIS: Modalität Adaptive Begründung über VISualisierungen MARVIS: 与VIS定性有关的模式调整理由 2507.01544v1

Authors (4): Benjamin Feuer, Lennart Purucker, Oussama Elachqar, Chinmay Hegde

Scientific applications of machine learning often rely on small, specialized models tuned to particular domains. Such models often achieve excellent performance, but lack flexibility. Foundation models offer versatility, but typically underperform specialized approaches, especially on non-traditional modalities and long-tail domains. We propose MARVIS (Modality Adaptive Reasoning over VISualizations), a training-free method that enables even small vision-language models to predict any data modality with high accuracy. MARVIS transforms latent embedding spaces into visual representations and then leverages the spatial and fine-grained reasoning skills of VLMs to successfully interpret and utilize them. MARVIS achieves competitive performance on vision, audio, biological, and tabular domains using a single 3B parameter model, achieving results that beat Gemini by 16\% on average and approach specialized methods, without exposing personally identifiable information (P.I.I.) or requiring any domain-specific training. We open source our code and datasets at https://github.com/penfever/marvis

nan


Article 328

Title@2025-07-02 (3): Parsimonious Gaussian mixture models with piecewise-constant eigenvalue profiles

Title: Parsimonious Gaussian mixture models with piecewise-constant eigenvalue profiles Parsimonische Gauß-Mischmodelle mit stückweise-konstanten Eigenwertprofilen 配有麻片- 恒定等离子值剖面图的帕西莫斯高斯混合混合物模型 2507.01542v1

Authors (4): Tom Szwagier, Pierre-Alexandre Mattei, Charles Bouveyron, Xavier Pennec

Gaussian mixture models (GMMs) are ubiquitous in statistical learning, particularly for unsupervised problems. While full GMMs suffer from the overparameterization of their covariance matrices in high-dimensional spaces, spherical GMMs (with isotropic covariance matrices) certainly lack flexibility to fit certain anisotropic distributions. Connecting these two extremes, we introduce a new family of parsimonious GMMs with piecewise-constant covariance eigenvalue profiles. These extend several low-rank models like the celebrated mixtures of probabilistic principal component analyzers (MPPCA), by enabling any possible sequence of eigenvalue multiplicities. If the latter are prespecified, then we can naturally derive an expectation-maximization (EM) algorithm to learn the mixture parameters. Otherwise, to address the notoriously-challenging issue of jointly learning the mixture parameters and hyperparameters, we propose a componentwise penalized EM algorithm, whose monotonicity is proven. We show the superior likelihood-parsimony tradeoffs achieved by our models on a variety of unsupervised experiments: density fitting, clustering and single-image denoising.

nan


Article 329

Title@2025-07-02 (3): Consistency of Learned Sparse Grid Quadrature Rules using NeuralODEs

Title: Consistency of Learned Sparse Grid Quadrature Rules using NeuralODEs Konsistenz der gelernten Sparse Grid-Quadratur-Regeln mit NeuralODEs 使用神经元的 “ 利用神经元 “ 的 “ 突变网格宽度规则 “ 一致性 2507.01533v1

Authors (3): Hanno Gottschalk, Emil Partow, Tobias J. Riedlinger

This paper provides a proof of the consistency of sparse grid quadrature for numerical integration of high dimensional distributions. In a first step, a transport map is learned that normalizes the distribution to a noise distribution on the unit cube. This step is built on the statistical learning theory of neural ordinary differential equations, which has been established recently. Secondly, the composition of the generative map with the quantity of interest is integrated numerically using the Clenshaw-Curtis sparse grid quadrature. A decomposition of the total numerical error in quadrature error and statistical error is provided. As main result it is proven in the framework of empirical risk minimization that all error terms can be controlled in the sense of PAC (probably approximately correct) learning and with high probability the numerical integral approximates the theoretical value up to an arbitrary small error in the limit where the data set size is growing and the network capacity is increased adaptively.

nan


Article 330

Title@2025-07-02 (3): Chargax: A JAX Accelerated EV Charging Simulator

Title: Chargax: A JAX Accelerated EV Charging Simulator Chargax: Ein JAX beschleunigter EV-Ladesimulator 电荷: JAX 加速 EV 充电模拟器 2507.01522v1

Authors (4): Koen Ponse, Jan Felix Kleuker, Aske Plaat, Thomas Moerland

Deep Reinforcement Learning can play a key role in addressing sustainable energy challenges. For instance, many grid systems are heavily congested, highlighting the urgent need to enhance operational efficiency. However, reinforcement learning approaches have traditionally been slow due to the high sample complexity and expensive simulation requirements. While recent works have effectively used GPUs to accelerate data generation by converting environments to JAX, these works have largely focussed on classical toy problems. This paper introduces Chargax, a JAX-based environment for realistic simulation of electric vehicle charging stations designed for accelerated training of RL agents. We validate our environment in a variety of scenarios based on real data, comparing reinforcement learning agents against baselines. Chargax delivers substantial computational performance improvements of over 100x-1000x over existing environments. Additionally, Chargax’ modular architecture enables the representation of diverse real-world charging station configurations.

nan


Article 331

Title@2025-07-02 (3): Loss Functions in Diffusion Models: A Comparative Study

Title: Loss Functions in Diffusion Models: A Comparative Study Verlustfunktionen bei Diffusionsmodellen: Eine vergleichende Studie 传播模型中的功能损失:比较研究 2507.01516v1

Authors (3): Dibyanshu Kumar, Philipp Vaeth, Magda Gregorová

Diffusion models have emerged as powerful generative models, inspiring extensive research into their underlying mechanisms. One of the key questions in this area is the loss functions these models shall train with. Multiple formulations have been introduced in the literature over the past several years with some links and some critical differences stemming from various initial considerations. In this paper, we explore the different target objectives and corresponding loss functions in detail. We present a systematic overview of their relationships, unifying them under the framework of the variational lower bound objective. We complement this theoretical analysis with an empirical study providing insights into the conditions under which these objectives diverge in performance and the underlying factors contributing to such deviations. Additionally, we evaluate how the choice of objective impacts the model ability to achieve specific goals, such as generating high-quality samples or accurately estimating likelihoods. This study offers a unified understanding of loss functions in diffusion models, contributing to more efficient and goal-oriented model designs in future research.

nan


Article 332

Title@2025-07-02 (3): Mamba Guided Boundary Prior Matters: A New Perspective for Generalized Polyp Segmentation

Title: Mamba Guided Boundary Prior Matters: A New Perspective for Generalized Polyp Segmentation Mamba Guided Boundary Prior Matters: Eine neue Perspektive für die generalisierte Polyp-Segmentierung Mamba 向导边界前事项:普遍聚分法的新视角 2507.01509v1

Authors (4): Tapas K. Dutta, Snehashis Majhi, Deepak Ranjan Nayak, Debesh Jha

Polyp segmentation in colonoscopy images is crucial for early detection and diagnosis of colorectal cancer. However, this task remains a significant challenge due to the substantial variations in polyp shape, size, and color, as well as the high similarity between polyps and surrounding tissues, often compounded by indistinct boundaries. While existing encoder-decoder CNN and transformer-based approaches have shown promising results, they struggle with stable segmentation performance on polyps with weak or blurry boundaries. These methods exhibit limited abilities to distinguish between polyps and non-polyps and capture essential boundary cues. Moreover, their generalizability still falls short of meeting the demands of real-time clinical applications. To address these limitations, we propose SAM-MaGuP, a groundbreaking approach for robust polyp segmentation. By incorporating a boundary distillation module and a 1D-2D Mamba adapter within the Segment Anything Model (SAM), SAM-MaGuP excels at resolving weak boundary challenges and amplifies feature learning through enriched global contextual interactions. Extensive evaluations across five diverse datasets reveal that SAM-MaGuP outperforms state-of-the-art methods, achieving unmatched segmentation accuracy and robustness. Our key innovations, a Mamba-guided boundary prior and a 1D-2D Mamba block, set a new benchmark in the field, pushing the boundaries of polyp segmentation to new heights.

nan


Article 333

Title@2025-07-02 (3): Scaling Up Liquid-Resistance Liquid-Capacitance Networks for Efficient Sequence Modeling

Title: Scaling Up Liquid-Resistance Liquid-Capacitance Networks for Efficient Sequence Modeling Skalierung von Flüssig-Resistenz-Netzwerken für eine effiziente Sequenzmodellierung 增强增强流动性恢复力的流动性能力网络,以建立高效序列建模 2505.21717v3

Authors (4): Mónika Farsang, Ramin Hasani, Daniela Rus, Radu Grosu

We present LrcSSM, a $\textit{nonlinear}$ recurrent model that processes long sequences as fast as today’s linear state-space layers. By forcing the state-transition matrix to be diagonal and learned at every step, the full sequence can be solved in parallel with a single prefix-scan, giving $\mathcal{O}(TD)$ time and memory and only $\mathcal{O}(\log T)$ sequential depth, for input-sequence length $T$ and a state dimension $D$. Moreover, LrcSSM offers a formal gradient-stability guarantee that other input-varying systems such as Liquid-S4 and Mamba do not provide. Lastly, for network depth $L$, as the forward and backward passes cost $\Theta(T\,D\,L)$ FLOPs, with its low sequential depth and parameter count $\Theta(D\,L)$, the model follows the compute-optimal scaling law regime ($\beta \approx 0.42$) recently observed for Mamba, outperforming quadratic-attention Transformers at equal compute while avoiding the memory overhead of FFT-based long convolutions. We show that on a series of long-range forecasting tasks, LrcSSM outperforms LRU, S5 and Mamba.

nan


Article 334

Title@2025-07-02 (3): SimBank: from Simulation to Solution in Prescriptive Process Monitoring

Title: SimBank: from Simulation to Solution in Prescriptive Process Monitoring SimBank: Von der Simulation zur Lösung in der Prescriptive Process Monitoring SimBank:从模拟到规范程序监测的解决方案 2506.14772v3

Authors (4): Jakob De Moor, Hans Weytjens, Johannes De Smedt, Jochen De Weerdt

Prescriptive Process Monitoring (PresPM) is an emerging area within Process Mining, focused on optimizing processes through real-time interventions for effective decision-making. PresPM holds significant promise for organizations seeking enhanced operational performance. However, the current literature faces two key limitations: a lack of extensive comparisons between techniques and insufficient evaluation approaches. To address these gaps, we introduce SimBank: a simulator designed for accurate benchmarking of PresPM methods. Modeled after a bank’s loan application process, SimBank enables extensive comparisons of both online and offline PresPM methods. It incorporates a variety of intervention optimization problems with differing levels of complexity and supports experiments on key causal machine learning challenges, such as assessing a method’s robustness to confounding in data. SimBank additionally offers a comprehensive evaluation capability: for each test case, it can generate the true outcome under each intervention action, which is not possible using recorded datasets. The simulator incorporates parallel activities and loops, drawing from common logs to generate cases that closely resemble real-life process instances. Our proof of concept demonstrates SimBank’s benchmarking capabilities through experiments with various PresPM methods across different interventions, highlighting its value as a publicly available simulator for advancing research and practice in PresPM.

nan


Article 335

Title@2025-07-02 (3): Meteoroid stream identification with HDBSCAN unsupervised clustering algorithm

Title: Meteoroid stream identification with HDBSCAN unsupervised clustering algorithm Meteoroid-Stream-Identifikation mit HDBSCAN unüberwachtem Clustering-Algorithmus 与 HDBScAN 不受监督的群集算法的气象流识别 2507.01501v1

Authors (2): Eloy Peña-Asensio, Fabio Ferrari

Accurate identification of meteoroid streams is central to understanding their origins and evolution. However, overlapping clusters and background noise hinder classification, an issue amplified for missions such as ESA’s LUMIO that rely on meteor shower observations to infer lunar meteoroid impact parameters. This study evaluates the performance of the Hierarchical Density-Based Spatial Clustering of Applications with Noise (HDBSCAN) algorithm for unsupervised meteoroid stream identification, comparing its outcomes with the established Cameras for All-Sky Meteor Surveillance (CAMS) look-up table method. We analyze the CAMS Meteoroid Orbit Database v3.0 using three feature vectors: LUTAB (CAMS geocentric parameters), ORBIT (heliocentric orbital elements), and GEO (adapted geocentric parameters). HDBSCAN is applied with varying minimum cluster sizes and two cluster selection methods (eom and leaf). To align HDBSCAN clusters with CAMS classifications, the Hungarian algorithm determines the optimal mapping. Clustering performance is assessed via the Silhouette score, Normalized Mutual Information, and F1 score, with Principal Component Analysis further supporting the analysis. With the GEO vector, HDBSCAN confirms 39 meteoroid streams, 21 strongly aligning with CAMS. The ORBIT vector identifies 30 streams, 13 with high matching scores. Less active showers pose identification challenges. The eom method consistently yields superior performance and agreement with CAMS. Although HDBSCAN requires careful selection of the minimum cluster size, it delivers robust, internally consistent clusters and outperforms the look-up table method in statistical coherence. These results underscore HDBSCAN’s potential as a mathematically consistent alternative for meteoroid stream identification, although further validation is needed to assess physical validity.

nan


Article 336

Title@2025-07-02 (3): optimizn: a Python Library for Developing Customized Optimization Algorithms

Title: optimizn: a Python Library for Developing Customized Optimization Algorithms optimizn: eine Python-Bibliothek für die Entwicklung maßgeschneiderter Optimierungsalgorithmen 优化: 用于开发自定制优化算法的 Python 图书馆 2503.00033v2

Authors (2): Akshay Sathiya, Rohit Pandey

Combinatorial optimization problems are prevalent across a wide variety of domains. These problems are often nuanced, their optimal solutions might not be efficiently obtainable, and they may require lots of time and compute resources to solve (they are NP-hard). It follows that the best course of action for solving these problems is to use general optimization algorithm paradigms to quickly and easily develop algorithms that are customized to these problems and can produce good solutions in a reasonable amount of time. In this paper, we present optimizn, a Python library for developing customized optimization algorithms under general optimization algorithm paradigms (simulated annealing, branch and bound). Additionally, optimizn offers continuous training, with which users can run their algorithms on a regular cadence, retain the salient aspects of previous runs, and use them in subsequent runs to potentially produce solutions that get closer and closer to optimality. An earlier version of this paper was peer reviewed and published internally at Microsoft.

nan


Article 337

Title@2025-07-02 (3): Diffusion-based Iterative Counterfactual Explanations for Fetal Ultrasound Image Quality Assessment

Title: Diffusion-based Iterative Counterfactual Explanations for Fetal Ultrasound Image Quality Assessment Diffusionsbasierte iterative Gegenfaktische Erklärungen für fetale Ultraschall-Bildqualitätsbewertung 胎儿超声波图像质量评估的基于传播的替代反事实解释 2403.08700v2

Authors (9): Paraskevas Pegios, Manxi Lin, Nina Weng, Morten Bo Søndergaard Svendsen, Zahra Bashir, Siavash Bigdeli, Anders Nymark Christensen, Martin Tolsgaard, Aasa Feragen

Obstetric ultrasound image quality is crucial for accurate diagnosis and monitoring of fetal health. However, acquiring high-quality standard planes is difficult, influenced by the sonographer’s expertise and factors like the maternal BMI or fetus dynamics. In this work, we explore diffusion-based counterfactual explainable AI to generate realistic, high-quality standard planes from low-quality non-standard ones. Through quantitative and qualitative evaluation, we demonstrate the effectiveness of our approach in generating plausible counterfactuals of increased quality. This shows future promise for enhancing training of clinicians by providing visual feedback and potentially improving standard plane quality and acquisition for downstream diagnosis and monitoring.

nan


Article 338

Title@2025-07-02 (3): On the Trade-off between Flatness and Optimization in Distributed Learning

Title: On the Trade-off between Flatness and Optimization in Distributed Learning Auf dem Trade-off zwischen Flatness und Optimierung im verteilten Lernen 关于分配式学习的平衡与优化的平衡 2406.20006v2

Authors (4): Ying Cao, Zhaoxian Wu, Kun Yuan, Ali H. Sayed

This paper proposes a theoretical framework to evaluate and compare the performance of stochastic gradient algorithms for distributed learning in relation to their behavior around local minima in nonconvex environments. Previous works have noticed that convergence toward flat local minima tend to enhance the generalization ability of learning algorithms. This work discovers three interesting results. First, it shows that decentralized learning strategies are able to escape faster away from local minima and favor convergence toward flatter minima relative to the centralized solution. Second, in decentralized methods, the consensus strategy has a worse excess-risk performance than diffusion, giving it a better chance of escaping from local minima and favoring flatter minima. Third, and importantly, the ultimate classification accuracy is not solely dependent on the flatness of the local minimum but also on how well a learning algorithm can approach that minimum. In other words, the classification accuracy is a function of both flatness and optimization performance. In this regard, since diffusion has a lower excess-risk than consensus, when both algorithms are trained starting from random initial points, diffusion enhances the classification accuracy. The paper examines the interplay between the two measures of flatness and optimization error closely. One important conclusion is that decentralized strategies deliver in general enhanced classification accuracy because they strike a more favorable balance between flatness and optimization performance compared to the centralized solution.

nan


Article 339

Title@2025-07-02 (3): How to Securely Shuffle? A survey about Secure Shufflers for privacy-preserving computations

Title: How to Securely Shuffle? A survey about Secure Shufflers for privacy-preserving computations Wie man sicher Shuffle? Eine Umfrage über Secure Shuffler für die Privatsphäre-Erhaltung Berechnungen 如何安全打乱? 关于用于隐私保护计算的安全打碎器的调查 2507.01487v1

Authors (4): Marc Damie, Florian Hahn, Andreas Peter, Jan Ramon

Ishai et al. (FOCS’06) introduced secure shuffling as an efficient building block for private data aggregation. Recently, the field of differential privacy has revived interest in secure shufflers by highlighting the privacy amplification they can provide in various computations. Although several works argue for the utility of secure shufflers, they often treat them as black boxes; overlooking the practical vulnerabilities and performance trade-offs of existing implementations. This leaves a central question open: what makes a good secure shuffler? This survey addresses that question by identifying, categorizing, and comparing 26 secure protocols that realize the necessary shuffling functionality. To enable a meaningful comparison, we adapt and unify existing security definitions into a consistent set of properties. We also present an overview of privacy-preserving technologies that rely on secure shufflers, offer practical guidelines for selecting appropriate protocols, and outline promising directions for future work.

nan


Article 340

Title@2025-07-02 (3): ManifoldMind: Dynamic Hyperbolic Reasoning for Trustworthy Recommendations

Title: ManifoldMind: Dynamic Hyperbolic Reasoning for Trustworthy Recommendations ManifoldMind: Dynamische hyperbolische Begründung für vertrauenswürdige Empfehlungen MManifulMind: 值得信赖的建议的动态超双曲理由 2507.02014v1

Authors (3): Anoushka Harit, Zhongtian Sun, Suncica Hadzidedic

We introduce ManifoldMind, a probabilistic geometric recommender system for exploratory reasoning over semantic hierarchies in hyperbolic space. Unlike prior methods with fixed curvature and rigid embeddings, ManifoldMind represents users, items, and tags as adaptive-curvature probabilistic spheres, enabling personalised uncertainty modeling and geometry-aware semantic exploration. A curvature-aware semantic kernel supports soft, multi-hop inference, allowing the model to explore diverse conceptual paths instead of overfitting to shallow or direct interactions. Experiments on four public benchmarks show superior NDCG, calibration, and diversity compared to strong baselines. ManifoldMind produces explicit reasoning traces, enabling transparent, trustworthy, and exploration-driven recommendations in sparse or abstract domains.

nan


Article 341

Title@2025-07-02 (3): Optimizing Methane Detection On Board Satellites: Speed, Accuracy, and Low-Power Solutions for Resource-Constrained Hardware

Title: Optimizing Methane Detection On Board Satellites: Speed, Accuracy, and Low-Power Solutions for Resource-Constrained Hardware Optimierung der Methan-Erkennung auf Board-Satelliten: Geschwindigkeit, Genauigkeit und Low-Power-Lösungen für ressourcenbeschränkte Hardware 优化板上卫星的甲烷探测:快速、准确和低功率办法 2507.01472v1

Authors (3): Jonáš Herec, Vít Růžička, Rado Pitoňák

Methane is a potent greenhouse gas, and detecting its leaks early via hyperspectral satellite imagery can help mitigate climate change. Meanwhile, many existing missions operate in manual tasking regimes only, thus missing potential events of interest. To overcome slow downlink rates cost-effectively, onboard detection is a viable solution. However, traditional methane enhancement methods are too computationally demanding for resource-limited onboard hardware. This work accelerates methane detection by focusing on efficient, low-power algorithms. We test fast target detection methods (ACE, CEM) that have not been previously used for methane detection and propose a Mag1c-SAS - a significantly faster variant of the current state-of-the-art algorithm for methane detection: Mag1c. To explore their true detection potential, we integrate them with a machine learning model (U-Net, LinkNet). Our results identify two promising candidates (Mag1c-SAS and CEM), both acceptably accurate for the detection of strong plumes and computationally efficient enough for onboard deployment: one optimized more for accuracy, the other more for speed, achieving up to ~100x and ~230x faster computation than original Mag1c on resource-limited hardware. Additionally, we propose and evaluate three band selection strategies. One of them can outperform the method traditionally used in the field while using fewer channels, leading to even faster processing without compromising accuracy. This research lays the foundation for future advancements in onboard methane detection with minimal hardware requirements, improving timely data delivery. The produced code, data, and models are open-sourced and can be accessed from https://github.com/zaitra/methane-filters-benchmark.

nan


Article 342

Title@2025-07-02 (3): Zero-Incentive Dynamics: a look at reward sparsity through the lens of unrewarded subgoals

Title: Zero-Incentive Dynamics: a look at reward sparsity through the lens of unrewarded subgoals Zero-Incentive Dynamics: ein Blick auf Belohnungssparsität durch die Linse der unerhörten Subgoals 零奖励零激励动态:从未获得回报的次级目标的角度来看待奖励的宽度 2507.01470v1

Authors (2): Yannick Molinghen, Tom Lenaerts

This work re-examines the commonly held assumption that the frequency of rewards is a reliable measure of task difficulty in reinforcement learning. We identify and formalize a structural challenge that undermines the effectiveness of current policy learning methods: when essential subgoals do not directly yield rewards. We characterize such settings as exhibiting zero-incentive dynamics, where transitions critical to success remain unrewarded. We show that state-of-the-art deep subgoal-based algorithms fail to leverage these dynamics and that learning performance is highly sensitive to the temporal proximity between subgoal completion and eventual reward. These findings reveal a fundamental limitation in current approaches and point to the need for mechanisms that can infer latent task structure without relying on immediate incentives.

nan


Article 343

Title@2025-07-02 (3): Cross-platform Smartphone Positioning at Museums

Title: Cross-platform Smartphone Positioning at Museums Plattformübergreifende Smartphone-Positionierung bei Museen 博物馆的跨平台智能手机定位 2507.01469v1

Authors (6): Alessio Ferrato, Fabio Gasparetti, Carla Limongelli, Stefano Mastandrea, Giuseppe Sansonetti, Joaquín Torres-Sospedra

Indoor Positioning Systems (IPSs) hold significant potential for enhancing visitor experiences in cultural heritage institutions. By enabling personalized navigation, efficient artifact organization, and better interaction with exhibits, IPSs can transform the modalities of how individuals engage with museums, galleries and libraries. However, these institutions face several challenges in implementing IPSs, including environmental constraints, technical limits, and limited experimentation. In other contexts, Received Signal Strength (RSS)-based approaches using Bluetooth Low Energy (BLE) and WiFi have emerged as preferred solutions due to their non-invasive nature and minimal infrastructure requirements. Nevertheless, the lack of publicly available RSS datasets that specifically reflect museum environments presents a substantial barrier to developing and evaluating positioning algorithms designed for the intricate spatial characteristics typical of cultural heritage sites. To address this limitation, we present BAR, a novel RSS dataset collected in front of 90 artworks across 13 museum rooms using two different platforms, i.e., Android and iOS. Additionally, we provide an advanced position classification baseline taking advantage of a proximity-based method and $k$-NN algorithms. In our analysis, we discuss the results and offer suggestions for potential research directions.

nan


Article 344

Title@2025-07-02 (3): Adaptive NAD: Online and Self-adaptive Unsupervised Network Anomaly Detector

Title: Adaptive NAD: Online and Self-adaptive Unsupervised Network Anomaly Detector Adaptive NAD: Online- und Self-Adaptive Unüberwachte Netzwerk-Anomaly Detector 适应性NAD: 在线和自适应性不受监督的网络异常探测器 2410.22967v4

Authors (3): Yachao Yuan, Yu Huang, Jin Wang

The widespread usage of the Internet of Things (IoT) has raised the risks of cyber threats, thus developing Anomaly Detection Systems (ADSs) that can adapt to evolving or new attacks is critical. Previous studies primarily focused on offline unsupervised learning methods to safeguard ADSs, which is not applicable in practical real-world applications. Besides, most of them strongly rely on assumptions of known legitimates and fail to satisfy the interpretable requirements in security applications, creating barriers to the adoption in practice. In this paper, we design Adaptive NAD, a general framework to improve and interpret online unsupervised anomaly detection in security domains. An interpretable two-layer anomaly detection strategy is proposed to generate reliable high-confidence pseudo-labels. Then, an online learning scheme is introduced to update Adaptive NAD by a novel threshold calculation technique to adapt to new threats. Experimental results demonstrate that Adaptive NAD achieves more than 5.4%, 23.0%, and 3.2% improvements in SPAUC compared with state-of-the-art solutions on the CIC-Darknet2020, CIC-DoHBrw-2020, and Edge-IIoTset datasets, respectively. The code is released at https://github.com/MyLearnCodeSpace/Adaptive-NAD.

nan


Article 345

Title@2025-07-02 (3): Symbolic identification of tensor equations in multidimensional physical fields

Title: Symbolic identification of tensor equations in multidimensional physical fields Symbolische Identifizierung von Tensorgleichungen in mehrdimensionalen physikalischen Feldern 在多层面物理领域对高方方形的符号识别 2507.01466v1

Authors (4): Tianyi Chen, Hao Yang, Wenjun Ma, Jun Zhang

Recently, data-driven methods have shown great promise for discovering governing equations from simulation or experimental data. However, most existing approaches are limited to scalar equations, with few capable of identifying tensor relationships. In this work, we propose a general data-driven framework for identifying tensor equations, referred to as Symbolic Identification of Tensor Equations (SITE). The core idea of SITE–representing tensor equations using a host-plasmid structure–is inspired by the multidimensional gene expression programming (M-GEP) approach. To improve the robustness of the evolutionary process, SITE adopts a genetic information retention strategy. Moreover, SITE introduces two key innovations beyond conventional evolutionary algorithms. First, it incorporates a dimensional homogeneity check to restrict the search space and eliminate physically invalid expressions. Second, it replaces traditional linear scaling with a tensor linear regression technique, greatly enhancing the efficiency of numerical coefficient optimization. We validate SITE using two benchmark scenarios, where it accurately recovers target equations from synthetic data, showing robustness to noise and small sample sizes. Furthermore, SITE is applied to identify constitutive relations directly from molecular simulation data, which are generated without reliance on macroscopic constitutive models. It adapts to both compressible and incompressible flow conditions and successfully identifies the corresponding macroscopic forms, highlighting its potential for data-driven discovery of tensor equation.

nan


Article 346

Title@2025-07-02 (3): Efficient Online Reinforcement Learning Fine-Tuning Need Not Retain Offline Data

Title: Efficient Online Reinforcement Learning Fine-Tuning Need Not Retain Offline Data Effizientes Online-Verstärktes Lernen Feinsteuerung braucht keine Offline-Daten 不保留离线数据 2412.07762v3

Authors (5): Zhiyuan Zhou, Andy Peng, Qiyang Li, Sergey Levine, Aviral Kumar

The modern paradigm in machine learning involves pre-training on diverse data, followed by task-specific fine-tuning. In reinforcement learning (RL), this translates to learning via offline RL on a diverse historical dataset, followed by rapid online RL fine-tuning using interaction data. Most RL fine-tuning methods require continued training on offline data for stability and performance. However, this is undesirable because training on diverse offline data is slow and expensive for large datasets, and in principle, also limit the performance improvement possible because of constraints or pessimism on offline data. In this paper, we show that retaining offline data is unnecessary as long as we use a properly-designed online RL approach for fine-tuning offline RL initializations. To build this approach, we start by analyzing the role of retaining offline data in online fine-tuning. We find that continued training on offline data is mostly useful for preventing a sudden divergence in the value function at the onset of fine-tuning, caused by a distribution mismatch between the offline data and online rollouts. This divergence typically results in unlearning and forgetting the benefits of offline pre-training. Our approach, Warm-start RL (WSRL), mitigates the catastrophic forgetting of pre-trained initializations using a very simple idea. WSRL employs a warmup phase that seeds the online RL run with a very small number of rollouts from the pre-trained policy to do fast online RL. The data collected during warmup helps ``recalibrate’’ the offline Q-function to the online distribution, allowing us to completely discard offline data without destabilizing the online RL fine-tuning. We show that WSRL is able to fine-tune without retaining any offline data, and is able to learn faster and attains higher performance than existing algorithms irrespective of whether they retain offline data or not.

nan


Article 347

Title@2025-07-02 (3): Tensor Program Optimization for the RISC-V Vector Extension Using Probabilistic Programs

Title: Tensor Program Optimization for the RISC-V Vector Extension Using Probabilistic Programs Tensor-Programmoptimierung für die RISC-Vektorerweiterung mittels probabilistischer Programme 利用概率方案优化RISC-V矢量扩展 2507.01457v1

Authors (3): Federico Nicolas Peccia, Frederik Haxel, Oliver Bringmann

RISC-V provides a flexible and scalable platform for applications ranging from embedded devices to high-performance computing clusters. Particularly, its RISC-V Vector Extension (RVV) becomes of interest for the acceleration of AI workloads. But writing software that efficiently utilizes the vector units of RISC-V CPUs without expert knowledge requires the programmer to rely on the autovectorization features of compilers or hand-crafted libraries like muRISCV-NN. Smarter approaches, like autotuning frameworks, have been missing the integration with the RISC-V RVV extension, thus heavily limiting the efficient deployment of complex AI workloads. In this paper, we present a workflow based on the TVM compiler to efficiently map AI workloads onto RISC-V vector units. Instead of relying on hand-crafted libraries, we integrated the RVV extension into TVM’s MetaSchedule framework, a probabilistic program framework for tensor operation tuning. We implemented different RISC-V SoCs on an FPGA and tuned a wide range of AI workloads on them. We found that our proposal shows a mean improvement of 46% in execution latency when compared against the autovectorization feature of GCC, and 29% against muRISCV-NN. Moreover, the binary resulting from our proposal has a smaller code memory footprint, making it more suitable for embedded devices. Finally, we also evaluated our solution on a commercially available RISC-V SoC implementing the RVV 1.0 Vector Extension and found our solution is able to find mappings that are 35% faster on average than the ones proposed by LLVM. We open-sourced our proposal for the community to expand it to target other RISC-V extensions.

nan


Article 348

Title@2025-07-02 (3): FE-LWS: Refined Image-Text Representations via Decoder Stacking and Fused Encodings for Remote Sensing Image Captioning

Title: FE-LWS: Refined Image-Text Representations via Decoder Stacking and Fused Encodings for Remote Sensing Image Captioning FE-LWS: Verfeinerte Bild-Text-Darstellungen über Decoder Stacking und Fused Encodings für Remote Sensing Image Captioning FE-LWS:通过用于遥感图像说明的解码器堆放和引信编码系统改进图像文本演示文示 2502.09282v2

Authors (2): Swadhin Das, Raksha Sharma

Remote sensing image captioning aims to generate descriptive text from remote sensing images, typically employing an encoder-decoder framework. In this setup, a convolutional neural network (CNN) extracts feature representations from the input image, which then guide the decoder in a sequence-to-sequence caption generation process. Although much research has focused on refining the decoder, the quality of image representations from the encoder remains crucial for accurate captioning. This paper introduces a novel approach that integrates features from two distinct CNN based encoders, capturing complementary information to enhance caption generation. Additionally, we propose a weighted averaging technique to combine the outputs of all GRUs in the stacked decoder. Furthermore, a comparison-based beam search strategy is incorporated to refine caption selection. The results demonstrate that our fusion-based approach, along with the enhanced stacked decoder, significantly outperforms both the transformer-based state-of-the-art model and other LSTM-based baselines.

nan


Article 349

Title@2025-07-02 (3): Rank-1 Matrix Completion with Gradient Descent and Small Random Initialization

Title: Rank-1 Matrix Completion with Gradient Descent and Small Random Initialization Rang-1-Matrix-Vervollständigung mit gradienten Abstieg und kleiner zufälliger Initialisierung RAN-1 矩阵完成, 以渐变源和小随机初始化完成 2212.09396v3

Authors (2): Daesung Kim, Hye Won Chung

The nonconvex formulation of the matrix completion problem has received significant attention in recent years due to its affordable complexity compared to the convex formulation. Gradient Descent (GD) is a simple yet efficient baseline algorithm for solving nonconvex optimization problems. The success of GD has been witnessed in many different problems in both theory and practice when it is combined with random initialization. However, previous works on matrix completion require either careful initialization or regularizers to prove the convergence of GD. In this paper, we study the rank-1 symmetric matrix completion and prove that GD converges to the ground truth when small random initialization is used. We show that in a logarithmic number of iterations, the trajectory enters the region where local convergence occurs. We provide an upper bound on the initialization size that is sufficient to guarantee the convergence, and show that a larger initialization can be used as more samples are available. We observe that the implicit regularization effect of GD plays a critical role in the analysis, and for the entire trajectory, it prevents each entry from becoming much larger than the others.

nan


Article 350

Title@2025-07-02 (3): Machine Learning Based Stress Testing Framework for Indian Financial Market Portfolios

Title: Machine Learning Based Stress Testing Framework for Indian Financial Market Portfolios Machine Learning Based Stress Testing Framework für indische Finanzmarktportfolios 印度金融市场投资组合机械学习压力测试框架 2507.02011v1

Authors (3): Vidya Sagar G, Shifat Ali, Siddhartha P. Chakrabarty

This paper presents a machine learning driven framework for sectoral stress testing in the Indian financial market, focusing on financial services, information technology, energy, consumer goods, and pharmaceuticals. Initially, we address the limitations observed in conventional stress testing through dimensionality reduction and latent factor modeling via Principal Component Analysis and Autoencoders. Building on this, we extend the methodology using Variational Autoencoders, which introduces a probabilistic structure to the latent space. This enables Monte Carlo-based scenario generation, allowing for more nuanced, distribution-aware simulation of stressed market conditions. The proposed framework captures complex non-linear dependencies and supports risk estimation through Value-at-Risk and Expected Shortfall. Together, these pipelines demonstrate the potential of Machine Learning approaches to improve the flexibility, robustness, and realism of financial stress testing.

nan


Article 351

Title@2025-07-02 (3): EdgeLoRA: An Efficient Multi-Tenant LLM Serving System on Edge Devices

Title: EdgeLoRA: An Efficient Multi-Tenant LLM Serving System on Edge Devices EdgeLoRA: Ein effizientes Multi-Tenant LLM Serving System auf Edge-Geräten EdgeloRA:一个高效的长长长长长长长长长长长长长长长长长长边缘装置服务系统 2507.01438v1

Authors (7): Zheyu Shen, Yexiao He, Ziyao Wang, Yuning Zhang, Guoheng Sun, Wanghao Ye, Ang Li

Large Language Models (LLMs) have gained significant attention due to their versatility across a wide array of applications. Fine-tuning LLMs with parameter-efficient adapters, such as Low-Rank Adaptation (LoRA), enables these models to efficiently adapt to downstream tasks without extensive retraining. Deploying fine-tuned LLMs on multi-tenant edge devices offers substantial benefits, such as reduced latency, enhanced privacy, and personalized responses. However, serving LLMs efficiently on resource-constrained edge devices presents critical challenges, including the complexity of adapter selection for different tasks and memory overhead from frequent adapter swapping. Moreover, given the multiple requests in multi-tenant settings, processing requests sequentially results in underutilization of computational resources and increased latency. This paper introduces EdgeLoRA, an efficient system for serving LLMs on edge devices in multi-tenant environments. EdgeLoRA incorporates three key innovations: (1) an adaptive adapter selection mechanism to streamline the adapter configuration process; (2) heterogeneous memory management, leveraging intelligent adapter caching and pooling to mitigate memory operation overhead; and (3) batch LoRA inference, enabling efficient batch processing to significantly reduce computational latency. Comprehensive evaluations using the Llama3.1-8B model demonstrate that EdgeLoRA significantly outperforms the status quo (i.e., llama.cpp) in terms of both latency and throughput. The results demonstrate that EdgeLoRA can achieve up to a 4 times boost in throughput. Even more impressively, it can serve several orders of magnitude more adapters simultaneously. These results highlight EdgeLoRA’s potential to transform edge deployment of LLMs in multi-tenant scenarios, offering a scalable and efficient solution for resource-constrained environments.

nan


Article 352

Title@2025-07-02 (3): Efficient Split Federated Learning for Large Language Models over Communication Networks

Title: Efficient Split Federated Learning for Large Language Models over Communication Networks Effizientes Split-Federated-Learning für große Sprachmodelle über Kommunikationsnetzwerke 针对通信网络的大型语言模式的高效分裂联邦学习 2504.14667v2

Authors (6): Kai Zhao, Zhaohui Yang, Ye Hu, Mingzhe Chen, Chen Zhu, Zhaoyang Zhang

Fine-tuning pre-trained large language models (LLMs) in a distributed manner poses significant challenges on resource-constrained edge networks. To address this challenge, we propose SflLLM, a novel framework that integrates split federated learning with parameter-efficient fine-tuning techniques. By leveraging model splitting and low-rank adaptation (LoRA), SflLLM reduces the computational burden on edge devices. Furthermore, the introduction of a federated server facilitates parallel training and enhances data privacy. To accommodate heterogeneous communication conditions and diverse computational capabilities of edge devices, as well as the impact of LoRA rank selection on model convergence and training cost, we formulate a joint optimization problem of both communication and computation resource. The formulated problem jointly optimizes subchannel allocation, power control, model splitting point selection, and LoRA rank configuration, aimed at minimizing total training delay. An iterative optimization algorithm is proposed to solve this problem efficiently. Specifically, a greedy heuristic is employed for subchannel allocation, the power control subproblem is reformulated as a convex optimization problem using auxiliary variables, and an exhaustive search is adopted for optimal split position and rank selection. Simulation results demonstrate that the proposed SflLLM framework achieves comparable model accuracy while significantly reducing client-side computational requirements. Furthermore, the proposed resource allocation scheme and adaptive LoRA rank selection strategy notably reduce the training latency compared to conventional approaches.

nan


Article 353

Title@2025-07-02 (3): Pensieve Grader: An AI-Powered, Ready-to-Use Platform for Effortless Handwritten STEM Grading

Title: Pensieve Grader: An AI-Powered, Ready-to-Use Platform for Effortless Handwritten STEM Grading Pensieve Grader: Eine KI-Powered, Ready-to-Use Plattform für mühelose handschriftliche STEM-Grading Pensieve grafer: 一个AI授权的无力手写STEM分级的现用平台 2507.01431v1

Authors (4): Yoonseok Yang, Minjune Kim, Marlon Rondinelli, Keren Shao

Grading handwritten, open-ended responses remains a major bottleneck in large university STEM courses. We introduce Pensieve (https://www.pensieve.co), an AI-assisted grading platform that leverages large language models (LLMs) to transcribe and evaluate student work, providing instructors with rubric-aligned scores, transcriptions, and confidence ratings. Unlike prior tools that focus narrowly on specific tasks like transcription or rubric generation, Pensieve supports the entire grading pipeline-from scanned student submissions to final feedback-within a human-in-the-loop interface. Pensieve has been deployed in real-world courses at over 20 institutions and has graded more than 300,000 student responses. We present system details and empirical results across four core STEM disciplines: Computer Science, Mathematics, Physics, and Chemistry. Our findings show that Pensieve reduces grading time by an average of 65%, while maintaining a 95.4% agreement rate with instructor-assigned grades for high-confidence predictions.

nan


Article 354

Title@2025-07-02 (3): Gradient Short-Circuit: Efficient Out-of-Distribution Detection via Feature Intervention

Title: Gradient Short-Circuit: Efficient Out-of-Distribution Detection via Feature Intervention Gradient Short-Circuit: Effiziente Out-of-Distribution-Erkennung über Feature-Intervention 渐进短路:通过地物干预有效外分配探测 2507.01417v1

Authors (3): Jiawei Gu, Ziyue Qiao, Zechao Li

Out-of-Distribution (OOD) detection is critical for safely deploying deep models in open-world environments, where inputs may lie outside the training distribution. During inference on a model trained exclusively with In-Distribution (ID) data, we observe a salient gradient phenomenon: around an ID sample, the local gradient directions for “enhancing” that sample’s predicted class remain relatively consistent, whereas OOD samples–unseen in training–exhibit disorganized or conflicting gradient directions in the same neighborhood. Motivated by this observation, we propose an inference-stage technique to short-circuit those feature coordinates that spurious gradients exploit to inflate OOD confidence, while leaving ID classification largely intact. To circumvent the expense of recomputing the logits after this gradient short-circuit, we further introduce a local first-order approximation that accurately captures the post-modification outputs without a second forward pass. Experiments on standard OOD benchmarks show our approach yields substantial improvements. Moreover, the method is lightweight and requires minimal changes to the standard inference pipeline, offering a practical path toward robust OOD detection in real-world applications.

nan


Article 355

Title@2025-07-02 (3): A deep solver for backward stochastic Volterra integral equations

Title: A deep solver for backward stochastic Volterra integral equations Ein tiefer Löser für rückwärts stochastische Volterra-Integralgleichungen 一个深深解器,用于反向蒸蒸蒸蒸蒸气伏特拉集成方程式 2505.18297v2

Authors (3): Kristoffer Andersson, Alessandro Gnoatto, Camilo Andrés García Trillos

We present the first deep-learning solver for backward stochastic Volterra integral equations (BSVIEs) and their fully-coupled forward-backward variants. The method trains a neural network to approximate the two solution fields in a single stage, avoiding the use of nested time-stepping cycles that limit classical algorithms. For the decoupled case we prove a non-asymptotic error bound composed of an a posteriori residual plus the familiar square root dependence on the time step. Numerical experiments confirm this rate and reveal two key properties: \emph{scalability}, in the sense that accuracy remains stable from low dimension up to 500 spatial variables while GPU batching keeps wall-clock time nearly constant; and \emph{generality}, since the same method handles coupled systems whose forward dynamics depend on the backward solution. These results open practical access to a family of high-dimensional, path-dependent problems in stochastic control and quantitative finance.

nan


Article 356

Title@2025-07-02 (3): Decomposing Prediction Mechanisms for In-Context Recall

Title: Decomposing Prediction Mechanisms for In-Context Recall Zersetzen von Vorhersagemechanismen für In-Context-Recall 内回顾的可预测性机制 2507.01414v1

Authors (6): Sultan Daniels, Dylan Davis, Dhruv Gautam, Wentinn Liao, Gireeja Ranade, Anant Sahai

We introduce a new family of toy problems that combine features of linear-regression-style continuous in-context learning (ICL) with discrete associative recall. We pretrain transformer models on sample traces from this toy, specifically symbolically-labeled interleaved state observations from randomly drawn linear deterministic dynamical systems. We study if the transformer models can recall the state of a sequence previously seen in its context when prompted to do so with the corresponding in-context label. Taking a closer look at this task, it becomes clear that the model must perform two functions: (1) identify which system’s state should be recalled and apply that system to its last seen state, and (2) continuing to apply the correct system to predict the subsequent states. Training dynamics reveal that the first capability emerges well into a model’s training. Surprisingly, the second capability, of continuing the prediction of a resumed sequence, develops much earlier. Via out-of-distribution experiments, and a mechanistic analysis on model weights via edge pruning, we find that next-token prediction for this toy problem involves at least two separate mechanisms. One mechanism uses the discrete symbolic labels to do the associative recall required to predict the start of a resumption of a previously seen sequence. The second mechanism, which is largely agnostic to the discrete symbolic labels, performs a “Bayesian-style” prediction based on the previous token and the context. These two mechanisms have different learning dynamics. To confirm that this multi-mechanism (manifesting as separate phase transitions) phenomenon is not just an artifact of our toy setting, we used OLMo training checkpoints on an ICL translation task to see a similar phenomenon: a decisive gap in the emergence of first-task-token performance vs second-task-token performance.

nan


Article 357

Title@2025-07-02 (3): Evaluating LLM Agent Collusion in Double Auctions

Title: Evaluating LLM Agent Collusion in Double Auctions Bewertung von LLM Agent Collusion in Doppelauktionen 评价双重拍卖中的LLM Agent串通 2507.01413v1

Authors (6): Kushal Agrawal, Verona Teo, Juan J. Vazquez, Sudarsh Kunnavakkam, Vishak Srikanth, Andy Liu

Large language models (LLMs) have demonstrated impressive capabilities as autonomous agents with rapidly expanding applications in various domains. As these agents increasingly engage in socioeconomic interactions, identifying their potential for undesirable behavior becomes essential. In this work, we examine scenarios where they can choose to collude, defined as secretive cooperation that harms another party. To systematically study this, we investigate the behavior of LLM agents acting as sellers in simulated continuous double auction markets. Through a series of controlled experiments, we analyze how parameters such as the ability to communicate, choice of model, and presence of environmental pressures affect the stability and emergence of seller collusion. We find that direct seller communication increases collusive tendencies, the propensity to collude varies across models, and environmental pressures, such as oversight and urgency from authority figures, influence collusive behavior. Our findings highlight important economic and ethical considerations for the deployment of LLM-based market agents.

nan


Article 358

Title@2025-07-02 (3): DeFusion: An Effective Decoupling Fusion Network for Multi-Modal Pregnancy Prediction

Title: DeFusion: An Effective Decoupling Fusion Network for Multi-Modal Pregnancy Prediction DeFusion: Ein effektives Entkopplungs-Fusionsnetzwerk für multi-Modalen Schwangerschaftsvorhersage DFusion:多模式妊娠预测有效脱钩融合网络 2501.04353v2

Authors (6): Xueqiang Ouyang, Jia Wei, Wenjie Huo, Xiaocong Wang, Rui Li, Jianlong Zhou

Temporal embryo images and parental fertility table indicators are both valuable for pregnancy prediction in \textbf{in vitro fertilization embryo transfer} (IVF-ET). However, current machine learning models cannot make full use of the complementary information between the two modalities to improve pregnancy prediction performance. In this paper, we propose a Decoupling Fusion Network called DeFusion to effectively integrate the multi-modal information for IVF-ET pregnancy prediction. Specifically, we propose a decoupling fusion module that decouples the information from the different modalities into related and unrelated information, thereby achieving a more delicate fusion. And we fuse temporal embryo images with a spatial-temporal position encoding, and extract fertility table indicator information with a table transformer. To evaluate the effectiveness of our model, we use a new dataset including 4046 cases collected from Southern Medical University. The experiments show that our model outperforms state-of-the-art methods. Meanwhile, the performance on the eye disease prediction dataset reflects the model’s good generalization. Our code is available at https://github.com/Ou-Young-1999/DFNet.

nan


Article 359

Title@2025-07-02 (3): Coherent Online Road Topology Estimation and Reasoning with Standard-Definition Maps

Title: Coherent Online Road Topology Estimation and Reasoning with Standard-Definition Maps Kohärente Online-Road-Topologie Schätzung und Begründung mit Standard-Definitionskarten 与标准定义地图一致的在线道路地形图估计和理由 2507.01397v1

Authors (5): Khanh Son Pham, Christian Witte, Jens Behley, Johannes Betz, Cyrill Stachniss

Most autonomous cars rely on the availability of high-definition (HD) maps. Current research aims to address this constraint by directly predicting HD map elements from onboard sensors and reasoning about the relationships between the predicted map and traffic elements. Despite recent advancements, the coherent online construction of HD maps remains a challenging endeavor, as it necessitates modeling the high complexity of road topologies in a unified and consistent manner. To address this challenge, we propose a coherent approach to predict lane segments and their corresponding topology, as well as road boundaries, all by leveraging prior map information represented by commonly available standard-definition (SD) maps. We propose a network architecture, which leverages hybrid lane segment encodings comprising prior information and denoising techniques to enhance training stability and performance. Furthermore, we facilitate past frames for temporal consistency. Our experimental evaluation demonstrates that our approach outperforms previous methods by a large margin, highlighting the benefits of our modeling scheme.

nan


Article 360

Title@2025-07-02 (3): Surrogate Modeling via Factorization Machine and Ising Model with Enhanced Higher-Order Interaction Learning

Title: Surrogate Modeling via Factorization Machine and Ising Model with Enhanced Higher-Order Interaction Learning Surrogate Modellierung über Factorization Machine und Ising Model mit erweitertem höhere-Order-Interaktions-Lernen 通过强化高阶互动学习的保理机和Ising模型的代理模拟 2507.01389v1

Authors (6): Anbang Wang, Dunbo Cai, Yu Zhang, Yangqing Huang, Xiangyang Feng, Zhihong Zhang

Recently, a surrogate model was proposed that employs a factorization machine to approximate the underlying input-output mapping of the original system, with quantum annealing used to optimize the resulting surrogate function. Inspired by this approach, we propose an enhanced surrogate model that incorporates additional slack variables into both the factorization machine and its associated Ising representation thereby unifying what was by design a two-step process into a single, integrated step. During the training phase, the slack variables are iteratively updated, enabling the model to account for higher-order feature interactions. We apply the proposed method to the task of predicting drug combination effects. Experimental results indicate that the introduction of slack variables leads to a notable improvement of performance. Our algorithm offers a promising approach for building efficient surrogate models that exploit potential quantum advantages.

nan


Article 361

Title@2025-07-02 (3): Momentum Does Not Reduce Stochastic Noise in Stochastic Gradient Descent

Title: Momentum Does Not Reduce Stochastic Noise in Stochastic Gradient Descent Momentum reduziert nicht stochastische Geräusche bei stochastischem Gradientenabstieg 沙粒渐变后代中不减少蒸汽噪音的动力 2402.02325v5

Authors (2): Naoki Sato, Hideaki Iiduka

For nonconvex objective functions, including those found in training deep neural networks, stochastic gradient descent (SGD) with momentum is said to converge faster and have better generalizability than SGD without momentum. In particular, adding momentum is thought to reduce stochastic noise. To verify this, we estimated the magnitude of gradient noise by using convergence analysis and an optimal batch size estimation formula and found that momentum does not reduce gradient noise. We also analyzed the effect of search direction noise, which is stochastic noise defined as the error between the search direction of the optimizer and the steepest descent direction, and found that it inherently smooths the objective function and that momentum does not reduce search direction noise either. Finally, an analysis of the degree of smoothing introduced by search direction noise revealed that adding momentum offers limited advantage to SGD.

nan


Article 362

Title@2025-07-02 (3): LUSD: Localized Update Score Distillation for Text-Guided Image Editing

Title: LUSD: Localized Update Score Distillation for Text-Guided Image Editing LUSD: Lokalisierte Update-Score-Destillation für textgeführte Bildbearbeitung LUSD: 用于文本引导图像编辑的地方化更新评分蒸馏 2503.11054v2

Authors (6): Worameth Chinchuthakun, Tossaporn Saengja, Nontawat Tritrong, Pitchaporn Rewatbowornwong, Pramook Khungurn, Supasorn Suwajanakorn

While diffusion models show promising results in image editing given a target prompt, achieving both prompt fidelity and background preservation remains difficult. Recent works have introduced score distillation techniques that leverage the rich generative prior of text-to-image diffusion models to solve this task without additional fine-tuning. However, these methods often struggle with tasks such as object insertion. Our investigation of these failures reveals significant variations in gradient magnitude and spatial distribution, making hyperparameter tuning highly input-specific or unsuccessful. To address this, we propose two simple yet effective modifications: attention-based spatial regularization and gradient filtering-normalization, both aimed at reducing these variations during gradient updates. Experimental results show our method outperforms state-of-the-art score distillation techniques in prompt fidelity, improving successful edits while preserving the background. Users also preferred our method over state-of-the-art techniques across three metrics, and by 58-64% overall.

nan


Article 363

Title@2025-07-02 (3): Direct Quantized Training of Language Models with Stochastic Rounding

Title: Direct Quantized Training of Language Models with Stochastic Rounding Direkte Quantisierte Schulung von Sprachmodellen mit stochastischer Rundung 直接量化的语言模式直接量化培训,并进行盘点四舍四入 2412.04787v2

Authors (6): Kaiyan Zhao, Tsuguchika Tabaru, Kenichi Kobayashi, Takumi Honda, Masafumi Yamazaki, Yoshimasa Tsuruoka

Although recent quantized Large Language Models (LLMs), such as BitNet, have paved the way for significant reduction in memory usage during deployment with binary or ternary weights, training these models still demands substantial memory footprints. This is partly because high-precision (i.e., unquantized) weights required for straight-through estimation must be maintained throughout the whole training process. To address this, we explore directly updating the quantized low-precision weights without relying on straight-through estimation during backpropagation, aiming to save memory usage during training. Specifically, we employ a stochastic rounding technique to minimize the information loss caused by the use of low-bit weights throughout training. Experimental results on our LLaMA-structured models of various sizes indicate that (1) training with only low-precision weights is feasible even when they are constrained to ternary values; (2) extending the bit width to 8 bits achieves performance on par with BitNet b1.58; (3) our models remain robust to precision scaling and memory reduction, showing minimal performance degradation when moving from FP32 to lower-memory environments (BF16/FP8); and (4) our models also support inference using ternary weights, showcasing their flexibility in deployment.

nan


Article 364

Title@2025-07-02 (3): Can Large Language Models Develop Strategic Reasoning? Post-training Insights from Learning Chess

Title: Can Large Language Models Develop Strategic Reasoning? Post-training Insights from Learning Chess Können große Sprachmodelle eine strategische Begründung entwickeln? 大语言模式能否制定战略理由? 培训后从学习象棋中透视 2507.00726v2

Authors (5): Dongyoon Hwang, Hojoon Lee, Jaegul Choo, Dongmin Park, Jongho Park

While reinforcement learning (RL) for large language models (LLMs) has shown promise in mathematical reasoning, strategic reasoning for LLMs using RL remains largely unexplored. We investigate whether LLMs can develop strategic reasoning capabilities through RL in chess. To this end, we leverage a chess-pretrained action-value network to provide dense reward on the LLM’s output move quality, which can be seen as a form of knowledge distillation. Our experiments show that our distillation-based dense rewards often outperform sparse binary rewards. However, surprisingly, all models plateau far below expert levels. We provide SFT and RL ablations on chess reasoning training and find evidence that this limitation stems from a deficit in the pretrained models’ internal understanding of chess–a deficit which RL alone may not be able to fully overcome.

nan


Article 365

Title@2025-07-02 (3): Feature Reweighting for EEG-based Motor Imagery Classification

Title: Feature Reweighting for EEG-based Motor Imagery Classification Feature-Umgewichtung für EEG-basierte Motor Imagery-Klassifikation 以EEG为基础的机动图象分类特点的重新加权 2308.02515v2

Authors (4): Taveena Lotey, Prateek Keserwani, Debi Prosad Dogra, Partha Pratim Roy

Classification of motor imagery (MI) using non-invasive electroencephalographic (EEG) signals is a critical objective as it is used to predict the intention of limb movements of a subject. In recent research, convolutional neural network (CNN) based methods have been widely utilized for MI-EEG classification. The challenges of training neural networks for MI-EEG signals classification include low signal-to-noise ratio, non-stationarity, non-linearity, and high complexity of EEG signals. The features computed by CNN-based networks on the highly noisy MI-EEG signals contain irrelevant information. Subsequently, the feature maps of the CNN-based network computed from the noisy and irrelevant features contain irrelevant information. Thus, many non-contributing features often mislead the neural network training and degrade the classification performance. Hence, a novel feature reweighting approach is proposed to address this issue. The proposed method gives a noise reduction mechanism named feature reweighting module that suppresses irrelevant temporal and channel feature maps. The feature reweighting module of the proposed method generates scores that reweight the feature maps to reduce the impact of irrelevant information. Experimental results show that the proposed method significantly improved the classification of MI-EEG signals of Physionet EEG-MMIDB and BCI Competition IV 2a datasets by a margin of 9.34% and 3.82%, respectively, compared to the state-of-the-art methods.

nan


Article 366

Title@2025-07-02 (3): Active Measurement: Efficient Estimation at Scale

Title: Active Measurement: Efficient Estimation at Scale Aktive Messung: Effiziente Abschätzung im Maßstab 活动计量: 规模上的有效估算 2507.01372v1

Authors (5): Max Hamilton, Jinlin Lai, Wenlong Zhao, Subhransu Maji, Daniel Sheldon

AI has the potential to transform scientific discovery by analyzing vast datasets with little human effort. However, current workflows often do not provide the accuracy or statistical guarantees that are needed. We introduce active measurement, a human-in-the-loop AI framework for scientific measurement. An AI model is used to predict measurements for individual units, which are then sampled for human labeling using importance sampling. With each new set of human labels, the AI model is improved and an unbiased Monte Carlo estimate of the total measurement is refined. Active measurement can provide precise estimates even with an imperfect AI model, and requires little human effort when the AI model is very accurate. We derive novel estimators, weighting schemes, and confidence intervals, and show that active measurement reduces estimation error compared to alternatives in several measurement tasks.

nan


Article 367

Title@2025-07-02 (3): Activation Reward Models for Few-Shot Model Alignment

Title: Activation Reward Models for Few-Shot Model Alignment Aktivierungs-Reward-Modelle für wenige scharfe Modellausrichtung 微小光模型对齐的激励奖得奖模型 2507.01368v1

Authors (11): Tianning Chai, Chancharik Mitra, Brandon Huang, Gautam Rajendrakumar Gare, Zhiqiu Lin, Assaf Arbelle, Leonid Karlinsky, Rogerio Feris, Trevor Darrell, Deva Ramanan, Roei Herzig

Aligning Large Language Models (LLMs) and Large Multimodal Models (LMMs) to human preferences is a central challenge in improving the quality of the models’ generative outputs for real-world applications. A common approach is to use reward modeling to encode preferences, enabling alignment via post-training using reinforcement learning. However, traditional reward modeling is not easily adaptable to new preferences because it requires a separate reward model, commonly trained on large preference datasets. To address this, we introduce Activation Reward Models (Activation RMs) – a novel few-shot reward modeling method that leverages activation steering to construct well-aligned reward signals using minimal supervision and no additional model finetuning. Activation RMs outperform existing few-shot reward modeling approaches such as LLM-as-a-judge with in-context learning, voting-based scoring, and token probability scoring on standard reward modeling benchmarks. Furthermore, we demonstrate the effectiveness of Activation RMs in mitigating reward hacking behaviors, highlighting their utility for safety-critical applications. Toward this end, we propose PreferenceHack, a novel few-shot setting benchmark, the first to test reward models on reward hacking in a paired preference format. Finally, we show that Activation RM achieves state-of-the-art performance on this benchmark, surpassing even GPT-4o.

nan


Article 368

Title@2025-07-02 (3): Continual Learning with Strategic Selection and Forgetting for Network Intrusion Detection

Title: Continual Learning with Strategic Selection and Forgetting for Network Intrusion Detection Kontinuierliches Lernen mit strategischer Auswahl und Vergessen für Netzwerkintrusionserkennung 利用战略选择和遗忘的不断学习,以探测网络入侵 2412.16264v4

Authors (7): Xinchen Zhang, Running Zhao, Zhihan Jiang, Handi Chen, Yulong Ding, Edith C. H. Ngai, Shuang-Hua Yang

Intrusion Detection Systems (IDS) are crucial for safeguarding digital infrastructure. In dynamic network environments, both threat landscapes and normal operational behaviors are constantly changing, resulting in concept drift. While continuous learning mitigates the adverse effects of concept drift, insufficient attention to drift patterns and excessive preservation of outdated knowledge can still hinder the IDS’s adaptability. In this paper, we propose SSF (Strategic Selection and Forgetting), a novel continual learning method for IDS, providing continuous model updates with a constantly refreshed memory buffer. Our approach features a strategic sample selection algorithm to select representative new samples and a strategic forgetting mechanism to drop outdated samples. The proposed strategic sample selection algorithm prioritizes new samples that cause the `drifted’ pattern, enabling the model to better understand the evolving landscape. Additionally, we introduce strategic forgetting upon detecting significant drift by discarding outdated samples to free up memory, allowing the incorporation of more recent data. SSF captures evolving patterns effectively and ensures the model is aligned with the change of data patterns, significantly enhancing the IDS’s adaptability to concept drift. The state-of-the-art performance of SSF on NSL-KDD and UNSW-NB15 datasets demonstrates its superior adaptability to concept drift for network intrusion detection. The code is released at https://github.com/xinchen930/SSF-Strategic-Selection-and-Forgetting.

nan


Article 369

Title@2025-07-02 (3): Tightly-Coupled LiDAR-IMU-Leg Odometry with Online Learned Leg Kinematics Incorporating Foot Tactile Information

Title: Tightly-Coupled LiDAR-IMU-Leg Odometry with Online Learned Leg Kinematics Incorporating Foot Tactile Information Eng gekoppelte LiDAR-IMU-Leg-Odometrie mit Online-Lektionskinematik, die Fußtaktile Informationen enthält LIDAR-IMU-Leg Odo量度与包含脚触觉信息的在线学习腿脚力学 2506.09548v2

Authors (7): Taku Okawara, Kenji Koide, Aoki Takanose, Shuji Oishi, Masashi Yokozuka, Kentaro Uno, Kazuya Yoshida

In this letter, we present tightly coupled LiDAR-IMU-leg odometry, which is robust to challenging conditions such as featureless environments and deformable terrains. We developed an online learning-based leg kinematics model named the neural leg kinematics model, which incorporates tactile information (foot reaction force) to implicitly express the nonlinear dynamics between robot feet and the ground. Online training of this model enhances its adaptability to weight load changes of a robot (e.g., assuming delivery or transportation tasks) and terrain conditions. According to the \textit{neural adaptive leg odometry factor} and online uncertainty estimation of the leg kinematics model-based motion predictions, we jointly solve online training of this kinematics model and odometry estimation on a unified factor graph to retain the consistency of both. The proposed method was verified through real experiments using a quadruped robot in two challenging situations: 1) a sandy beach, representing an extremely featureless area with a deformable terrain, and 2) a campus, including multiple featureless areas and terrain types of asphalt, gravel (deformable terrain), and grass. Experimental results showed that our odometry estimation incorporating the \textit{neural leg kinematics model} outperforms state-of-the-art works. Our project page is available for further details: https://takuokawara.github.io/RAL2025_project_page/

nan


Article 370

Title@2025-07-02 (3): Efficient Kilometer-Scale Precipitation Downscaling with Conditional Wavelet Diffusion

Title: Efficient Kilometer-Scale Precipitation Downscaling with Conditional Wavelet Diffusion Effiziente Kilometer-Scale Niederschlagsabsenkung mit bedingter Wavelet-Diffusion 与有条件的波子扩散相降幅 2507.01354v1

Authors (5): Chugang Yi, Minghan Yu, Weikang Qian, Yixin Wen, Haizhao Yang

Effective hydrological modeling and extreme weather analysis demand precipitation data at a kilometer-scale resolution, which is significantly finer than the 10 km scale offered by standard global products like IMERG. To address this, we propose the Wavelet Diffusion Model (WDM), a generative framework that achieves 10x spatial super-resolution (downscaling to 1 km) and delivers a 9x inference speedup over pixel-based diffusion models. WDM is a conditional diffusion model that learns the learns the complex structure of precipitation from MRMS radar data directly in the wavelet domain. By focusing on high-frequency wavelet coefficients, it generates exceptionally realistic and detailed 1-km precipitation fields. This wavelet-based approach produces visually superior results with fewer artifacts than pixel-space models, and delivers a significant gains in sampling efficiency. Our results demonstrate that WDM provides a robust solution to the dual challenges of accuracy and speed in geoscience super-resolution, paving the way for more reliable hydrological forecasts.

nan


Article 371

Title@2025-07-02 (3): Dataset Distillation via the Wasserstein Metric

Title: Dataset Distillation via the Wasserstein Metric Datensatz Destillation über die Wasserstein Metric 通过瓦塞斯坦测量仪进行数据集蒸馏 2311.18531v3

Authors (8): Haoyang Liu, Yijiang Li, Tiancheng Xing, Peiran Wang, Vibhu Dalal, Luwei Li, Jingrui He, Haohan Wang

Dataset Distillation (DD) aims to generate a compact synthetic dataset that enables models to achieve performance comparable to training on the full large dataset, significantly reducing computational costs. Drawing from optimal transport theory, we introduce WMDD (Wasserstein Metric-based Dataset Distillation), a straightforward yet powerful method that employs the Wasserstein metric to enhance distribution matching. We compute the Wasserstein barycenter of features from a pretrained classifier to capture essential characteristics of the original data distribution. By optimizing synthetic data to align with this barycenter in feature space and leveraging per-class BatchNorm statistics to preserve intra-class variations, WMDD maintains the efficiency of distribution matching approaches while achieving state-of-the-art results across various high-resolution datasets. Our extensive experiments demonstrate WMDD’s effectiveness and adaptability, highlighting its potential for advancing machine learning applications at scale.

nan


Article 372

Title@2025-07-02 (3): Aitomia: Your Intelligent Assistant for AI-Driven Atomistic and Quantum Chemical Simulations

Title: Aitomia: Your Intelligent Assistant for AI-Driven Atomistic and Quantum Chemical Simulations Aitomia: Ihr intelligenter Assistent für KI-getriebene Atomistische und Quantum Chemical Simulationen Aitomia:您对AI-Driven原子学和量子化学模拟的智能助理 2505.08195v2

Authors (6): Jinming Hu, Hassan Nawaz, Yuting Rui, Lijie Chi, Arif Ullah, Pavlo O. Dral

We have developed Aitomia - a platform powered by AI to assist in performing AI-driven atomistic and quantum chemical (QC) simulations. This evolving intelligent assistant platform is equipped with chatbots and AI agents to help experts and guide non-experts in setting up and running the atomistic simulations, monitoring their computation status, analyzing the simulation results, and summarizing them for the user in text and graphical forms. We achieve these goals by exploiting open-source large language models (LLMs, original and fine-tuned), rule-based agents, and a retrieval-augmented generation (RAG) system. Aitomia leverages the versatility of our MLatom ecosystem, supporting AI-enhanced computational chemistry tasks ranging from ground- to excited-state calculations such as geometry optimizations, thermochemistry, and spectra calculations. Aitomia is the first intelligent assistant publicly accessible online on a cloud computing platform for atomistic simulations of broad scope (Aitomistic Hub at https://aitomistic.xyz), while it may also be deployed locally as described at http://mlatom.com/aitomia. Aitomia is expected to lower the barrier to performing atomistic simulations, democratizing simulations, and accelerating research and development in the relevant fields.

nan


Article 373

Title@2025-07-02 (3): Pre-training Large Memory Language Models with Internal and External Knowledge

Title: Pre-training Large Memory Language Models with Internal and External Knowledge Vorschulung großer Speicher Sprachmodelle mit internem und externem Wissen 具有内部和外部知识的大型记忆语言模型 2505.15962v2

Authors (8): Linxi Zhao, Sofian Zalouk, Christian K. Belardi, Justin Lovelace, Jin Peng Zhou, Kilian Q. Weinberger, Yoav Artzi, Jennifer J. Sun

Neural language models are black-boxes – both linguistic patterns and factual knowledge are distributed across billions of opaque parameters. This entangled encoding makes it difficult to reliably inspect, verify, or update specific facts. We propose a new class of language models, Large Memory Language Models (LMLM) with a pre-training recipe that stores factual knowledge in both internal weights and an external database. Our approach strategically masks externally retrieved factual values from the training loss, thereby teaching the model to perform targeted lookups rather than relying on memorization in model weights. Our experiments demonstrate that LMLMs achieve competitive performance compared to significantly larger, knowledge-dense LLMs on standard benchmarks, while offering the advantages of explicit, editable, and verifiable knowledge bases. This work represents a fundamental shift in how language models interact with and manage factual knowledge.

nan


Article 374

Title@2025-07-02 (3): Backdooring Bias (B^2) into Stable Diffusion Models

Title: Backdooring Bias (B^2) into Stable Diffusion Models Hintertürige Bias (B^2) in stabile Diffusionsmodelle 后门Bisas (B2) 进入稳定扩散模型 2406.15213v3

Authors (4): Ali Naseh, Jaechul Roh, Eugene Bagdasaryan, Amir Houmansadr

Recent advances in large text-conditional diffusion models have revolutionized image generation by enabling users to create realistic, high-quality images from textual prompts, significantly enhancing artistic creation and visual communication. However, these advancements also introduce an underexplored attack opportunity: the possibility of inducing biases by an adversary into the generated images for malicious intentions, e.g., to influence public opinion and spread propaganda. In this paper, we study an attack vector that allows an adversary to inject arbitrary bias into a target model. The attack leverages low-cost backdooring techniques using a targeted set of natural textual triggers embedded within a small number of malicious data samples produced with public generative models. An adversary could pick common sequences of words that can then be inadvertently activated by benign users during inference. We investigate the feasibility and challenges of such attacks, demonstrating how modern generative models have made this adversarial process both easier and more adaptable. On the other hand, we explore various aspects of the detectability of such attacks and demonstrate that the model’s utility remains intact in the absence of the triggers. Our extensive experiments using over 200,000 generated images and against hundreds of fine-tuned models demonstrate the feasibility of the presented backdoor attack. We illustrate how these biases maintain strong text-image alignment, highlighting the challenges in detecting biased images without knowing that bias in advance. Our cost analysis confirms the low financial barrier ($10-$15) to executing such attacks, underscoring the need for robust defensive strategies against such vulnerabilities in diffusion models.

nan


Article 375

Title@2025-07-02 (3): Drug Discovery SMILES-to-Pharmacokinetics Diffusion Models with Deep Molecular Understanding

Title: Drug Discovery SMILES-to-Pharmacokinetics Diffusion Models with Deep Molecular Understanding Drug Discovery SMILES-to-Pharmacokinetics Diffusion Modelle mit tiefem molekularen Verständnis 具有深分子理解的药物发现SMILES到药用动力学扩散模型 2408.07636v2

Authors (3): Bing Hu, Anita Layton, Helen Chen

Artificial intelligence (AI) is increasingly used in every stage of drug development. One challenge facing drug discovery AI is that drug pharmacokinetic (PK) datasets are often collected independently from each other, often with limited overlap, creating data overlap sparsity. Data sparsity makes data curation difficult for researchers looking to answer research questions in poly-pharmacy, drug combination research, and high-throughput screening. We propose Imagand, a novel SMILES-to-Pharmacokinetic (S2PK) diffusion model capable of generating an array of PK target properties conditioned on SMILES inputs. We show that Imagand-generated synthetic PK data closely resembles real data univariate and bivariate distributions, and improves performance for downstream tasks. Imagand is a promising solution for data overlap sparsity and allows researchers to efficiently generate ligand PK data for drug discovery research. Code is available at https://github.com/bing1100/Imagand.

nan


Article 376

Title@2025-07-02 (3): Diffusion Policies for Risk-Averse Behavior Modeling in Offline Reinforcement Learning

Title: Diffusion Policies for Risk-Averse Behavior Modeling in Offline Reinforcement Learning Diffusionsrichtlinien für risikoorientierte Verhaltensmodelle im Offline-Verstärkungslernen 关于在离线强化学习中建立风险与规避行为模拟模型的政策的传播 2403.17646v2

Authors (4): Xiaocong Chen, Siyu Wang, Tong Yu, Lina Yao

Offline reinforcement learning (RL) presents distinct challenges as it relies solely on observational data. A central concern in this context is ensuring the safety of the learned policy by quantifying uncertainties associated with various actions and environmental stochasticity. Traditional approaches primarily emphasize mitigating epistemic uncertainty by learning risk-averse policies, often overlooking environmental stochasticity. In this study, we propose an uncertainty-aware distributional offline RL method to simultaneously address both epistemic uncertainty and environmental stochasticity. We propose a model-free offline RL algorithm capable of learning risk-averse policies and characterizing the entire distribution of discounted cumulative rewards, as opposed to merely maximizing the expected value of accumulated discounted returns. Our method is rigorously evaluated through comprehensive experiments in both risk-sensitive and risk-neutral benchmarks, demonstrating its superior performance.

nan


Article 377

Title@2025-07-02 (3): Reasoner for Real-World Event Detection: Scaling Reinforcement Learning via Adaptive Perplexity-Aware Sampling Strategy

Title: Reasoner for Real-World Event Detection: Scaling Reinforcement Learning via Adaptive Perplexity-Aware Sampling Strategy Begründung für die Erkennung von Real-World-Events: Scaling-Verstärkung durch adaptive Perplexity-Aware Sampling-Strategie 现实世界事件发现的理由:通过适应性重复-软件抽样战略加强学习 2507.01327v1

Authors (7): Xiaoyun Zhang, Jingqing Ruan, Xing Ma, Yawen Zhu, Jiansong Chen, Ke Zeng, Xunliang Cai

Detecting abnormal events in real-world customer service dialogues is highly challenging due to the complexity of business data and the dynamic nature of customer interactions. Moreover, models must demonstrate strong out-of-domain (OOD) generalization to enable rapid adaptation across different business scenarios and maximize commercial value. In this work, we propose a novel Adaptive Perplexity-Aware Reinforcement Learning (APARL) framework that leverages the advanced reasoning capabilities of large language models for abnormal event detection. APARL introduces a dual-loop dynamic curriculum learning architecture, enabling the model to progressively focus on more challenging samples as its proficiency increases. This design effectively addresses performance bottlenecks and significantly enhances OOD transferability. Extensive evaluations on food delivery dialogue tasks show that our model achieves significantly enhanced adaptability and robustness, attaining the highest F1 score with an average improvement of 17.19\%, and an average improvement of 9.59\% in OOD transfer tests. This method provides a superior solution for industrial deployment of anomaly detection models, contributing to improved operational efficiency and commercial benefits.

nan


Article 378

Title@2025-07-02 (3): SWinMamba: Serpentine Window State Space Model for Vascular Segmentation

Title: SWinMamba: Serpentine Window State Space Model for Vascular Segmentation SWinMamba: Serpentine Fenster Zustand Raummodell für Vaskuläre Segmentierung SWinMamba: 蛇角窗口状态血管分割空间模型 2507.01323v1

Authors (3): Rongchang Zhao, Huanchi Liu, Jian Zhang

Vascular segmentation in medical images is crucial for disease diagnosis and surgical navigation. However, the segmented vascular structure is often discontinuous due to its slender nature and inadequate prior modeling. In this paper, we propose a novel Serpentine Window Mamba (SWinMamba) to achieve accurate vascular segmentation. The proposed SWinMamba innovatively models the continuity of slender vascular structures by incorporating serpentine window sequences into bidirectional state space models. The serpentine window sequences enable efficient feature capturing by adaptively guiding global visual context modeling to the vascular structure. Specifically, the Serpentine Window Tokenizer (SWToken) adaptively splits the input image using overlapping serpentine window sequences, enabling flexible receptive fields (RFs) for vascular structure modeling. The Bidirectional Aggregation Module (BAM) integrates coherent local features in the RFs for vascular continuity representation. In addition, dual-domain learning with Spatial-Frequency Fusion Unit (SFFU) is designed to enhance the feature representation of vascular structure. Extensive experiments on three challenging datasets demonstrate that the proposed SWinMamba achieves superior performance with complete and connected vessels.

nan


Article 379

Title@2025-07-02 (3): ICLShield: Exploring and Mitigating In-Context Learning Backdoor Attacks

Title: ICLShield: Exploring and Mitigating In-Context Learning Backdoor Attacks ICLShield: In-Context-Lernen Hintertürangriffe erforschen und abmildern ILLShield:探索和减少内线学习后门攻击 2507.01321v1

Authors (4): Zhiyao Ren, Siyuan Liang, Aishan Liu, Dacheng Tao

In-context learning (ICL) has demonstrated remarkable success in large language models (LLMs) due to its adaptability and parameter-free nature. However, it also introduces a critical vulnerability to backdoor attacks, where adversaries can manipulate LLM behaviors by simply poisoning a few ICL demonstrations. In this paper, we propose, for the first time, the dual-learning hypothesis, which posits that LLMs simultaneously learn both the task-relevant latent concepts and backdoor latent concepts within poisoned demonstrations, jointly influencing the probability of model outputs. Through theoretical analysis, we derive an upper bound for ICL backdoor effects, revealing that the vulnerability is dominated by the concept preference ratio between the task and the backdoor. Motivated by these findings, we propose ICLShield, a defense mechanism that dynamically adjusts the concept preference ratio. Our method encourages LLMs to select clean demonstrations during the ICL phase by leveraging confidence and similarity scores, effectively mitigating susceptibility to backdoor attacks. Extensive experiments across multiple LLMs and tasks demonstrate that our method achieves state-of-the-art defense effectiveness, significantly outperforming existing approaches (+26.02% on average). Furthermore, our method exhibits exceptional adaptability and defensive performance even for closed-source models (e.g., GPT-4).

nan


Article 380

Title@2025-07-02 (3): Fourier Series Guided Design of Quantum Convolutional Neural Networks for Enhanced Time Series Forecasting

Title: Fourier Series Guided Design of Quantum Convolutional Neural Networks for Enhanced Time Series Forecasting Fourier Series Guided Design Quantum Convolutional Neural Networks for Enhanced Time Series Forecasting 用于增强时间序列预测的量子进量神经网络的四ier系列制导设计 2404.15377v3

Authors (4): Sandra Leticia Juárez Osorio, Mayra Alejandra Rivera Ruiz, Andres Mendez-Vazquez, Eduardo Rodriguez-Tello

In this study, we apply 1D quantum convolution to address the task of time series forecasting. By encoding multiple points into the quantum circuit to predict subsequent data, each point becomes a feature, transforming the problem into a multidimensional one. Building on theoretical foundations from prior research, which demonstrated that Variational Quantum Circuits (VQCs) can be expressed as multidimensional Fourier series, we explore the capabilities of different architectures and ansatz. This analysis considers the concepts of circuit expressibility and the presence of barren plateaus. Analyzing the problem within the framework of the Fourier series enabled the design of an architecture that incorporates data reuploading, resulting in enhanced performance. Rather than a strict requirement for the number of free parameters to exceed the degrees of freedom of the Fourier series, our findings suggest that even a limited number of parameters can produce Fourier functions of higher degrees. This highlights the remarkable expressive power of quantum circuits. This observation is also significant in reducing training times. The ansatz with greater expressibility and number of non-zero Fourier coefficients consistently delivers favorable results across different scenarios, with performance metrics improving as the number of qubits increases.

nan


Article 381

Title@2025-07-02 (3): Neural Hamiltonian Operator

Title: Neural Hamiltonian Operator Neural Hamiltonian Betreiber 神经汉密尔顿操作员 2507.01313v1

Authors (1): Qian Qi

Stochastic control problems in high dimensions are notoriously difficult to solve due to the curse of dimensionality. An alternative to traditional dynamic programming is Pontryagin’s Maximum Principle (PMP), which recasts the problem as a system of Forward-Backward Stochastic Differential Equations (FBSDEs). In this paper, we introduce a formal framework for solving such problems with deep learning by defining a \textbf{Neural Hamiltonian Operator (NHO)}. This operator parameterizes the coupled FBSDE dynamics via neural networks that represent the feedback control and an ansatz for the value function’s spatial gradient. We show how the optimal NHO can be found by training the underlying networks to enforce the consistency conditions dictated by the PMP. By adopting this operator-theoretic view, we situate the deep FBSDE method within the rigorous language of statistical inference, framing it as a problem of learning an unknown operator from simulated data. This perspective allows us to prove the universal approximation capabilities of NHOs under general martingale drivers and provides a clear lens for analyzing the significant optimization challenges inherent to this class of models.

nan


Article 382

Title@2025-07-02 (3): DiffusionLight-Turbo: Accelerated Light Probes for Free via Single-Pass Chrome Ball Inpainting

Title: DiffusionLight-Turbo: Accelerated Light Probes for Free via Single-Pass Chrome Ball Inpainting DiffusionLight-Turbo: Beschleunigte Lichtsonden kostenlos über Single-Pass Chrome Ball Inpainting DifulLight-Turbo:通过单塑料铬球涂漆免费快速光探测 2507.01305v1

Authors (6): Worameth Chinchuthakun, Pakkapon Phongthawee, Amit Raj, Varun Jampani, Pramook Khungurn, Supasorn Suwajanakorn

We introduce a simple yet effective technique for estimating lighting from a single low-dynamic-range (LDR) image by reframing the task as a chrome ball inpainting problem. This approach leverages a pre-trained diffusion model, Stable Diffusion XL, to overcome the generalization failures of existing methods that rely on limited HDR panorama datasets. While conceptually simple, the task remains challenging because diffusion models often insert incorrect or inconsistent content and cannot readily generate chrome balls in HDR format. Our analysis reveals that the inpainting process is highly sensitive to the initial noise in the diffusion process, occasionally resulting in unrealistic outputs. To address this, we first introduce DiffusionLight, which uses iterative inpainting to compute a median chrome ball from multiple outputs to serve as a stable, low-frequency lighting prior that guides the generation of a high-quality final result. To generate high-dynamic-range (HDR) light probes, an Exposure LoRA is fine-tuned to create LDR images at multiple exposure values, which are then merged. While effective, DiffusionLight is time-intensive, requiring approximately 30 minutes per estimation. To reduce this overhead, we introduce DiffusionLight-Turbo, which reduces the runtime to about 30 seconds with minimal quality loss. This 60x speedup is achieved by training a Turbo LoRA to directly predict the averaged chrome balls from the iterative process. Inference is further streamlined into a single denoising pass using a LoRA swapping technique. Experimental results that show our method produces convincing light estimates across diverse settings and demonstrates superior generalization to in-the-wild scenarios. Our code is available at https://diffusionlight.github.io/turbo

nan


Article 383

Title@2025-07-02 (3): Empirical Bayes Estimation for Lasso-Type Regularizers: Analysis of Automatic Relevance Determination

Title: Empirical Bayes Estimation for Lasso-Type Regularizers: Analysis of Automatic Relevance Determination Empirische Bayes-Schätzung für Lasso-Typ-Regularisatoren: Analyse der automatischen Relevanzbestimmung Lasso-Type正规化器的经验性贝量估计:自动相关性确定分析 2501.11280v4

Authors (2): Tsukasa Yoshida, Kazuho Watanabe

This paper focuses on linear regression models with non-conjugate sparsity-inducing regularizers such as lasso and group lasso. Although the empirical Bayes approach enables us to estimate the regularization parameter, little is known on the properties of the estimators. In particular, many aspects regarding the specific conditions under which the mechanism of automatic relevance determination (ARD) occurs remain unexplained. In this paper, we derive the empirical Bayes estimators for the group lasso regularized linear regression models with limited parameters. It is shown that the estimators diverge under a specific condition, giving rise to the ARD mechanism. We also prove that empirical Bayes methods can produce the ARD mechanism in general regularized linear regression models and clarify the conditions under which models such as ridge, lasso, and group lasso can do so.

nan


Article 384

Title@2025-07-02 (3): OralBBNet: Spatially Guided Dental Segmentation of Panoramic X-Rays with Bounding Box Priors

Title: OralBBNet: Spatially Guided Dental Segmentation of Panoramic X-Rays with Bounding Box Priors OralBBNet: Spatially Guided Dental Segmentierung von Panorama X-Rays mit Bounding Box Priors 口服BBNet:全景X光空间引导牙科分离 2406.03747v3

Authors (6): Devichand Budagam, Azamat Zhanatuly Imanbayev, Iskander Rafailovich Akhmetov, Aleksandr Sinitca, Sergey Antonov, Dmitrii Kaplun

Teeth segmentation and recognition play a vital role in a variety of dental applications and diagnostic procedures. The integration of deep learning models has facilitated the development of precise and automated segmentation methods. Although prior research has explored teeth segmentation, not many methods have successfully performed tooth segmentation and detection simultaneously. This study presents UFBA-425, a dental dataset derived from the UFBA-UESC dataset, featuring bounding box and polygon annotations for 425 panoramic dental X-rays. In addition, this paper presents the OralBBNet architecture, which is based on the best segmentation and detection qualities of architectures such as U-Net and YOLOv8, respectively. OralBBNet is designed to improve the accuracy and robustness of tooth classification and segmentation on panoramic X-rays by leveraging the complementary strengths of U-Net and YOLOv8. Our approach achieved a 1-3% improvement in mean average precision (mAP) for tooth detection compared to existing techniques and a 15-20% improvement in the dice score for teeth segmentation over state-of-the-art (SOTA) solutions for various tooth categories and 2-4% improvement in the dice score compared to other SOTA segmentation architectures. The results of this study establish a foundation for the wider implementation of object detection models in dental diagnostics.

nan


Article 385

Title@2025-07-02 (3): Far From Sight, Far From Mind: Inverse Distance Weighting for Graph Federated Recommendation

Title: Far From Sight, Far From Mind: Inverse Distance Weighting for Graph Federated Recommendation Weit weg vom Sehen, weit weg vom Denken: Inverse Distanzgewichtung für Graph Federated Empfehlung 远离视觉,远离心智:对 “ 绿联建议 “ 的反距离加权 2507.01285v1

Authors (4): Aymen Rayane Khouas, Mohamed Reda Bouadjenek, Hakim Hacid, Sunil Aryal

Graph federated recommendation systems offer a privacy-preserving alternative to traditional centralized recommendation architectures, which often raise concerns about data security. While federated learning enables personalized recommendations without exposing raw user data, existing aggregation methods overlook the unique properties of user embeddings in this setting. Indeed, traditional aggregation methods fail to account for their complexity and the critical role of user similarity in recommendation effectiveness. Moreover, evolving user interactions require adaptive aggregation while preserving the influence of high-relevance anchor users (the primary users before expansion in graph-based frameworks). To address these limitations, we introduce Dist-FedAvg, a novel distance-based aggregation method designed to enhance personalization and aggregation efficiency in graph federated learning. Our method assigns higher aggregation weights to users with similar embeddings, while ensuring that anchor users retain significant influence in local updates. Empirical evaluations on multiple datasets demonstrate that Dist-FedAvg consistently outperforms baseline aggregation techniques, improving recommendation accuracy while maintaining seamless integration into existing federated learning frameworks.

nan


Article 386

Title@2025-07-02 (3): VLAD: A VLM-Augmented Autonomous Driving Framework with Hierarchical Planning and Interpretable Decision Process

Title: VLAD: A VLM-Augmented Autonomous Driving Framework with Hierarchical Planning and Interpretable Decision Process VLAD: Ein VLM-Augmented Autonomous Driving Framework mit Hierarchical Planning und Interpretable Decision Process VLAD: 具有等级规划和解释性决定程序的VLM增强自主驱动框架 2507.01284v1

Authors (5): Cristian Gariboldi, Hayato Tokida, Ken Kinjo, Yuki Asada, Alexander Carballo

Recent advancements in open-source Visual Language Models (VLMs) such as LLaVA, Qwen-VL, and Llama have catalyzed extensive research on their integration with diverse systems. The internet-scale general knowledge encapsulated within these models presents significant opportunities for enhancing autonomous driving perception, prediction, and planning capabilities. In this paper we propose VLAD, a vision-language autonomous driving model, which integrates a fine-tuned VLM with VAD, a state-of-the-art end-to-end system. We implement a specialized fine-tuning approach using custom question-answer datasets designed specifically to improve the spatial reasoning capabilities of the model. The enhanced VLM generates high-level navigational commands that VAD subsequently processes to guide vehicle operation. Additionally, our system produces interpretable natural language explanations of driving decisions, thereby increasing transparency and trustworthiness of the traditionally black-box end-to-end architecture. Comprehensive evaluation on the real-world nuScenes dataset demonstrates that our integrated system reduces average collision rates by 31.82% compared to baseline methodologies, establishing a new benchmark for VLM-augmented autonomous driving systems.

nan


Article 387

Title@2025-07-02 (3): A Framework for Mining Collectively-Behaving Bots in MMORPGs

Title: A Framework for Mining Collectively-Behaving Bots in MMORPGs Ein Rahmen für den Abbau kollektiv verhaltender Bots in MMORPGs 《MMOMOPGs集体经营的矿床框架》 2501.10461v2

Authors (5): Hyunsoo Kim, Jun Hee Kim, Jaeman Son, Jihoon Song, Eunjo Lee

In MMORPGs (Massively Multiplayer Online Role-Playing Games), abnormal players (bots) using unauthorized automated programs to carry out pre-defined behaviors systematically and repeatedly are commonly observed. Bots usually engage in these activities to gain in-game money, which they eventually trade for real money outside the game. Such abusive activities negatively impact the in-game experiences of legitimate users since bots monopolize specific hunting areas and obtain valuable items. Thus, detecting abnormal players is a significant task for game companies. Motivated by the fact that bots tend to behave collectively with similar in-game trajectories due to the auto-programs, we developed BotTRep, a framework that comprises trajectory representation learning followed by clustering using a completely unlabeled in-game trajectory dataset. Our model aims to learn representations for in-game trajectory sequences so that players with contextually similar trajectories have closer embeddings. Then, by applying DBSCAN to these representations and visualizing the corresponding moving patterns, our framework ultimately assists game masters in identifying and banning bots.

nan


Article 388

Title@2025-07-02 (3): BoltzNCE: Learning Likelihoods for Boltzmann Generation with Stochastic Interpolants and Noise Contrastive Estimation

Title: BoltzNCE: Learning Likelihoods for Boltzmann Generation with Stochastic Interpolants and Noise Contrastive Estimation BoltzNCE: Likelihoods für die Boltzmann-Generation mit stochastischen Interpolanten und Geräuschkontrastabschätzungen BoltzNCE: 学习博尔茨曼一代人与斯托克内刑警的相似之处和噪音对抗估计 2507.00846v2

Authors (4): Rishal Aggarwal, Jacky Chen, Nicholas M. Boffi, David Ryan Koes

Efficient sampling from the Boltzmann distribution defined by an energy function is a key challenge in modeling physical systems such as molecules. Boltzmann Generators tackle this by leveraging Continuous Normalizing Flows that transform a simple prior into a distribution that can be reweighted to match the Boltzmann distribution using sample likelihoods. However, obtaining likelihoods requires computing costly Jacobians during integration, making it impractical for large molecular systems. To overcome this, we propose learning the likelihood of the generated distribution via an energy-based model trained with noise contrastive estimation and score matching. By using stochastic interpolants to anneal between the prior and generated distributions, we combine both the objective functions to efficiently learn the density function. On the alanine dipeptide system, we demonstrate that our method yields free energy profiles and energy distributions comparable to those obtained with exact likelihoods. Additionally, we show that free energy differences between metastable states can be estimated accurately with orders-of-magnitude speedup.

nan


Article 389

Title@2025-07-02 (3): PULSE: Practical Evaluation Scenarios for Large Multimodal Model Unlearning

Title: PULSE: Practical Evaluation Scenarios for Large Multimodal Model Unlearning PULSE: Praktische Bewertungsszenarien für großes multimodales Modell-Unlearning PULSE:大型多模式模式不学习的实际评价设想 2507.01271v1

Authors (5): Tatsuki Kawakami, Kazuki Egashira, Atsuyuki Miyai, Go Irie, Kiyoharu Aizawa

In recent years, unlearning techniques, which are methods for inducing a model to “forget” previously learned information, have attracted attention as a way to address privacy and copyright concerns in large language models (LLMs) and large multimodal models (LMMs). While several unlearning benchmarks have been established for LLMs, a practical evaluation framework for unlearning in LMMs has been less explored. Specifically, existing unlearning benchmark for LMMs considers only scenarios in which the model is required to unlearn fine-tuned knowledge through a single unlearning operation. In this study, we introduce PULSE protocol for realistic unlearning scenarios for LMMs by introducing two critical perspectives: (i) Pre-trained knowledge Unlearning for analyzing the effect across different knowledge acquisition phases and (ii) Long-term Sustainability Evaluation to address sequential requests. We then evaluate existing unlearning methods along these dimensions. Our results reveal that, although some techniques can successfully unlearn knowledge acquired through fine-tuning, they struggle to eliminate information learned during pre-training. Moreover, methods that effectively unlearn a batch of target data in a single operation exhibit substantial performance degradation when the same data are split and unlearned sequentially.

nan


Article 390

Title@2025-07-02 (3): $μ^2$Tokenizer: Differentiable Multi-Scale Multi-Modal Tokenizer for Radiology Report Generation

Title: $μ^2$Tokenizer: Differentiable Multi-Scale Multi-Modal Tokenizer for Radiology Report Generation $μ^2$Tokenizer: Differentiable Multi-Scale Multi-Modal Tokenizer für Radiologie Report Generation $2 $2 收缩器:用于产生放射学报告的可区别的多规模多式多式调控器 2507.00316v2

Authors (7): Siyou Li, Pengyao Qin, Huanan Wu, Dong Nie, Arun J. Thirunavukarasu, Juntao Yu, Le Zhang

Automated radiology report generation (RRG) aims to produce detailed textual reports from clinical imaging, such as computed tomography (CT) scans, to improve the accuracy and efficiency of diagnosis and provision of management advice. RRG is complicated by two key challenges: (1) inherent complexity in extracting relevant information from imaging data under resource constraints, and (2) difficulty in objectively evaluating discrepancies between model-generated and expert-written reports. To address these challenges, we propose $\mu^2$LLM, a $\underline{\textbf{mu}}$ltiscale $\underline{\textbf{mu}}$ltimodal large language models for RRG tasks. The novel ${\mu}^2$Tokenizer, as an intermediate layer, integrates multi-modal features from the multiscale visual tokenizer and the text tokenizer, then enhances report generation quality through direct preference optimization (DPO), guided by GREEN-RedLlama. Experimental results on four large CT image-report medical datasets demonstrate that our method outperforms existing approaches, highlighting the potential of our fine-tuned $\mu^2$LLMs on limited data for RRG tasks. At the same time, for prompt engineering, we introduce a five-stage, LLM-driven pipeline that converts routine CT reports into paired visual-question-answer triples and citation-linked reasoning narratives, creating a scalable, high-quality supervisory corpus for explainable multimodal radiology LLM. All code, datasets, and models will be publicly available in our official repository. https://github.com/Siyou-Li/u2Tokenizer

nan


Article 391

Title@2025-07-02 (3): Combating Confirmation Bias: A Unified Pseudo-Labeling Framework for Entity Alignment

Title: Combating Confirmation Bias: A Unified Pseudo-Labeling Framework for Entity Alignment Bekämpfung von Konfirmations-Bias: Ein einheitliches Pseudo-Labeling-Rahmenwerk für die Ausrichtung von Unternehmen 打击确认的偏见:统一实体统一化框架 2307.02075v4

Authors (4): Qijie Ding, Jie Yin, Daokun Zhang, Junbin Gao

Entity alignment (EA) aims at identifying equivalent entity pairs across different knowledge graphs (KGs) that refer to the same real-world identity. To circumvent the shortage of seed alignments provided for training, recent EA models utilize pseudo-labeling strategies to iteratively add unaligned entity pairs predicted with high confidence to the seed alignments for model training. However, the adverse impact of confirmation bias during pseudo-labeling has been largely overlooked, thus hindering entity alignment performance. To systematically combat confirmation bias for pseudo-labeling-based entity alignment, we propose a Unified Pseudo-Labeling framework for Entity Alignment (UPL-EA) that explicitly eliminates pseudo-labeling errors to boost the accuracy of entity alignment. UPL-EA consists of two complementary components: (1) Optimal Transport (OT)-based pseudo-labeling uses discrete OT modeling as an effective means to determine entity correspondences and reduce erroneous matches across two KGs. An effective criterion is derived to infer pseudo-labeled alignments that satisfy one-to-one correspondences; (2) Parallel pseudo-label ensembling refines pseudo-labeled alignments by combining predictions over multiple models independently trained in parallel. The ensembled pseudo-labeled alignments are thereafter used to augment seed alignments to reinforce subsequent model training for alignment inference. The effectiveness of UPL-EA in eliminating pseudo-labeling errors is both theoretically supported and experimentally validated. Our extensive results and in-depth analyses demonstrate the superiority of UPL-EA over 15 competitive baselines and its utility as a general pseudo-labeling framework for entity alignment.

nan


Article 392

Title@2025-07-02 (3): Dynamic Matching with Post-allocation Service and its Application to Refugee Resettlement

Title: Dynamic Matching with Post-allocation Service and its Application to Refugee Resettlement Dynamisches Matching mit dem Post-Allocation Service und seiner Anwendung auf die Neuansiedlung von Flüchtlingen 与员额分配服务及其适用于难民重新安置的动态匹配 2410.22992v2

Authors (5): Kirk Bansak, Soonbong Lee, Vahideh Manshadi, Rad Niazadeh, Elisabeth Paulson

Motivated by our collaboration with a major refugee resettlement agency in the U.S., we study a dynamic matching problem where each new arrival (a refugee case) must be matched immediately and irrevocably to one of the static resources (a location with a fixed annual quota). In addition to consuming the static resource, each case requires post-allocation services from a server, such as a translator. Given the uncertainty in service time, a server may not be available at a given time, thus we refer to it as a dynamic resource. Upon matching, the case will wait to avail service in a first-come-first-serve manner. Bursty matching to a location may result in undesirable congestion at its corresponding server. Consequently, the central planner (the agency) faces a dynamic matching problem with an objective that combines the matching reward (captured by pair-specific employment outcomes) with the cost for congestion for dynamic resources and over-allocation for the static ones. Motivated by the observed fluctuations in the composition of refugee pools across the years, we aim to design algorithms that do not rely on distributional knowledge. We develop learning-based algorithms that are asymptotically optimal in certain regimes, easy to interpret, and computationally fast. Our design is based on learning the dual variables of the underlying optimization problem; however, the main challenge lies in the time-varying nature of the dual variables associated with dynamic resources. Our theoretical development brings together techniques from Lyapunov analysis, adversarial online learning, and stochastic optimization. On the application side, when tested on real data from our partner agency and incorporating practical considerations, our method outperforms existing ones making it a viable candidate for replacing the current practice upon experimentation.

nan


Article 393

Title@2025-07-02 (3): Automated Classification of Volcanic Earthquakes Using Transformer Encoders: Insights into Data Quality and Model Interpretability

Title: Automated Classification of Volcanic Earthquakes Using Transformer Encoders: Insights into Data Quality and Model Interpretability Automatisierte Klassifizierung von Vulkan-Erdbeben mit Transformer-Encodern: Einblicke in Datenqualität und Modellinterpretierbarkeit 利用变换器计算器对火山地震进行自动分类:对数据质量和模型解释的透视 2507.01260v1

Authors (5): Y. Suzuki, Y. Yukutake, T. Ohminato, M. Yamasaki, Ahyi Kim

Precisely classifying earthquake types is crucial for elucidating the relationship between volcanic earthquakes and volcanic activity. However, traditional methods rely on subjective human judgment, which requires considerable time and effort. To address this issue, we developed a deep learning model using a transformer encoder for a more objective and efficient classification. Tested on Mount Asama’s diverse seismic activity, our model achieved high F1 scores (0.930 for volcano tectonic, 0.931 for low-frequency earthquakes, and 0.980 for noise), superior to a conventional CNN-based method. To enhance interpretability, attention weight visualizations were analyzed, revealing that the model focuses on key waveform features similarly to human experts. However, inconsistencies in training data, such as ambiguously labeled B-type events with S-waves, were found to influence classification accuracy and attention weight distributions. Experiments addressing data selection and augmentation demonstrated the importance of balancing data quality and diversity. In addition, stations within 3 km of the crater played an important role in improving model performance and interpretability. These findings highlight the potential of Transformer-based models for automated volcanic earthquake classification, particularly in improving efficiency and interpretability. By addressing challenges such as data imbalance and subjective labeling, our approach provides a robust framework for understanding seismic activity at Mount Asama. Moreover, this framework offers opportunities for transfer learning to other volcanic regions, paving the way for enhanced volcanic hazard assessments and disaster mitigation strategies.

nan


Article 394

Title@2025-07-02 (3): AIRES: Accelerating Out-of-Core GCNs via Algorithm-System Co-Design

Title: AIRES: Accelerating Out-of-Core GCNs via Algorithm-System Co-Design AIRES: Beschleunigung von Out-of-Core-GCNs über Algorithm-System Co-Design AIRES:通过Algorithm-System共同设计加速将全球氯化萘超出核心 2507.02006v1

Authors (3): Shakya Jayakody, Youpeng Zhao, Jun Wang

Graph convolutional networks (GCNs) are fundamental in various scientific applications, ranging from biomedical protein-protein interactions (PPI) to large-scale recommendation systems. An essential component for modeling graph structures in GCNs is sparse general matrix-matrix multiplication (SpGEMM). As the size of graph data continues to scale up, SpGEMMs are often conducted in an out-of-core fashion due to limited GPU memory space in resource-constrained systems. Albeit recent efforts that aim to alleviate the memory constraints of out-of-core SpGEMM through either GPU feature caching, hybrid CPU-GPU memory layout, or performing the computation in sparse format, current systems suffer from both high I/O latency and GPU under-utilization issues. In this paper, we first identify the problems of existing systems, where sparse format data alignment and memory allocation are the main performance bottlenecks, and propose AIRES, a novel algorithm-system co-design solution to accelerate out-of-core SpGEMM computation for GCNs. Specifically, from the algorithm angle, AIRES proposes to alleviate the data alignment issues on the block level for matrices in sparse formats and develops a tiling algorithm to facilitate row block-wise alignment. On the system level, AIRES employs a three-phase dynamic scheduling that features a dual-way data transfer strategy utilizing a tiered memory system: integrating GPU memory, GPU Direct Storage (GDS), and host memory to reduce I/O latency and improve throughput. Evaluations show that AIRES significantly outperforms the state-of-the-art methods, achieving up to 1.8x lower latency in real-world graph processing benchmarks.

nan


Article 395

Title@2025-07-01 (2): Embedding-Space Diffusion for Zero-Shot Environmental Sound Classification

Title: Embedding-Space Diffusion for Zero-Shot Environmental Sound Classification Einbettungs-Space-Diffusion für Zero-Shot-Environmental Sound-Klassifikation 内嵌式空间扩散用于零热无害环境分类 2412.03771v2

Authors (3): Ysobel Sims, Alexandre Mendes, Stephan Chalup

Zero-shot learning enables models to generalise to unseen classes by leveraging semantic information, bridging the gap between training and testing sets with non-overlapping classes. While much research has focused on zero-shot learning in computer vision, the application of these methods to environmental audio remains underexplored, with poor performance in existing studies. Generative methods, which have demonstrated success in computer vision, are notably absent from zero-shot environmental sound classification studies. To address this gap, this work investigates generative methods for zero-shot learning in environmental audio. Two successful generative models from computer vision are adapted: a cross-aligned and distribution-aligned variational autoencoder (CADA-VAE) and a leveraging invariant side generative adversarial network (LisGAN). Additionally, we introduced a novel diffusion model conditioned on class auxiliary data. Synthetic embeddings generated by the diffusion model are combined with seen class embeddings to train a classifier. Experiments are conducted on five environmental audio datasets, ESC-50, ARCA23K-FSD, FSC22, UrbanSound8k and TAU Urban Acoustics 2019, and one music classification dataset, GTZAN. Results show that the diffusion model outperforms all baseline methods on average across six audio datasets. This work establishes the diffusion model as a promising approach for zero-shot learning and introduces the first benchmark of generative methods for zero-shot environmental sound classification, providing a foundation for future research.

nan


Article 396

Title@2025-07-01 (2): SFO: Piloting VLM Feedback for Offline RL

Title: SFO: Piloting VLM Feedback for Offline RL SFO: Pilotierung von VLM Feedback für Offline RL SFO: 试验离线RL的VLM反馈 2503.01062v4

Authors (1): Jacob Beck

While internet-scale image and textual data have enabled strong generalization in Vision-Language Models (VLMs), the absence of internet-scale control data has impeded the development of similar generalization in standard reinforcement learning (RL) agents. Although VLMs are fundamentally limited in their ability to solve control tasks due to their lack of action-conditioned training data, their capacity for image understanding allows them to provide valuable feedback in RL tasks by recognizing successful outcomes. A key challenge in Reinforcement Learning from AI Feedback (RLAIF) is determining how best to integrate VLM-derived signals into the learning process. We explore this question in the context of offline RL and introduce a class of methods called sub-trajectory filtered optimization. We identify three key insights. First, trajectory length plays a crucial role in offline RL, as full-trajectory preference learning exacerbates the stitching problem, necessitating the use of sub-trajectories. Second, even in Markovian environments, a non-Markovian reward signal from a sequence of images is required to assess trajectory improvement, as VLMs do not interpret control actions and must rely on visual cues over time. Third, a simple yet effective approach–filtered and weighted behavior cloning–consistently outperforms more complex reinforcement learning from human feedback-based methods. We propose sub-trajectory filtered behavior cloning, a method that leverages VLM feedback on sub-trajectories while incorporating a retrospective filtering mechanism that removes sub-trajectories preceding failures to improve robustness and prevent turbulence. This study is preliminary; we provide initial evidence through evaluations on a toy control domain. Please enjoy our airport puns.

nan


Article 397

Title@2025-07-01 (2): Jump-Start Reinforcement Learning with Self-Evolving Priors for Extreme Monopedal Locomotion

Title: Jump-Start Reinforcement Learning with Self-Evolving Priors for Extreme Monopedal Locomotion Jump-Start Verstärkungslernen mit sich selbst entwickelnden Prioren für extreme monopedale Locomotion 与自我演化前科一起进行跳跃启动强化学习,以进行极端单式移动 2507.01243v1

Authors (6): Ziang Zheng, Guojian Zhan, Shiqi Liu, Yao Lyu, Tao Zhang, Shengbo Eben Li

Reinforcement learning (RL) has shown great potential in enabling quadruped robots to perform agile locomotion. However, directly training policies to simultaneously handle dual extreme challenges, i.e., extreme underactuation and extreme terrains, as in monopedal hopping tasks, remains highly challenging due to unstable early-stage interactions and unreliable reward feedback. To address this, we propose JumpER (jump-start reinforcement learning via self-evolving priors), an RL training framework that structures policy learning into multiple stages of increasing complexity. By dynamically generating self-evolving priors through iterative bootstrapping of previously learned policies, JumpER progressively refines and enhances guidance, thereby stabilizing exploration and policy optimization without relying on external expert priors or handcrafted reward shaping. Specifically, when integrated with a structured three-stage curriculum that incrementally evolves action modality, observation space, and task objective, JumpER enables quadruped robots to achieve robust monopedal hopping on unpredictable terrains for the first time. Remarkably, the resulting policy effectively handles challenging scenarios that traditional methods struggle to conquer, including wide gaps up to 60 cm, irregularly spaced stairs, and stepping stones with distances varying from 15 cm to 35 cm. JumpER thus provides a principled and scalable approach for addressing locomotion tasks under the dual challenges of extreme underactuation and extreme terrains.

nan


Article 398

Title@2025-07-01 (2): Beyond First-Order: Training LLMs with Stochastic Conjugate Subgradients and AdamW

Title: Beyond First-Order: Training LLMs with Stochastic Conjugate Subgradients and AdamW Beyond First-Order: Training LLMs mit stochastischen Konjugat-Subgradienten und AdamW 第一班之后:与斯托卡共产子公司和AdamW公司合办的培训LLM培训 2507.01241v1

Authors (2): Di Zhang, Yihang Zhang

Stochastic gradient-based descent (SGD), have long been central to training large language models (LLMs). However, their effectiveness is increasingly being questioned, particularly in large-scale applications where empirical evidence suggests potential performance limitations. In response, this paper proposes a stochastic conjugate subgradient method together with adaptive sampling tailored specifically for training LLMs. The method not only achieves faster convergence per iteration but also demonstrates improved scalability compared to traditional SGD techniques. It leverages sample complexity analysis to adaptively choose the sample size, employs a stochastic conjugate subgradient approach to determine search directions and utilizing an AdamW-like algorithm to adaptively adjust step sizes. This approach preserves the key advantages of first-order methods while effectively addressing the nonconvexity and non-smoothness inherent in LLMs training. Additionally, we provide a detailed analysis of the advantage of the algorithm. Experimental results show that the proposed method not only maintains, but in many cases surpasses, the scalability of traditional SGD techniques, significantly enhancing both the speed and accuracy of the optimization process.

nan


Article 399

Title@2025-07-01 (2): GenBFA: An Evolutionary Optimization Approach to Bit-Flip Attacks on LLMs

Title: GenBFA: An Evolutionary Optimization Approach to Bit-Flip Attacks on LLMs GenBFA: Ein evolutionärer Optimierungsansatz für Bit-Flip-Angriffe auf LLMs GenBFA:对LMLM的点翻攻击的演进优化办法 2411.13757v4

Authors (7): Sanjay Das, Swastik Bhattacharya, Souvik Kundu, Shamik Kundu, Anand Menon, Arnab Raha, Kanad Basu

Large Language Models (LLMs) have revolutionized natural language processing (NLP), excelling in tasks like text generation and summarization. However, their increasing adoption in mission-critical applications raises concerns about hardware-based threats, particularly bit-flip attacks (BFAs). BFAs, enabled by fault injection methods such as Rowhammer, target model parameters in memory, compromising both integrity and performance. Identifying critical parameters for BFAs in the vast parameter space of LLMs poses significant challenges. While prior research suggests transformer-based architectures are inherently more robust to BFAs compared to traditional deep neural networks, we challenge this assumption. For the first time, we demonstrate that as few as three bit-flips can cause catastrophic performance degradation in an LLM with billions of parameters. Current BFA techniques are inadequate for exploiting this vulnerability due to the difficulty of efficiently identifying critical parameters within the immense parameter space. To address this, we propose AttentionBreaker, a novel framework tailored for LLMs that enables efficient traversal of the parameter space to identify critical parameters. Additionally, we introduce GenBFA, an evolutionary optimization strategy designed to refine the search further, isolating the most critical bits for an efficient and effective attack. Empirical results reveal the profound vulnerability of LLMs to AttentionBreaker. For example, merely three bit-flips (4.129 x 10^-9% of total parameters) in the LLaMA3-8B-Instruct 8-bit quantized (W8) model result in a complete performance collapse: accuracy on MMLU tasks drops from 67.3% to 0%, and Wikitext perplexity skyrockets from 12.6 to 4.72 x 10^5. These findings underscore the effectiveness of AttentionBreaker in uncovering and exploiting critical vulnerabilities within LLM architectures.

nan


Article 400

Title@2025-07-01 (2): Quantum Machine Learning in Transportation: A Case Study of Pedestrian Stress Modelling

Title: Quantum Machine Learning in Transportation: A Case Study of Pedestrian Stress Modelling Quantum Machine Learning in Transportation: Eine Fallstudie zur fußgängerischen Stressmodellierung Qantum 运输机械学习:虫害应激模拟模型案例研究 2507.01235v1

Authors (2): Bara Rababa, Bilal Farooq

Quantum computing has opened new opportunities to tackle complex machine learning tasks, for instance, high-dimensional data representations commonly required in intelligent transportation systems. We explore quantum machine learning to model complex skin conductance response (SCR) events that reflect pedestrian stress in a virtual reality road crossing experiment. For this purpose, Quantum Support Vector Machine (QSVM) with an eight-qubit ZZ feature map and a Quantum Neural Network (QNN) using a Tree Tensor Network ansatz and an eight-qubit ZZ feature map, were developed on Pennylane. The dataset consists of SCR measurements along with features such as the response amplitude and elapsed time, which have been categorized into amplitude-based classes. The QSVM achieved good training accuracy, but had an overfitting problem, showing a low test accuracy of 45% and therefore impacting the reliability of the classification model. The QNN model reached a higher test accuracy of 55%, making it a better classification model than the QSVM and the classic versions.

nan


Article 401

Title@2025-07-01 (2): CAM-NET: An AI Model for Whole Atmosphere with Thermosphere and Ionosphere Extension

Title: CAM-NET: An AI Model for Whole Atmosphere with Thermosphere and Ionosphere Extension CAM-NET: Ein KI-Modell für ganze Atmosphäre mit Thermosphäre und Ionosphärenerweiterung CAM-NET:具有热层和电离层扩展作用的AI全大气模型 2506.19340v3

Authors (2): Jiahui Hu, Wenjun Dong

We present Compressible Atmospheric Model-Network (CAM-NET), an AI model designed to predict neutral atmospheric variables from the Earth’s surface to the ionosphere with high accuracy and computational efficiency. Accurate modeling of the entire atmosphere is critical for understanding the upward propagation of gravity waves, which influence upper-atmospheric dynamics and coupling across atmospheric layers. CAM-NET leverages the Spherical Fourier Neural Operator (SFNO) to capture global-scale atmospheric dynamics while preserving the Earth’s spherical structure. Trained on a decade of datasets from the Whole Atmosphere Community Climate Model with thermosphere and ionosphere eXtension (WACCM-X), CAM-NET demonstrates accuracy comparable to WACCM-X while achieving a speedup of over 1000x in inference time, can provide one year simulation within a few minutes once trained. The model effectively predicts key atmospheric parameters, including zonal and meridional winds, temperature, and time rate of pressure. Inspired by traditional modeling approaches that use external couplers to simulate tracer transport, CAM-NET introduces a modular architecture that explicitly separates tracer prediction from core dynamics. The core backbone of CAM-NET focuses on forecasting primary physical variables (e.g., temperature, wind velocity), while tracer variables are predicted through a lightweight, fine-tuned model. This design allows for efficient adaptation to specific tracer scenarios with minimal computational cost, avoiding the need to retrain the entire model. We have validated this approach on the $O^2$ tracer, demonstrating strong performance and generalization capabilities.

nan


Article 402

Title@2025-07-01 (2): 2HandedAfforder: Learning Precise Actionable Bimanual Affordances from Human Videos

Title: 2HandedAfforder: Learning Precise Actionable Bimanual Affordances from Human Videos 2HandedAfforder: Präzises Lernen von lebensfähigen Bimanual Affordances aus menschlichen Videos 2HandAfforder: 学习人类视频中可操作的双人经济能力 2503.09320v3

Authors (4): Marvin Heidinger, Snehal Jauhri, Vignesh Prasad, Georgia Chalvatzaki

When interacting with objects, humans effectively reason about which regions of objects are viable for an intended action, i.e., the affordance regions of the object. They can also account for subtle differences in object regions based on the task to be performed and whether one or two hands need to be used. However, current vision-based affordance prediction methods often reduce the problem to naive object part segmentation. In this work, we propose a framework for extracting affordance data from human activity video datasets. Our extracted 2HANDS dataset contains precise object affordance region segmentations and affordance class-labels as narrations of the activity performed. The data also accounts for bimanual actions, i.e., two hands co-ordinating and interacting with one or more objects. We present a VLM-based affordance prediction model, 2HandedAfforder, trained on the dataset and demonstrate superior performance over baselines in affordance region segmentation for various activities. Finally, we show that our predicted affordance regions are actionable, i.e., can be used by an agent performing a task, through demonstration in robotic manipulation scenarios. Project-website: https://sites.google.com/view/2handedafforder

nan


Article 403

Title@2025-07-01 (2): PAE MobiLLM: Privacy-Aware and Efficient LLM Fine-Tuning on the Mobile Device via Additive Side-Tuning

Title: PAE MobiLLM: Privacy-Aware and Efficient LLM Fine-Tuning on the Mobile Device via Additive Side-Tuning PAE MobiLLM: Datenschutzerklärung und effizientes LLM Feintuning auf dem mobilen Gerät über Additive Side-Tuning PAE MobillLLM: 通过Additive 侧向托盘对移动设备进行隐私-软件和高效LLM精美调整 2507.01216v1

Authors (10): Xingke Yang, Liang Li, Zhiyi Wan, Sicong Li, Hao Wang, Xiaoqi Qi, Jiang Liu, Tomoaki Ohtsuki, Xin Fu, Miao Pan

There is a huge gap between numerous intriguing applications fostered by on-device large language model (LLM) fine-tuning (FT) from fresh mobile data and the limited resources of a mobile device. While existing server-assisted methods (e.g., split learning or side-tuning) may enable LLM FT on the local mobile device, they suffer from heavy communication burdens of activation transmissions, and may disclose data, labels or fine-tuned models to the server. To address those issues, we develop PAE MobiLLM, a privacy-aware and efficient LLM FT method which can be deployed on the mobile device via server-assisted additive side-tuning. To further accelerate FT convergence and improve computing efficiency, PAE MobiLLM integrates activation caching on the server side, which allows the server to reuse historical activations and saves the mobile device from repeatedly computing forward passes for the recurring data samples. Besides, to reduce communication cost, PAE MobiLLM develops a one-token (i.e., ``pivot’’ token) activation shortcut that transmits only a single activation dimension instead of full activation matrices to guide the side network tuning. Last but not least, PAE MobiLLM introduces the additive adapter side-network design which makes the server train the adapter modules based on device-defined prediction differences rather than raw ground-truth labels. In this way, the server can only assist device-defined side-network computing, and learn nothing about data, labels or fine-tuned models.

nan


Article 404

Title@2025-07-01 (2): Deep Learning-Based Intrusion Detection for Automotive Ethernet: Evaluating & Optimizing Fast Inference Techniques for Deployment on Low-Cost Platform

Title: Deep Learning-Based Intrusion Detection for Automotive Ethernet: Evaluating & Optimizing Fast Inference Techniques for Deployment on Low-Cost Platform Deep Learning-based Intrusion Detection für Automotive Ethernet: Bewertung und Optimierung schneller Inferenztechniken für den Einsatz auf Low-Cost-Plattform 用于汽车以太网的深入学习入侵探测:评估并优化快速推断技术,以便在低成本平台上部署 2507.01208v1

Authors (5): Pedro R. X. Carmo, Igor de Moura, Assis T. de Oliveira Filho, Djamel Sadok, Cleber Zanchettin

Modern vehicles are increasingly connected, and in this context, automotive Ethernet is one of the technologies that promise to provide the necessary infrastructure for intra-vehicle communication. However, these systems are subject to attacks that can compromise safety, including flow injection attacks. Deep Learning-based Intrusion Detection Systems (IDS) are often designed to combat this problem, but they require expensive hardware to run in real time. In this work, we propose to evaluate and apply fast neural network inference techniques like Distilling and Prunning for deploying IDS models on low-cost platforms in real time. The results show that these techniques can achieve intrusion detection times of up to 727 {\mu}s using a Raspberry Pi 4, with AUCROC values of 0.9890.

nan


Article 405

Title@2025-07-01 (2): DGenNO: A Novel Physics-aware Neural Operator for Solving Forward and Inverse PDE Problems based on Deep, Generative Probabilistic Modeling

Title: DGenNO: A Novel Physics-aware Neural Operator for Solving Forward and Inverse PDE Problems based on Deep, Generative Probabilistic Modeling DGenNO: Ein neuartiger, physikbewusster Neuraloperator für die Lösung vorwärts- und inverser PDE-Probleme basierend auf tiefer, generativer probabilistischer Modellierung DGenNO:基于深生概率模型模型的解决前向和反向PDE问题新物理觉醒神经操作员 2502.06250v3

Authors (2): Yaohua Zang, Phaedon-Stelios Koutsourelakis

Solving parametric partial differential equations (PDEs) and associated PDE-based, inverse problems is a central task in engineering and physics, yet existing neural operator methods struggle with high-dimensional, discontinuous inputs and require large amounts of {\em labeled} training data. We propose the Deep Generative Neural Operator (DGenNO), a physics-aware framework that addresses these challenges by leveraging a deep, generative, probabilistic model in combination with a set of lower-dimensional, latent variables that simultaneously encode PDE-inputs and PDE-outputs. This formulation can make use of unlabeled data and significantly improves inverse problem-solving, particularly for discontinuous or discrete-valued input functions. DGenNO enforces physics constraints without labeled data by incorporating as virtual observables, weak-form residuals based on compactly supported radial basis functions (CSRBFs). These relax regularity constraints and eliminate higher-order derivatives from the objective function. We also introduce MultiONet, a novel neural operator architecture, which is a more expressive generalization of the popular DeepONet that significantly enhances the approximating power of the proposed model. These innovations make DGenNO particularly effective for challenging forward and inverse, PDE-based problems, such as those involving multi-phase media. Numerical experiments demonstrate that DGenNO achieves higher accuracy across multiple benchmarks while exhibiting robustness to noise and strong generalization to out-of-distribution cases. Its adaptability, and the ability to handle sparse, noisy data while providing probabilistic estimates, make DGenNO a powerful tool for scientific and engineering applications.

nan


Article 406

Title@2025-07-01 (2): Fair Algorithms with Probing for Multi-Agent Multi-Armed Bandits

Title: Fair Algorithms with Probing for Multi-Agent Multi-Armed Bandits Faire Algorithmen mit Probing für Multi-Agent Multi-Armed Bandits 多代理多武装强盗验证法的公允算法 2506.14988v2

Authors (3): Tianyi Xu, Jiaxin Liu, Zizhan Zheng

We propose a multi-agent multi-armed bandit (MA-MAB) framework aimed at ensuring fair outcomes across agents while maximizing overall system performance. A key challenge in this setting is decision-making under limited information about arm rewards. To address this, we introduce a novel probing framework that strategically gathers information about selected arms before allocation. In the offline setting, where reward distributions are known, we leverage submodular properties to design a greedy probing algorithm with a provable performance bound. For the more complex online setting, we develop an algorithm that achieves sublinear regret while maintaining fairness. Extensive experiments on synthetic and real-world datasets show that our approach outperforms baseline methods, achieving better fairness and efficiency.

nan


Article 407

Title@2025-07-01 (2): Are Large Brainwave Foundation Models Capable Yet? Insights from Fine-tuning

Title: Are Large Brainwave Foundation Models Capable Yet? Insights from Fine-tuning Sind große Brainwave-Stiftungsmodelle noch praktikabel? Erkenntnisse aus der Feinsteuerung 大型脑波基金会模型有能力吗? 2507.01196v1

Authors (6): Na Lee, Konstantinos Barmpas, Yannis Panagakis, Dimitrios Adamos, Nikolaos Laskaris, Stefanos Zafeiriou

Foundation Models have demonstrated significant success across various domains in Artificial Intelligence (AI), yet their capabilities for brainwave modeling remain unclear. In this paper, we comprehensively evaluate current Large Brainwave Foundation Models (LBMs) through systematic fine-tuning experiments across multiple Brain-Computer Interface (BCI) benchmark tasks, including memory tasks and sleep stage classification. Our extensive analysis shows that state-of-the-art LBMs achieve only marginal improvements (0.9%-1.2%) over traditional deep architectures while requiring significantly more parameters (millions vs thousands), raising important questions about their efficiency and applicability in BCI contexts. Moreover, through detailed ablation studies and Low-Rank Adaptation (LoRA), we significantly reduce trainable parameters without performance degradation, while demonstrating that architectural and training inefficiencies limit LBMs’ current capabilities. Our experiments span both full model fine-tuning and parameter-efficient adaptation techniques, providing insights into optimal training strategies for BCI applications. We pioneer the application of LoRA to LBMs, revealing that performance benefits generally emerge when adapting multiple neural network components simultaneously. These findings highlight the critical need for domain-specific development strategies to advance LBMs, suggesting that current architectures may require redesign to fully leverage the potential of foundation models in brainwave analysis.

nan


Article 408

Title@2025-07-01 (2): Distributional Information Embedding: A Framework for Multi-bit Watermarking

Title: Distributional Information Embedding: A Framework for Multi-bit Watermarking Einbettung von Distributionsinformationen: Ein Rahmen für Multi-Bit-Wasserzeichen 分发信息嵌入式信息:多位标水的框架 2501.16558v2

Authors (5): Haiyun He, Yepeng Liu, Ziqiao Wang, Yongyi Mao, Yuheng Bu

This paper introduces a novel problem, distributional information embedding, motivated by the practical demands of multi-bit watermarking for large language models (LLMs). Unlike traditional information embedding, which embeds information into a pre-existing host signal, LLM watermarking actively controls the text generation process–adjusting the token distribution–to embed a detectable signal. We develop an information-theoretic framework to analyze this distributional information embedding problem, characterizing the fundamental trade-offs among three critical performance metrics: text quality, detectability, and information rate. In the asymptotic regime, we demonstrate that the maximum achievable rate with vanishing error corresponds to the entropy of the LLM’s output distribution and increases with higher allowable distortion. We also characterize the optimal watermarking scheme to achieve this rate. Extending the analysis to the finite-token case with non-i.i.d. tokens, we identify schemes that maximize detection probability while adhering to constraints on false alarm and distortion.

nan


Article 409

Title@2025-07-01 (2): Diffusion Explorer: Interactive Exploration of Diffusion Models

Title: Diffusion Explorer: Interactive Exploration of Diffusion Models Diffusion Explorer: Interaktive Erforschung von Diffusionsmodellen 传播探索:传播模型的互动探索 2507.01178v1

Authors (2): Alec Helbling, Duen Horng Chau

Diffusion models have been central to the development of recent image, video, and even text generation systems. They posses striking geometric properties that can be faithfully portrayed in low-dimensional settings. However, existing resources for explaining diffusion either require an advanced theoretical foundation or focus on their neural network architectures rather than their rich geometric properties. We introduce Diffusion Explorer, an interactive tool to explain the geometric properties of diffusion models. Users can train 2D diffusion models in the browser and observe the temporal dynamics of their sampling process. Diffusion Explorer leverages interactive animation, which has been shown to be a powerful tool for making engaging visualizations of dynamic systems, making it well suited to explaining diffusion models which represent stochastic processes that evolve over time. Diffusion Explorer is open source and a live demo is available at alechelbling.com/Diffusion-Explorer.

nan


Article 410

Title@2025-07-01 (2): Rewind-to-Delete: Certified Machine Unlearning for Nonconvex Functions

Title: Rewind-to-Delete: Certified Machine Unlearning for Nonconvex Functions Rewind-to-Delete: Zertifiziertes Maschinen-Entlernen für nicht konvexe Funktionen 倒回到删除: 认证机器不学习非convex函数 2409.09778v5

Authors (2): Siqiao Mu, Diego Klabjan

Machine unlearning algorithms aim to efficiently remove data from a model without retraining it from scratch, in order to remove corrupted or outdated data or respect a user’s right to be forgotten." Certified machine unlearning is a strong theoretical guarantee based on differential privacy that quantifies the extent to which an algorithm erases data from the model weights. In contrast to existing works in certified unlearning for convex or strongly convex loss functions, or nonconvex objectives with limiting assumptions, we propose the first, first-order, black-box (i.e., can be applied to models pretrained with vanilla gradient descent) algorithm for unlearning on general nonconvex loss functions, which unlearns byrewinding” to an earlier step during the learning process before performing gradient descent on the loss function of the retained data points. We prove $(\epsilon, \delta)$ certified unlearning and performance guarantees that establish the privacy-utility-complexity tradeoff of our algorithm, and we prove generalization guarantees for functions that satisfy the Polyak-Lojasiewicz inequality. Finally, we demonstrate the superior performance of our algorithm compared to existing methods, within a new experimental framework that more accurately reflects unlearning user data in practice.

nan


Article 411

Title@2025-07-01 (2): ConceptAttention: Diffusion Transformers Learn Highly Interpretable Features

Title: ConceptAttention: Diffusion Transformers Learn Highly Interpretable Features KonzeptAchtung: Diffusionstransformatoren lernen hochinterpretable Funktionen 概念意图:传播变异器学习高可解释特性 2502.04320v2

Authors (5): Alec Helbling, Tuna Han Salih Meral, Ben Hoover, Pinar Yanardag, Duen Horng Chau

Do the rich representations of multi-modal diffusion transformers (DiTs) exhibit unique properties that enhance their interpretability? We introduce ConceptAttention, a novel method that leverages the expressive power of DiT attention layers to generate high-quality saliency maps that precisely locate textual concepts within images. Without requiring additional training, ConceptAttention repurposes the parameters of DiT attention layers to produce highly contextualized concept embeddings, contributing the major discovery that performing linear projections in the output space of DiT attention layers yields significantly sharper saliency maps compared to commonly used cross-attention maps. ConceptAttention even achieves state-of-the-art performance on zero-shot image segmentation benchmarks, outperforming 15 other zero-shot interpretability methods on the ImageNet-Segmentation dataset. ConceptAttention works for popular image models and even seamlessly generalizes to video generation. Our work contributes the first evidence that the representations of multi-modal DiTs are highly transferable to vision tasks like segmentation.

nan


Article 412

Title@2025-07-01 (2): Squat: Quant Small Language Models on the Edge

Title: Squat: Quant Small Language Models on the Edge Squat: Quant kleine Sprachmodelle am Rand Squt: 边边缘的量化小语言模型 2402.10787v2

Authors (12): Xuan Shen, Peiyan Dong, Zhenglun Kong, Yifan Gong, Changdi Yang, Zhaoyang Han, Yanyue Xie, Lei Lu, Cheng Lyu, Chao Wu, Yanzhi Wang, Pu Zhao

A growing trend has emerged in designing high-quality Small Language Models (SLMs) with a few million parameters. This trend is driven by the increasing concerns over cloud costs, privacy, and latency. Considering that full parameter training is feasible for SLMs on mobile devices, Quantization-Aware Training (QAT) is employed to improve efficiency by reducing computational overhead and memory footprint. However, previous QAT works adopt fine-grained quantization methods to compress models with billions of parameters on GPUs, incompatible with current commodity hardware, such as mobile and edge devices, which relies on Single Instruction Multiple Data (SIMD) instructions. Thus, the generalization of these methods to SLMs on mobile devices is limited. In this paper, we propose Squat method, an effective QAT framework with deployable quantization for SLMs on mobile devices. Specifically, we propose entropy-guided and distribution-aligned distillation to mitigate the distortion of attention information from quantization. Besides, we employ sub-8-bit token adaptive quantization, assigning varying bit widths to different tokens based on their importance. Furthermore, we develop a SIMD-based Multi-Kernel Mixed-Precision (MKMP) multiplier to support sub-8-bit mixed-precision MAC on mobile devices. Our extensive experiments verify the substantial improvements of our method compared to other QAT methods across various datasets. Furthermore, we achieve an on-device speedup of up to 2.37x compared with its FP16 counterparts, signaling a great advancement. Code: https://github.com/shawnricecake/squant

nan


Article 413

Title@2025-07-01 (2): Self-reflective Uncertainties: Do LLMs Know Their Internal Answer Distribution?

Title: Self-reflective Uncertainties: Do LLMs Know Their Internal Answer Distribution? Selbstreflektierende Unsicherheiten: Kennen LLMs ihre interne Antwortverteilung? 自我反感的不确定性:LLMs知道他们的内部答案分布吗? 2505.20295v2

Authors (6): Michael Kirchhof, Luca Füger, Adam Goliński, Eeshan Gunesh Dhekane, Arno Blaas, Sinead Williamson

To reveal when a large language model (LLM) is uncertain about a response, uncertainty quantification commonly produces percentage numbers along with the output. But is this all we can do? We argue that in the output space of LLMs, the space of strings, exist strings expressive enough to summarize the distribution over output strings the LLM deems possible. We lay a foundation for this new avenue of uncertainty explication and present SelfReflect, a theoretically-motivated metric to assess how faithfully a string summarizes an LLM’s internal answer distribution. We show that SelfReflect is able to discriminate even subtle differences of candidate summary strings and that it aligns with human judgement, outperforming alternative metrics such as LLM judges and embedding comparisons. With SelfReflect, we investigate a number of self-summarization methods and find that even state-of-the-art reasoning models struggle to explicate their internal uncertainty. But we find that faithful summarizations can be generated by sampling and summarizing. To support the development of this universal form of LLM uncertainties, we publish our metric at https://github.com/apple/ml-selfreflect

nan


Article 414

Title@2025-07-01 (2): Vehicle-group-based Crash Risk Prediction and Interpretation on Highways

Title: Vehicle-group-based Crash Risk Prediction and Interpretation on Highways Risikovorhersage und -interpretation auf Autobahnen auf Fahrzeuggruppenbasis 公路撞车风险预测和解释 2402.12415v3

Authors (5): Tianheng Zhu, Ling Wang, Yiheng Feng, Wanjing Ma, Mohamed Abdel-Aty

Previous studies in predicting crash risks primarily associated the number or likelihood of crashes on a road segment with traffic parameters or geometric characteristics, usually neglecting the impact of vehicles’ continuous movement and interactions with nearby vehicles. Recent technology advances, such as Connected and Automated Vehicles (CAVs) and Unmanned Aerial Vehicles (UAVs) are able to collect high-resolution trajectory data, which enables trajectory-based risk analysis. This study investigates a new vehicle group (VG) based risk analysis method and explores risk evolution mechanisms considering VG features. An impact-based vehicle grouping method is proposed to cluster vehicles into VGs by evaluating their responses to the erratic behaviors of nearby vehicles. The risk of a VG is aggregated based on the risk between each vehicle pair in the VG, measured by inverse Time-to-Collision (iTTC). A Logistic Regression and a Graph Neural Network (GNN) are then employed to predict VG risks using aggregated and disaggregated VG information. Both methods achieve excellent performance with AUC values exceeding 0.93. For the GNN model, GNNExplainer with feature perturbation is applied to identify critical individual vehicle features and their directional impact on VG risks. Overall, this research contributes a new perspective for identifying, predicting, and interpreting traffic risks.

nan


Article 415

Title@2025-07-01 (2): FlashDP: Private Training Large Language Models with Efficient DP-SGD

Title: FlashDP: Private Training Large Language Models with Efficient DP-SGD FlashDP: Privates Training von großen Sprachmodellen mit effizienter DP-SGD 私营部门培训大型语言模式,具有高效率的DP-SGD 2507.01154v1

Authors (6): Liangyu Wang, Junxiao Wang, Jie Ren, Zihang Xiang, David E. Keyes, Di Wang

As large language models (LLMs) increasingly underpin technological advancements, the privacy of their training data emerges as a critical concern. Differential Privacy (DP) serves as a rigorous mechanism to protect this data, yet its integration via Differentially Private Stochastic Gradient Descent (DP-SGD) introduces substantial challenges, primarily due to the complexities of per-sample gradient clipping. Current explicit methods, such as Opacus, necessitate extensive storage for per-sample gradients, significantly inflating memory requirements. Conversely, implicit methods like GhostClip reduce storage needs by recalculating gradients multiple times, which leads to inefficiencies due to redundant computations. This paper introduces FlashDP, an innovative cache-friendly per-layer DP-SGD that consolidates necessary operations into a single task, calculating gradients only once in a fused manner. This approach not only diminishes memory movement by up to \textbf{50\%} but also cuts down redundant computations by \textbf{20\%}, compared to previous methods. Consequently, FlashDP does not increase memory demands and achieves a \textbf{90\%} throughput compared to the Non-DP method on a four-A100 system during the pre-training of the Llama-13B model, while maintaining parity with standard per-layer clipped DP-SGD in terms of accuracy. These advancements establish FlashDP as a pivotal development for efficient and privacy-preserving training of LLMs. FlashDP’s code has been open-sourced in https://github.com/kaustpradalab/flashdp.

nan


Article 416

Title@2025-07-01 (2): BioPars: A Pretrained Biomedical Large Language Model for Persian Biomedical Text Mining

Title: BioPars: A Pretrained Biomedical Large Language Model for Persian Biomedical Text Mining BioPars: Ein vorgebildetes biomedizinisches Großsprachmodell für persischen biomedizinischen Textbergbau BioPars:波斯生物医学材料开采的预先培训的生物医学大语言模型 2506.21567v2

Authors (6): Baqer M. Merzah, Tania Taami, Salman Asoudeh, Saeed Mirzaee, Amir reza Hossein pour, Amir Ali Bengari

Large Language Models (LLMs) have recently gained attention in the life sciences due to their capacity to model, extract, and apply complex biological information. Beyond their classical use as chatbots, these systems are increasingly used for complex analysis and problem-solving in specialized fields, including bioinformatics. First, we introduce BIOPARS-BENCH, a dataset from over 10,000 scientific articles, textbooks, and medical websites. BioParsQA was also introduced to evaluate the proposed model, which consists of 5,231 Persian medical questions and answers. This study then introduces BioPars, a simple but accurate measure designed to assess LLMs for three main abilities: acquiring subject-specific knowledge, interpreting and synthesizing such knowledge, and demonstrating proper evidence. Comparing ChatGPT, Llama, and Galactica, our study highlights their ability to remember and retrieve learned knowledge but also reveals shortcomings in addressing higher-level, real-world questions and fine-grained inferences. These findings indicate the need for further fine-tuning to address the capabilities of LLM in bioinformatics tasks. To our knowledge, BioPars is the first application of LLM in Persian medical QA, especially for generating long answers. Evaluation of four selected medical QA datasets shows that BioPars has achieved remarkable results compared to comparative approaches. The model on BioParsQA achieved a ROUGE-L score of 29.99, which is an improvement over GPT-4 1.0. The model achieved a BERTScore of 90.87 with the MMR method. The MoverScore and BLEURT values were also higher in this model than the other three models. In addition, the reported scores for the model are MoverScore=60.43 and BLEURT=50.78. BioPars is an ongoing project and all resources related to its development will be made available via the following GitHub repository: https://github.com/amirap80/BioPars.

nan


Article 417

Title@2025-07-01 (2): LZ Penalty: An information-theoretic repetition penalty for autoregressive language models

Title: LZ Penalty: An information-theoretic repetition penalty for autoregressive language models LZ-Strafe: Eine informationstheoretische Wiederholungsstrafe für autoregressive Sprachmodelle LZ惩罚:对自动递减语言模式的 信息理论重复处罚 2504.20131v2

Authors (6): Antonio A. Ginart, Naveen Kodali, Jason Lee, Caiming Xiong, Silvio Savarese, John R. Emmons

We introduce the LZ penalty, a penalty specialized for reducing degenerate repetitions in autoregressive language models without loss of capability. The penalty is based on the codelengths in the LZ77 universal lossless compression algorithm. Through the lens of the prediction-compression duality, decoding the LZ penalty has the interpretation of sampling from the residual distribution after removing the information that is highly compressible. We demonstrate the LZ penalty enables state-of-the-art open-source reasoning models to operate with greedy (temperature zero) decoding without loss of capability and without instances of degenerate repetition. Both the industry-standard frequency penalty and repetition penalty are ineffective, incurring degenerate repetition rates of up to 4%.

nan


Article 418

Title@2025-07-01 (2): A Review on Sound Source Localization in Robotics: Focusing on Deep Learning Methods

Title: A Review on Sound Source Localization in Robotics: Focusing on Deep Learning Methods Eine Übersicht über die Lokalisierung von Schallquellen in der Robotik: Fokussierung auf Deep-Learning-Methoden 关于 “ 机器人中无害源本地化:侧重于深层学习方法 “ 的审评 2507.01143v1

Authors (3): Reza Jalayer, Masoud Jalayer, Amirali Baniasadi

Sound source localization (SSL) adds a spatial dimension to auditory perception, allowing a system to pinpoint the origin of speech, machinery noise, warning tones, or other acoustic events, capabilities that facilitate robot navigation, human-machine dialogue, and condition monitoring. While existing surveys provide valuable historical context, they typically address general audio applications and do not fully account for robotic constraints or the latest advancements in deep learning. This review addresses these gaps by offering a robotics-focused synthesis, emphasizing recent progress in deep learning methodologies. We start by reviewing classical methods such as Time Difference of Arrival (TDOA), beamforming, Steered-Response Power (SRP), and subspace analysis. Subsequently, we delve into modern machine learning (ML) and deep learning (DL) approaches, discussing traditional ML and neural networks (NNs), convolutional neural networks (CNNs), convolutional recurrent neural networks (CRNNs), and emerging attention-based architectures. The data and training strategy that are the two cornerstones of DL-based SSL are explored. Studies are further categorized by robot types and application domains to facilitate researchers in identifying relevant work for their specific contexts. Finally, we highlight the current challenges in SSL works in general, regarding environmental robustness, sound source multiplicity, and specific implementation constraints in robotics, as well as data and learning strategies in DL-based SSL. Also, we sketch promising directions to offer an actionable roadmap toward robust, adaptable, efficient, and explainable DL-based SSL for next-generation robots.

nan


Article 419

Title@2025-07-01 (2): Spectral Manifold Harmonization for Graph Imbalanced Regression

Title: Spectral Manifold Harmonization for Graph Imbalanced Regression Spektrale Manifold Harmonisierung für Graph Imbalanced Regression 图I平衡回归的光谱蒙面协调 2507.01132v1

Authors (5): Brenda Nogueira, Gabe Gomes, Meng Jiang, Nitesh V. Chawla, Nuno Moniz

Graph-structured data is ubiquitous in scientific domains, where models often face imbalanced learning settings. In imbalanced regression, domain preferences focus on specific target value ranges representing the most scientifically valuable cases; we observe a significant lack of research. In this paper, we present Spectral Manifold Harmonization (SMH), a novel approach for addressing this imbalanced regression challenge on graph-structured data by generating synthetic graph samples that preserve topological properties while focusing on often underrepresented target distribution regions. Conventional methods fail in this context because they either ignore graph topology in case generation or do not target specific domain ranges, resulting in models biased toward average target values. Experimental results demonstrate the potential of SMH on chemistry and drug discovery benchmark datasets, showing consistent improvements in predictive performance for target domain ranges.

nan


Article 420

Title@2025-07-01 (2): Tensor Decomposition Networks for Fast Machine Learning Interatomic Potential Computations

Title: Tensor Decomposition Networks for Fast Machine Learning Interatomic Potential Computations Tensor-Dekompositionsnetzwerke für schnelles maschinelles Lernen Interatomare Potentialrechnungen 快速机器学习跨原子潜能值计算器的电锯分解网络 2507.01131v1

Authors (9): Yuchao Lin, Cong Fu, Zachary Krueger, Haiyang Yu, Maho Nakata, Jianwen Xie, Emine Kucukbenli, Xiaofeng Qian, Shuiwang Ji

$\rm{SO}(3)$-equivariant networks are the dominant models for machine learning interatomic potentials (MLIPs). The key operation of such networks is the Clebsch-Gordan (CG) tensor product, which is computationally expensive. To accelerate the computation, we develop tensor decomposition networks (TDNs) as a class of approximately equivariant networks whose CG tensor products are replaced by low-rank tensor decompositions, such as the CANDECOMP/PARAFAC (CP) decomposition. With the CP decomposition, we prove (i) a uniform bound on the induced error of $\rm{SO}(3)$-equivariance, and (ii) the universality of approximating any equivariant bilinear map. To further reduce the number of parameters, we propose path-weight sharing that ties all multiplicity-space weights across the $O(L^3)$ CG paths into a single path without compromising equivariance, where $L$ is the maximum angular degree. The resulting layer acts as a plug-and-play replacement for tensor products in existing networks, and the computational complexity of tensor products is reduced from $O(L^6)$ to $O(L^4)$. We evaluate TDNs on PubChemQCR, a newly curated molecular relaxation dataset containing 105 million DFT-calculated snapshots. We also use existing datasets, including OC20, and OC22. Results show that TDNs achieve competitive performance with dramatic speedup in computations.

nan


Article 421

Title@2025-07-01 (2): On Design Principles for Private Adaptive Optimizers

Title: On Design Principles for Private Adaptive Optimizers Über Design-Prinzipien für private Adaptive Optimierer 私人适应性优化剂设计原则 2507.01129v1

Authors (3): Arun Ganesh, Brendan McMahan, Abhradeep Thakurta

The spherical noise added to gradients in differentially private (DP) training undermines the performance of adaptive optimizers like AdaGrad and Adam, and hence many recent works have proposed algorithms to address this challenge. However, the empirical results in these works focus on simple tasks and models and the conclusions may not generalize to model training in practice. In this paper we survey several of these variants, and develop better theoretical intuition for them as well as perform empirical studies comparing them. We find that a common intuition of aiming for unbiased estimates of second moments of gradients in adaptive optimizers is misguided, and instead that a simple technique called scale-then-privatize (which does not achieve unbiased second moments) has more desirable theoretical behaviors and outperforms all other variants we study on a small-scale language model training task. We additionally argue that scale-then-privatize causes the noise addition to better match the application of correlated noise mechanisms which are more desirable to use in practice.

nan


Article 422

Title@2025-07-01 (2): Temporal Chain of Thought: Long-Video Understanding by Thinking in Frames

Title: Temporal Chain of Thought: Long-Video Understanding by Thinking in Frames Temporale Gedankenkette: Langes-Video-Verstehen durch das Denken in Rahmen 时时思维链:通过思考框架来长期理解 2507.02001v1

Authors (5): Anurag Arnab, Ahmet Iscen, Mathilde Caron, Alireza Fathi, Cordelia Schmid

Despite recent advances in Vision-Language Models (VLMs), long-video understanding remains a challenging problem. Although state-of-the-art long-context VLMs can process around 1000 input frames, they still struggle to effectively leverage this sequence length, and succumb to irrelevant distractors within the context window. We present Temporal Chain of Thought, an inference strategy for video question-answering that curates the model’s input context. We use the VLM itself to iteratively identify and extract the most relevant frames from the video, which are then used for answering. We demonstrate how leveraging more computation at inference-time to select the most relevant context leads to improvements in accuracy, in agreement with recent work on inference-time scaling of LLMs. Moreover, we achieve state-of-the-art results on 4 diverse video question-answering datasets, showing consistent improvements with 3 different VLMs. In particular, our method shines on longer videos which would not otherwise fit within the model’s context window: On longer videos of more than 1 hour on LVBench, our approach using a context window of 32K outperforms the same VLM using standard inference with a 700K context window by 2.8 points.

nan


Article 423

Title@2025-07-01 (2): Landslide Detection and Mapping Using Deep Learning Across Multi-Source Satellite Data and Geographic Regions

Title: Landslide Detection and Mapping Using Deep Learning Across Multi-Source Satellite Data and Geographic Regions Erdrutscherkennung und Kartierung mittels Deep Learning in Multi-Source-Satellitendaten und geographischen Regionen 跨多来源卫星数据和地理区域利用深度学习进行滑坡探测和绘图 2507.01123v1

Authors (3): Rahul A. Burange, Harsh K. Shinde, Omkar Mutyalwar

Landslides pose severe threats to infrastructure, economies, and human lives, necessitating accurate detection and predictive mapping across diverse geographic regions. With advancements in deep learning and remote sensing, automated landslide detection has become increasingly effective. This study presents a comprehensive approach integrating multi-source satellite imagery and deep learning models to enhance landslide identification and prediction. We leverage Sentinel-2 multispectral data and ALOS PALSAR-derived slope and Digital Elevation Model (DEM) layers to capture critical environmental features influencing landslide occurrences. Various geospatial analysis techniques are employed to assess the impact of terra in characteristics, vegetation cover, and rainfall on detection accuracy. Additionally, we evaluate the performance of multiple stateof-the-art deep learning segmentation models, including U-Net, DeepLabV3+, and Res-Net, to determine their effectiveness in landslide detection. The proposed framework contributes to the development of reliable early warning systems, improved disaster risk management, and sustainable land-use planning. Our findings provide valuable insights into the potential of deep learning and multi-source remote sensing in creating robust, scalable, and transferable landslide prediction models.

nan


Article 424

Title@2025-07-01 (2): Why Neural Network Can Discover Symbolic Structures with Gradient-based Training: An Algebraic and Geometric Foundation for Neurosymbolic Reasoning

Title: Why Neural Network Can Discover Symbolic Structures with Gradient-based Training: An Algebraic and Geometric Foundation for Neurosymbolic Reasoning Warum neurales Netzwerk symbolische Strukturen mit gradient-based Training entdecken kann: Eine algebraische und geometrische Grundlage für neurosymbolische Reasoning 为何神经网络网络能够通过渐进式培训发现象征性结构:一个用于神经元理性的代数和几何基金会 2506.21797v2

Authors (2): Peihao Wang, Zhangyang Wang

We develop a theoretical framework that explains how discrete symbolic structures can emerge naturally from continuous neural network training dynamics. By lifting neural parameters to a measure space and modeling training as Wasserstein gradient flow, we show that under geometric constraints, such as group invariance, the parameter measure $\mu_t$ undergoes two concurrent phenomena: (1) a decoupling of the gradient flow into independent optimization trajectories over some potential functions, and (2) a progressive contraction on the degree of freedom. These potentials encode algebraic constraints relevant to the task and act as ring homomorphisms under a commutative semi-ring structure on the measure space. As training progresses, the network transitions from a high-dimensional exploration to compositional representations that comply with algebraic operations and exhibit a lower degree of freedom. We further establish data scaling laws for realizing symbolic tasks, linking representational capacity to the group invariance that facilitates symbolic solutions. This framework charts a principled foundation for understanding and designing neurosymbolic systems that integrate continuous learning with discrete algebraic reasoning.

nan


Article 425

Title@2025-07-01 (2): A Neural Operator based on Dynamic Mode Decomposition

Title: A Neural Operator based on Dynamic Mode Decomposition Ein neuraler Operator auf Basis von Dynamic Mode Decomposition 基于动态模式分解的神经操作器 2507.01117v1

Authors (4): Nikita Sakovich, Dmitry Aksenov, Ekaterina Pleshakova, Sergey Gataullin

The scientific computation methods development in conjunction with artificial intelligence technologies remains a hot research topic. Finding a balance between lightweight and accurate computations is a solid foundation for this direction. The study presents a neural operator based on the dynamic mode decomposition algorithm (DMD), mapping functional spaces, which combines DMD and deep learning (DL) for spatiotemporal processes efficient modeling. Solving PDEs for various initial and boundary conditions requires significant computational resources. The method suggested automatically extracts key modes and system dynamics using them to construct predictions, reducing computational costs compared to traditional numerical methods. The approach has demonstrated its efficiency through comparative analysis of performance with closest analogues DeepONet and FNO in the heat equation, Laplaces equation, and Burgers equation solutions approximation, where it achieves high reconstruction accuracy.

nan


Article 426

Title@2025-07-01 (2): A LoD of Gaussians: Unified Training and Rendering for Ultra-Large Scale Reconstruction with External Memory

Title: A LoD of Gaussians: Unified Training and Rendering for Ultra-Large Scale Reconstruction with External Memory A LoD of Gaussians: Unified Training und Rendering für Ultra-Large Scale Rekonstruktion mit externem Speicher Gaussian AD: 利用外部记忆进行超大型重建的统一培训和投标 2507.01110v1

Authors (6): Felix Windisch, Lukas Radl, Thomas Köhler, Michael Steiner, Dieter Schmalstieg, Markus Steinberger

Gaussian Splatting has emerged as a high-performance technique for novel view synthesis, enabling real-time rendering and high-quality reconstruction of small scenes. However, scaling to larger environments has so far relied on partitioning the scene into chunks – a strategy that introduces artifacts at chunk boundaries, complicates training across varying scales, and is poorly suited to unstructured scenarios such as city-scale flyovers combined with street-level views. Moreover, rendering remains fundamentally limited by GPU memory, as all visible chunks must reside in VRAM simultaneously. We introduce A LoD of Gaussians, a framework for training and rendering ultra-large-scale Gaussian scenes on a single consumer-grade GPU – without partitioning. Our method stores the full scene out-of-core (e.g., in CPU memory) and trains a Level-of-Detail (LoD) representation directly, dynamically streaming only the relevant Gaussians. A hybrid data structure combining Gaussian hierarchies with Sequential Point Trees enables efficient, view-dependent LoD selection, while a lightweight caching and view scheduling system exploits temporal coherence to support real-time streaming and rendering. Together, these innovations enable seamless multi-scale reconstruction and interactive visualization of complex scenes – from broad aerial views to fine-grained ground-level details.

nan


Article 427

Title@2025-07-01 (2): Geometry-aware 4D Video Generation for Robot Manipulation

Title: Geometry-aware 4D Video Generation for Robot Manipulation Geometry-aware 4D-Video-Generation für Roboter-Manipulation 用于机器人操纵的视频生成 2507.01099v1

Authors (6): Zeyi Liu, Shuang Li, Eric Cousineau, Siyuan Feng, Benjamin Burchfiel, Shuran Song

Understanding and predicting the dynamics of the physical world can enhance a robot’s ability to plan and interact effectively in complex environments. While recent video generation models have shown strong potential in modeling dynamic scenes, generating videos that are both temporally coherent and geometrically consistent across camera views remains a significant challenge. To address this, we propose a 4D video generation model that enforces multi-view 3D consistency of videos by supervising the model with cross-view pointmap alignment during training. This geometric supervision enables the model to learn a shared 3D representation of the scene, allowing it to predict future video sequences from novel viewpoints based solely on the given RGB-D observations, without requiring camera poses as inputs. Compared to existing baselines, our method produces more visually stable and spatially aligned predictions across multiple simulated and real-world robotic datasets. We further show that the predicted 4D videos can be used to recover robot end-effector trajectories using an off-the-shelf 6DoF pose tracker, supporting robust robot manipulation and generalization to novel camera viewpoints.

nan


Article 428

Title@2025-07-01 (2): Proof of a perfect platonic representation hypothesis

Title: Proof of a perfect platonic representation hypothesis Nachweis einer perfekten platonischen Darstellungshypothese 证明完美拉子代表假设的证据 2507.01098v1

Authors (2): Liu Ziyin, Isaac Chuang

In this note, we elaborate on and explain in detail the proof given by Ziyin et al. (2025) of the “perfect” Platonic Representation Hypothesis (PRH) for the embedded deep linear network model (EDLN). We show that if trained with SGD, two EDLNs with different widths and depths and trained on different data will become Perfectly Platonic, meaning that every possible pair of layers will learn the same representation up to a rotation. Because most of the global minima of the loss function are not Platonic, that SGD only finds the perfectly Platonic solution is rather extraordinary. The proof also suggests at least six ways the PRH can be broken. We also show that in the EDLN model, the emergence of the Platonic representations is due to the same reason as the emergence of progressive sharpening. This implies that these two seemingly unrelated phenomena in deep learning can, surprisingly, have a common cause. Overall, the theory and proof highlight the importance of understanding emergent “entropic forces” due to the irreversibility of SGD training and their role in representation learning. The goal of this note is to be instructive and avoid lengthy technical details.

nan


Article 429

Title@2025-07-01 (2): STONet: A neural operator for modeling solute transport in micro-cracked reservoirs

Title: STONet: A neural operator for modeling solute transport in micro-cracked reservoirs STONet: Ein neuronaler Operator zur Modellierung des Löttransports in mikrogekrackten Reservoirs Stonet:在微裂开的储油层中模拟溶液运输的神经操作员 2412.05576v2

Authors (4): Ehsan Haghighat, Mohammad Hesan Adeli, S Mohammad Mousavi, Ruben Juanes

In this work, we introduce a novel neural operator, the Solute Transport Operator Network (STONet), to efficiently model contaminant transport in micro-cracked porous media. STONet’s model architecture is specifically designed for this problem and uniquely integrates an enriched DeepONet structure with a transformer-based multi-head attention mechanism, enhancing performance without incurring additional computational overhead compared to existing neural operators. The model combines different networks to encode heterogeneous properties effectively and predict the rate of change of the concentration field to accurately model the transport process. The training data is obtained using finite element (FEM) simulations by random sampling of micro-fracture distributions and applied pressure boundary conditions, which capture diverse scenarios of fracture densities, orientations, apertures, lengths, and balance of pressure-driven to density-driven flow. Our numerical experiments demonstrate that, once trained, STONet achieves accurate predictions, with relative errors typically below 1% compared with FEM simulations while reducing runtime by approximately two orders of magnitude. This type of computational efficiency facilitates building digital twins for rapid assessment of subsurface contamination risks and optimization of environmental remediation strategies. The data and code for the paper will be published at https://github.com/ehsanhaghighat/STONet.

nan


Article 430

Title@2025-07-01 (2): Description of the Training Process of Neural Networks via Ergodic Theorem : Ghost nodes

Title: Description of the Training Process of Neural Networks via Ergodic Theorem : Ghost nodes Beschreibung des Ausbildungsprozesses von Neuronalen Netzwerken über Ergodic Theorem : Geisterknoten 描述Ergodic定理神经网络培训过程:幽灵节点 2507.01003v1

Authors (2): Eun-Ji Park, Sangwon Yun

Recent studies have proposed interpreting the training process from an ergodic perspective. Building on this foundation we present a unified framework for understanding and accelerating the training of deep neural networks via stochastic gradient descent. By analyzing the geometric landscape of the objective function we introduce a practical diagnostic, the running estimate of the largest Lyapunov exponent, which provably distinguishes genuine convergence toward stable minimizers from mere statistical stabilization near saddle points. We then propose a ghost category extension for standard classifiers that adds auxiliary ghost output nodes so the model gains extra descent directions that open a lateral corridor around narrow loss barriers and enable the optimizer to bypass poor basins during the early training phase. We show that this extension strictly reduces approximation error and that after sufficient convergence the ghost dimensions collapse and the extended model’s invariant law coincides with that of the original and there exists a path in the enlarged parameter space along which the total loss does not increase while the original loss decreases by an arbitrary margin. Taken together these results provide a principled architecture level intervention that accelerates early stage trainability while preserving asymptotic behavior.

nan


Article 431

Title@2025-07-01 (2): SPGD: Steepest Perturbed Gradient Descent Optimization

Title: SPGD: Steepest Perturbed Gradient Descent Optimization SPGD: Steepest Perturbed Gradient Descent Optimization SPGD: 最深的受扰动的渐变人后裔优化 2411.04946v2

Authors (2): Amir M. Vahedi, Horea T. Ilies

Optimization algorithms are pivotal in advancing various scientific and industrial fields but often encounter obstacles such as trapping in local minima, saddle points, and plateaus (flat regions), which makes the convergence to reasonable or near-optimal solutions particularly challenging. This paper presents the Steepest Perturbed Gradient Descent (SPGD), a novel algorithm that innovatively combines the principles of the gradient descent method with periodic uniform perturbation sampling to effectively circumvent these impediments and lead to better solutions whenever possible. SPGD is distinctively designed to generate a set of candidate solutions and select the one exhibiting the steepest loss difference relative to the current solution. It enhances the traditional gradient descent approach by integrating a strategic exploration mechanism that significantly increases the likelihood of escaping sub-optimal local minima and navigating complex optimization landscapes effectively. Our approach not only retains the directed efficiency of gradient descent but also leverages the exploratory benefits of stochastic perturbations, thus enabling a more comprehensive search for global optima across diverse problem spaces. We demonstrate the efficacy of SPGD in solving the 3D component packing problem, an NP-hard challenge. Preliminary results show a substantial improvement over four established methods, particularly on response surfaces with complex topographies and in multidimensional non-convex continuous optimization problems. Comparative analyses with established 2D benchmark functions highlight SPGD’s superior performance, showcasing its ability to navigate complex optimization landscapes. These results emphasize SPGD’s potential as a versatile tool for a wide range of optimization problems.

nan


Article 432

Title@2025-07-01 (2): Diffuse-CLoC: Guided Diffusion for Physics-based Character Look-ahead Control

Title: Diffuse-CLoC: Guided Diffusion for Physics-based Character Look-ahead Control Diffuse-CLoC: Geführte Diffusion für physikbasierte Charakter-Blick-Ahead-Steuerung Diffuse-CLOC:向导扩散,用于基于物理的字符外观控制 2503.11801v2

Authors (8): Xiaoyu Huang, Takara Truong, Yunbo Zhang, Fangzhou Yu, Jean Pierre Sleiman, Jessica Hodgins, Koushil Sreenath, Farbod Farshidian

We present Diffuse-CLoC, a guided diffusion framework for physics-based look-ahead control that enables intuitive, steerable, and physically realistic motion generation. While existing kinematics motion generation with diffusion models offer intuitive steering capabilities with inference-time conditioning, they often fail to produce physically viable motions. In contrast, recent diffusion-based control policies have shown promise in generating physically realizable motion sequences, but the lack of kinematics prediction limits their steerability. Diffuse-CLoC addresses these challenges through a key insight: modeling the joint distribution of states and actions within a single diffusion model makes action generation steerable by conditioning it on the predicted states. This approach allows us to leverage established conditioning techniques from kinematic motion generation while producing physically realistic motions. As a result, we achieve planning capabilities without the need for a high-level planner. Our method handles a diverse set of unseen long-horizon downstream tasks through a single pre-trained model, including static and dynamic obstacle avoidance, motion in-betweening, and task-space control. Experimental results show that our method significantly outperforms the traditional hierarchical framework of high-level motion diffusion and low-level tracking.

nan


Article 433

Title@2025-07-01 (2): Box Pose and Shape Estimation and Domain Adaptation for Large-Scale Warehouse Automation

Title: Box Pose and Shape Estimation and Domain Adaptation for Large-Scale Warehouse Automation Box Pose und Formschätzung und Domain-Anpassung für großformatige Lagerautomatisierung 用于大型仓库自动化的框框和形状估计及域域适应 2507.00984v1

Authors (6): Xihang Yu, Rajat Talak, Jingnan Shi, Ulrich Viereck, Igor Gilitschenski, Luca Carlone

Modern warehouse automation systems rely on fleets of intelligent robots that generate vast amounts of data – most of which remains unannotated. This paper develops a self-supervised domain adaptation pipeline that leverages real-world, unlabeled data to improve perception models without requiring manual annotations. Our work focuses specifically on estimating the pose and shape of boxes and presents a correct-and-certify pipeline for self-supervised box pose and shape estimation. We extensively evaluate our approach across a range of simulated and real industrial settings, including adaptation to a large-scale real-world dataset of 50,000 images. The self-supervised model significantly outperforms models trained solely in simulation and shows substantial improvements over a zero-shot 3D bounding box estimation baseline.

nan


Article 434

Title@2025-07-01 (2): Enhancing LLM Agent Safety via Causal Influence Prompting

Title: Enhancing LLM Agent Safety via Causal Influence Prompting Verbesserung der Sicherheit von LLM-Agenten durch ursächlichen Einfluss 通过原因影响促进增强LLM代理安全 2507.00979v1

Authors (5): Dongyoon Hahm, Woogyeol Jin, June Suk Choi, Sungsoo Ahn, Kimin Lee

As autonomous agents powered by large language models (LLMs) continue to demonstrate potential across various assistive tasks, ensuring their safe and reliable behavior is crucial for preventing unintended consequences. In this work, we introduce CIP, a novel technique that leverages causal influence diagrams (CIDs) to identify and mitigate risks arising from agent decision-making. CIDs provide a structured representation of cause-and-effect relationships, enabling agents to anticipate harmful outcomes and make safer decisions. Our approach consists of three key steps: (1) initializing a CID based on task specifications to outline the decision-making process, (2) guiding agent interactions with the environment using the CID, and (3) iteratively refining the CID based on observed behaviors and outcomes. Experimental results demonstrate that our method effectively enhances safety in both code execution and mobile device control tasks.

nan


Article 435

Title@2025-07-01 (2): Meta-Posterior Consistency for the Bayesian Inference of Metastable System

Title: Meta-Posterior Consistency for the Bayesian Inference of Metastable System Meta-Posterior-Konsistenz für die Bayesische Schlussfolgerung des Metastable-Systems 贝耶斯元元元系统推论的元相近性 2408.01868v2

Authors (2): Zachary P Adams, Sayan Mukherjee

The vast majority of the literature on learning dynamical systems or stochastic processes from time series has focused on stable or ergodic systems, for both Bayesian and frequentist inference procedures. However, most real-world systems are only metastable, that is, the dynamics appear to be stable on some time scale, but are in fact unstable over longer time scales. Consistency of inference for metastable systems may not be possible, but one can ask about metaconsistency: Do inference procedures converge when observations are taken over a large but finite time interval, but diverge on longer time scales? In this paper we introduce, discuss, and quantify metaconsistency in a Bayesian framework. We discuss how metaconsistency can be exploited to efficiently infer a model for a sub-system of a larger system, where inference on the global behavior may require much more data, or there is no theoretical guarantee as to the asymptotic success of inference procedures. We also discuss the relation between metaconsistency and the spectral properties of the model dynamical system in the case of uniformly ergodic and non-ergodic diffusions.

nan


Article 436

Title@2025-07-01 (2): Reasoning as an Adaptive Defense for Safety

Title: Reasoning as an Adaptive Defense for Safety Vernunft als adaptive Verteidigung für die Sicherheit 以 “ 安全适应防卫 “ 为由 2507.00971v1

Authors (4): Taeyoun Kim, Fahim Tajwar, Aditi Raghunathan, Aviral Kumar

Reasoning methods that adaptively allocate test-time compute have advanced LLM performance on easy to verify domains such as math and code. In this work, we study how to utilize this approach to train models that exhibit a degree of robustness to safety vulnerabilities, and show that doing so can provide benefits. We build a recipe called $\textit{TARS}$ (Training Adaptive Reasoners for Safety), a reinforcement learning (RL) approach that trains models to reason about safety using chain-of-thought traces and a reward signal that balances safety with task completion. To build TARS, we identify three critical design choices: (1) a “lightweight” warmstart SFT stage, (2) a mix of harmful, harmless, and ambiguous prompts to prevent shortcut behaviors such as too many refusals, and (3) a reward function to prevent degeneration of reasoning capabilities during training. Models trained with TARS exhibit adaptive behaviors by spending more compute on ambiguous queries, leading to better safety-refusal trade-offs. They also internally learn to better distinguish between safe and unsafe prompts and attain greater robustness to both white-box (e.g., GCG) and black-box attacks (e.g., PAIR). Overall, our work provides an effective, open recipe for training LLMs against jailbreaks and harmful requests by reasoning per prompt.

nan


Article 437

Title@2025-07-01 (2): Scalable Feature Learning on Huge Knowledge Graphs for Downstream Machine Learning

Title: Scalable Feature Learning on Huge Knowledge Graphs for Downstream Machine Learning Skalierbares Feature-Lernen auf riesigen Wissensgraphen für Downstream Machine Learning 下流机器学习大型知识图的可缩放功能学习 2507.00965v1

Authors (2): Félix Lefebvre, Gaël Varoquaux

Many machine learning tasks can benefit from external knowledge. Large knowledge graphs store such knowledge, and embedding methods can be used to distill it into ready-to-use vector representations for downstream applications. For this purpose, current models have however two limitations: they are primarily optimized for link prediction, via local contrastive learning, and they struggle to scale to the largest graphs due to GPU memory limits. To address these, we introduce SEPAL: a Scalable Embedding Propagation ALgorithm for large knowledge graphs designed to produce high-quality embeddings for downstream tasks at scale. The key idea of SEPAL is to enforce global embedding alignment by optimizing embeddings only on a small core of entities, and then propagating them to the rest of the graph via message passing. We evaluate SEPAL on 7 large-scale knowledge graphs and 46 downstream machine learning tasks. Our results show that SEPAL significantly outperforms previous methods on downstream tasks. In addition, SEPAL scales up its base embedding model, enabling fitting huge knowledge graphs on commodity hardware.

nan


Article 438

Title@2025-07-01 (2): Benchmarking the Discovery Engine

Title: Benchmarking the Discovery Engine Benchmarking der Discovery Engine 设定发现引擎基准 2507.00964v1

Authors (9): Jack Foxabbott, Arush Tagade, Andrew Cusick, Robbie McCorkell, Leo McKee-Reid, Jugal Patel, Jamie Rumbelow, Jessica Rumbelow, Zohreh Shams

The Discovery Engine is a general purpose automated system for scientific discovery, which combines machine learning with state-of-the-art ML interpretability to enable rapid and robust scientific insight across diverse datasets. In this paper, we benchmark the Discovery Engine against five recent peer-reviewed scientific publications applying machine learning across medicine, materials science, social science, and environmental science. In each case, the Discovery Engine matches or exceeds prior predictive performance while also generating deeper, more actionable insights through rich interpretability artefacts. These results demonstrate its potential as a new standard for automated, interpretable scientific modelling that enables complex knowledge discovery from data.

nan


Article 439

Title@2025-07-01 (2): Not All Water Consumption Is Equal: A Water Stress Weighted Metric for Sustainable Computing

Title: Not All Water Consumption Is Equal: A Water Stress Weighted Metric for Sustainable Computing Nicht jeder Wasserverbrauch ist gleich: Ein Wasserdruck-gewichtetes Metric für nachhaltiges Rechnen 并非所有水消耗量都相等:可持续计算中的水应激反应加权计量 2506.22773v2

Authors (3): Yanran Wu, Inez Hua, Yi Ding

Water consumption is an increasingly critical dimension of computing sustainability, especially as AI workloads rapidly scale. However, current water impact assessment often overlooks where and when water stress is more severe. To fill in this gap, we present SCARF, the first general framework that evaluates water impact of computing by factoring in both spatial and temporal variations in water stress. SCARF calculates an Adjusted Water Impact (AWI) metric that considers both consumption volume and local water stress over time. Through three case studies on LLM serving, datacenters, and semiconductor fabrication plants, we show the hidden opportunities for reducing water impact by optimizing location and time choices, paving the way for water-sustainable computing. The code is available at https://github.com/jojacola/SCARF.

nan


Article 440

Title@2025-07-01 (2): Large Language Model Confidence Estimation via Black-Box Access

Title: Large Language Model Confidence Estimation via Black-Box Access Große Sprachmodell-Konfidenzschätzung über Black-Box-Zugriff 通过黑箱访问大语言模型信任度估计 2406.04370v4

Authors (5): Tejaswini Pedapati, Amit Dhurandhar, Soumya Ghosh, Soham Dan, Prasanna Sattigeri

Estimating uncertainty or confidence in the responses of a model can be significant in evaluating trust not only in the responses, but also in the model as a whole. In this paper, we explore the problem of estimating confidence for responses of large language models (LLMs) with simply black-box or query access to them. We propose a simple and extensible framework where, we engineer novel features and train a (interpretable) model (viz. logistic regression) on these features to estimate the confidence. We empirically demonstrate that our simple framework is effective in estimating confidence of Flan-ul2, Llama-13b, Mistral-7b and GPT-4 on four benchmark Q\&A tasks as well as of Pegasus-large and BART-large on two benchmark summarization tasks with it surpassing baselines by even over $10\%$ (on AUROC) in some cases. Additionally, our interpretable approach provides insight into features that are predictive of confidence, leading to the interesting and useful discovery that our confidence models built for one LLM generalize zero-shot across others on a given dataset.

nan


Article 441

Title@2025-07-01 (2): Uncertainty Quantification of Wind Gust Predictions in the Northeast United States: An Evidential Neural Network and Explainable Artificial Intelligence Approach

Title: Uncertainty Quantification of Wind Gust Predictions in the Northeast United States: An Evidential Neural Network and Explainable Artificial Intelligence Approach Ungewissheit Quantifizierung von Windgust-Vorhersagen im Nordosten der Vereinigten Staaten: Ein evidentielles neurales Netzwerk und erklärbarer Ansatz für künstliche Intelligenz 美国东北部风速预测的不确定性量化:证据神经网络和可解释的人工情报方法 2502.00300v2

Authors (5): Israt Jahan, John S. Schreck, David John Gagne, Charlie Becker, Marina Astitha

Machine learning algorithms have shown promise in reducing bias in wind gust predictions, while still underpredicting high gusts. Uncertainty quantification (UQ) supports this issue by identifying when predictions are reliable or need cautious interpretation. Using data from 61 extratropical storms in the Northeastern USA, we introduce evidential neural network (ENN) as a novel approach for UQ in gust predictions, leveraging atmospheric variables from the Weather Research and Forecasting (WRF) model. Explainable AI techniques suggested that key predictive features contributed to higher uncertainty, which correlated strongly with storm intensity and spatial gust gradients. Compared to WRF, ENN demonstrated a 47% reduction in RMSE and allowed the construction of gust prediction intervals without an ensemble, successfully capturing at least 95% of observed gusts at 179 out of 266 stations. From an operational perspective, providing gust forecasts with quantified uncertainty enhances stakeholders’ confidence in risk assessment and response planning for extreme gust events.

nan


Article 442

Title@2025-07-01 (2): Atmospheric model-trained machine learning selection and classification of ultracool TY dwarfs

Title: Atmospheric model-trained machine learning selection and classification of ultracool TY dwarfs Atmosphärische modellgestützte maschinelle Lernauswahl und Klassifizierung von ultracoolen TY-Zwergen 超酷醇TTY矮星的大气示范培训机学习选择和分类 2507.00957v1

Authors (1): Ankit Biswas

The T and Y spectral classes represent the coolest and lowest-mass population of brown dwarfs, yet their census remains incomplete due to limited statistics. Existing detection frameworks are often constrained to identifying M, L, and early T dwarfs, owing to the sparse observational sample of ultracool dwarfs (UCDs) at later types. This paper presents a novel machine learning framework capable of detecting and classifying late-T and Y dwarfs, trained entirely on synthetic photometry from atmospheric models. Utilizing grids from the ATMO 2020 and Sonora Bobcat models, I produce a training dataset over two orders of magnitude larger than any empirical set of >T6 UCDs. Polynomial color relations fitted to the model photometry are used to assign spectral types to these synthetic models, which in turn train an ensemble of classifiers to identify and classify the spectral type of late UCDs. The model is highly performant when validating on both synthetic and empirical datasets, verifying catalogs of known UCDs with object classification metrics >99% and an average spectral type precision within 0.35 +/- 0.37 subtypes. Application of the model to a 1.5 degree region around Pisces and the UKIDSS UDS field results in the discovery of one previously uncatalogued T8.2 candidate, demonstrating the ability of this model-trained approach in discovering faint, late-type UCDs from photometric catalogs.

nan


Article 443

Title@2025-07-01 (2): MLR-Bench: Evaluating AI Agents on Open-Ended Machine Learning Research

Title: MLR-Bench: Evaluating AI Agents on Open-Ended Machine Learning Research MLR-Bench: Bewertung von KI-Agenten auf Open-Ended Machine Learning Research MLR-Bench:评估AI公司在开放式机械学习研究方面的代理机构 2505.19955v2

Authors (10): Hui Chen, Miao Xiong, Yujie Lu, Wei Han, Ailin Deng, Yufei He, Jiaying Wu, Yibo Li, Yue Liu, Bryan Hooi

Recent advancements in AI agents have demonstrated their growing potential to drive and support scientific discovery. In this work, we introduce MLR-Bench, a comprehensive benchmark for evaluating AI agents on open-ended machine learning research. MLR-Bench includes three key components: (1) 201 research tasks sourced from NeurIPS, ICLR, and ICML workshops covering diverse ML topics; (2) MLR-Judge, an automated evaluation framework combining LLM-based reviewers with carefully designed review rubrics to assess research quality; and (3) MLR-Agent, a modular agent scaffold capable of completing research tasks through four stages: idea generation, proposal formulation, experimentation, and paper writing. Our framework supports both stepwise assessment across these distinct research stages, and end-to-end evaluation of the final research paper. We then use MLR-Bench to evaluate six frontier LLMs and an advanced coding agent, finding that while LLMs are effective at generating coherent ideas and well-structured papers, current coding agents frequently (e.g., in 80% of the cases) produce fabricated or invalidated experimental results–posing a major barrier to scientific reliability. We validate MLR-Judge through human evaluation, showing high agreement with expert reviewers, supporting its potential as a scalable tool for research evaluation. We open-source MLR-Bench to help the community benchmark, diagnose, and improve AI research agents toward trustworthy and transparent scientific discovery.

nan


Article 444

Title@2025-07-01 (2): MVP: Winning Solution to SMP Challenge 2025 Video Track

Title: MVP: Winning Solution to SMP Challenge 2025 Video Track MVP: Gewinnlösung für SMP Challenge 2025 Video Track MVP: 赢得SMP挑战2025视频音轨的解决方案 2507.00950v1

Authors (7): Liliang Ye, Yunyao Zhang, Yafeng Wu, Yi-Ping Phoebe Chen, Junqing Yu, Wei Yang, Zikai Song

Social media platforms serve as central hubs for content dissemination, opinion expression, and public engagement across diverse modalities. Accurately predicting the popularity of social media videos enables valuable applications in content recommendation, trend detection, and audience engagement. In this paper, we present Multimodal Video Predictor (MVP), our winning solution to the Video Track of the SMP Challenge 2025. MVP constructs expressive post representations by integrating deep video features extracted from pretrained models with user metadata and contextual information. The framework applies systematic preprocessing techniques, including log-transformations and outlier removal, to improve model robustness. A gradient-boosted regression model is trained to capture complex patterns across modalities. Our approach ranked first in the official evaluation of the Video Track, demonstrating its effectiveness and reliability for multimodal video popularity prediction on social platforms. The source code is available at https://anonymous.4open.science/r/SMPDVideo.

nan


Article 445

Title@2025-07-01 (2): Time Series Foundation Models are Flow Predictors

Title: Time Series Foundation Models are Flow Predictors Time Series Foundation Modelle sind Flow Predictors 时间系列基础模型是流动预测模型 2507.00945v1

Authors (3): Massimiliano Luca, Ciro Beneduce, Bruno Lepri

We investigate the effectiveness of time series foundation models (TSFMs) for crowd flow prediction, focusing on Moirai and TimesFM. Evaluated on three real-world mobility datasets-Bike NYC, Taxi Beijing, and Spanish national OD flows-these models are deployed in a strict zero-shot setting, using only the temporal evolution of each OD flow and no explicit spatial information. Moirai and TimesFM outperform both statistical and deep learning baselines, achieving up to 33% lower RMSE, 39% lower MAE and up to 49% higher CPC compared to state-of-the-art competitors. Our results highlight the practical value of TSFMs for accurate, scalable flow prediction, even in scenarios with limited annotated data or missing spatial context.

nan


Article 446

Title@2025-07-01 (2): RaGNNarok: A Light-Weight Graph Neural Network for Enhancing Radar Point Clouds on Unmanned Ground Vehicles

Title: RaGNNarok: A Light-Weight Graph Neural Network for Enhancing Radar Point Clouds on Unmanned Ground Vehicles RaGNNarok: Ein Licht-Gewichts-Graph-Neural-Netzwerk zur Verbesserung von Radarpunktwolken auf unbemannten Bodenfahrzeugen RaGNNNARok:增强无人驾驶地面飞行器雷达点云的光重图形神经网络 2507.00937v1

Authors (7): David Hunt, Shaocheng Luo, Spencer Hallyburton, Shafii Nillongo, Yi Li, Tingjun Chen, Miroslav Pajic

Low-cost indoor mobile robots have gained popularity with the increasing adoption of automation in homes and commercial spaces. However, existing lidar and camera-based solutions have limitations such as poor performance in visually obscured environments, high computational overhead for data processing, and high costs for lidars. In contrast, mmWave radar sensors offer a cost-effective and lightweight alternative, providing accurate ranging regardless of visibility. However, existing radar-based localization suffers from sparse point cloud generation, noise, and false detections. Thus, in this work, we introduce RaGNNarok, a real-time, lightweight, and generalizable graph neural network (GNN)-based framework to enhance radar point clouds, even in complex and dynamic environments. With an inference time of just 7.3 ms on the low-cost Raspberry Pi 5, RaGNNarok runs efficiently even on such resource-constrained devices, requiring no additional computational resources. We evaluate its performance across key tasks, including localization, SLAM, and autonomous navigation, in three different environments. Our results demonstrate strong reliability and generalizability, making RaGNNarok a robust solution for low-cost indoor mobile robots.

nan


Article 447

Title@2025-07-01 (2): Development and Comparative Evaluation of Three Artificial Intelligence Models (NLP, LLM, JEPA) for Predicting Triage in Emergency Departments: A 7-Month Retrospective Proof-of-Concept

Title: Development and Comparative Evaluation of Three Artificial Intelligence Models (NLP, LLM, JEPA) for Predicting Triage in Emergency Departments: A 7-Month Retrospective Proof-of-Concept Entwicklung und vergleichende Auswertung von drei Modellen der Künstlichen Intelligenz (NLP, LLM, JEPA) zur Vorhersage von Triage in Notfallabteilungen: Eine 7-Monats-Retrospektive Proof-of-Concept 应急部门预测外逃的三个人工情报模型(NLP、LLM、JEPA)的开发和比较评价:7月7日回溯性概念校对 2507.01080v1

Authors (5): Edouard Lansiaux, Ramy Azzouz, Emmanuel Chazard, Amélie Vromant, Eric Wiel

Triage errors, including undertriage and overtriage, are persistent challenges in emergency departments (EDs). With increasing patient influx and staff shortages, the integration of artificial intelligence (AI) into triage protocols has gained attention. This study compares the performance of three AI models [Natural Language Processing (NLP), Large Language Models (LLM), and Joint Embedding Predictive Architecture (JEPA)] in predicting triage outcomes against the FRENCH scale and clinical practice.We conducted a retrospective analysis of a prospectively recruited cohort gathering adult patient triage data over a 7-month period at the Roger Salengro Hospital ED (Lille, France). Three AI models were trained and validated : (1) TRIAGEMASTER (NLP), (2) URGENTIAPARSE (LLM), and (3) EMERGINET (JEPA). Data included demographic details, verbatim chief complaints, vital signs, and triage outcomes based on the FRENCH scale and GEMSA coding. The primary outcome was the concordance of AI-predicted triage level with the FRENCH gold-standard. It was assessed thanks to various indicators : F1-Score, Weighted Kappa, Spearman, MAE, RMSE. The LLM model (URGENTIAPARSE) showed higher accuracy (composite score: 2.514) compared to JEPA (EMERGINET, 0.438) and NLP (TRIAGEMASTER, -3.511), outperforming nurse triage (-4.343). Secondary analyses highlighted the effectiveness of URGENTIAPARSE in predicting hospitalization needs (GEMSA) and its robustness with structured data versus raw transcripts (either for GEMSA prediction or for FRENCH prediction). LLM architecture, through abstraction of patient representations, offers the most accurate triage predictions among tested models. Integrating AI into ED workflows could enhance patient safety and operational efficiency, though integration into clinical workflows requires addressing model limitations and ensuring ethical transparency.

nan


Article 448

Title@2025-07-01 (2): Conformal Inference under High-Dimensional Covariate Shifts via Likelihood-Ratio Regularization

Title: Conformal Inference under High-Dimensional Covariate Shifts via Likelihood-Ratio Regularization Konforme Schlussfolgerung unter hochdimensionalen Kovariate Verschiebungen über Likelihood-Ratio Regularisierung 通过传统-拉蒂奥正规化,在高多样性可变性转变下发生非正式推论 2502.13030v5

Authors (5): Sunay Joshi, Shayan Kiyani, George Pappas, Edgar Dobriban, Hamed Hassani

We consider the problem of conformal prediction under covariate shift. Given labeled data from a source domain and unlabeled data from a covariate shifted target domain, we seek to construct prediction sets with valid marginal coverage in the target domain. Most existing methods require estimating the unknown likelihood ratio function, which can be prohibitive for high-dimensional data such as images. To address this challenge, we introduce the likelihood ratio regularized quantile regression (LR-QR) algorithm, which combines the pinball loss with a novel choice of regularization in order to construct a threshold function without directly estimating the unknown likelihood ratio. We show that the LR-QR method has coverage at the desired level in the target domain, up to a small error term that we can control. Our proofs draw on a novel analysis of coverage via stability bounds from learning theory. Our experiments demonstrate that the LR-QR algorithm outperforms existing methods on high-dimensional prediction tasks, including a regression task for the Communities and Crime dataset, an image classification task from the WILDS repository, and an LLM question-answering task on the MMLU benchmark.

nan


Article 449

Title: Understanding Generalization in Node and Link Prediction Verallgemeinerung in Knoten und Link-Vorhersage verstehen 理解节点和链接中的通用化 2507.00927v1

Authors (3): Antonis Vasileiou, Timo Stoll, Christopher Morris

Using message-passing graph neural networks (MPNNs) for node and link prediction is crucial in various scientific and industrial domains, which has led to the development of diverse MPNN architectures. Besides working well in practical settings, their ability to generalize beyond the training set remains poorly understood. While some studies have explored MPNNs’ generalization in graph-level prediction tasks, much less attention has been given to node- and link-level predictions. Existing works often rely on unrealistic i.i.d.\@ assumptions, overlooking possible correlations between nodes or links, and assuming fixed aggregation and impractical loss functions while neglecting the influence of graph structure. In this work, we introduce a unified framework to analyze the generalization properties of MPNNs in inductive and transductive node and link prediction settings, incorporating diverse architectural parameters and loss functions and quantifying the influence of graph structure. Additionally, our proposed generalization framework can be applied beyond graphs to any classification task under the inductive or transductive setting. Our empirical study supports our theoretical insights, deepening our understanding of MPNNs’ generalization capabilities in these tasks.

nan


Article 450

Title@2025-07-01 (2): HyperFusion: Hierarchical Multimodal Ensemble Learning for Social Media Popularity Prediction

Title: HyperFusion: Hierarchical Multimodal Ensemble Learning for Social Media Popularity Prediction HyperFusion: Hierarchisches multimodales Ensemble Lernen für Social Media Popularity Prediction HyperFusion:社会媒体大众化预测的等级式多模式学习组合组合 2507.00926v1

Authors (7): Liliang Ye, Yunyao Zhang, Yafeng Wu, Yi-Ping Phoebe Chen, Junqing Yu, Wei Yang, Zikai Song

Social media popularity prediction plays a crucial role in content optimization, marketing strategies, and user engagement enhancement across digital platforms. However, predicting post popularity remains challenging due to the complex interplay between visual, textual, temporal, and user behavioral factors. This paper presents HyperFusion, a hierarchical multimodal ensemble learning framework for social media popularity prediction. Our approach employs a three-tier fusion architecture that progressively integrates features across abstraction levels: visual representations from CLIP encoders, textual embeddings from transformer models, and temporal-spatial metadata with user characteristics. The framework implements a hierarchical ensemble strategy combining CatBoost, TabNet, and custom multi-layer perceptrons. To address limited labeled data, we propose a two-stage training methodology with pseudo-labeling and iterative refinement. We introduce novel cross-modal similarity measures and hierarchical clustering features that capture inter-modal dependencies. Experimental results demonstrate that HyperFusion achieves competitive performance on the SMP challenge dataset. Our team achieved third place in the SMP Challenge 2025 (Image Track). The source code is available at https://anonymous.4open.science/r/SMPDImage.

nan


Article 451

Title@2025-07-01 (2): Bridging SFT and DPO for Diffusion Model Alignment with Self-Sampling Preference Optimization

Title: Bridging SFT and DPO for Diffusion Model Alignment with Self-Sampling Preference Optimization Überbrückung von SFT und DPO für Diffusionsmodell-Ausrichtung mit selbstprobender Preference-Optimierung 连接SFT和DPPO的传播模型与自样最佳优化 2410.05255v2

Authors (11): Daoan Zhang, Guangchen Lan, Dong-Jun Han, Wenlin Yao, Xiaoman Pan, Hongming Zhang, Mingxiao Li, Pengcheng Chen, Yu Dong, Christopher Brinton, Jiebo Luo

Existing post-training techniques are broadly categorized into supervised fine-tuning (SFT) and reinforcement learning (RL) methods; the former is stable during training but suffers from limited generalization, while the latter, despite its stronger generalization capability, relies on additional preference data or reward models and carries the risk of reward exploitation. In order to preserve the advantages of both SFT and RL – namely, eliminating the need for paired data and reward models while retaining the training stability of SFT and the generalization ability of RL – a new alignment method, Self-Sampling Preference Optimization (SSPO), is proposed in this paper. SSPO introduces a Random Checkpoint Replay (RCR) strategy that utilizes historical checkpoints to construct paired data, thereby effectively mitigating overfitting. Simultaneously, a Self-Sampling Regularization (SSR) strategy is employed to dynamically evaluate the quality of generated samples; when the generated samples are more likely to be winning samples, the approach automatically switches from DPO (Direct Preference Optimization) to SFT, ensuring that the training process accurately reflects the quality of the samples. Experimental results demonstrate that SSPO not only outperforms existing methods on text-to-image benchmarks, but its effectiveness has also been validated in text-to-video tasks. We validate SSPO across both text-to-image and text-to-video benchmarks. SSPO surpasses all previous approaches on the text-to-image benchmarks and demonstrates outstanding performance on the text-to-video benchmarks.

nan


Article 452

Title@2025-07-01 (2): Persistence Paradox in Dynamic Science

Title: Persistence Paradox in Dynamic Science Persistenzparadox in der dynamischen Wissenschaft 动态科学中的持久性悖论 2506.22729v2

Authors (2): Honglin Bao, Kai Li

Persistence is often regarded as a virtue in science. In this paper, however, we challenge this conventional view by highlighting its contextual nature, particularly how persistence can become a liability during periods of paradigm shift. We focus on the deep learning revolution catalyzed by AlexNet in 2012. Analyzing the 20-year career trajectories of over 5,000 scientists who were active in top machine learning venues during the preceding decade, we examine how their research focus and output evolved. We first uncover a dynamic period in which leading venues increasingly prioritized cutting-edge deep learning developments that displaced relatively traditional statistical learning methods. Scientists responded to these changes in markedly different ways. Those who were previously successful or affiliated with old teams adapted more slowly, experiencing what we term a rigidity penalty - a reluctance to embrace new directions leading to a decline in scientific impact, as measured by citation percentile rank. In contrast, scientists who pursued strategic adaptation - selectively pivoting toward emerging trends while preserving weak connections to prior expertise - reaped the greatest benefits. Taken together, our macro- and micro-level findings show that scientific breakthroughs act as mechanisms that reconfigure power structures within a field.

nan


Article 453

Title@2025-07-01 (2): The Hidden Life of Tokens: Reducing Hallucination of Large Vision-Language Models via Visual Information Steering

Title: The Hidden Life of Tokens: Reducing Hallucination of Large Vision-Language Models via Visual Information Steering Das verborgene Leben von Tokens: Verringerung der Halluzination großer Vision-Sprachenmodelle durch visuelle Informationslenkung 《神的隐秘生命:通过视觉信息指导减少大型视觉语言模型的幻觉》 2502.03628v2

Authors (10): Zhuowei Li, Haizhou Shi, Yunhe Gao, Di Liu, Zhenting Wang, Yuxiao Chen, Ting Liu, Long Zhao, Hao Wang, Dimitris N. Metaxas

Large Vision-Language Models (LVLMs) can reason effectively over both textual and visual inputs, but they tend to hallucinate syntactically coherent yet visually ungrounded contents. In this paper, we investigate the internal dynamics of hallucination by examining the tokens logits ranking throughout the generation process, revealing three key patterns in how LVLMs process information: (1) gradual visual information loss - visually grounded tokens gradually become less favored throughout generation, and (2) early excitation - semantically meaningful tokens achieve peak activation in the layers earlier than the final layer. (3) hidden genuine information - visually grounded tokens though not being eventually decoded still retain relatively high rankings at inference. Based on these insights, we propose VISTA (Visual Information Steering with Token-logit Augmentation), a training-free inference-time intervention framework that reduces hallucination while promoting genuine information. VISTA works by combining two complementary approaches: reinforcing visual information in activation space and leveraging early layer activations to promote semantically meaningful decoding. Compared to existing methods, VISTA requires no external supervision and is applicable to various decoding strategies. Extensive experiments show that VISTA on average reduces hallucination by about 40% on evaluated open-ended generation task, and it consistently outperforms existing methods on four benchmarks across four architectures under three decoding strategies. Code is available at https://github.com/LzVv123456/VISTA.

nan


Article 454

Title@2025-07-01 (2): TABASCO: A Fast, Simplified Model for Molecular Generation with Improved Physical Quality

Title: TABASCO: A Fast, Simplified Model for Molecular Generation with Improved Physical Quality TABASCO: Ein schnelles, vereinfachtes Modell für die molekulare Erzeugung mit verbesserter physikalischer Qualität TABASCO:一个快速简化的提高物理质量的分子生成模型 2507.00899v1

Authors (4): Carlos Vonessen, Charles Harris, Miruna Cretu, Pietro Liò

State-of-the-art models for 3D molecular generation are based on significant inductive biases, SE(3), permutation equivariance to respect symmetry and graph message-passing networks to capture local chemistry, yet the generated molecules still struggle with physical plausibility. We introduce TABASCO which relaxes these assumptions: The model has a standard non-equivariant transformer architecture, treats atoms in a molecule as sequences and reconstructs bonds deterministically after generation. The absence of equivariant layers and message passing allows us to significantly simplify the model architecture and scale data throughput. On the GEOM-Drugs benchmark TABASCO achieves state-of-the-art PoseBusters validity and delivers inference roughly 10x faster than the strongest baseline, while exhibiting emergent rotational equivariance despite symmetry not being hard-coded. Our work offers a blueprint for training minimalist, high-throughput generative models suited to specialised tasks such as structure- and pharmacophore-based drug design. We provide a link to our implementation at github.com/carlosinator/tabasco.

nan


Article 455

Title@2025-07-01 (2): An in depth look at the Procrustes-Wasserstein distance: properties and barycenters

Title: An in depth look at the Procrustes-Wasserstein distance: properties and barycenters Ein tiefer Blick auf die Entfernung von Procrustes-Wasserstein: Eigenschaften und Barycenter 深度查看Procrustes-Wasserstein 距离: 属性和采油器 2507.00894v1

Authors (4): Davide Adamo, Marco Corneli, Manon Vuillien, Emmanuelle Vila

Due to its invariance to rigid transformations such as rotations and reflections, Procrustes-Wasserstein (PW) was introduced in the literature as an optimal transport (OT) distance, alternative to Wasserstein and more suited to tasks such as the alignment and comparison of point clouds. Having that application in mind, we carefully build a space of discrete probability measures and show that over that space PW actually is a distance. Algorithms to solve the PW problems already exist, however we extend the PW framework by discussing and testing several initialization strategies. We then introduce the notion of PW barycenter and detail an algorithm to estimate it from the data. The result is a new method to compute representative shapes from a collection of point clouds. We benchmark our method against existing OT approaches, demonstrating superior performance in scenarios requiring precise alignment and shape preservation. We finally show the usefulness of the PW barycenters in an archaeological context. Our results highlight the potential of PW in boosting 2D and 3D point cloud analysis for machine learning and computational geometry applications.

nan


Article 456

Title@2025-07-01 (2): Lion Secretly Solves Constrained Optimization: As Lyapunov Predicts

Title: Lion Secretly Solves Constrained Optimization: As Lyapunov Predicts Lion Secretly Solves Constrained Optimization: Wie Lyapunov voraussagt 限制优化:如Lyapunov预测 2310.05898v7

Authors (4): Lizhang Chen, Bo Liu, Kaizhao Liang, Qiang Liu

Lion (Evolved Sign Momentum), a new optimizer discovered through program search, has shown promising results in training large AI models. It performs comparably or favorably to AdamW but with greater memory efficiency. As we can expect from the results of a random search program, Lion incorporates elements from several existing algorithms, including signed momentum, decoupled weight decay, Polak, and Nesterov momentum, but does not fit into any existing category of theoretically grounded optimizers. Thus, even though Lion appears to perform well as a general-purpose optimizer for a wide range of tasks, its theoretical basis remains uncertain. This lack of theoretical clarity limits opportunities to further enhance and expand Lion’s efficacy. This work aims to demystify Lion. Based on both continuous-time and discrete-time analysis, we demonstrate that Lion is a theoretically novel and principled approach for minimizing a general loss function $f(x)$ while enforcing a bound constraint $|x|_\infty \leq 1/\lambda$. Lion achieves this through the incorporation of decoupled weight decay, where $\lambda$ represents the weight decay coefficient. Our analysis is made possible by the development of a new Lyapunov function for the Lion updates. It applies to a broader family of Lion-$\kappa$ algorithms, where the $\text{sign}(\cdot)$ operator in Lion is replaced by the subgradient of a convex function $\kappa$, leading to the solution of a general composite optimization problem of $\min_x f(x) + \kappa^*(x)$. Our findings provide valuable insights into the dynamics of Lion and pave the way for further improvements and extensions of Lion-related algorithms.

nan


Article 457

Title@2025-07-01 (2): Scaling Laws Are Unreliable for Downstream Tasks: A Reality Check

Title: Scaling Laws Are Unreliable for Downstream Tasks: A Reality Check Skalierungsgesetze sind für Downstream-Aufgaben unzuverlässig: Ein Realitätscheck 增强法律对下流任务不可靠:一个现实检查 2507.00885v1

Authors (3): Nicholas Lourie, Michael Y. Hu, Kyunghyun Cho

Downstream scaling laws aim to predict task performance at larger scales from pretraining losses at smaller scales. Whether this prediction should be possible is unclear: some works demonstrate that task performance follows clear linear scaling trends under transformation, whereas others point out fundamental challenges to downstream scaling laws, such as emergence and inverse scaling. In this work, we conduct a meta-analysis of existing data on downstream scaling laws, finding that close fit to linear scaling laws only occurs in a minority of cases: 39% of the time. Furthermore, seemingly benign changes to the experimental setting can completely change the scaling trend. Our analysis underscores the need to understand the conditions under which scaling laws succeed. To fully model the relationship between pretraining loss and downstream task performance, we must embrace the cases in which scaling behavior deviates from linear trends.

nan


Article 458

Title@2025-07-01 (2): A Scalable and Quantum-Accurate Foundation Model for Biomolecular Force Field via Linearly Tensorized Quadrangle Attention

Title: A Scalable and Quantum-Accurate Foundation Model for Biomolecular Force Field via Linearly Tensorized Quadrangle Attention Ein skalierbares und Quantum-akkurates Fundamentalmodell für das biomolekulare Kraftfeld durch lineares Viereck Achtung 通过线性天分矩形注意进行生物分子力场的可缩缩和量子精确基础模型 2507.00884v1

Authors (14): Qun Su, Kai Zhu, Qiaolin Gou, Jintu Zhang, Renling Hu, Yurong Li, Yongze Wang, Hui Zhang, Ziyi You, Linlong Jiang, Yu Kang, Jike Wang, Chang-Yu Hsieh, Tingjun Hou

Accurate atomistic biomolecular simulations are vital for disease mechanism understanding, drug discovery, and biomaterial design, but existing simulation methods exhibit significant limitations. Classical force fields are efficient but lack accuracy for transition states and fine conformational details critical in many chemical and biological processes. Quantum Mechanics (QM) methods are highly accurate but computationally infeasible for large-scale or long-time simulations. AI-based force fields (AIFFs) aim to achieve QM-level accuracy with efficiency but struggle to balance many-body modeling complexity, accuracy, and speed, often constrained by limited training data and insufficient validation for generalizability. To overcome these challenges, we introduce LiTEN, a novel equivariant neural network with Tensorized Quadrangle Attention (TQA). TQA efficiently models three- and four-body interactions with linear complexity by reparameterizing high-order tensor features via vector operations, avoiding costly spherical harmonics. Building on LiTEN, LiTEN-FF is a robust AIFF foundation model, pre-trained on the extensive nablaDFT dataset for broad chemical generalization and fine-tuned on SPICE for accurate solvated system simulations. LiTEN achieves state-of-the-art (SOTA) performance across most evaluation subsets of rMD17, MD22, and Chignolin, outperforming leading models such as MACE, NequIP, and EquiFormer. LiTEN-FF enables the most comprehensive suite of downstream biomolecular modeling tasks to date, including QM-level conformer searches, geometry optimization, and free energy surface construction, while offering 10x faster inference than MACE-OFF for large biomolecules (~1000 atoms). In summary, we present a physically grounded, highly efficient framework that advances complex biomolecular modeling, providing a versatile foundation for drug discovery and related applications.

nan


Article 459

Title@2025-07-01 (2): NN-Former: Rethinking Graph Structure in Neural Architecture Representation

Title: NN-Former: Rethinking Graph Structure in Neural Architecture Representation NN-Former: Graphenstruktur in der Darstellung neuraler Architektur neu denken NN-前:神经结构代表结构的重新思考图结构 2507.00880v1

Authors (5): Ruihan Xu, Haokui Zhang, Yaowei Wang, Wei Zeng, Shiliang Zhang

The growing use of deep learning necessitates efficient network design and deployment, making neural predictors vital for estimating attributes such as accuracy and latency. Recently, Graph Neural Networks (GNNs) and transformers have shown promising performance in representing neural architectures. However, each of both methods has its disadvantages. GNNs lack the capabilities to represent complicated features, while transformers face poor generalization when the depth of architecture grows. To mitigate the above issues, we rethink neural architecture topology and show that sibling nodes are pivotal while overlooked in previous research. We thus propose a novel predictor leveraging the strengths of GNNs and transformers to learn the enhanced topology. We introduce a novel token mixer that considers siblings, and a new channel mixer named bidirectional graph isomorphism feed-forward network. Our approach consistently achieves promising performance in both accuracy and latency prediction, providing valuable insights for learning Directed Acyclic Graph (DAG) topology. The code is available at https://github.com/XuRuihan/NNFormer.

nan


Article 460

Title@2025-07-01 (2): Unsupervised Attributed Dynamic Network Embedding with Stability Guarantees

Title: Unsupervised Attributed Dynamic Network Embedding with Stability Guarantees Unüberwachtes zugeschriebenes dynamisches Netzwerk mit Stabilitätsgarantien einbinden 带有稳定保障的未受监督的属性动态网络 2503.02859v2

Authors (4): Emma Ceccherini, Ian Gallagher, Andrew Jones, Daniel Lawson

Stability for dynamic network embeddings ensures that nodes behaving the same at different times receive the same embedding, allowing comparison of nodes in the network across time. We present attributed unfolded adjacency spectral embedding (AUASE), a stable unsupervised representation learning framework for dynamic networks in which nodes are attributed with time-varying covariate information. To establish stability, we prove uniform convergence to an associated latent position model. We quantify the benefits of our dynamic embedding by comparing with state-of-the-art network representation learning methods on four real attributed networks. To the best of our knowledge, AUASE is the only attributed dynamic embedding that satisfies stability guarantees without the need for ground truth labels, which we demonstrate provides significant improvements for link prediction and node classification.

nan


Article 461

Title@2025-07-01 (2): Reasoning by Superposition: A Theoretical Perspective on Chain of Continuous Thought

Title: Reasoning by Superposition: A Theoretical Perspective on Chain of Continuous Thought Vernunft durch Überlagerung: Eine theoretische Perspektive auf Kette des kontinuierlichen Denkens 由超位推理推理:对持续思考链的理论观点 2505.12514v2

Authors (6): Hanlin Zhu, Shibo Hao, Zhiting Hu, Jiantao Jiao, Stuart Russell, Yuandong Tian

Large Language Models (LLMs) have demonstrated remarkable performance in many applications, including challenging reasoning problems via chain-of-thoughts (CoTs) techniques that generate ``thinking tokens’’ before answering the questions. While existing theoretical works demonstrate that CoTs with discrete tokens boost the capability of LLMs, recent work on continuous CoTs lacks a theoretical understanding of why it outperforms discrete counterparts in various reasoning tasks such as directed graph reachability, a fundamental graph reasoning problem that includes many practical domain applications as special cases. In this paper, we prove that a two-layer transformer with $D$ steps of continuous CoTs can solve the directed graph reachability problem, where $D$ is the diameter of the graph, while the best known result of constant-depth transformers with discrete CoTs requires $O(n^2)$ decoding steps where $n$ is the number of vertices ($D<n$). In our construction, each continuous thought vector is a superposition state that encodes multiple search frontiers simultaneously (i.e., parallel breadth-first search (BFS)), while discrete CoTs must choose a single path sampled from the superposition state, which leads to sequential search that requires many more steps and may be trapped into local solutions. We also performed extensive experiments to verify that our theoretical construction aligns well with the empirical solution obtained via training dynamics. Notably, encoding of multiple search frontiers as a superposition state automatically emerges in training continuous CoTs, without explicit supervision to guide the model to explore multiple paths simultaneously.

nan


Article 462

Title@2025-07-01 (2): Realism in Action: Anomaly-Aware Diagnosis of Brain Tumors from Medical Images Using YOLOv8 and DeiT

Title: Realism in Action: Anomaly-Aware Diagnosis of Brain Tumors from Medical Images Using YOLOv8 and DeiT Realismus in Aktion: Anomalie-Aware Diagnose von Hirntumoren aus medizinischen Bildern mit YOLOv8 und DeiT 现实在行动中:利用YOLOv8和DeiT对医疗图象中的脑肿瘤进行反常-警告诊断 2401.03302v4

Authors (4): Seyed Mohammad Hossein Hashemi, Leila Safari, Mohsen Hooshmand, Amirhossein Dadashzadeh Taromi

Reliable diagnosis of brain tumors remains challenging due to low clinical incidence rates of such cases. However, this low rate is neglected in most of proposed methods. We propose a clinically inspired framework for anomaly-resilient tumor detection and classification. Detection leverages YOLOv8n fine-tuned on a realistically imbalanced dataset (1:9 tumor-to-normal ratio; 30,000 MRI slices from 81 patients). In addition, we propose a novel Patient-to-Patient (PTP) metric that evaluates diagnostic reliability at the patient level. Classification employs knowledge distillation: a Data Efficient Image Transformer (DeiT) student model is distilled from a ResNet152 teacher. The distilled ViT achieves an F1-score of 0.92 within 20 epochs, matching near teacher performance (F1=0.97) with significantly reduced computational resources. This end-to-end framework demonstrates high robustness in clinically representative anomaly-distributed data, offering a viable tool that adheres to realistic situations in clinics.

nan


Article 463

Title@2025-07-01 (2): Template-Fitting Meets Deep Learning: Redshift Estimation Using Physics-Guided Neural Networks

Title: Template-Fitting Meets Deep Learning: Redshift Estimation Using Physics-Guided Neural Networks Template-Fitting trifft auf Deep Learning: Redshift-Schätzung mittels physikgeführter neuraler Netzwerke 使用物理辅助神经网络进行重新定时估计 2507.00866v1

Authors (7): Jonas Chris Ferrao, Dickson Dias, Pranav Naik, Glory D’Cruz, Anish Naik, Siya Khandeparkar, Manisha Gokuldas Fal Dessai

Accurate photometric redshift estimation is critical for observational cosmology, especially in large-scale surveys where spectroscopic measurements are impractical. Traditional approaches include template fitting and machine learning, each with distinct strengths and limitations. We present a hybrid method that integrates template fitting with deep learning using physics-guided neural networks. By embedding spectral energy distribution templates into the network architecture, our model encodes physical priors into the training process. The system employs a multimodal design, incorporating cross-attention mechanisms to fuse photometric and image data, along with Bayesian layers for uncertainty estimation. We evaluate our model on the publicly available PREML dataset, which includes approximately 400,000 galaxies from the Hyper Suprime-Cam PDR3 release, with 5-band photometry, multi-band imaging, and spectroscopic redshifts. Our approach achieves an RMS error of 0.0507, a 3-sigma catastrophic outlier rate of 0.13%, and a bias of 0.0028. The model satisfies two of the three LSST photometric redshift requirements for redshifts below 3. These results highlight the potential of combining physically motivated templates with data-driven models for robust redshift estimation in upcoming cosmological surveys.

nan


Article 464

Title@2025-07-01 (2): Machine Learning-based Early Detection of Potato Sprouting Using Electrophysiological Signals

Title: Machine Learning-based Early Detection of Potato Sprouting Using Electrophysiological Signals Machine Learning-based Early Detection of Potato Sprouting mit elektrophysiologischen Signalen 利用电生理信号早期探测马铃薯流 2507.00862v1

Authors (11): Davide Andreoletti, Aris Marcolongo, Natasa Sarafijanovic Djukic, Julien Roulet, Stefano Billeter, Andrzej Kurenda, Margot Visse-Mansiaux, Brice Dupuis, Carrol Annette Plummer, Beatrice Paoli, Omran Ayoub

Accurately predicting potato sprouting before the emergence of any visual signs is critical for effective storage management, as sprouting degrades both the commercial and nutritional value of tubers. Effective forecasting allows for the precise application of anti-sprouting chemicals (ASCs), minimizing waste and reducing costs. This need has become even more pressing following the ban on Isopropyl N-(3-chlorophenyl) carbamate (CIPC) or Chlorpropham due to health and environmental concerns, which has led to the adoption of significantly more expensive alternative ASCs. Existing approaches primarily rely on visual identification, which only detects sprouting after morphological changes have occurred, limiting their effectiveness for proactive management. A reliable early prediction method is therefore essential to enable timely intervention and improve the efficiency of post-harvest storage strategies, where early refers to detecting sprouting before any visible signs appear. In this work, we address the problem of early prediction of potato sprouting. To this end, we propose a novel machine learning (ML)-based approach that enables early prediction of potato sprouting using electrophysiological signals recorded from tubers using proprietary sensors. Our approach preprocesses the recorded signals, extracts relevant features from the wavelet domain, and trains supervised ML models for early sprouting detection. Additionally, we incorporate uncertainty quantification techniques to enhance predictions. Experimental results demonstrate promising performance in the early detection of potato sprouting by accurately predicting the exact day of sprouting for a subset of potatoes and while showing acceptable average error across all potatoes. Despite promising results, further refinements are necessary to minimize prediction errors, particularly in reducing the maximum observed deviations.

nan


Article 465

Title@2025-07-01 (2): Aligning Learning and Endogenous Decision-Making

Title: Aligning Learning and Endogenous Decision-Making Ausrichten des Lernens und endogene Entscheidungsfindung 协调学习和内部决策 2507.00851v1

Authors (4): Rares Cristian, Pavithra Harsha, Georgia Perakis, Brian Quanz

Many of the observations we make are biased by our decisions. For instance, the demand of items is impacted by the prices set, and online checkout choices are influenced by the assortments presented. The challenge in decision-making under this setting is the lack of counterfactual information, and the need to learn it instead. We introduce an end-to-end method under endogenous uncertainty to train ML models to be aware of their downstream, enabling their effective use in the decision-making stage. We further introduce a robust optimization variant that accounts for uncertainty in ML models – specifically by constructing uncertainty sets over the space of ML models and optimizing actions to protect against worst-case predictions. We prove guarantees that this robust approach can capture near-optimal decisions with high probability as a function of data. Besides this, we also introduce a new class of two-stage stochastic optimization problems to the end-to-end learning framework that can now be addressed through our framework. Here, the first stage is an information-gathering problem to decide which random variable to poll and gain information about before making a second-stage decision based off of it. We present several computational experiments for pricing and inventory assortment/recommendation problems. We compare against existing methods in online learning/bandits/offline reinforcement learning and show our approach has consistent improved performance over these. Just as in the endogenous setting, the model’s prediction also depends on the first-stage decision made. While this decision does not affect the random variable in this setting, it does affect the correct point forecast that should be made.

nan


Article 466

Title@2025-07-01 (2): Quantum Approximate Optimization Algorithm for Spatiotemporal Forecasting of HIV Clusters

Title: Quantum Approximate Optimization Algorithm for Spatiotemporal Forecasting of HIV Clusters Quanten Ungefähre Optimierungsalgorithmus für die räumliche zeitliche Vorhersage von HIV-Clustern 艾滋病毒群集分布时性预测的 量 量 近 优化优化值 2507.00848v1

Authors (5): Don Roosan, Saif Nirzhor, Rubayat Khan, Fahmida Hai, Mohammad Rifat Haidar

HIV epidemiological data is increasingly complex, requiring advanced computation for accurate cluster detection and forecasting. We employed quantum-accelerated machine learning to analyze HIV prevalence at the ZIP-code level using AIDSVu and synthetic SDoH data for 2022. Our approach compared classical clustering (DBSCAN, HDBSCAN) with a quantum approximate optimization algorithm (QAOA), developed a hybrid quantum-classical neural network for HIV prevalence forecasting, and used quantum Bayesian networks to explore causal links between SDoH factors and HIV incidence. The QAOA-based method achieved 92% accuracy in cluster detection within 1.6 seconds, outperforming classical algorithms. Meanwhile, the hybrid quantum-classical neural network predicted HIV prevalence with 94% accuracy, surpassing a purely classical counterpart. Quantum Bayesian analysis identified housing instability as a key driver of HIV cluster emergence and expansion, with stigma exerting a geographically variable influence. These quantum-enhanced methods deliver greater precision and efficiency in HIV surveillance while illuminating critical causal pathways. This work can guide targeted interventions, optimize resource allocation for PrEP, and address structural inequities fueling HIV transmission.

nan


Article 467

Title@2025-07-01 (2): Do Echo Top Heights Improve Deep Learning Nowcasts?

Title: Do Echo Top Heights Improve Deep Learning Nowcasts? Verbessern Echo Top Heights Deep Learning Nowcasts? Echo高层高地能改善深造即时播音吗? 2507.00845v1

Authors (4): Peter Pavlík, Marc Schleiss, Anna Bou Ezzeddine, Viera Rozinajová

Precipitation nowcasting – the short-term prediction of rainfall using recent radar observations – is critical for weather-sensitive sectors such as transportation, agriculture, and disaster mitigation. While recent deep learning models have shown promise in improving nowcasting skill, most approaches rely solely on 2D radar reflectivity fields, discarding valuable vertical information available in the full 3D radar volume. In this work, we explore the use of Echo Top Height (ETH), a 2D projection indicating the maximum altitude of radar reflectivity above a given threshold, as an auxiliary input variable for deep learning-based nowcasting. We examine the relationship between ETH and radar reflectivity, confirming its relevance for predicting rainfall intensity. We implement a single-pass 3D U-Net that processes both the radar reflectivity and ETH as separate input channels. While our models are able to leverage ETH to improve skill at low rain-rate thresholds, results are inconsistent at higher intensities and the models with ETH systematically underestimate precipitation intensity. Three case studies are used to illustrate how ETH can help in some cases, but also confuse the models and increase the error variance. Nonetheless, the study serves as a foundation for critically assessing the potential contribution of additional variables to nowcasting performance.

nan


Article 468

Title@2025-07-01 (2): Fully Differentiable Lagrangian Convolutional Neural Network for Physics-Informed Precipitation Nowcasting

Title: Fully Differentiable Lagrangian Convolutional Neural Network for Physics-Informed Precipitation Nowcasting Ganz unterschiedlich lagrangianisches Konvolutions-Neural-Netzwerk für physikinformierte Ausscheidungs-Nowcasting 完全区别的拉格朗日拉格朗日 物理化降水革命神经网络 2402.10747v2

Authors (4): Peter Pavlík, Martin Výboh, Anna Bou Ezzeddine, Viera Rozinajová

This paper presents a convolutional neural network model for precipitation nowcasting that combines data-driven learning with physics-informed domain knowledge. We propose LUPIN, a Lagrangian Double U-Net for Physics-Informed Nowcasting, that draws from existing extrapolation-based nowcasting methods. It consists of a U-Net that dynamically produces mesoscale advection motion fields, a differentiable semi-Lagrangian extrapolation operator, and an advection-free U-Net capturing the growth and decay of precipitation over time. Using our approach, we successfully implement the Lagrangian convolutional neural network for precipitation nowcasting in a fully differentiable and GPU-accelerated manner. This allows for end-to-end training and inference, including the data-driven Lagrangian coordinate system transformation of the data at runtime. We evaluate the model and compare it with other related AI-based models both quantitatively and qualitatively in an extreme event case study. Based on our evaluation, LUPIN matches and even exceeds the performance of the chosen benchmarks, opening the door for other Lagrangian machine learning models.

nan


Article 469

Title@2025-07-01 (2): Discrete Diffusion in Large Language and Multimodal Models: A Survey

Title: Discrete Diffusion in Large Language and Multimodal Models: A Survey Diskrete Diffusion in großen Sprach- und Multimodalmodellen: Eine Umfrage 大语言和多模式模型的分解传播:调查 2506.13759v2

Authors (3): Runpeng Yu, Qi Li, Xinchao Wang

In this work, we provide a systematic survey of Discrete Diffusion Language Models (dLLMs) and Discrete Diffusion Multimodal Language Models (dMLLMs). Unlike autoregressive (AR) models, dLLMs and dMLLMs adopt a multi-token, parallel decoding paradigm using full attention and a denoising-based generation strategy. This paradigm naturally enables parallel generation, fine-grained output controllability, and dynamic, response-aware perception. These capabilities are previously difficult to achieve with AR models. Recently, a growing number of industrial-scale proprietary d(M)LLMs, as well as a large number of open-source academic d(M)LLMs, have demonstrated performance comparable to their autoregressive counterparts, while achieving up to 10x acceleration in inference speed. The advancement of discrete diffusion LLMs and MLLMs has been largely driven by progress in two domains. The first is the development of autoregressive LLMs and MLLMs, which has accumulated vast amounts of data, benchmarks, and foundational infrastructure for training and inference. The second contributing domain is the evolution of the mathematical models underlying discrete diffusion. Together, these advancements have catalyzed a surge in dLLMs and dMLLMs research in early 2025. In this work, we present a comprehensive overview of the research in the dLLM and dMLLM domains. We trace the historical development of dLLMs and dMLLMs, formalize the underlying mathematical frameworks, and categorize representative models. We further analyze key techniques for training and inference, and summarize emerging applications across language, vision-language, and biological domains. We conclude by discussing future directions for research and deployment. Paper collection: https://github.com/LiQiiiii/DLLM-Survey

nan


Article 470

Title@2025-07-01 (2): Stylometry recognizes human and LLM-generated texts in short samples

Title: Stylometry recognizes human and LLM-generated texts in short samples Stylometrie erkennt menschliche und LLM-generierte Texte in kurzen Proben tytylometerm在短样本中确认人类和LLM产生的文本 2507.00838v1

Authors (4): Karol Przystalski, Jan K. Argasiński, Iwona Grabska-Gradzińska, Jeremi K. Ochab

The paper explores stylometry as a method to distinguish between texts created by Large Language Models (LLMs) and humans, addressing issues of model attribution, intellectual property, and ethical AI use. Stylometry has been used extensively to characterise the style and attribute authorship of texts. By applying it to LLM-generated texts, we identify their emergent writing patterns. The paper involves creating a benchmark dataset based on Wikipedia, with (a) human-written term summaries, (b) texts generated purely by LLMs (GPT-3.5/4, LLaMa 2/3, Orca, and Falcon), (c) processed through multiple text summarisation methods (T5, BART, Gensim, and Sumy), and (d) rephrasing methods (Dipper, T5). The 10-sentence long texts were classified by tree-based models (decision trees and LightGBM) using human-designed (StyloMetrix) and n-gram-based (our own pipeline) stylometric features that encode lexical, grammatical, syntactic, and punctuation patterns. The cross-validated results reached a performance of up to .87 Matthews correlation coefficient in the multiclass scenario with 7 classes, and accuracy between .79 and 1. in binary classification, with the particular example of Wikipedia and GPT-4 reaching up to .98 accuracy on a balanced dataset. Shapley Additive Explanations pinpointed features characteristic of the encyclopaedic text type, individual overused words, as well as a greater grammatical standardisation of LLMs with respect to human-written texts. These results show – crucially, in the context of the increasingly sophisticated LLMs – that it is possible to distinguish machine- from human-generated texts at least for a well-defined text type.

nan


Article 471

Title@2025-07-01 (2): Studying and Improving Graph Neural Network-based Motif Estimation

Title: Studying and Improving Graph Neural Network-based Motif Estimation Untersuchung und Verbesserung der graphischen Neuralnetz-basierten Motivationsschätzung 研究和改善图形神经网络基于Motif 估计 2506.15709v2

Authors (3): Pedro C. Vieira, Miguel E. P. Silva, Pedro Manuel Pinto Ribeiro

Graph Neural Networks (GNNs) are a predominant method for graph representation learning. However, beyond subgraph frequency estimation, their application to network motif significance-profile (SP) prediction remains under-explored, with no established benchmarks in the literature. We propose to address this problem, framing SP estimation as a task independent of subgraph frequency estimation. Our approach shifts from frequency counting to direct SP estimation and modulates the problem as multitarget regression. The reformulation is optimised for interpretability, stability and scalability on large graphs. We validate our method using a large synthetic dataset and further test it on real-world graphs. Our experiments reveal that 1-WL limited models struggle to make precise estimations of SPs. However, they can generalise to approximate the graph generation processes of networks by comparing their predicted SP with the ones originating from synthetic generators. This first study on GNN-based motif estimation also hints at how using direct SP estimation can help go past the theoretical limitations that motif estimation faces when performed through subgraph counting.

nan


Article 472

Title@2025-07-01 (2): yProv4ML: Effortless Provenance Tracking for Machine Learning Systems

Title: yProv4ML: Effortless Provenance Tracking for Machine Learning Systems yProv4ML: Müheloses Provenienz-Tracking für maschinelle Lernsysteme yProv4ML: 机器学习系统无穷无尽的证明跟踪 2507.01078v1

Authors (3): Gabriele Padovani, Valentine Anantharaj, Sandro Fiore

The rapid growth of interest in large language models (LLMs) reflects their potential for flexibility and generalization, and attracted the attention of a diverse range of researchers. However, the advent of these techniques has also brought to light the lack of transparency and rigor with which development is pursued. In particular, the inability to determine the number of epochs and other hyperparameters in advance presents challenges in identifying the best model. To address this challenge, machine learning frameworks such as MLFlow can automate the collection of this type of information. However, these tools capture data using proprietary formats and pose little attention to lineage. This paper proposes yProv4ML, a framework to capture provenance information generated during machine learning processes in PROV-JSON format, with minimal code modifications.

nan


Article 473

Title@2025-07-01 (2): Good Enough to Learn: LLM-based Anomaly Detection in ECU Logs without Reliable Labels

Title: Good Enough to Learn: LLM-based Anomaly Detection in ECU Logs without Reliable Labels Gut genug zu lernen: LLM-basierte Anomalieerkennung in ECU-Logs ohne zuverlässige Etiketten 足以学习:无可靠标签的ECU日志中基于LLM的反常检测 2507.01077v1

Authors (3): Bogdan Bogdan, Arina Cazacu, Laura Vasilie

Anomaly detection often relies on supervised or clustering approaches, with limited success in specialized domains like automotive communication systems where scalable solutions are essential. We propose a novel decoder-only Large Language Model (LLM) to detect anomalies in Electronic Control Unit (ECU) communication logs. Our approach addresses two key challenges: the lack of LLMs tailored for ECU communication and the complexity of inconsistent ground truth data. By learning from UDP communication logs, we formulate anomaly detection simply as identifying deviations in time from normal behavior. We introduce an entropy regularization technique that increases model’s uncertainty in known anomalies while maintaining consistency in similar scenarios. Our solution offers three novelties: a decoder-only anomaly detection architecture, a way to handle inconsistent labeling, and an adaptable LLM for different ECU communication use cases. By leveraging the generative capabilities of decoder-only models, we present a new technique that addresses the high cost and error-prone nature of manual labeling through a more scalable system that is able to learn from a minimal set of examples, while improving detection accuracy in complex communication environments.

nan


Article 474

Title@2025-07-01 (2): LangTime: A Language-Guided Unified Model for Time Series Forecasting with Proximal Policy Optimization

Title: LangTime: A Language-Guided Unified Model for Time Series Forecasting with Proximal Policy Optimization LangTime: Ein sprachgeführtes einheitliches Modell für die Vorhersage von Zeitreihen mit proximaler Politikoptimierung LangTime:一个语言指导统一模型,用于以最优政策优化方式预测时间序列 2503.08271v2

Authors (6): Wenzhe Niu, Zongxia Xie, Yanru Sun, Wei He, Man Xu, Chao Hao

Recent research has shown an increasing interest in utilizing pre-trained large language models (LLMs) for a variety of time series applications. However, there are three main challenges when using LLMs as foundational models for time series forecasting: (1) Cross-domain generalization. (2) Cross-modality alignment. (3) Error accumulation in autoregressive frameworks. To address these challenges, we proposed LangTime, a language-guided unified model for time series forecasting that incorporates cross-domain pre-training with reinforcement learning-based fine-tuning. Specifically, LangTime constructs Temporal Comprehension Prompts (TCPs), which include dataset-wise and channel-wise instructions, to facilitate domain adaptation and condense time series into a single token, enabling LLMs to understand better and align temporal data. To improve autoregressive forecasting, we introduce TimePPO, a reinforcement learning-based fine-tuning algorithm. TimePPO mitigates error accumulation by leveraging a multidimensional rewards function tailored for time series and a repeat-based value estimation strategy. Extensive experiments demonstrate that LangTime achieves state-of-the-art cross-domain forecasting performance, while TimePPO fine-tuning effectively enhances the stability and accuracy of autoregressive forecasting.

nan


Article 475

Title@2025-07-01 (2): On best approximation by multivariate ridge functions with applications to generalized translation networks

Title: On best approximation by multivariate ridge functions with applications to generalized translation networks Bei der besten Annäherung durch multivariate Ridge-Funktionen mit Anwendungen zu generalisierten Übersetzungsnetzwerken 多变量山脊功能与通用翻译网络应用的最佳近似率 2412.08453v3

Authors (4): Paul Geuchen, Palina Salanevich, Olov Schavemaker, Felix Voigtlaender

In this paper, we prove sharp upper and lower bounds for the approximation of Sobolev functions by sums of multivariate ridge functions, i.e., for approximation by functions of the form $\mathbb{R}^d \ni x \mapsto \sum_{k=1}^n \varrho_k(A_k x) \in \mathbb{R}$ with $\varrho_k : \mathbb{R}^\ell \to \mathbb{R}$ and $A_k \in \mathbb{R}^{\ell \times d}$. We show that the order of approximation asymptotically behaves as $n^{-r/(d-\ell)}$, where $r$ is the regularity (order of differentiability) of the Sobolev functions to be approximated. Our lower bound even holds when approximating $L^\infty$-Sobolev functions of regularity $r$ with error measured in $L^1$, while our upper bound applies to the approximation of $L^p$-Sobolev functions in $L^p$ for any $1 \leq p \leq \infty$. These bounds generalize well-known results regarding the approximation properties of univariate ridge functions to the multivariate case. We use our results to obtain sharp asymptotic bounds for the approximation of Sobolev functions using generalized translation networks and complex-valued neural networks.

nan


Article 476

Title: EvoPress: Accurate Dynamic Model Compression via Evolutionary Search EvoPress: Präzise dynamische Modellkompression durch evolutionäre Suche Evopress: 通过进化搜索准确的动态模型压缩 2410.14649v2

Authors (4): Oliver Sieberling, Denis Kuznedelev, Eldar Kurtic, Dan Alistarh

The high computational costs of large language models (LLMs) have led to a flurry of research on LLM compression, via methods such as quantization, sparsification, or structured pruning. A new frontier in this area is given by dynamic, non-uniform compression methods, which adjust the compression levels (e.g., sparsity) per-block or even per-layer in order to minimize accuracy loss, while guaranteeing a global compression threshold. Yet, current methods rely on estimating the importance of a given layer, implicitly assuming that layers contribute independently to the overall compression error. We begin from the motivating observation that this independence assumption does not generally hold for LLM compression: pruning a model further may even significantly recover performance. To address this, we propose EvoPress, a novel evolutionary framework for dynamic LLM compression. By formulating dynamic compression as a general optimization problem, EvoPress identifies optimal compression profiles in a highly efficient manner, and generalizes across diverse models and compression techniques. Via EvoPress, we achieve state-of-the-art performance for dynamic compression of Llama, Mistral, and Phi models, setting new benchmarks for structural pruning (block/layer dropping), unstructured sparsity, and quantization with dynamic bitwidths. Our code is available at https://github.com/IST-DASLab/EvoPress}.

nan


Article 477

Title@2025-07-01 (2): Provenance Tracking in Large-Scale Machine Learning Systems

Title: Provenance Tracking in Large-Scale Machine Learning Systems Provenienzverfolgung in großformatigen Maschinen-Lernsystemen 大型机器学习系统中的证书追踪系统 2507.01075v1

Authors (3): Gabriele Padovani, Valentine Anantharaj, Sandro Fiore

As the demand for large scale AI models continues to grow, the optimization of their training to balance computational efficiency, execution time, accuracy and energy consumption represents a critical multidimensional challenge. Achieving this balance requires not only innovative algorithmic techniques and hardware architectures but also comprehensive tools for monitoring, analyzing, and understanding the underlying processes involved in model training and deployment. Provenance data information about the origins, context, and transformations of data and processes has become a key component in this pursuit. By leveraging provenance, researchers and engineers can gain insights into resource usage patterns, identify inefficiencies, and ensure reproducibility and accountability in AI development workflows. For this reason, the question of how distributed resources can be optimally utilized to scale large AI models in an energy efficient manner is a fundamental one. To support this effort, we introduce the yProv4ML library, a tool designed to collect provenance data in JSON format, compliant with the W3C PROV and ProvML standards. yProv4ML focuses on flexibility and extensibility, and enables users to integrate additional data collection tools via plugins. The library is fully integrated with the yProv framework, allowing for higher level pairing in tasks run also through workflow management systems.

nan


Article 478

Title@2025-07-01 (2): A Probabilistic Approach to Wildfire Spread Prediction Using a Denoising Diffusion Surrogate Model

Title: A Probabilistic Approach to Wildfire Spread Prediction Using a Denoising Diffusion Surrogate Model Ein probabilistischer Ansatz zur Wildfire-Spread-Vorhersage mit einem Denoising Diffusion Surrogate-Modell 利用低度扩散扩散代用品模型对野火扩散预测采取概率办法 2507.00761v1

Authors (6): Wenbo Yu, Anirbit Ghosh, Tobias Sebastian Finn, Rossella Arcucci, Marc Bocquet, Sibo Cheng

Thanks to recent advances in generative AI, computers can now simulate realistic and complex natural processes. We apply this capability to predict how wildfires spread, a task made difficult by the unpredictable nature of fire and the variety of environmental conditions it depends on. In this study, We present the first denoising diffusion model for predicting wildfire spread, a new kind of AI framework that learns to simulate fires not just as one fixed outcome, but as a range of possible scenarios. By doing so, it accounts for the inherent uncertainty of wildfire dynamics, a feature that traditional models typically fail to represent. Unlike deterministic approaches that generate a single prediction, our model produces ensembles of forecasts that reflect physically meaningful distributions of where fire might go next. This technology could help us develop smarter, faster, and more reliable tools for anticipating wildfire behavior, aiding decision-makers in fire risk assessment and response planning.

nan


Article 479

Title@2025-07-01 (2): Chameleon: A MatMul-Free Temporal Convolutional Network Accelerator for End-to-End Few-Shot and Continual Learning from Sequential Data

Title: Chameleon: A MatMul-Free Temporal Convolutional Network Accelerator for End-to-End Few-Shot and Continual Learning from Sequential Data Chameleon: Ein matmul-freies Temporal Convolutional Network Accelerator für Ende-zu-Ende wenig scharfes und kontinuierliches Lernen aus sequentiellen Daten 变色素:从序列数据中连续到最后小热和连续学习的 Mat-Mul-无时空演变网络加速器 2505.24852v2

Authors (2): Douwe den Blanken, Charlotte Frenkel

On-device learning at the edge enables low-latency, private personalization with improved long-term robustness and reduced maintenance costs. Yet, achieving scalable, low-power end-to-end on-chip learning, especially from real-world sequential data with a limited number of examples, is an open challenge. Indeed, accelerators supporting error backpropagation optimize for learning performance at the expense of inference efficiency, while simplified learning algorithms often fail to reach acceptable accuracy targets. In this work, we present Chameleon, leveraging three key contributions to solve these challenges. (i) A unified learning and inference architecture supports few-shot learning (FSL), continual learning (CL) and inference at only 0.5% area overhead to the inference logic. (ii) Long temporal dependencies are efficiently captured with temporal convolutional networks (TCNs), enabling the first demonstration of end-to-end on-chip FSL and CL on sequential data and inference on 16-kHz raw audio. (iii) A dual-mode, matrix-multiplication-free compute array allows either matching the power consumption of state-of-the-art inference-only keyword spotting (KWS) accelerators or enabling $4.3\times$ higher peak GOPS. Fabricated in 40-nm CMOS, Chameleon sets new accuracy records on Omniglot for end-to-end on-chip FSL (96.8%, 5-way 1-shot, 98.8%, 5-way 5-shot) and CL (82.2% final accuracy for learning 250 classes with 10 shots), while maintaining an inference accuracy of 93.3% on the 12-class Google Speech Commands dataset at an extreme-edge power budget of 3.1 $\mu$W.

nan


Article 480

Title@2025-07-01 (2): SINDy on slow manifolds

Title: SINDy on slow manifolds SINDy auf langsamen Mannigfaltigkeiten 慢速方块的辛迪尼 2507.00747v1

Authors (4): Diemen Delgado-Cano, Erick Kracht, Urban Fasel, Benjamin Herrmann

The sparse identification of nonlinear dynamics (SINDy) has been established as an effective method to learn interpretable models of dynamical systems from data. However, for high-dimensional slow-fast dynamical systems, the regression problem becomes simultaneously computationally intractable and ill-conditioned. Although, in principle, modeling only the dynamics evolving on the underlying slow manifold addresses both of these challenges, the truncated fast variables have to be compensated by including higher-order nonlinearities as candidate terms for the model, leading to an explosive growth in the size of the SINDy library. In this work, we develop a SINDy variant that is able to robustly and efficiently identify slow-fast dynamics in two steps: (i) identify the slow manifold, that is, an algebraic equation for the fast variables as functions of the slow ones, and (ii) learn a model for the dynamics of the slow variables restricted to the manifold. Critically, the equation learned in (i) is leveraged to build a manifold-informed function library for (ii) that contains only essential higher-order nonlinearites as candidate terms. Rather than containing all monomials of up to a certain degree, the resulting custom library is a sparse subset of the latter that is tailored to the specific problem at hand. The approach is demonstrated on numerical examples of a snap-through buckling beam and the flow over a NACA 0012 airfoil. We find that our method significantly reduces both the condition number and the size of the SINDy library, thus enabling accurate identification of the dynamics on slow manifolds.

nan


Article 481

Title@2025-07-01 (2): Evaluating LLMs and Prompting Strategies for Automated Hardware Diagnosis from Textual User-Reports

Title: Evaluating LLMs and Prompting Strategies for Automated Hardware Diagnosis from Textual User-Reports Bewertung von LLMs und Prompting-Strategien für automatisierte Hardware-Diagnose aus Text-User-Reports 从文字用户报告中评价LLMs和自动硬件诊断快速战略 2507.00742v1

Authors (6): Carlos Caminha, Maria de Lourdes M. Silva, Iago C. Chaves, Felipe T. Brito, Victor A. E. Farias, Javam C. Machado

Computer manufacturers offer platforms for users to describe device faults using textual reports such as “My screen is flickering”. Identifying the faulty component from the report is essential for automating tests and improving user experience. However, such reports are often ambiguous and lack detail, making this task challenging. Large Language Models (LLMs) have shown promise in addressing such issues. This study evaluates 27 open-source models (1B-72B parameters) and 2 proprietary LLMs using four prompting strategies: Zero-Shot, Few-Shot, Chain-of-Thought (CoT), and CoT+Few-Shot (CoT+FS). We conducted 98,948 inferences, processing over 51 million input tokens and generating 13 million output tokens. We achieve f1-score up to 0.76. Results show that three models offer the best balance between size and performance: mistral-small-24b-instruct and two smaller models, llama-3.2-1b-instruct and gemma-2-2b-it, that offer competitive performance with lower VRAM usage, enabling efficient inference on end-user devices as modern laptops or smartphones with NPUs.

nan


Article 482

Title@2025-07-01 (2): Conditional Local Independence Testing for Dynamic Causal Discovery

Title: Conditional Local Independence Testing for Dynamic Causal Discovery Conditional Local Independence Testing für dynamische Kausal Entdeckung 动态原因发现有条件的局部独立测试 2506.07844v2

Authors (3): Mingzhou Liu, Xinwei Sun, Yizhou Wang

Inferring causal relationships from dynamical systems is the central interest of many scientific inquiries. Conditional Local Independence (CLI), which describes whether the evolution of one process is influenced by another process given additional processes, is important for causal learning in such systems. However, existing CLI tests were limited to counting processes. In this paper, we propose a nonparametric CLT test for It\^o processes. Specifically, we first introduce a testing statistic based on the Local Covariance Measure (LCM) by constructing a martingale from the conditional expectation of the process of interest. For estimation, we propose an efficient estimator based on the optimal filtering equation, which can achieve root-N consistency. To establish the asymptotic level and power of the test, we relax the restrictive boundedness condition to a moment bound condition, which is practical for It\^o processes. We verify the proposed test in synthetic and real-world experiments.

nan


Article 483

Title@2025-07-01 (2): Machine Learning-Based Analysis of ECG and PCG Signals for Rheumatic Heart Disease Detection: A Scoping Review (2015-2025)

Title: Machine Learning-Based Analysis of ECG and PCG Signals for Rheumatic Heart Disease Detection: A Scoping Review (2015-2025) Machine Learning-based Analysis of EKG and PCG Signals for Rheumatic Heart Disease Detection: A Scoping Review (2015-2025) 以机械学习为基础,对用于风湿心脏病检测的ECG和PCG信号进行机学分析:范围审查(2015-2025年) 2505.18182v2

Authors (9): Damilare Emmanuel Olatunji, Julius Dona Zannu, Carine Pierrette Mukamakuza, Godbright Nixon Uiso, Chol Buol, Mona Mamoun Mubarak Aman, John Bosco Thuo, Nchofon Tagha Ghogomu, Evelyne Umubyeyi

AI-powered stethoscopes offer a promising alternative for screening rheumatic heart disease (RHD), particularly in regions with limited diagnostic infrastructure. Early detection is vital, yet echocardiography, the gold standard tool, remains largely inaccessible in low-resource settings due to cost and workforce constraints. This review systematically examines machine learning (ML) applications from 2015 to 2025 that analyze electrocardiogram (ECG) and phonocardiogram (PCG) data to support accessible, scalable screening of all RHD variants in relation to the World Heart Federation’s “25 by 25” goal to reduce RHD mortality. Using PRISMA-ScR guidelines, 37 peer-reviewed studies were selected from PubMed, IEEE Xplore, Scopus, and Embase. Convolutional neural networks (CNNs) dominate recent efforts, achieving a median accuracy of 97.75%, F1-score of 0.95, and AUROC of 0.89. However, challenges remain: 73% of studies used single-center datasets, 81.1% relied on private data, only 10.8% were externally validated, and none assessed cost-effectiveness. Although 45.9% originated from endemic regions, few addressed demographic diversity or implementation feasibility. These gaps underscore the disconnect between model performance and clinical readiness. Bridging this divide requires standardized benchmark datasets, prospective trials in endemic areas, and broader validation. If these issues are addressed, AI-augmented auscultation could transform cardiovascular diagnostics in underserved populations, thereby aiding early detection. This review also offers practical recommendations for building accessible ML-based RHD screening tools, aiming to close the diagnostic gap in low-resource settings where conventional auscultation may miss up to 90% of cases and echocardiography remains out of reach.

nan


Article 484

Title@2025-07-01 (2): Gradient Descent Algorithm in Hilbert Spaces under Stationary Markov Chains with $φ$- and $β$-Mixing

Title: Gradient Descent Algorithm in Hilbert Spaces under Stationary Markov Chains with $φ$- and $β$-Mixing Gradient Descent Algorithm in Hilbert Spaces unter Stationären Markov Ketten mit $φ$- und $β$-Mixing 希尔伯特空域中固定的马克夫链下的梯度底源对数值,以美元和 美元混合法混合法计算 2502.03551v2

Authors (2): Priyanka Roy, Susanne Saminger-Platz

In this paper, we study a strictly stationary Markov chain gradient descent algorithm operating in general Hilbert spaces. Our analysis focuses on the mixing coefficients of the underlying process, specifically the $\phi$- and $\beta$-mixing coefficients. Under these assumptions, we derive probabilistic upper bounds on the convergence behavior of the algorithm based on the exponential as well as the polynomial decay of the mixing coefficients.

nan


Article 485

Title@2025-07-01 (2): Towards Efficient Parametric State Estimation in Circulating Fuel Reactors with Shallow Recurrent Decoder Networks

Title: Towards Efficient Parametric State Estimation in Circulating Fuel Reactors with Shallow Recurrent Decoder Networks Auf dem Weg zu einer effizienten parametrischen staatlichen Schätzung bei Brennstoffreaktoren mit shallow recurrent Decoder-Netzwerken 利用浅层经常拆解网络对循环燃料反应堆进行高效参数国家估算 2503.08904v2

Authors (4): Stefano Riva, Carolina Introini, J. Nathan Kutz, Antonio Cammi

The recent developments in data-driven methods have paved the way to new methodologies to provide accurate state reconstruction of engineering systems; nuclear reactors represent particularly challenging applications for this task due to the complexity of the strongly coupled physics involved and the extremely harsh and hostile environments, especially for new technologies such as Generation-IV reactors. Data-driven techniques can combine different sources of information, including computational proxy models and local noisy measurements on the system, to robustly estimate the state. This work leverages the novel Shallow Recurrent Decoder architecture to infer the entire state vector (including neutron fluxes, precursors concentrations, temperature, pressure and velocity) of a reactor from three out-of-core time-series neutron flux measurements alone. In particular, this work extends the standard architecture to treat parametric time-series data, ensuring the possibility of investigating different accidental scenarios and showing the capabilities of this approach to provide an accurate state estimation in various operating conditions. This paper considers as a test case the Molten Salt Fast Reactor (MSFR), a Generation-IV reactor concept, characterised by strong coupling between the neutronics and the thermal hydraulics due to the liquid nature of the fuel. The promising results of this work are further strengthened by the possibility of quantifying the uncertainty associated with the state estimation, due to the considerably low training cost. The accurate reconstruction of every characteristic field in real-time makes this approach suitable for monitoring and control purposes in the framework of a reactor digital twin.

nan


Article 486

Title@2025-07-01 (2): Aleatoric and Epistemic Uncertainty Measures for Ordinal Classification through Binary Reduction

Title: Aleatoric and Epistemic Uncertainty Measures for Ordinal Classification through Binary Reduction Aleatorische und epistemische Unsicherheitsmaßnahmen für die Ordnungsklassifizierung durch Binärreduktion 通过减少二进制分级对奥氏分类采取疏离和分导不确定性措施 2507.00733v1

Authors (2): Stefan Haas, Eyke Hüllermeier

Ordinal classification problems, where labels exhibit a natural order, are prevalent in high-stakes fields such as medicine and finance. Accurate uncertainty quantification, including the decomposition into aleatoric (inherent variability) and epistemic (lack of knowledge) components, is crucial for reliable decision-making. However, existing research has primarily focused on nominal classification and regression. In this paper, we introduce a novel class of measures of aleatoric and epistemic uncertainty in ordinal classification, which is based on a suitable reduction to (entropy- and variance-based) measures for the binary case. These measures effectively capture the trade-off in ordinal classification between exact hit-rate and minimial error distances. We demonstrate the effectiveness of our approach on various tabular ordinal benchmark datasets using ensembles of gradient-boosted trees and multi-layer perceptrons for approximate Bayesian inference. Our method significantly outperforms standard and label-wise entropy and variance-based measures in error detection, as indicated by misclassification rates and mean absolute error. Additionally, the ordinal measures show competitive performance in out-of-distribution (OOD) detection. Our findings highlight the importance of considering the ordinal nature of classification problems when assessing uncertainty.

nan


Article 487

Title@2025-07-01 (2): Guided Unconditional and Conditional Generative Models for Super-Resolution and Inference of Quasi-Geostrophic Turbulence

Title: Guided Unconditional and Conditional Generative Models for Super-Resolution and Inference of Quasi-Geostrophic Turbulence Geführte und bedingte Generative Modelle für Super-Resolution und Schlussfolgerung der Quasi-Geostrophen Turbulenz 指导关于超分辨率和推论的无附加条件和有条件生成模型, 2507.00719v1

Authors (3): Anantha Narayanan Suresh Babu, Akhil Sadam, Pierre F. J. Lermusiaux

Typically, numerical simulations of the ocean, weather, and climate are coarse, and observations are sparse and gappy. In this work, we apply four generative diffusion modeling approaches to super-resolution and inference of forced two-dimensional quasi-geostrophic turbulence on the beta-plane from coarse, sparse, and gappy observations. Two guided approaches minimally adapt a pre-trained unconditional model: SDEdit modifies the initial condition, and Diffusion Posterior Sampling (DPS) modifies the reverse diffusion process score. The other two conditional approaches, a vanilla variant and classifier-free guidance, require training with paired high-resolution and observation data. We consider eight test cases spanning: two regimes, eddy and anisotropic-jet turbulence; two Reynolds numbers, 10^3 and 10^4; and two observation types, 4x coarse-resolution fields and coarse, sparse and gappy observations. Our comprehensive skill metrics include norms of the reconstructed vorticity fields, turbulence statistical quantities, and quantification of the super-resolved probabilistic ensembles and their errors. We also study the sensitivity to tuning parameters such as guidance strength. Results show that SDEdit generates unphysical fields, while DPS generates reasonable reconstructions at low computational cost but with smoothed fine-scale features. Both conditional approaches require re-training, but they reconstruct missing fine-scale features, are cycle-consistent with observations, and possess the correct statistics such as energy spectra. Further, their mean model errors are highly correlated with and predictable from their ensemble standard deviations. Results highlight the trade-offs between ease of implementation, fidelity (sharpness), and cycle-consistency of the diffusion models, and offer practical guidance for deployment in geophysical inverse problems.

nan


Article 488

Title@2025-07-01 (2): Leveraging Nested MLMC for Sequential Neural Posterior Estimation with Intractable Likelihoods

Title: Leveraging Nested MLMC for Sequential Neural Posterior Estimation with Intractable Likelihoods Leveraging Nested MLMC für sequentielle neurale hintere Abschätzung mit intractable Likelihoods 利用内嵌的 MLMC 进行序列神经波的序列神经波、与可吸引的隐隐性进行的其他估计 2401.16776v2

Authors (3): Xiliang Yang, Yifei Xiong, Zhijian He

There has been a growing interest in studying sequential neural posterior estimation (SNPE) techniques for their advantages in dealing with simulation-based models with intractable likelihoods. They are devoted to learning the posterior from adaptively proposed simulations using neural network-based conditional density estimators. As a SNPE technique, the automatic posterior transformation (APT) method proposed by Greenberg et al. (2019) performs notably and scales to high dimensional data. However, the APT method bears the computation of an expectation of the logarithm of an intractable normalizing constant, i.e., a nested expectation. Although atomic APT was proposed to solve this by discretizing the normalizing constant, it remains challenging to analyze the convergence of learning. In this paper, we propose a nested APT method to estimate the involved nested expectation instead. This facilitates establishing the convergence analysis. Since the nested estimators for the loss function and its gradient are biased, we make use of unbiased multi-level Monte Carlo (MLMC) estimators for debiasing. To further reduce the excessive variance of the unbiased estimators, this paper also develops some truncated MLMC estimators by taking account of the trade-off between the bias and the average cost. Numerical experiments for approximating complex posteriors with multimodal in moderate dimensions are provided.

nan


Article 489

Title@2025-07-01 (2): Large Reasoning Models are not thinking straight: on the unreliability of thinking trajectories

Title: Large Reasoning Models are not thinking straight: on the unreliability of thinking trajectories Große Vernunftmodelle denken nicht gerade: über die Unzuverlässigkeit von Denkbahnen 大型理性模型思维不直:思考轨迹不可靠 2507.00711v1

Authors (3): Jhouben Cuesta-Ramirez, Samuel Beaussant, Mehdi Mounsif

Large Language Models (LLMs) trained via Reinforcement Learning (RL) have recently achieved impressive results on reasoning benchmarks. Yet, growing evidence shows that these models often generate longer but ineffective chains of thought (CoTs), calling into question whether benchmark gains reflect real reasoning improvements. We present new evidence of overthinking, where models disregard correct solutions even when explicitly provided, instead continuing to generate unnecessary reasoning steps that often lead to incorrect conclusions. Experiments on three state-of-the-art models using the AIME2024 math benchmark reveal critical limitations in these models ability to integrate corrective information, posing new challenges for achieving robust and interpretable reasoning.

nan


Article 490

Title@2025-07-01 (2): Sliding Puzzles Gym: A Scalable Benchmark for State Representation in Visual Reinforcement Learning

Title: Sliding Puzzles Gym: A Scalable Benchmark for State Representation in Visual Reinforcement Learning Sliding Puzzles Gym: Ein skalierbarer Benchmark für die staatliche Repräsentation im visuellen Stärkungslernen Sliding 谜题 Gym: 国家参与视觉强化学习的可缩放基准 2410.14038v4

Authors (6): Bryan L. M. de Oliveira, Luana G. B. Martins, Bruno Brandão, Murilo L. da Luz, Telma W. de L. Soares, Luckeciano C. Melo

Effective visual representation learning is crucial for reinforcement learning (RL) agents to extract task-relevant information from raw sensory inputs and generalize across diverse environments. However, existing RL benchmarks lack the ability to systematically evaluate representation learning capabilities in isolation from other learning challenges. To address this gap, we introduce the Sliding Puzzles Gym (SPGym), a novel benchmark that transforms the classic 8-tile puzzle into a visual RL task with images drawn from arbitrarily large datasets. SPGym’s key innovation lies in its ability to precisely control representation learning complexity through adjustable grid sizes and image pools, while maintaining fixed environment dynamics, observation, and action spaces. This design enables researchers to isolate and scale the visual representation challenge independently of other learning components. Through extensive experiments with model-free and model-based RL algorithms, we uncover fundamental limitations in current methods’ ability to handle visual diversity. As we increase the pool of possible images, all algorithms exhibit in- and out-of-distribution performance degradation, with sophisticated representation learning techniques often underperforming simpler approaches like data augmentation. These findings highlight critical gaps in visual representation learning for RL and establish SPGym as a valuable tool for driving progress in robust, generalizable decision-making systems.

nan


Article 491

Title@2025-07-01 (2): SCAWaveNet: A Spatial-Channel Attention-based Network for Global Significant Wave Height Retrieval

Title: SCAWaveNet: A Spatial-Channel Attention-based Network for Global Significant Wave Height Retrieval SCAWaveNet: Ein räumlich-channelbasiertes Netzwerk für die weltweite Erfassung signifikanter Wellenhöhen SCAWaveNet:全球重大波高高度回收空间通道关注网 2507.00701v1

Authors (5): Chong Zhang, Xichao Liu, Yibing Zhan, Dapeng Tao, Jun Ni

Recent advancements in spaceborne GNSS missions have produced extensive global datasets, providing a robust basis for deep learning-based significant wave height (SWH) retrieval. While existing deep learning models predominantly utilize CYGNSS data with four-channel information, they often adopt single-channel inputs or simple channel concatenation without leveraging the benefits of cross-channel information interaction during training. To address this limitation, a novel spatial-channel attention-based network, namely SCAWaveNet, is proposed for SWH retrieval. Specifically, features from each channel of the DDMs are modeled as independent attention heads, enabling the fusion of spatial and channel-wise information. For auxiliary parameters, a lightweight attention mechanism is designed to assign weights along the spatial and channel dimensions. The final feature integrates both spatial and channel-level characteristics. Model performance is evaluated using four-channel CYGNSS data. When ERA5 is used as a reference, SCAWaveNet achieves an average RMSE of 0.438 m. When using buoy data from NDBC, the average RMSE reaches 0.432 m. Compared to state-of-the-art models, SCAWaveNet reduces the average RMSE by at least 3.52% on the ERA5 dataset and by 5.47% on the NDBC buoy observations. The code is available at https://github.com/Clifx9908/SCAWaveNet.

nan


Article 492

Title@2025-07-01 (2): T2I-R1: Reinforcing Image Generation with Collaborative Semantic-level and Token-level CoT

Title: T2I-R1: Reinforcing Image Generation with Collaborative Semantic-level and Token-level CoT T2I-R1: Verstärkung der Bildgenerierung mit kollaborativem Semantik- und Token-Level CoT T2I-R1:与合作语义级和Token 级COT加强图像生成 2505.00703v2

Authors (9): Dongzhi Jiang, Ziyu Guo, Renrui Zhang, Zhuofan Zong, Hao Li, Le Zhuo, Shilin Yan, Pheng-Ann Heng, Hongsheng Li

Recent advancements in large language models have demonstrated how chain-of-thought (CoT) and reinforcement learning (RL) can improve performance. However, applying such reasoning strategies to the visual generation domain remains largely unexplored. In this paper, we present T2I-R1, a novel reasoning-enhanced text-to-image generation model, powered by RL with a bi-level CoT reasoning process. Specifically, we identify two levels of CoT that can be utilized to enhance different stages of generation: (1) the semantic-level CoT for high-level planning of the prompt and (2) the token-level CoT for low-level pixel processing during patch-by-patch generation. To better coordinate these two levels of CoT, we introduce BiCoT-GRPO with an ensemble of generation rewards, which seamlessly optimizes both generation CoTs within the same training step. By applying our reasoning strategies to the baseline model, Janus-Pro, we achieve superior performance with 13% improvement on T2I-CompBench and 19% improvement on the WISE benchmark, even surpassing the state-of-the-art model FLUX.1. Code is available at: https://github.com/CaraJ7/T2I-R1

nan


Article 493

Title@2025-07-01 (2): A Test-Function Approach to Incremental Stability

Title: A Test-Function Approach to Incremental Stability Ein Test-Funktions-Ansatz zur Inkrementellen Stabilität 递增稳定检验功能办法 2507.00695v1

Authors (3): Daniel Pfrommer, Max Simchowitz, Ali Jadbabaie

This paper presents a novel framework for analyzing Incremental-Input-to-State Stability ($\delta$ISS) based on the idea of using rewards as “test functions.” Whereas control theory traditionally deals with Lyapunov functions that satisfy a time-decrease condition, reinforcement learning (RL) value functions are constructed by exponentially decaying a Lipschitz reward function that may be non-smooth and unbounded on both sides. Thus, these RL-style value functions cannot be directly understood as Lyapunov certificates. We develop a new equivalence between a variant of incremental input-to-state stability of a closed-loop system under given a policy, and the regularity of RL-style value functions under adversarial selection of a H"older-continuous reward function. This result highlights that the regularity of value functions, and their connection to incremental stability, can be understood in a way that is distinct from the traditional Lyapunov-based approach to certifying stability in control theory.

nan


Article 494

Title@2025-07-01 (2): Iterative Resolution of Prompt Ambiguities Using a Progressive Cutting-Search Approach

Title: Iterative Resolution of Prompt Ambiguities Using a Progressive Cutting-Search Approach Iterative Auflösung von Prompt-Ambiguitäten mittels eines progressiven Cutting-Search-Ansatzes 采用逐步切割和搜寻办法迅速解决问题 2505.02952v2

Authors (1): Fabrizio Marozzo

Generative AI systems have revolutionized human interaction by enabling natural language-based coding and problem solving. However, the inherent ambiguity of natural language often leads to imprecise instructions, forcing users to iteratively test, correct, and resubmit their prompts. We propose an iterative approach that systematically narrows down these ambiguities through a structured series of clarification questions and alternative solution proposals, illustrated with input/output examples as well. Once every uncertainty is resolved, a final, precise solution is generated. Evaluated on a diverse dataset spanning coding, data analysis, and creative writing, our method demonstrates superior accuracy, competitive resolution times, and higher user satisfaction compared to conventional one-shot solutions, which typically require multiple manual iterations to achieve a correct output.

nan


Article 495

Title@2025-07-01 (2): Diffusion Classifier Guidance for Non-robust Classifiers

Title: Diffusion Classifier Guidance for Non-robust Classifiers Diffusions-Klassifikator-Leitfaden für nichtrobuste Klassifikatoren 非紫外分类的传播分类指南 2507.00687v1

Authors (4): Philipp Vaeth, Dibyanshu Kumar, Benjamin Paassen, Magda Gregorová

Classifier guidance is intended to steer a diffusion process such that a given classifier reliably recognizes the generated data point as a certain class. However, most classifier guidance approaches are restricted to robust classifiers, which were specifically trained on the noise of the diffusion forward process. We extend classifier guidance to work with general, non-robust, classifiers that were trained without noise. We analyze the sensitivity of both non-robust and robust classifiers to noise of the diffusion process on the standard CelebA data set, the specialized SportBalls data set and the high-dimensional real-world CelebA-HQ data set. Our findings reveal that non-robust classifiers exhibit significant accuracy degradation under noisy conditions, leading to unstable guidance gradients. To mitigate these issues, we propose a method that utilizes one-step denoised image predictions and implements stabilization techniques inspired by stochastic optimization methods, such as exponential moving averages. Experimental results demonstrate that our approach improves the stability of classifier guidance while maintaining sample diversity and visual quality. This work contributes to advancing conditional sampling techniques in generative models, enabling a broader range of classifiers to be used as guidance classifiers.

nan


Article 496

Title@2025-07-01 (2): Testing the spin-bath view of self-attention: A Hamiltonian analysis of GPT-2 Transformer

Title: Testing the spin-bath view of self-attention: A Hamiltonian analysis of GPT-2 Transformer Testen der Spin-Bad-Ansicht der Selbstachtung: Eine Hamiltonian Analyse von GPT-2 Transformer 测试自觉自觉的自吹泡泡视图:汉密尔顿对GPT-2变形器的分析 2507.00683v1

Authors (2): Satadeep Bhattacharjee, Seung-Cheol Lee

The recently proposed physics-based framework by Huo and Johnson~\cite{huo2024capturing} models the attention mechanism of Large Language Models (LLMs) as an interacting two-body spin system, offering a first-principles explanation for phenomena like repetition and bias. Building on this hypothesis, we extract the complete Query-Key weight matrices from a production-grade GPT-2 model and derive the corresponding effective Hamiltonian for every attention head. From these Hamiltonians we obtain analytic \textit{phase boundaries} logit gap criteria that predict which token should dominate the next-token distribution for a given context. A systematic evaluation on 144 heads across 20 factual-recall prompts reveals a strong negative correlation between the theoretical logit gaps and the model’s empirical token rankings ($r\approx-0.70$, $p<10^{-3}$).Targeted ablations further show that suppressing the heads most aligned with the spin-bath predictions induces the anticipated shifts in output probabilities, confirming a causal link rather than a coincidental association. Taken together, our findings provide the first strong empirical evidence for the spin-bath analogy in a production-grade model. This validation not only furnishes a tractable, physics-inspired lens for interpretability but also provides the groundwork for novel generative models, bridging the gap between theoretical condensed matter physics and AI.

nan


Article 497

Title@2025-07-01 (2): Learning dynamical systems from data: Gradient-based dictionary optimization

Title: Learning dynamical systems from data: Gradient-based dictionary optimization Dynamische Systeme aus Daten lernen: Gradient-basierte Wörterbuchoptimierung 从数据中学习动态系统:基于梯度的字典优化 2411.04775v2

Authors (3): Mohammad Tabish, Neil K. Chada, Stefan Klus

The Koopman operator plays a crucial role in analyzing the global behavior of dynamical systems. Existing data-driven methods for approximating the Koopman operator or discovering the governing equations of the underlying system typically require a fixed set of basis functions, also called dictionary. The optimal choice of basis functions is highly problem-dependent and often requires domain knowledge. We present a novel gradient descent-based optimization framework for learning suitable and interpretable basis functions from data and show how it can be used in combination with EDMD, SINDy, and PDE-FIND. We illustrate the efficacy of the proposed approach with the aid of various benchmark problems such as the Ornstein-Uhlenbeck process, Chua’s circuit, a nonlinear heat equation, as well as protein-folding data.

nan


Article 498

Title@2025-07-01 (2): Harnessing the Power of Reinforcement Learning for Adaptive MCMC

Title: Harnessing the Power of Reinforcement Learning for Adaptive MCMC Die Kraft des Stärkungslernens für adaptive MCMC nutzen 利用加强学习能力促进适应性MCMC 2507.00671v1

Authors (5): Congye Wang, Matthew A. Fisher, Heishiro Kanagawa, Wilson Chen, Chris. J. Oates

Sampling algorithms drive probabilistic machine learning, and recent years have seen an explosion in the diversity of tools for this task. However, the increasing sophistication of sampling algorithms is correlated with an increase in the tuning burden. There is now a greater need than ever to treat the tuning of samplers as a learning task in its own right. In a conceptual breakthrough, Wang et al (2025) formulated Metropolis-Hastings as a Markov decision process, opening up the possibility for adaptive tuning using Reinforcement Learning (RL). Their emphasis was on theoretical foundations; realising the practical benefit of Reinforcement Learning Metropolis-Hastings (RLMH) was left for subsequent work. The purpose of this paper is twofold: First, we observe the surprising result that natural choices of reward, such as the acceptance rate, or the expected squared jump distance, provide insufficient signal for training RLMH. Instead, we propose a novel reward based on the contrastive divergence, whose superior performance in the context of RLMH is demonstrated. Second, we explore the potential of RLMH and present adaptive gradient-based samplers that balance flexibility of the Markov transition kernel with learnability of the associated RL task. A comprehensive simulation study using the posteriordb benchmark supports the practical effectiveness of RLMH.

nan


Article 499

Title@2025-07-01 (2): The Number of Trials Matters in Infinite-Horizon General-Utility Markov Decision Processes

Title: The Number of Trials Matters in Infinite-Horizon General-Utility Markov Decision Processes Die Zahl der Prüfungen in Unendlichen-Horizon allgemeinen-Utility Markov Entscheidungsprozesse 无限期 – – 霍里松通用通用马尔科夫决定程序中的审理事项数目 2409.15128v2

Authors (3): Pedro P. Santos, Alberto Sardinha, Francisco S. Melo

The general-utility Markov decision processes (GUMDPs) framework generalizes the MDPs framework by considering objective functions that depend on the frequency of visitation of state-action pairs induced by a given policy. In this work, we contribute with the first analysis on the impact of the number of trials, i.e., the number of randomly sampled trajectories, in infinite-horizon GUMDPs. We show that, as opposed to standard MDPs, the number of trials plays a key-role in infinite-horizon GUMDPs and the expected performance of a given policy depends, in general, on the number of trials. We consider both discounted and average GUMDPs, where the objective function depends, respectively, on discounted and average frequencies of visitation of state-action pairs. First, we study policy evaluation under discounted GUMDPs, proving lower and upper bounds on the mismatch between the finite and infinite trials formulations for GUMDPs. Second, we address average GUMDPs, studying how different classes of GUMDPs impact the mismatch between the finite and infinite trials formulations. Third, we provide a set of empirical results to support our claims, highlighting how the number of trajectories and the structure of the underlying GUMDP influence policy evaluation.

nan


Article 500

Title@2025-07-01 (2): Continuous Wavelet Transform and Siamese Network-Based Anomaly Detection in Multi-variate Semiconductor Process Time Series

Title: Continuous Wavelet Transform and Siamese Network-Based Anomaly Detection in Multi-variate Semiconductor Process Time Series Kontinuierliche Wavelet-Transformation und siamesische Netzwerk-basierte Anomalie-Erkennung in Multivariate Semiconductor Prozesszeitreihe 多变半导体工艺时间序列中的连续波盘变换和暹粒网络反异探测 2507.01999v1

Authors (4): Bappaditya Dey, Daniel Sorensen, Minjin Hwang, Sandip Halder

Semiconductor manufacturing is an extremely complex process, characterized by thousands of interdependent parameters collected across diverse tools and process steps. Multi-variate time-series (MTS) analysis has emerged as a critical methodology for enabling real-time monitoring, fault detection, and predictive maintenance in such environments. However, anomaly prediction in semiconductor fabrication presents several critical challenges, including high data dimensionality, severe class imbalance due to the rarity of true faults, noisy and missing measurements, and non-stationary behavior of production systems. Furthermore, the complex interdependencies between variables and the delayed emergence of faults across downstream stages complicate both anomaly detection and root-cause-analysis. This paper presents a novel and generic approach for anomaly detection in MTS data using machine learning. The proposed methodology consists of three main steps: a) converting MTS data into image-based representations using the Continuous Wavelet Transform, b) developing a multi-class image classifier by fine-tuning a pretrained VGG-16 architecture on custom CWT image datasets, and c) constructing a Siamese network composed of two identical sub-networks, each utilizing the fine-tuned VGG-16 as a backbone. The network takes pairs of CWT images as input -one serving as a reference or anchor (representing a known-good signal), and the other as a query (representing an unknown signal). The model then compares the embeddings of both inputs to determine whether they belong to the same class at a given time step. Our approach demonstrates high accuracy in identifying anomalies on a real FAB process time-series dataset, offering a promising solution for offline anomaly detection in process and tool trace data. Moreover, the approach is flexible and can be applied in both supervised and semi-supervised settings.

nan


Article 501

Title@2025-07-01 (2): Audio-3DVG: Unified Audio - Point Cloud Fusion for 3D Visual Grounding

Title: Audio-3DVG: Unified Audio - Point Cloud Fusion for 3D Visual Grounding Audio-3DVG: Unified Audio - Point Cloud Fusion für 3D Visual Grounding 音频-3DVG:3D视觉地面定位的统一音频 - 点云融合 2507.00669v1

Authors (8): Duc Cao-Dinh, Khai Le-Duc, Anh Dao, Bach Phan Tat, Chris Ngo, Duy M. H. Nguyen, Nguyen X. Khanh, Thanh Nguyen-Tang

3D Visual Grounding (3DVG) involves localizing target objects in 3D point clouds based on natural language. While prior work has made strides using textual descriptions, leveraging spoken language-known as Audio-based 3D Visual Grounding-remains underexplored and challenging. Motivated by advances in automatic speech recognition (ASR) and speech representation learning, we propose Audio-3DVG, a simple yet effective framework that integrates audio and spatial information for enhanced grounding. Rather than treating speech as a monolithic input, we decompose the task into two complementary components. First, we introduce Object Mention Detection, a multi-label classification task that explicitly identifies which objects are referred to in the audio, enabling more structured audio-scene reasoning. Second, we propose an Audio-Guided Attention module that captures interactions between candidate objects and relational speech cues, improving target discrimination in cluttered scenes. To support benchmarking, we synthesize audio descriptions for standard 3DVG datasets, including ScanRefer, Sr3D, and Nr3D. Experimental results demonstrate that Audio-3DVG not only achieves new state-of-the-art performance in audio-based grounding, but also competes with text-based methods-highlighting the promise of integrating spoken language into 3D vision tasks.

nan


Article 502

Title@2025-07-01 (2): Identity Preserving 3D Head Stylization with Multiview Score Distillation

Title: Identity Preserving 3D Head Stylization with Multiview Score Distillation Identität 3D-Head-Stylisierung mit Multiview-Score-Destillation erhalten 身份 保留 3D 带有多视图分数蒸馏的顶板化 2411.13536v2

Authors (4): Bahri Batuhan Bilecen, Ahmet Berke Gokmen, Furkan Guzelant, Aysegul Dundar

3D head stylization transforms realistic facial features into artistic representations, enhancing user engagement across gaming and virtual reality applications. While 3D-aware generators have made significant advancements, many 3D stylization methods primarily provide near-frontal views and struggle to preserve the unique identities of original subjects, often resulting in outputs that lack diversity and individuality. This paper addresses these challenges by leveraging the PanoHead model, synthesizing images from a comprehensive 360-degree perspective. We propose a novel framework that employs negative log-likelihood distillation (LD) to enhance identity preservation and improve stylization quality. By integrating multi-view grid score and mirror gradients within the 3D GAN architecture and introducing a score rank weighing technique, our approach achieves substantial qualitative and quantitative improvements. Our findings not only advance the state of 3D head stylization but also provide valuable insights into effective distillation processes between diffusion models and GANs, focusing on the critical issue of identity preservation. Please visit the https://three-bee.github.io/head_stylization for more visuals.

nan


Article 503

Title@2025-07-01 (2): Neural Augmented Kalman Filters for Road Network assisted GNSS positioning

Title: Neural Augmented Kalman Filters for Road Network assisted GNSS positioning Neural Augmented Kalman Filter für Straßennetz unterstützt GNSS-Positionierung 协助导航卫星系统定位的公路网络神经增强卡尔曼过滤器 2507.00654v1

Authors (4): Hans van Gorp, Davide Belli, Amir Jalalirad, Bence Major

The Global Navigation Satellite System (GNSS) provides critical positioning information globally, but its accuracy in dense urban environments is often compromised by multipath and non-line-of-sight errors. Road network data can be used to reduce the impact of these errors and enhance the accuracy of a positioning system. Previous works employing road network data are either limited to offline applications, or rely on Kalman Filter (KF) heuristics with little flexibility and robustness. We instead propose training a Temporal Graph Neural Network (TGNN) to integrate road network information into a KF. The TGNN is designed to predict the correct road segment and its associated uncertainty to be used in the measurement update step of the KF. We validate our approach with real-world GNSS data and open-source road networks, observing a 29% decrease in positioning error for challenging scenarios compared to a GNSS-only KF. To the best of our knowledge, ours is the first deep learning-based approach jointly employing road network data and GNSS measurements to determine the user position on Earth.

nan


Article 504

Title@2025-07-01 (2): Cognitive Load-Aware Inference: A Neuro-Symbolic Framework for Optimizing the Token Economy of Large Language Models

Title: Cognitive Load-Aware Inference: A Neuro-Symbolic Framework for Optimizing the Token Economy of Large Language Models Kognitive Load-Aware-Schlussfolgerung: Ein neuro-symbolisches Framework zur Optimierung der Token-Ökonomie großer Sprachmodelle 认知式负载- 软件提醒推断:一个优化大语言模型调制经济的神经- 交元框架 2507.00653v1

Authors (1): Yilun Zhang

The escalating computational costs of Large Language Model (LLM) inference have become a critical barrier to their widespread and sustainable deployment. While existing optimization strategies are effective, they are predominantly based on statistical heuristics or architectural modifications, lacking a guiding cognitive theory to manage the inference process itself. This paper aims to bridge this gap by introducing a novel paradigm: the Cognitive Load-Aware Inference (CLAI) framework, which operationalizes principles from Cognitive Load Theory (CLT) and neuroscience for LLM inference. We formalize the concepts of Intrinsic Cognitive Load, Extraneous Cognitive Load, and Germane Cognitive Load into quantifiable LLM metrics ($ICL_{LLM}$, $ECL_{LLM}$, and $GCL_{LLM}$), thereby reframing the inference process as a cognitive economics optimization problem: based on the intrinsic complexity of a problem ($ICL_{LLM}$), minimize wasteful computation ($ECL_{LLM}$), and strategically allocate the token budget to productive reasoning ($GCL_{LLM}$). We propose two implementation paths: CLAI-Prompt, a zero-shot method that guides a base LLM through cognitive control steps via a structured meta-prompt, and CLAI-Tune, a fine-tuned model that internalizes these principles for spontaneous cognitive economy. Across a range of benchmarks in complex reasoning, long-context question answering, and code generation, our methods achieve significant reductions in token consumption (up to 45\%) without sacrificing accuracy. Furthermore, CLAI-Tune exhibits an emergent ability to autonomously decompose difficult problems, a key characteristic of human expert cognition. This work demonstrates that by emulating the brain’s resource management strategies, we can build more efficient, robust, and capable artificial intelligence systems.

nan


Article 505

Title@2025-07-01 (2): GANs Secretly Perform Approximate Bayesian Model Selection

Title: GANs Secretly Perform Approximate Bayesian Model Selection GANs führen heimlich ungefähre bayesische Modellauswahl durch GANs GANs 秘密进行近贝耶西亚模式选择 2507.00651v1

Authors (2): Maurizio Filippone, Marius P. Linhard

Generative Adversarial Networks (GANs) are popular and successful generative models. Despite their success, optimization is notoriously challenging and they require regularization against overfitting. In this work, we explain the success and limitations of GANs by interpreting them as probabilistic generative models. This interpretation enables us to view GANs as Bayesian neural networks with partial stochasticity, allowing us to establish conditions of universal approximation. We can then cast the adversarial-style optimization of several variants of GANs as the optimization of a proxy for the marginal likelihood. Taking advantage of the connection between marginal likelihood optimization and Occam’s razor, we can define regularization and optimization strategies to smooth the loss landscape and search for solutions with minimum description length, which are associated with flat minima and good generalization. The results on a wide range of experiments indicate that these strategies lead to performance improvements and pave the way to a deeper understanding of regularization strategies for GANs.

nan


Article 506

Title@2025-07-01 (2): Cooperative Sheaf Neural Networks

Title: Cooperative Sheaf Neural Networks Kooperative Schaf-Neural-Netzwerke 合作社牛羊神经网络 2507.00647v1

Authors (5): André Ribeiro, Ana Luiza Tenório, Juan Belieni, Amauri H. Souza, Diego Mesquita

Sheaf diffusion has recently emerged as a promising design pattern for graph representation learning due to its inherent ability to handle heterophilic data and avoid oversmoothing. Meanwhile, cooperative message passing has also been proposed as a way to enhance the flexibility of information diffusion by allowing nodes to independently choose whether to propagate/gather information from/to neighbors. A natural question ensues: is sheaf diffusion capable of exhibiting this cooperative behavior? Here, we provide a negative answer to this question. In particular, we show that existing sheaf diffusion methods fail to achieve cooperative behavior due to the lack of message directionality. To circumvent this limitation, we introduce the notion of cellular sheaves over directed graphs and characterize their in- and out-degree Laplacians. We leverage our construction to propose Cooperative Sheaf Neural Networks (CSNNs). Theoretically, we characterize the receptive field of CSNN and show it allows nodes to selectively attend (listen) to arbitrarily far nodes while ignoring all others in their path, potentially mitigating oversquashing. Our experiments show that CSNN presents overall better performance compared to prior art on sheaf diffusion as well as cooperative graph neural networks.

nan


Article 507

Title@2025-07-01 (2): Hebbian Physics Networks: A Self-Organizing Computational Architecture Based on Local Physical Laws

Title: Hebbian Physics Networks: A Self-Organizing Computational Architecture Based on Local Physical Laws Hebbian Physics Networks: Eine selbstorganisierende Computational Architecture basierend auf lokalen physikalischen Gesetzen Hebbian物理网络:基于当地物理法的自组织计算架构 2507.00641v1

Authors (3): Gunjan Auti, Hirofumi Daiguji, Gouhei Tanaka

Traditional machine learning approaches in physics rely on global optimization, limiting interpretability and enforcing physical constraints externally. We introduce the Hebbian Physics Network (HPN), a self-organizing computational framework in which learning emerges from local Hebbian updates driven by violations of conservation laws. Grounded in non-equilibrium thermodynamics and inspired by Prigogine/’s theory of dissipative structures, HPNs eliminate the need for global loss functions by encoding physical laws directly into the system/’s local dynamics. Residuals - quantified imbalances in continuity, momentum, or energy - serve as thermodynamic signals that drive weight adaptation through generalized Hebbian plasticity. We demonstrate this approach on incompressible fluid flow and continuum diffusion, where physically consistent structures emerge from random initial conditions without supervision. HPNs reframe computation as a residual-driven thermodynamic process, offering an interpretable, scalable, and physically grounded alternative for modeling complex dynamical systems.

nan


Article 508

Title@2025-07-01 (2): Mirror Online Conformal Prediction with Intermittent Feedback

Title: Mirror Online Conformal Prediction with Intermittent Feedback Spiegel Online Konforme Vorhersage mit intermittierendem Feedback 附带间断反馈的 “ 镜 “ 在线非正式预测 2503.10345v4

Authors (3): Bowen Wang, Matteo Zecchin, Osvaldo Simeone

Online conformal prediction enables the runtime calibration of a pre-trained artificial intelligence model using feedback on its performance. Calibration is achieved through set predictions that are updated via online rules so as to ensure long-term coverage guarantees. While recent research has demonstrated the benefits of incorporating prior knowledge into the calibration process, this has come at the cost of replacing coverage guarantees with less tangible regret guarantees based on the quantile loss. This work introduces intermittent mirror online conformal prediction (IM-OCP), a novel runtime calibration framework that integrates prior knowledge, operates under potentially intermittent feedback, and features minimal memory complexity. IM-OCP guarantees long-term coverage and sub-linear regret, both of which hold deterministically for any given data sequence and in expectation with respect to the intermittent feedback.

nan


Article 509

Title@2025-07-01 (2): Forward Reverse Kernel Regression for the Schrödinger bridge problem

Title: Forward Reverse Kernel Regression for the Schrödinger bridge problem Rückwärts-Kernel-Regression für das Schrödinger-Brückenproblem Schrödinger桥桥问题前向内核倒退 2507.00640v1

Authors (2): Denis Belomestny, John. Schoenmakers

In this paper, we study the Schr"odinger Bridge Problem (SBP), which is central to entropic optimal transport. For general reference processes and begin–endpoint distributions, we propose a forward-reverse iterative Monte Carlo procedure to approximate the Schr"odinger potentials in a nonparametric way. In particular, we use kernel based Monte Carlo regression in the context of Picard iteration of a corresponding fixed point problem. By preserving in the iteration positivity and contractivity in a Hilbert metric sense, we develop a provably convergent algorithm. Furthermore, we provide convergence rates for the potential estimates and prove their optimality. Finally, as an application, we propose a non-nested Monte Carlo procedure for the final dimensional distributions of the Schr"odinger Bridge process, based on the constructed potentials and the forward-reverse simulation method for conditional diffusions.

nan


Article 510

Title@2025-07-01 (2): Training-Conditional Coverage Bounds under Covariate Shift

Title: Training-Conditional Coverage Bounds under Covariate Shift Training-Conditional Coverage Bounds unter Kovariate Shift 共同变换下的培训 – – 有条件的覆盖范围 2405.16594v2

Authors (2): Mehrdad Pournaderi, Yu Xiang

Conformal prediction methodology has recently been extended to the covariate shift setting, where the distribution of covariates differs between training and test data. While existing results ensure that the prediction sets from these methods achieve marginal coverage above a nominal level, their coverage rate conditional on the training dataset (referred to as training-conditional coverage) remains unexplored. In this paper, we address this gap by deriving upper bounds on the tail of the training-conditional coverage distribution, offering probably approximately correct (PAC) guarantees for these methods. Our results quantify the relationship between the quality of the prediction sets and the severity of distributional changes, and can potentially be used to compute more efficient prediction sets.

nan


Article 511

Title@2025-07-01 (2): A novel Trunk Branch-net PINN for flow and heat transfer prediction in porous medium

Title: A novel Trunk Branch-net PINN for flow and heat transfer prediction in porous medium Ein neuartiges Trunk Branch-Net PINN für die Vorhersage von Durchfluss und Wärmeübertragung im porösen Medium 新的Trunk分支网PINN,用于对多孔介质的流量和热传输预测 2501.16362v2

Authors (6): Haoyun Xing, Kaiyan Jin, Guice Yao, Jin Zhao, Dichu Xu, Dongsheng Wen

A novel Trunk-Branch (TB)-net physics-informed neural network (PINN) architecture is developed, which is a PINN-based method incorporating trunk and branch nets to capture both global and local features. The aim is to solve four main classes of problems: forward flow problem, forward heat transfer problem, inverse heat transfer problem, and transfer learning problem within the porous medium, which are notoriously complex that could not be handled by origin PINN. In the proposed TB-net PINN architecture, a Fully-connected Neural Network (FNN) is used as the trunk net, followed by separated FNNs as the branch nets with respect to outputs, and automatic differentiation is performed for partial derivatives of outputs with respect to inputs by considering various physical loss. The effectiveness and flexibility of the novel TB-net PINN architecture is demonstrated through a collection of forward problems, and transfer learning validates the feasibility of resource reuse. Combining with the superiority over traditional numerical methods in solving inverse problems, the proposed TB-net PINN shows its great potential for practical engineering applications.

nan


Article 512

Title@2025-07-01 (2): Towards the Training of Deeper Predictive Coding Neural Networks

Title: Towards the Training of Deeper Predictive Coding Neural Networks Auf dem Weg zur Ausbildung tiefer vorausschauender Coding-Neural-Netzwerke 走向深更可预测编码神经网络培训 2506.23800v2

Authors (4): Chang Qi, Matteo Forasassi, Thomas Lukasiewicz, Tommaso Salvatori

Predictive coding networks trained with equilibrium propagation are neural models that perform inference through an iterative energy minimization process. Previous studies have demonstrated their effectiveness in shallow architectures, but show significant performance degradation when depth exceeds five to seven layers. In this work, we show that the reason behind this degradation is due to exponentially imbalanced errors between layers during weight updates, and predictions from the previous layer not being effective in guiding updates in deeper layers. We address the first issue by introducing two novel methods to optimize the latent variables that use precision-weighting to re-balance the distribution of energy among layers during the `relaxation phase’, and the second issue by proposing a novel weight update mechanism that reduces error accumulation in deeper layers. Empirically, we test our methods on a large number of image classification tasks, resulting in large improvements in test accuracy across networks with more than seven layers, with performances comparable to those of backprop on similar models. These findings suggest that a better understanding of the relaxation phase is important to train models using equilibrium propagation at scale, and open new possibilities for their application in complex tasks.

nan


Article 513

Title@2025-07-01 (2): Positive region preserved random sampling: an efficient feature selection method for massive data

Title: Positive region preserved random sampling: an efficient feature selection method for massive data Positive Region erhaltene Zufallsstichprobe: eine effiziente Feature-Auswahlmethode für massive Daten 积极区域保留随机随机抽样:大规模数据的有效特征选择方法 2507.01998v1

Authors (4): Hexiang Bai, Deyu Li, Jiye Liang, Yanhui Zhai

Selecting relevant features is an important and necessary step for intelligent machines to maximize their chances of success. However, intelligent machines generally have no enough computing resources when faced with huge volume of data. This paper develops a new method based on sampling techniques and rough set theory to address the challenge of feature selection for massive data. To this end, this paper proposes using the ratio of discernible object pairs to all object pairs that should be distinguished to measure the discriminatory ability of a feature set. Based on this measure, a new feature selection method is proposed. This method constructs positive region preserved samples from massive data to find a feature subset with high discriminatory ability. Compared with other methods, the proposed method has two advantages. First, it is able to select a feature subset that can preserve the discriminatory ability of all the features of the target massive data set within an acceptable time on a personal computer. Second, the lower boundary of the probability of the object pairs that can be discerned using the feature subset selected in all object pairs that should be distinguished can be estimated before finding reducts. Furthermore, 11 data sets of different sizes were used to validate the proposed method. The results show that approximate reducts can be found in a very short period of time, and the discriminatory ability of the final reduct is larger than the estimated lower boundary. Experiments on four large-scale data sets also showed that an approximate reduct with high discriminatory ability can be obtained in reasonable time on a personal computer.

nan


Article 514

Title@2025-07-01 (2): Geometric Gaussian Approximations of Probability Distributions

Title: Geometric Gaussian Approximations of Probability Distributions Geometrische Gauß-Annäherungen der Wahrscheinlichkeitsverteilungen 概率分布分布的近似几何几何 2507.00616v1

Authors (3): Nathaël Da Costa, Bálint Mucsányi, Philipp Hennig

Approximating complex probability distributions, such as Bayesian posterior distributions, is of central interest in many applications. We study the expressivity of geometric Gaussian approximations. These consist of approximations by Gaussian pushforwards through diffeomorphisms or Riemannian exponential maps. We first review these two different kinds of geometric Gaussian approximations. Then we explore their relationship to one another. We further provide a constructive proof that such geometric Gaussian approximations are universal, in that they can capture any probability distribution. Finally, we discuss whether, given a family of probability distributions, a common diffeomorphism can be found to obtain uniformly high-quality geometric Gaussian approximations for that family.

nan


Article 515

Title@2025-07-01 (2): Residual Reward Models for Preference-based Reinforcement Learning

Title: Residual Reward Models for Preference-based Reinforcement Learning Residual Reward-Modelle für das Preference-basierte Verstärkungslernen 以优惠为基础的强化学习的剩余奖励模式 2507.00611v1

Authors (5): Chenyang Cao, Miguel Rogel-García, Mohamed Nabail, Xueqian Wang, Nicholas Rhinehart

Preference-based Reinforcement Learning (PbRL) provides a way to learn high-performance policies in environments where the reward signal is hard to specify, avoiding heuristic and time-consuming reward design. However, PbRL can suffer from slow convergence speed since it requires training in a reward model. Prior work has proposed learning a reward model from demonstrations and fine-tuning it using preferences. However, when the model is a neural network, using different loss functions for pre-training and fine-tuning can pose challenges to reliable optimization. In this paper, we propose a method to effectively leverage prior knowledge with a Residual Reward Model (RRM). An RRM assumes that the true reward of the environment can be split into a sum of two parts: a prior reward and a learned reward. The prior reward is a term available before training, for example, a user’s ``best guess’’ reward function, or a reward function learned from inverse reinforcement learning (IRL), and the learned reward is trained with preferences. We introduce state-based and image-based versions of RRM and evaluate them on several tasks in the Meta-World environment suite. Experimental results show that our method substantially improves the performance of a common PbRL method. Our method achieves performance improvements for a variety of different types of prior rewards, including proxy rewards, a reward obtained from IRL, and even a negated version of the proxy reward. We also conduct experiments with a Franka Panda to show that our method leads to superior performance on a real robot. It significantly accelerates policy learning for different tasks, achieving success in fewer steps than the baseline. The videos are presented at https://sunlighted.github.io/RRM-web/.

nan


Article 516

Title@2025-07-01 (2): Integrating Expert Labels into LLM-based Emission Goal Detection: Example Selection vs Automatic Prompt Design

Title: Integrating Expert Labels into LLM-based Emission Goal Detection: Example Selection vs Automatic Prompt Design Integration von Experten-Etiketten in LLM-basierte Emissionszielerkennung: Beispielauswahl vs Automatisches Prompt-Design 将专家标签纳入基于LLM的LLM排放目标探测:选择实例与自动即时设计 2412.06432v2

Authors (5): Marco Wrzalik, Adrian Ulges, Anne Uersfeld, Florian Faust, Viola Campos

We address the detection of emission reduction goals in corporate reports, an important task for monitoring companies’ progress in addressing climate change. Specifically, we focus on the issue of integrating expert feedback in the form of labeled example passages into LLM-based pipelines, and compare the two strategies of (1) a dynamic selection of few-shot examples and (2) the automatic optimization of the prompt by the LLM itself. Our findings on a public dataset of 769 climate-related passages from real-world business reports indicate that automatic prompt optimization is the superior approach, while combining both methods provides only limited benefit. Qualitative results indicate that optimized prompts do indeed capture many intricacies of the targeted emission goal extraction task.

nan


Article 517

Title@2025-07-01 (2): A Practical Guide to Interpretable Role-Based Clustering in Multi-Layer Financial Networks

Title: A Practical Guide to Interpretable Role-Based Clustering in Multi-Layer Financial Networks Ein praktischer Leitfaden für ein rollenbasiertes Clustering in Multi-Layer-Finanznetzwerken 多行业金融网络基于可解释作用的集群化实用指南 2507.00600v1

Authors (3): Christian Franssen, Iman van Lelyveld, Bernd Heidergott

Understanding the functional roles of financial institutions within interconnected markets is critical for effective supervision, systemic risk assessment, and resolution planning. We propose an interpretable role-based clustering approach for multi-layer financial networks, designed to identify the functional positions of institutions across different market segments. Our method follows a general clustering framework defined by proximity measures, cluster evaluation criteria, and algorithm selection. We construct explainable node embeddings based on egonet features that capture both direct and indirect trading relationships within and across market layers. Using transaction-level data from the ECB’s Money Market Statistical Reporting (MMSR), we demonstrate how the approach uncovers heterogeneous institutional roles such as market intermediaries, cross-segment connectors, and peripheral lenders or borrowers. The results highlight the flexibility and practical value of role-based clustering in analyzing financial networks and understanding institutional behavior in complex market structures.

nan


Article 518

Title@2025-07-01 (2): Hierarchical Decentralized Stochastic Control for Cyber-Physical Systems

Title: Hierarchical Decentralized Stochastic Control for Cyber-Physical Systems Hierarchische dezentrale stochastische Steuerung für Cyber-Physische Systeme 网络物理系统等级分层存储控制 2506.22971v2

Authors (3): Kesav Kaza, Ramachandran Anantharaman, Rahul Meshram

This paper presents a two-timescale hierarchical decentralized architecture for control of Cyber-Physical Systems. The architecture consists of $N$ independent sub-processes, a global controller, and $N$ local controllers, each formulated as a Markov Decision Process (MDP). The global controller, operating at a slower timescale optimizes the infinite-horizon discounted cumulative reward under budget constraints. For the local controllers, operating at a faster timescale, we propose two different optimization frameworks, namely the COpt and FOpt. In the COpt framework, the local controller also optimizes an infinite-horizon MDP, while in the FOpt framework, the local controller optimizes a finite-horizon MDP. The FOpt framework mimics a federal structure, where the local controllers have more autonomy in their decision making. First, the existence of stationary deterministic optimal policies for both these frameworks is established. Then, various relationships between the two frameworks are studied, including a bound on the difference between the two optimal value functions. Additionally, sufficiency conditions are provided such that the two frameworks lead to the same optimal values.

nan


Article 519

Title@2025-07-01 (2): Multiresolution Analysis and Statistical Thresholding on Dynamic Networks

Title: Multiresolution Analysis and Statistical Thresholding on Dynamic Networks Multiauflösungsanalyse und statistische Thresholding auf dynamischen Netzwerken 多分辨率分析和统计动态网络 2506.01208v2

Authors (4): Raphaël Romero, Tijl De Bie, Nick Heard, Alexander Modell

Detecting structural change in dynamic network data has wide-ranging applications. Existing approaches typically divide the data into time bins, extract network features within each bin, and then compare these features over time. This introduces an inherent tradeoff between temporal resolution and the statistical stability of the extracted features. Despite this tradeoff, reminiscent of time-frequency tradeoffs in signal processing, most methods rely on a fixed temporal resolution. Choosing an appropriate resolution parameter is typically difficult and can be especially problematic in domains like cybersecurity, where anomalous behavior may emerge at multiple time scales. We address this challenge by proposing ANIE (Adaptive Network Intensity Estimation), a multi-resolution framework designed to automatically identify the time scales at which network structure evolves, enabling the joint detection of both rapid and gradual changes. Modeling interactions as Poisson processes, our method proceeds in two steps: (1) estimating a low-dimensional subspace of node behavior, and (2) deriving a set of novel empirical affinity coefficients that quantify change in interaction intensity between latent factors and support statistical testing for structural change across time scales. We provide theoretical guarantees for subspace estimation and the asymptotic behavior of the affinity coefficients, enabling model-based change detection. Experiments on synthetic networks show that ANIE adapts to the appropriate time resolution and is able to capture sharp structural changes while remaining robust to noise. Furthermore, applications to real-world data showcase the practical benefits of ANIE’s multiresolution approach to detecting structural change over fixed resolution methods.

nan


Article 520

Title@2025-07-01 (2): Downscaling Neural Network for Coastal Simulations

Title: Downscaling Neural Network for Coastal Simulations Downscaling Neural Network für Küstensimulationen 沿海模拟的降尺度神经网络 2408.16553v2

Authors (4): Zhi-Song Liu, Markus Buttner, Vadym Aizinger, Andreas Rupp

Learning the fine-scale details of a coastal ocean simulation from a coarse representation is a challenging task. For real-world applications, high-resolution simulations are necessary to advance understanding of many coastal processes, specifically, to predict flooding resulting from tsunamis and storm surges. We propose a Downscaling Neural Network for Coastal Simulation (DNNCS) for spatiotemporal enhancement to efficiently learn the high-resolution numerical solution. Given images of coastal simulations produced on low-resolution computational meshes using low polynomial order discontinuous Galerkin discretizations and a coarse temporal resolution, the proposed DNNCS learns to produce high-resolution free surface elevation and velocity visualizations in both time and space. To efficiently model the dynamic changes over time and space, we propose grid-aware spatiotemporal attention to project the temporal features to the spatial domain for non-local feature matching. The coordinate information is also utilized via positional encoding. For the final reconstruction, we use the spatiotemporal bilinear operation to interpolate the missing frames and then expand the feature maps to the frequency domain for residual mapping. Besides data-driven losses, the proposed physics-informed loss guarantees gradient consistency and momentum changes. Their combination contributes to the overall 24% improvements in Root Mean Square Error (RMSE). To train the proposed model, we propose a novel coastal simulation dataset and use it for model optimization and evaluation. Our method shows superior downscaling quality and fast computation compared to the state-of-the-art methods.

nan


Article 521

Title@2025-07-01 (2): Binned semiparametric Bayesian networks

Title: Binned semiparametric Bayesian networks Verbundene semiparametrische Bayesische Netze Binned半参数网络 2506.21997v2

Authors (4): Rafael Sojo, Javier Díaz-Rozo, Concha Bielza, Pedro Larrañaga

This paper introduces a new type of probabilistic semiparametric model that takes advantage of data binning to reduce the computational cost of kernel density estimation in nonparametric distributions. Two new conditional probability distributions are developed for the new binned semiparametric Bayesian networks, the sparse binned kernel density estimation and the Fourier kernel density estimation. These two probability distributions address the curse of dimensionality, which typically impacts binned models, by using sparse tensors and restricting the number of parent nodes in conditional probability calculations. To evaluate the proposal, we perform a complexity analysis and conduct several comparative experiments using synthetic data and datasets from the UCI Machine Learning repository. The experiments include different binning rules, parent restrictions, grid sizes, and number of instances to get a holistic view of the model’s behavior. As a result, our binned semiparametric Bayesian networks achieve structural learning and log-likelihood estimations with no statistically significant differences compared to the semiparametric Bayesian networks, but at a much higher speed. Thus, the new binned semiparametric Bayesian networks prove to be a reliable and more efficient alternative to their non-binned counterparts.

nan


Article 522

Title@2025-07-01 (2): Quantum Circuit Structure Optimization for Quantum Reinforcement Learning

Title: Quantum Circuit Structure Optimization for Quantum Reinforcement Learning Quantum Circuit Structure Optimization for Quantum Reinforcement Learning 量子强化学习的量子电路结构优化 2507.00589v1

Authors (2): Seok Bin Son, Joongheon Kim

Reinforcement learning (RL) enables agents to learn optimal policies through environmental interaction. However, RL suffers from reduced learning efficiency due to the curse of dimensionality in high-dimensional spaces. Quantum reinforcement learning (QRL) addresses this issue by leveraging superposition and entanglement in quantum computing, allowing efficient handling of high-dimensional problems with fewer resources. QRL combines quantum neural networks (QNNs) with RL, where the parameterized quantum circuit (PQC) acts as the core computational module. The PQC performs linear and nonlinear transformations through gate operations, similar to hidden layers in classical neural networks. Previous QRL studies, however, have used fixed PQC structures based on empirical intuition without verifying their optimality. This paper proposes a QRL-NAS algorithm that integrates quantum neural architecture search (QNAS) to optimize PQC structures within QRL. Experiments demonstrate that QRL-NAS achieves higher rewards than QRL with fixed circuits, validating its effectiveness and practical utility.

nan


Article 523

Title@2025-07-01 (2): AI-Generated Video Detection via Perceptual Straightening

Title: AI-Generated Video Detection via Perceptual Straightening KI-generierte Videoerkennung durch Wahrnehmungsjustierung AI - 通过感知直线探测视频 2507.00583v1

Authors (6): Christian Internò, Robert Geirhos, Markus Olhofer, Sunny Liu, Barbara Hammer, David Klindt

The rapid advancement of generative AI enables highly realistic synthetic videos, posing significant challenges for content authentication and raising urgent concerns about misuse. Existing detection methods often struggle with generalization and capturing subtle temporal inconsistencies. We propose ReStraV(Representation Straightening Video), a novel approach to distinguish natural from AI-generated videos. Inspired by the “perceptual straightening” hypothesis – which suggests real-world video trajectories become more straight in neural representation domain – we analyze deviations from this expected geometric property. Using a pre-trained self-supervised vision transformer (DINOv2), we quantify the temporal curvature and stepwise distance in the model’s representation domain. We aggregate statistics of these measures for each video and train a classifier. Our analysis shows that AI-generated videos exhibit significantly different curvature and distance patterns compared to real videos. A lightweight classifier achieves state-of-the-art detection performance (e.g., 97.17% accuracy and 98.63% AUROC on the VidProM benchmark), substantially outperforming existing image- and video-based methods. ReStraV is computationally efficient, it is offering a low-cost and effective detection solution. This work provides new insights into using neural representation geometry for AI-generated video detection.

nan


Article 524

Title@2025-07-01 (2): Rotational Sampling: A Plug-and-Play Encoder for Rotation-Invariant 3D Molecular GNNs

Title: Rotational Sampling: A Plug-and-Play Encoder for Rotation-Invariant 3D Molecular GNNs Rotationsproben: Ein Plug-and-Play-Encoder für Rotations-Invariante 3D-Molekular-GNNs 旋转取样:旋转-变量3D分子GNNs的插管和布边编码器 2507.01073v1

Authors (1): Dian Jin

Graph neural networks (GNNs) have achieved remarkable success in molecular property prediction. However, traditional graph representations struggle to effectively encode the inherent 3D spatial structures of molecules, as molecular orientations in 3D space introduce significant variability, severely limiting model generalization and robustness. Existing approaches primarily focus on rotation-invariant and rotation-equivariant methods. Invariant methods often rely heavily on prior knowledge and lack sufficient generalizability, while equivariant methods suffer from high computational costs. To address these limitations, this paper proposes a novel plug-and-play 3D encoding module leveraging rotational sampling. By computing the expectation over the SO(3) rotational group, the method naturally achieves approximate rotational invariance. Furthermore, by introducing a carefully designed post-alignment strategy, strict invariance can be achieved without compromising performance. Experimental evaluations on the QM9 and C10 Datasets demonstrate superior predictive accuracy, robustness, and generalization performance compared to existing methods. Moreover, the proposed approach maintains low computational complexity and enhanced interpretability, providing a promising direction for efficient and effective handling of 3D molecular information in drug discovery and material design.

nan


Article 525

Title@2025-07-01 (2): Integrating Dual Prototypes for Task-Wise Adaption in Pre-Trained Model-Based Class-Incremental Learning

Title: Integrating Dual Prototypes for Task-Wise Adaption in Pre-Trained Model-Based Class-Incremental Learning Integration von Dual Prototypen für die Task-Wise-Anpassung in vorgefertigtes modellbasiertes Klassen-Inkrementelles Lernen 将双重原型结合,用于在培训前的基于模型的高级课堂强化学习中进行任务-任务-任务-调整适应 2411.17766v3

Authors (5): Zhiming Xu, Suorong Yang, Baile Xu, Furao Shen, Jian Zhao

Class-incremental learning (CIL) aims to acquire new classes while conserving historical knowledge incrementally. Despite existing pre-trained model (PTM) based methods performing excellently in CIL, it is better to fine-tune them on downstream incremental tasks with massive patterns unknown to PTMs. However, using task streams for fine-tuning could lead to \textit{catastrophic forgetting} that will erase the knowledge in PTMs. This paper proposes the Dual Prototype network for Task-wise Adaption (DPTA) of PTM-based CIL. For each incremental learning task, an adapter module is built to fine-tune the PTM, where the center-adapt loss forces the representation to be more centrally clustered and class separable. The dual prototype network improves the prediction process by enabling test-time adapter selection, where the raw prototypes deduce several possible task indexes of test samples to select suitable adapter modules for PTM, and the augmented prototypes that could separate highly correlated classes are utilized to determine the final result. Experiments on several benchmark datasets demonstrate the excellent performance of DPTA. Code is available in https://github.com/Yorkxzm/DPTA

nan


Article 526

Title@2025-07-01 (2): Foundation Models for Clinical Records at Health System Scale

Title: Foundation Models for Clinical Records at Health System Scale Gründungsmodelle für klinische Aufzeichnungen auf der Health System Scale 保健系统规模的临床记录模型基础 2507.00574v1

Authors (8): Haresh Rengaraj Rajamohan, Xiang Gao, Weicheng Zhu, Shih-Lun Huang, Long Chen, Kyunghyun Cho, Cem M. Deniz, Narges Razavian

Large-scale pretraining has transformed modeling of language and other data types, but its potential remains underexplored in healthcare with structured electronic health records (EHRs). We present a novel generative pretraining strategy for sequential EHR data using next-visit event prediction. Our model learns to autoregressively generate various tokenized clinical events for the next visit based on patient history and inherently handles the joint prediction of heterogeneous data types. Additionally, we introduce regularization on predicting repeated events and highlight a key pitfall in EHR-based foundation model evaluations: repeated event tokens can inflate performance metrics when new onsets are not distinguished from subsequent occurrences. Our model is evaluated via zero-shot prediction for forecasting dementia and knee osteoarthritis incidence within 2 and 5 years, and the model performance rivals a fully fine-tuned masked pretrained Transformer baseline, demonstrating that our approach captures complex clinical dependencies without requiring costly task-specific fine-tuning.

nan


Article 527

Title@2025-07-01 (2): The Curse of Depth in Large Language Models

Title: The Curse of Depth in Large Language Models Der Fluch der Tiefe in großen Sprachmodellen 大语言模型深度诅咒 2502.05795v2

Authors (6): Wenfang Sun, Xinyuan Song, Pengxiang Li, Lu Yin, Yefeng Zheng, Shiwei Liu

In this paper, we introduce the Curse of Depth, a concept that highlights, explains, and addresses the recent observation in modern Large Language Models (LLMs) where nearly half of the layers are less effective than expected. We first confirm the wide existence of this phenomenon across the most popular families of LLMs such as Llama, Mistral, DeepSeek, and Qwen. Our analysis, theoretically and empirically, identifies that the underlying reason for the ineffectiveness of deep layers in LLMs is the widespread usage of Pre-Layer Normalization (Pre-LN). While Pre-LN stabilizes the training of Transformer LLMs, its output variance exponentially grows with the model depth, which undesirably causes the derivative of the deep Transformer blocks to be an identity matrix, and therefore barely contributes to the training. To resolve this training pitfall, we propose LayerNorm Scaling (LNS), which scales the variance of output of the layer normalization inversely by the square root of its depth. This simple modification mitigates the output variance explosion of deeper Transformer layers, improving their contribution. Across a wide range of model sizes (130M to 7B), our experiments show that LNS consistently outperforms previous normalization and scaling techniques in enhancing LLM pre-training performance. Moreover, this improvement seamlessly carries over to supervised fine-tuning. All these gains can be attributed to the fact that LayerNorm Scaling enables deeper layers to contribute more effectively during training. Our code is available at \href{https://github.com/lmsdss/LayerNorm-Scaling}{LayerNorm-Scaling}.

nan


Article 528

Title@2025-07-01 (2): Inverse Design in Nanophotonics via Representation Learning

Title: Inverse Design in Nanophotonics via Representation Learning Inverses Design in der Nanophotonik über Representation Learning 通过代表制学习对纳米体进行反向设计 2507.00546v1

Authors (3): Reza Marzban, Ali Adibi, Raphael Pestourie

Inverse design in nanophotonics, the computational discovery of structures achieving targeted electromagnetic (EM) responses, has become a key tool for recent optical advances. Traditional intuition-driven or iterative optimization methods struggle with the inherently high-dimensional, non-convex design spaces and the substantial computational demands of EM simulations. Recently, machine learning (ML) has emerged to address these bottlenecks effectively. This review frames ML-enhanced inverse design methodologies through the lens of representation learning, classifying them into two categories: output-side and input-side approaches. Output-side methods use ML to learn a representation in the solution space to create a differentiable solver that accelerates optimization. Conversely, input-side techniques employ ML to learn compact, latent-space representations of feasible device geometries, enabling efficient global exploration through generative models. Each strategy presents unique trade-offs in data requirements, generalization capacity, and novel design discovery potentials. Hybrid frameworks that combine physics-based optimization with data-driven representations help escape poor local optima, improve scalability, and facilitate knowledge transfer. We conclude by highlighting open challenges and opportunities, emphasizing complexity management, geometry-independent representations, integration of fabrication constraints, and advancements in multiphysics co-designs.

nan


Article 529

Title@2025-07-01 (2): Not All Attention Heads Are What You Need: Refining CLIP’s Image Representation with Attention Ablation

Title: Not All Attention Heads Are What You Need: Refining CLIP’s Image Representation with Attention Ablation Nicht alle Aufmerksamkeit Köpfe sind, was Sie brauchen: Verfeinerung CLIPs Bilddarstellung mit Aufmerksamkeit Ablation 并非所有的焦点都指你需要什么:完善CLIP的“关注推介”图像代表 2507.00537v1

Authors (6): Feng Lin, Marco Chen, Haokui Zhang, Xiaotian Yu, Guangming Lu, Rong Xiao

This paper studies the role of attention heads in CLIP’s image encoder. While CLIP has exhibited robust performance across diverse applications, we hypothesize that certain attention heads negatively affect final representations and that ablating them can improve performance in downstream tasks. To capitalize on this insight, we propose a simple yet effective method, called Attention Ablation Technique (AAT), to suppress the contribution of specific heads by manipulating attention weights. By integrating two alternative strategies tailored for different application scenarios, AAT systematically identifies and ablates detrimental attention heads to enhance representation quality. Experiments demonstrate that AAT consistently improves downstream task performance across various domains, boosting recall rate by up to 11.1% on CLIP-family models for cross-modal retrieval. The results highlight the potential of AAT to effectively refine large-scale vision-language models with virtually no increase in inference cost.

nan


Article 530

Title@2025-07-01 (2): Exploring Large Action Sets with Hyperspherical Embeddings using von Mises-Fisher Sampling

Title: Exploring Large Action Sets with Hyperspherical Embeddings using von Mises-Fisher Sampling Große Action-Sets mit Hypersphärischen Einbettungen mit von Mises-Fisher Sampling erkunden 利用 von Misses-Fisher 取样法探索具有超球嵌入的大型动作集 2507.00518v1

Authors (6): Walid Bendada, Guillaume Salha-Galvan, Romain Hennequin, Théo Bontempelli, Thomas Bouabça, Tristan Cazenave

This paper introduces von Mises-Fisher exploration (vMF-exp), a scalable method for exploring large action sets in reinforcement learning problems where hyperspherical embedding vectors represent these actions. vMF-exp involves initially sampling a state embedding representation using a von Mises-Fisher distribution, then exploring this representation’s nearest neighbors, which scales to virtually unlimited numbers of candidate actions. We show that, under theoretical assumptions, vMF-exp asymptotically maintains the same probability of exploring each action as Boltzmann Exploration (B-exp), a popular alternative that, nonetheless, suffers from scalability issues as it requires computing softmax values for each action. Consequently, vMF-exp serves as a scalable alternative to B-exp for exploring large action sets with hyperspherical embeddings. Experiments on simulated data, real-world public data, and the successful large-scale deployment of vMF-exp on the recommender system of a global music streaming service empirically validate the key properties of the proposed method.

nan


Article 531

Title@2025-07-01 (2): Simulation-Efficient Cosmological Inference with Multi-Fidelity SBI

Title: Simulation-Efficient Cosmological Inference with Multi-Fidelity SBI Simulationseffiziente kosmologische Schlussfolgerung mit Multi-Fidelity SBI 多功能性模拟有效宇宙学推断 2507.00514v1

Authors (3): Leander Thiele, Adrian E. Bayer, Naoya Takeishi

The simulation cost for cosmological simulation-based inference can be decreased by combining simulation sets of varying fidelity. We propose an approach to such multi-fidelity inference based on feature matching and knowledge distillation. Our method results in improved posterior quality, particularly for small simulation budgets and difficult inference problems.

nan


Article 532

Title@2025-07-01 (2): Medical Image Segmentation Using Advanced Unet: VMSE-Unet and VM-Unet CBAM+

Title: Medical Image Segmentation Using Advanced Unet: VMSE-Unet and VM-Unet CBAM+ Medizinische Bildsegmentierung mit Advanced Unet: VMSE-Unet und VM-Unet CBAM+ 利用高级Unet:VMSE-Unet和VM-Unet CBAM+ 2507.00511v1

Authors (6): Sayandeep Kanrar, Raja Piyush, Qaiser Razi, Debanshi Chakraborty, Vikas Hassija, GSS Chalapathi

In this paper, we present the VMSE U-Net and VM-Unet CBAM+ model, two cutting-edge deep learning architectures designed to enhance medical image segmentation. Our approach integrates Squeeze-and-Excitation (SE) and Convolutional Block Attention Module (CBAM) techniques into the traditional VM U-Net framework, significantly improving segmentation accuracy, feature localization, and computational efficiency. Both models show superior performance compared to the baseline VM-Unet across multiple datasets. Notably, VMSEUnet achieves the highest accuracy, IoU, precision, and recall while maintaining low loss values. It also exhibits exceptional computational efficiency with faster inference times and lower memory usage on both GPU and CPU. Overall, the study suggests that the enhanced architecture VMSE-Unet is a valuable tool for medical image analysis. These findings highlight its potential for real-world clinical applications, emphasizing the importance of further research to optimize accuracy, robustness, and computational efficiency.

nan


Article 533

Title@2025-07-01 (2): Plastic tensor networks for interpretable generative modeling

Title: Plastic tensor networks for interpretable generative modeling Kunststoff-Tensor-Netzwerke für interpretierbare generative Modellierung 用于可解释的基因建模的可塑多压网络 2504.06722v2

Authors (4): Katsuya O. Akamatsu, Kenji Harada, Tsuyoshi Okubo, Naoki Kawashima

A structural optimization scheme for a single-layer nonnegative adaptive tensor tree (NATT) that models a target probability distribution is proposed as an alternative paradigm for generative modeling. The NATT scheme, by construction, automatically searches for a tree structure that best fits a given discrete dataset whose features serve as inputs, and has the advantage that it is interpretable as a probabilistic graphical model. We consider the NATT scheme and a recently proposed Born machine adaptive tensor tree (BMATT) optimization scheme and demonstrate their effectiveness on a variety of generative modeling tasks where the objective is to infer the hidden structure of a provided dataset. Our results show that in terms of minimizing the negative log-likelihood, the single-layer scheme has model performance comparable to the Born machine scheme, though not better. The tasks include deducing the structure of binary bitwise operations, learning the internal structure of random Bayesian networks given only visible sites, and a real-world example related to hierarchical clustering where a cladogram is constructed from mitochondrial DNA sequences. In doing so, we also show the importance of the choice of network topology and the versatility of a least-mutual information criterion in selecting a candidate structure for a tensor tree, as well as discuss aspects of these tensor tree generative models including their information content and interpretability.

nan


Article 534

Title@2025-07-01 (2): MuteSwap: Silent Face-based Voice Conversion

Title: MuteSwap: Silent Face-based Voice Conversion MuteSwap: Silent Face-based Voice Conversion MuteSwap: 沉默的面部语音转换 2507.00498v1

Authors (3): Yifan Liu, Yu Fang, Zhouhan Lin

Conventional voice conversion modifies voice characteristics from a source speaker to a target speaker, relying on audio input from both sides. However, this process becomes infeasible when clean audio is unavailable, such as in silent videos or noisy environments. In this work, we focus on the task of Silent Face-based Voice Conversion (SFVC), which does voice conversion entirely from visual inputs. i.e., given images of a target speaker and a silent video of a source speaker containing lip motion, SFVC generates speech aligning the identity of the target speaker while preserving the speech content in the source silent video. As this task requires generating intelligible speech and converting identity using only visual cues, it is particularly challenging. To address this, we introduce MuteSwap, a novel framework that employs contrastive learning to align cross-modality identities and minimize mutual information to separate shared visual features. Experimental results show that MuteSwap achieves impressive performance in both speech synthesis and identity conversion, especially under noisy conditions where methods dependent on audio input fail to produce intelligible results, demonstrating both the effectiveness of our training approach and the feasibility of SFVC.

nan


Article 535

Title@2025-07-01 (2): PNAct: Crafting Backdoor Attacks in Safe Reinforcement Learning

Title: PNAct: Crafting Backdoor Attacks in Safe Reinforcement Learning PNAct: Backdoor-Angriffe im sicheren Ausbau lernen PN法案:在安全加强学习中进行后门攻击 2507.00485v1

Authors (4): Weiran Guo, Guanjun Liu, Ziyuan Zhou, Ling Wang

Reinforcement Learning (RL) is widely used in tasks where agents interact with an environment to maximize rewards. Building on this foundation, Safe Reinforcement Learning (Safe RL) incorporates a cost metric alongside the reward metric, ensuring that agents adhere to safety constraints during decision-making. In this paper, we identify that Safe RL is vulnerable to backdoor attacks, which can manipulate agents into performing unsafe actions. First, we introduce the relevant concepts and evaluation metrics for backdoor attacks in Safe RL. It is the first attack framework in the Safe RL field that involves both Positive and Negative Action sample (PNAct) is to implant backdoors, where positive action samples provide reference actions and negative action samples indicate actions to be avoided. We theoretically point out the properties of PNAct and design an attack algorithm. Finally, we conduct experiments to evaluate the effectiveness of our proposed backdoor attack framework, evaluating it with the established metrics. This paper highlights the potential risks associated with Safe RL and underscores the feasibility of such attacks. Our code and supplementary material are available at https://github.com/azure-123/PNAct.

nan


Article 536

Title@2025-07-01 (2): UFGraphFR: Graph Federation Recommendation System based on User Text description features

Title: UFGraphFR: Graph Federation Recommendation System based on User Text description features UFGraphFR: Graph Federation Recommendation System basierend auf Funktionen der Benutzertextbeschreibung UFGgraphFR:基于用户文本描述特征的图表联合会建议系统 2501.08044v3

Authors (4): Xudong Wang, Qingbo Hao, Xu Cheng, Yingyuan Xiao

Federated learning has emerged as a key paradigm in privacy-preserving computing due to its “data usable but not visible” property, enabling users to collaboratively train models without sharing raw data. Motivated by this, federated recommendation systems offer a promising architecture that balances user privacy with recommendation accuracy through distributed collaborative learning. However, existing federated recommendation methods often neglect the underlying semantic or behavioral relationships between users during parameter aggregation, which limits their recommendation effectiveness. To overcome this limitation, graph-based federated recommendation systems have been proposed to leverage neighborhood information. Yet, conventional graph construction methods usually require access to raw user data or explicit social links, which contradicts the strict privacy requirements of federated learning. In this work, we propose UFGraphFR (User Text-feature-based Graph Federated Recommendation), a novel personalized federated recommendation framework that constructs a user graph based on clients’ locally embedded text features. Our core assumption is that users with similar textual feature descriptions exhibit similar preferences. Accordingly, UFGraphFR introduces two key components: (1) a privacy-preserving user relationship graph constructed from the joint embedding layer’s weight matrix without leaking raw user attributes; (2) a Transformer-based architecture to model temporal dependencies in user-item interaction sequences. Experimental results on benchmark datasets such as MovieLens and HetRec2011 demonstrate that UFGraphFR achieves recommendation accuracy comparable to both centralized and state-of-the-art federated baselines while preserving user privacy. The code is available at: https://github.com/trueWangSyutung/UFGraphFR.

nan


Article 537

Title@2025-07-01 (2): Physics-Aware Style Transfer for Adaptive Holographic Reconstruction

Title: Physics-Aware Style Transfer for Adaptive Holographic Reconstruction Physics-Aware Style Transfer für adaptive holographische Rekonstruktion 用于适应性全球重建的物理软件样式转让 2507.00482v1

Authors (4): Chanseok Lee, Fakhriyya Mammadova, Jiseong Barg, Mooseok Jang

Inline holographic imaging presents an ill-posed inverse problem of reconstructing objects’ complex amplitude from recorded diffraction patterns. Although recent deep learning approaches have shown promise over classical phase retrieval algorithms, they often require high-quality ground truth datasets of complex amplitude maps to achieve a statistical inverse mapping operation between the two domains. Here, we present a physics-aware style transfer approach that interprets the object-to-sensor distance as an implicit style within diffraction patterns. Using the style domain as the intermediate domain to construct cyclic image translation, we show that the inverse mapping operation can be learned in an adaptive manner only with datasets composed of intensity measurements. We further demonstrate its biomedical applicability by reconstructing the morphology of dynamically flowing red blood cells, highlighting its potential for real-time, label-free imaging. As a framework that leverages physical cues inherently embedded in measurements, the presented method offers a practical learning strategy for imaging applications where ground truth is difficult or impossible to obtain.

nan


Article 538

Title@2025-07-01 (2): Posterior Inference in Latent Space for Scalable Constrained Black-box Optimization

Title: Posterior Inference in Latent Space for Scalable Constrained Black-box Optimization Hintere Inferenz im Latent Space für skalierbare, eingeschränkte Black-Box-Optimierung 缓空空间中可缩缩的黑盒优化 2507.00480v1

Authors (5): Kiyoung Om, Kyuil Sim, Taeyoung Yun, Hyeongyu Kang, Jinkyoo Park

Optimizing high-dimensional black-box functions under black-box constraints is a pervasive task in a wide range of scientific and engineering problems. These problems are typically harder than unconstrained problems due to hard-to-find feasible regions. While Bayesian optimization (BO) methods have been developed to solve such problems, they often struggle with the curse of dimensionality. Recently, generative model-based approaches have emerged as a promising alternative for constrained optimization. However, they suffer from poor scalability and are vulnerable to mode collapse, particularly when the target distribution is highly multi-modal. In this paper, we propose a new framework to overcome these challenges. Our method iterates through two stages. First, we train flow-based models to capture the data distribution and surrogate models that predict both function values and constraint violations with uncertainty quantification. Second, we cast the candidate selection problem as a posterior inference problem to effectively search for promising candidates that have high objective values while not violating the constraints. During posterior inference, we find that the posterior distribution is highly multi-modal and has a large plateau due to constraints, especially when constraint feedback is given as binary indicators of feasibility. To mitigate this issue, we amortize the sampling from the posterior distribution in the latent space of flow-based models, which is much smoother than that in the data space. We empirically demonstrate that our method achieves superior performance on various synthetic and real-world constrained black-box optimization tasks. Our code is publicly available \href{https://github.com/umkiyoung/CiBO}{here}.

nan


Article 539

Title@2025-07-01 (2): Bisecle: Binding and Separation in Continual Learning for Video Language Understanding

Title: Bisecle: Binding and Separation in Continual Learning for Video Language Understanding Bisecle: Bindung und Trennung im kontinuierlichen Lernen für das Verständnis von Videosprachen Bisecle:为了解视频语言而持续学习中的约束和分离 2507.00469v1

Authors (5): Yue Tan, Xiaoqian Hu, Hao Xue, Celso De Melo, Flora D. Salim

Frontier vision-language models (VLMs) have made remarkable improvements in video understanding tasks. However, real-world videos typically exist as continuously evolving data streams (e.g., dynamic scenes captured by wearable glasses), necessitating models to continually adapt to shifting data distributions and novel scenarios. Considering the prohibitive computational costs of fine-tuning models on new tasks, usually, a small subset of parameters is updated while the bulk of the model remains frozen. This poses new challenges to existing continual learning frameworks in the context of large multimodal foundation models, i.e., catastrophic forgetting and update conflict. While the foundation models struggle with parameter-efficient continual learning, the hippocampus in the human brain has evolved highly efficient mechanisms for memory formation and consolidation. Inspired by the rapid Binding and pattern separation mechanisms in the hippocampus, in this work, we propose Bisecle for video-language continual learning, where a multi-directional supervision module is used to capture more cross-modal relationships and a contrastive prompt learning scheme is designed to isolate task-specific knowledge to facilitate efficient memory storage. Binding and separation processes further strengthen the ability of VLMs to retain complex experiences, enabling robust and efficient continual learning in video understanding tasks. We perform a thorough evaluation of the proposed Bisecle, demonstrating its ability to mitigate forgetting and enhance cross-task generalization on several VideoQA benchmarks.

nan


Article 540

Title@2025-07-01 (2): Diversity Conscious Refined Random Forest

Title: Diversity Conscious Refined Random Forest Diversity Bewusst Verfeinerter Zufallswald 改良的随机森林 2507.00467v1

Authors (5): Sijan Bhattarai, Saurav Bhandari, Girija Bhusal, Saroj Shakya, Tapendra Pandey

Random Forest (RF) is a widely used ensemble learning technique known for its robust classification performance across diverse domains. However, it often relies on hundreds of trees and all input features, leading to high inference cost and model redundancy. In this work, our goal is to grow trees dynamically only on informative features and then enforce maximal diversity by clustering and retaining uncorrelated trees. Therefore, we propose a Refined Random Forest Classifier that iteratively refines itself by first removing the least informative features and then analytically determines how many new trees should be grown, followed by correlation-based clustering to remove redundant trees. The classification accuracy of our model was compared against the standard RF on the same number of trees. Experiments on 8 multiple benchmark datasets, including binary and multiclass datasets, demonstrate that the proposed model achieves improved accuracy compared to standard RF.

nan


Article 541

Title@2025-07-01 (2): Recurrent Memory-Augmented Transformers with Chunked Attention for Long-Context Language Modeling

Title: Recurrent Memory-Augmented Transformers with Chunked Attention for Long-Context Language Modeling Recurrent Memory-Augmented Transformers mit Chunked Aufmerksamkeit für Lang-Kontext-Sprachenmodellierung 经常用内存增强型变换器,吸引人们注意长文语言建模 2507.00453v1

Authors (1): Ankit Kashyap

We present a Transformer architecture for long-context language modeling that combines global attention with two biologically inspired components: chunked local attention and a gated FIFO memory mechanism. This unified attention block allows the model to efficiently handle both short-range and long-range dependencies without increasing attention cost quadratically. The memory module persistently stores past token representations using a gated update mechanism inspired by recurrent networks. Rotary positional encoding is applied per attention head to enable directionally disentangled, scale-invariant positional signals. The architecture is implemented entirely from scratch in PyTorch, with no reliance on high-level libraries, enabling transparent and modular experimentation. Our model offers a lightweight and extensible design for tasks such as dialogue modeling, code completion, and document understanding.

nan


Article 542

Title@2025-07-01 (2): Best Agent Identification for General Game Playing

Title: Best Agent Identification for General Game Playing Best Agent Identification für das allgemeine Spiel 普通游戏游戏最佳代理标识 2507.00451v1

Authors (4): Matthew Stephenson, Alex Newcombe, Eric Piette, Dennis Soemers

We present an efficient and generalised procedure to accurately identify the best performing algorithm for each sub-task in a multi-problem domain. Our approach treats this as a set of best arm identification problems for multi-armed bandits, where each bandit corresponds to a specific task and each arm corresponds to a specific algorithm or agent. We propose an optimistic selection process based on the Wilson score interval (Optimistic-WS) that ranks each arm across all bandits in terms of their potential regret reduction. We evaluate the performance of Optimistic-WS on two of the most popular general game domains, the General Video Game AI (GVGAI) framework and the Ludii general game playing system, with the goal of identifying the highest performing agent for each game within a limited number of trials. Compared to previous best arm identification algorithms for multi-armed bandits, our results demonstrate a substantial performance improvement in terms of average simple regret. This novel approach can be used to significantly improve the quality and accuracy of agent evaluation procedures for general game frameworks, as well as other multi-task domains with high algorithm runtimes.

nan


Article 543

Title@2025-07-01 (2): Overcoming Long-Context Limitations of State-Space Models via Context-Dependent Sparse Attention

Title: Overcoming Long-Context Limitations of State-Space Models via Context-Dependent Sparse Attention Überwinden von Langkontext-Grenzen von State-Space-Modellen über Kontext-Abhängige Sparse-Achtung 克服国家空间模型通过环境依赖性分散关注而克服国家空间模型的长文限制 2507.00449v1

Authors (4): Zhihao Zhan, Jianan Zhao, Zhaocheng Zhu, Jian Tang

Efficient long-context modeling remains a critical challenge for natural language processing (NLP), as the time complexity of the predominant Transformer architecture scales quadratically with the sequence length. While state-space models (SSMs) offer alternative sub-quadratic solutions, they struggle to capture long-range dependencies effectively. In this work, we focus on analyzing and improving the long-context modeling capabilities of SSMs. We show that the widely used synthetic task, associative recall, which requires a model to recall a value associated with a single key without context, insufficiently represents the complexities of real-world long-context modeling. To address this limitation, we extend the associative recall to a novel synthetic task, \emph{joint recall}, which requires a model to recall the value associated with a key given in a specified context. Theoretically, we prove that SSMs do not have the expressiveness to solve multi-query joint recall in sub-quadratic time complexity. To resolve this issue, we propose a solution based on integrating SSMs with Context-Dependent Sparse Attention (CDSA), which has the expressiveness to solve multi-query joint recall with sub-quadratic computation. To bridge the gap between theoretical analysis and real-world applications, we propose locality-sensitive Hashing Attention with sparse Key Selection (HAX), which instantiates the theoretical solution and is further tailored to natural language domains. Extensive experiments on both synthetic and real-world long-context benchmarks show that HAX consistently outperforms SSM baselines and SSMs integrated with context-independent sparse attention (CISA).

nan


Article 544

Title@2025-07-01 (2): Iterative Distillation for Reward-Guided Fine-Tuning of Diffusion Models in Biomolecular Design

Title: Iterative Distillation for Reward-Guided Fine-Tuning of Diffusion Models in Biomolecular Design iterative Destillation für reward-guided Feintuning von Diffusionsmodellen im biomolekularen Design 生物分子设计中扩散模型的奖励-引导精细推荐热蒸馏 2507.00445v1

Authors (10): Xingyu Su, Xiner Li, Masatoshi Uehara, Sunwoo Kim, Yulai Zhao, Gabriele Scalia, Ehsan Hajiramezanali, Tommaso Biancalani, Degui Zhi, Shuiwang Ji

We address the problem of fine-tuning diffusion models for reward-guided generation in biomolecular design. While diffusion models have proven highly effective in modeling complex, high-dimensional data distributions, real-world applications often demand more than high-fidelity generation, requiring optimization with respect to potentially non-differentiable reward functions such as physics-based simulation or rewards based on scientific knowledge. Although RL methods have been explored to fine-tune diffusion models for such objectives, they often suffer from instability, low sample efficiency, and mode collapse due to their on-policy nature. In this work, we propose an iterative distillation-based fine-tuning framework that enables diffusion models to optimize for arbitrary reward functions. Our method casts the problem as policy distillation: it collects off-policy data during the roll-in phase, simulates reward-based soft-optimal policies during roll-out, and updates the model by minimizing the KL divergence between the simulated soft-optimal policy and the current model policy. Our off-policy formulation, combined with KL divergence minimization, enhances training stability and sample efficiency compared to existing RL-based methods. Empirical results demonstrate the effectiveness and superior reward optimization of our approach across diverse tasks in protein, small molecule, and regulatory DNA design.

nan


Article 545

Title@2025-07-01 (2): A Minimalist Method for Fine-tuning Text-to-Image Diffusion Models

Title: A Minimalist Method for Fine-tuning Text-to-Image Diffusion Models Minimalistische Methode zur Feinabstimmung von Text-zu-Bild-Diffusions-Modellen 微微调文本到图像传播模型的微量微调方法 2506.12036v3

Authors (4): Yanting Miao, William Loh, Pacal Poupart, Suraj Kothawade

Recent work uses reinforcement learning (RL) to fine-tune text-to-image diffusion models, improving text-image alignment and sample quality. However, existing approaches introduce unnecessary complexity: they cache the full sampling trajectory, depend on differentiable reward models or large preference datasets, or require specialized guidance techniques. Motivated by the “golden noise” hypothesis – that certain initial noise samples can consistently yield superior alignment – we introduce Noise PPO, a minimalist RL algorithm that leaves the pre-trained diffusion model entirely frozen and learns a prompt-conditioned initial noise generator. Our approach requires no trajectory storage, reward backpropagation, or complex guidance tricks. Extensive experiments show that optimizing the initial noise distribution consistently improves alignment and sample quality over the original model, with the most significant gains at low inference steps. As the number of inference steps increases, the benefit of noise optimization diminishes but remains present. These findings clarify the scope and limitations of the golden noise hypothesis and reinforce the practical value of minimalist RL fine-tuning for diffusion models.

nan


Article 546

Title@2025-07-01 (2): A Recipe for Causal Graph Regression: Confounding Effects Revisited

Title: A Recipe for Causal Graph Regression: Confounding Effects Revisited Ein Rezept für kausale Graphenregression: Konfounding Effekte wiederaufgenommen Causal 图表回归的食谱: 重审的困扰效应 2507.00440v1

Authors (4): Yujia Yin, Tianyi Qu, Zihao Wang, Yifan Chen

Through recognizing causal subgraphs, causal graph learning (CGL) has risen to be a promising approach for improving the generalizability of graph neural networks under out-of-distribution (OOD) scenarios. However, the empirical successes of CGL techniques are mostly exemplified in classification settings, while regression tasks, a more challenging setting in graph learning, are overlooked. We thus devote this work to tackling causal graph regression (CGR); to this end we reshape the processing of confounding effects in existing CGL studies, which mainly deal with classification. Specifically, we reflect on the predictive power of confounders in graph-level regression, and generalize classification-specific causal intervention techniques to regression through a lens of contrastive learning. Extensive experiments on graph OOD benchmarks validate the efficacy of our proposals for CGR. The model implementation and the code are provided on https://github.com/causal-graph/CGR.

nan


Article 547

Title@2025-07-01 (2): Autonomy by Design: Preserving Human Autonomy in AI Decision-Support

Title: Autonomy by Design: Preserving Human Autonomy in AI Decision-Support Autonomie nach Design: Erhaltung der menschlichen Autonomie in der KI-Entscheidungsunterstützung 设计自主:在AI决定支持中维护人类自主 2506.23952v2

Authors (3): Stefan Buijsman, Sarah Carter, Juan Pablo Bermúdez

AI systems increasingly support human decision-making across domains of professional, skill-based, and personal activity. While previous work has examined how AI might affect human autonomy globally, the effects of AI on domain-specific autonomy – the capacity for self-governed action within defined realms of skill or expertise – remain understudied. We analyze how AI decision-support systems affect two key components of domain-specific autonomy: skilled competence (the ability to make informed judgments within one’s domain) and authentic value-formation (the capacity to form genuine domain-relevant values and preferences). By engaging with prior investigations and analyzing empirical cases across medical, financial, and educational domains, we demonstrate how the absence of reliable failure indicators and the potential for unconscious value shifts can erode domain-specific autonomy both immediately and over time. We then develop a constructive framework for autonomy-preserving AI support systems. We propose specific socio-technical design patterns – including careful role specification, implementation of defeater mechanisms, and support for reflective practice – that can help maintain domain-specific autonomy while leveraging AI capabilities. This framework provides concrete guidance for developing AI systems that enhance rather than diminish human agency within specialized domains of action.

nan


Article 548

Title@2025-07-01 (2): Both Asymptotic and Non-Asymptotic Convergence of Quasi-Hyperbolic Momentum using Increasing Batch Size

Title: Both Asymptotic and Non-Asymptotic Convergence of Quasi-Hyperbolic Momentum using Increasing Batch Size Sowohl asymptotische als auch nicht-asymptotische Konvergenz von Quasi-Hyperbolic Momentum mit zunehmender Batchgröße 使用递增批量大小的 准-双曲调运动的无症状和非症状融合 2506.23544v2

Authors (2): Kento Imaizumi, Hideaki Iiduka

Momentum methods were originally introduced for their superiority to stochastic gradient descent (SGD) in deterministic settings with convex objective functions. However, despite their widespread application to deep neural networks – a representative case of stochastic nonconvex optimization – the theoretical justification for their effectiveness in such settings remains limited. Quasi-hyperbolic momentum (QHM) is an algorithm that generalizes various momentum methods and has been studied to better understand the class of momentum-based algorithms as a whole. In this paper, we provide both asymptotic and non-asymptotic convergence results for mini-batch QHM with an increasing batch size. We show that achieving asymptotic convergence requires either a decaying learning rate or an increasing batch size. Since a decaying learning rate adversely affects non-asymptotic convergence, we demonstrate that using mini-batch QHM with an increasing batch size – without decaying the learning rate – can be a more effective strategy. Our experiments show that even a finite increase in batch size can provide benefits for training neural networks.

nan


Article 549

Title@2025-07-01 (2): RadZero: Similarity-Based Cross-Attention for Explainable Vision-Language Alignment in Radiology with Zero-Shot Multi-Task Capability

Title: RadZero: Similarity-Based Cross-Attention for Explainable Vision-Language Alignment in Radiology with Zero-Shot Multi-Task Capability RadZero: Ähnlichkeitsbasierte Cross-Attention für erklärbare Vision-Sprachenausrichtung in der Radiologie mit Zero-Shot-Multi-Task-Fähigkeit RadZero:在无热多任务能力的放射学中,对可解释的视觉-语言协调进行基于相似的交叉关注 2504.07416v2

Authors (4): Jonggwon Park, Soobum Kim, Byungmu Yoon, Kyoyun Choi

Recent advancements in multi-modal models have significantly improved vision-language (VL) alignment in radiology. However, existing approaches struggle to effectively utilize complex radiology reports for learning and offer limited interpretability through attention probability visualizations. To address these challenges, we introduce RadZero, a novel framework for VL alignment in radiology with zero-shot multi-task capability. A key component of our approach is VL-CABS (Vision-Language Cross-Attention Based on Similarity), which aligns text embeddings with local image features for interpretable, fine-grained VL reasoning. RadZero leverages large language models to extract concise semantic sentences from radiology reports and employs multi-positive contrastive training to effectively capture relationships between images and multiple relevant textual descriptions. It uses a pre-trained vision encoder with additional trainable Transformer layers, allowing efficient high-resolution image processing. By computing similarity between text embeddings and local image patch features, VL-CABS enables zero-shot inference with similarity probability for classification, and pixel-level VL similarity maps for grounding and segmentation. Experimental results on public chest radiograph benchmarks show that RadZero outperforms state-of-the-art methods in zero-shot classification, grounding, and segmentation. Furthermore, VL similarity map analysis highlights the potential of VL-CABS for improving explainability in VL alignment. Additionally, qualitative evaluation demonstrates RadZero’s capability for open-vocabulary semantic segmentation, further validating its effectiveness in medical imaging.

nan


Article 550

Title@2025-07-01 (2): Flexible Language Modeling in Continuous Space with Transformer-based Autoregressive Flows

Title: Flexible Language Modeling in Continuous Space with Transformer-based Autoregressive Flows Flexible Sprachmodellierung im kontinuierlichen Raum mit transformerbasierten autoregressiven Strömungen 具有以变换器为基础的自动递减流动的连续空间灵活语言建模 2507.00425v1

Authors (9): Ruixiang Zhang, Shuangfei Zhai, Jiatao Gu, Yizhe Zhang, Huangjie Zheng, Tianrong Chen, Miguel Angel Bautista, Josh Susskind, Navdeep Jaitly

Autoregressive models have driven remarkable progress in language modeling. Their foundational reliance on discrete tokens, unidirectional context, and single-pass decoding, while central to their success, also inspires the exploration of a design space that could offer new axes of modeling flexibility. In this work, we explore an alternative paradigm, shifting language modeling from a discrete token space to a continuous latent space. We propose a novel framework TarFlowLM, that employs transformer-based autoregressive normalizing flows to model these continuous representations. This approach unlocks substantial flexibility, enabling the construction of models that can capture global bi-directional context through stacked, alternating-direction autoregressive transformations, support block-wise generation with flexible token patch sizes, and facilitate a hierarchical multi-pass generation process. We further propose new mixture-based coupling transformations designed to capture complex dependencies within the latent space shaped by discrete data, and demonstrate theoretical connections to conventional discrete autoregressive models. Extensive experiments on language modeling benchmarks demonstrate strong likelihood performance and highlight the flexible modeling capabilities inherent in our framework.

nan


Article 551

Title@2025-07-01 (2): Find a Scapegoat: Poisoning Membership Inference Attack and Defense to Federated Learning

Title: Find a Scapegoat: Poisoning Membership Inference Attack and Defense to Federated Learning Find a Scapegoat: Vergiftung der Mitgliedschaft Inferenzangriff und Verteidigung zu Federated Learning 寻找一条“Scamegoat”:毒瘾成员攻击和防御联邦学习组织 2507.00423v1

Authors (4): Wenjin Mo, Zhiyuan Li, Minghong Fang, Mingwei Fang

Federated learning (FL) allows multiple clients to collaboratively train a global machine learning model with coordination from a central server, without needing to share their raw data. This approach is particularly appealing in the era of privacy regulations like the GDPR, leading many prominent companies to adopt it. However, FL’s distributed nature makes it susceptible to poisoning attacks, where malicious clients, controlled by an attacker, send harmful data to compromise the model. Most existing poisoning attacks in FL aim to degrade the model’s integrity, such as reducing its accuracy, with limited attention to privacy concerns from these attacks. In this study, we introduce FedPoisonMIA, a novel poisoning membership inference attack targeting FL. FedPoisonMIA involves malicious clients crafting local model updates to infer membership information. Additionally, we propose a robust defense mechanism to mitigate the impact of FedPoisonMIA attacks. Extensive experiments across various datasets demonstrate the attack’s effectiveness, while our defense approach reduces its impact to a degree.

nan


Article 552

Title@2025-07-01 (2): Diffusion Disambiguation Models for Partial Label Learning

Title: Diffusion Disambiguation Models for Partial Label Learning Diffusion Disambiguation Modelle für partielles Label-Lernen 部分标签学习的分解差异模型 2507.00411v1

Authors (5): Jinfu Fan, Xiaohui Zhong, Kangrui Ren, Jiangnan Li, Linqing Huang

Learning from ambiguous labels is a long-standing problem in practical machine learning applications. The purpose of \emph{partial label learning} (PLL) is to identify the ground-truth label from a set of candidate labels associated with a given instance. Inspired by the remarkable performance of diffusion models in various generation tasks, this paper explores their potential to denoise ambiguous labels through the reverse denoising process. Therefore, this paper reformulates the label disambiguation problem from the perspective of generative models, where labels are generated by iteratively refining initial random guesses. This perspective enables the diffusion model to learn how label information is generated stochastically. By modeling the generation uncertainty, we can use the maximum likelihood estimate of the label for classification inference. However, such ambiguous labels lead to a mismatch between instance and label, which reduces the quality of generated data. To address this issue, this paper proposes a \emph{diffusion disambiguation model for PLL} (DDMP), which first uses the potential complementary information between instances and labels to construct pseudo-clean labels for initial diffusion training. Furthermore, a transition-aware matrix is introduced to estimate the potential ground-truth labels, which are dynamically updated during the diffusion generation. During training, the ground-truth label is progressively refined, improving the classifier. Experiments show the advantage of the DDMP and its suitability for PLL.

nan


Article 553

Title@2025-07-01 (2): Parameter-Efficient Fine-Tuning via Circular Convolution

Title: Parameter-Efficient Fine-Tuning via Circular Convolution Parameter-Effizient Feintuning über Kreiskonvolution 通过循环革命提高参数效率 2407.19342v4

Authors (7): Aochuan Chen, Jiashun Cheng, Zijing Liu, Ziqi Gao, Fugee Tsung, Yu Li, Jia Li

Low-Rank Adaptation (LoRA) has gained popularity for fine-tuning large foundation models, leveraging low-rank matrices $\mathbf{A}$ and $\mathbf{B}$ to represent weight changes (i.e., $\Delta \mathbf{W} = \mathbf{B} \mathbf{A}$). This method reduces trainable parameters and mitigates heavy memory consumption associated with full delta matrices by sequentially multiplying $\mathbf{A}$ and $\mathbf{B}$ with the activation. Despite its success, the intrinsic low-rank characteristic may limit its performance. Although several variants have been proposed to address this issue, they often overlook the crucial computational and memory efficiency brought by LoRA. In this paper, we propose Circular Convolution Adaptation (C$^3$A), which not only achieves high-rank adaptation with enhanced performance but also excels in both computational power and memory utilization. Extensive experiments demonstrate that C$^3$A consistently outperforms LoRA and its variants across various fine-tuning tasks.

nan


Article 554

Title@2025-07-01 (2): Teaching Time Series to See and Speak: Forecasting with Aligned Visual and Textual Perspectives

Title: Teaching Time Series to See and Speak: Forecasting with Aligned Visual and Textual Perspectives Lehrende Zeitreihen zum Sehen und Sprechen: Vorhersagen mit gerichteten visuellen und textuellen Perspektiven 阅读和发言教学时间系列:以一致的视觉和文字视角预测 2506.24124v2

Authors (4): Sixun Dong, Wei Fan, Teresa Wu, Yanjie Fu

Time series forecasting traditionally relies on unimodal numerical inputs, which often struggle to capture high-level semantic patterns due to their dense and unstructured nature. While recent approaches have explored representing time series as text using large language models (LLMs), these methods remain limited by the discrete nature of token sequences and lack the perceptual intuition humans typically apply, such as interpreting visual patterns. In this paper, we propose a multimodal contrastive learning framework that transforms raw time series into structured visual and textual perspectives. Rather than using natural language or real-world images, we construct both modalities directly from numerical sequences. We then align these views in a shared semantic space via contrastive learning, enabling the model to capture richer and more complementary representations. Furthermore, we introduce a variate selection module that leverages the aligned representations to identify the most informative variables for multivariate forecasting. Extensive experiments on fifteen short-term and six long-term forecasting benchmarks demonstrate that our approach consistently outperforms strong unimodal and cross-modal baselines, highlighting the effectiveness of multimodal alignment in enhancing time series forecasting. Code is available at: https://github.com/Ironieser/TimesCLIP.

nan


Article 555

Title@2025-07-01 (2): GRAND: Graph Release with Assured Node Differential Privacy

Title: GRAND: Graph Release with Assured Node Differential Privacy GRAND: Graph Release mit gesichertem Node Differential Privacy GRAND: 带有有保证的节点差异隐私的图表发布 2507.00402v1

Authors (3): Suqing Liu, Xuan Bi, Tianxi Li

Differential privacy is a well-established framework for safeguarding sensitive information in data. While extensively applied across various domains, its application to network data – particularly at the node level – remains underexplored. Existing methods for node-level privacy either focus exclusively on query-based approaches, which restrict output to pre-specified network statistics, or fail to preserve key structural properties of the network. In this work, we propose GRAND (Graph Release with Assured Node Differential privacy), which is, to the best of our knowledge, the first network release mechanism that releases entire networks while ensuring node-level differential privacy and preserving structural properties. Under a broad class of latent space models, we show that the released network asymptotically follows the same distribution as the original network. The effectiveness of the approach is evaluated through extensive experiments on both synthetic and real-world datasets.

nan


Article 556

Title@2025-07-01 (2): Few-shot Classification as Multi-instance Verification: Effective Backbone-agnostic Transfer across Domains

Title: Few-shot Classification as Multi-instance Verification: Effective Backbone-agnostic Transfer across Domains Wenige Schuss-Klassifikation als Multi-Instance-Verifizierung: Effektive Backbone-agnostische Übertragung über Domains hinweg ” 多功能核查 “ 分类:跨域的有效后骨-不可知转让 2507.00401v1

Authors (3): Xin Xu, Eibe Frank, Geoffrey Holmes

We investigate cross-domain few-shot learning under the constraint that fine-tuning of backbones (i.e., feature extractors) is impossible or infeasible – a scenario that is increasingly common in practical use cases. Handling the low-quality and static embeddings produced by frozen, “black-box” backbones leads to a problem representation of few-shot classification as a series of multiple instance verification (MIV) tasks. Inspired by this representation, we introduce a novel approach to few-shot domain adaptation, named the “MIV-head”, akin to a classification head that is agnostic to any pretrained backbone and computationally efficient. The core components designed for the MIV-head, when trained on few-shot data from a target domain, collectively yield strong performance on test data from that domain. Importantly, it does so without fine-tuning the backbone, and within the “meta-testing” phase. Experimenting under various settings and on an extension of the Meta-dataset benchmark for cross-domain few-shot image classification, using representative off-the-shelf convolutional neural network and vision transformer backbones pretrained on ImageNet1K, we show that the MIV-head achieves highly competitive accuracy when compared to state-of-the-art “adapter” (or partially fine-tuning) methods applied to the same backbones, while incurring substantially lower adaptation cost. We also find well-known “classification head” approaches lag far behind in terms of accuracy. Ablation study empirically justifies the core components of our approach. We share our code at https://github.com/xxweka/MIV-head.

nan


Article 557

Title@2025-07-01 (2): Two-Stage Regularization-Based Structured Pruning for LLMs

Title: Two-Stage Regularization-Based Structured Pruning for LLMs Zweistufiges Regularisierungs-basierendes strukturiertes Pruning für LLMs LLMM 双级正规化和结构化 2505.18232v2

Authors (9): Mingkuan Feng, Jinyang Wu, Siyuan Liu, Shuai Zhang, Ruihan Jin, Feihu Che, Pengpeng Shao, Zhengqi Wen, Jianhua Tao

The deployment of large language models (LLMs) is largely hindered by their large number of parameters. Structural pruning has emerged as a promising solution. Prior structured pruning methods directly remove unimportant parameters based on certain metrics, which often causes knowledge loss and necessitates extensive retraining. To overcome this, we introduce a novel pruning method TRSP: Two-Stage Regularization-Based Structured Pruning for LLMs. Specifically, we multiply the output of each transformer layer by an initial learnable weight and iteratively learn these weights by adding their $\ell_1$-norm as a regularization term to the loss function, serving as the first-stage regularization. Subsequently, we apply additional regularization to the difference between the output and input of layers with smaller weights, encouraging the shift of knowledge to the preserved layers. This serves as the second-stage regularization. TRSP retains more knowledge and better preserves model performance than direct parameter elimination. Through extensive experimentation we show that TRSP outperforms strong layer-wise structured pruning methods without requiring retraining. As a layer-wise pruning method, it delivers notable end-to-end acceleration, making it a promising solution for efficient LLM deployment.

nan


Article 558

Title@2025-07-01 (2): PEAKS: Selecting Key Training Examples Incrementally via Prediction Error Anchored by Kernel Similarity

Title: PEAKS: Selecting Key Training Examples Incrementally via Prediction Error Anchored by Kernel Similarity PEAKS: Auswahl von Schlüsseltrainingsbeispielen inkrementell über Vorhersagefehler durch Kernel-Ähnlichkeit verankert EAKS:通过预测错误逐步选择关键培训实例 2504.05250v4

Authors (3): Mustafa Burak Gurbuz, Xingyu Zheng, Constantine Dovrolis

As deep learning continues to be driven by ever-larger datasets, understanding which examples are most important for generalization has become a critical question. While progress in data selection continues, emerging applications require studying this problem in dynamic contexts. To bridge this gap, we pose the Incremental Data Selection (IDS) problem, where examples arrive as a continuous stream, and need to be selected without access to the full data source. In this setting, the learner must incrementally build a training dataset of predefined size while simultaneously learning the underlying task. We find that in IDS, the impact of a new sample on the model state depends fundamentally on both its geometric relationship in the feature space and its prediction error. Leveraging this insight, we propose PEAKS (Prediction Error Anchored by Kernel Similarity), an efficient data selection method tailored for IDS. Our comprehensive evaluations demonstrate that PEAKS consistently outperforms existing selection strategies. Furthermore, PEAKS yields increasingly better performance returns than random selection as training data size grows on real-world datasets. The code is available at https://github.com/BurakGurbuz97/PEAKS.

nan


Article 559

Title@2025-07-01 (2): HelixPipe: Efficient Distributed Training of Long Sequence Transformers with Attention Parallel Pipeline Parallelism

Title: HelixPipe: Efficient Distributed Training of Long Sequence Transformers with Attention Parallel Pipeline Parallelism HelixPipe: Effizientes Training von langen Sequenztransformatoren mit Aufmerksamkeit Paralleler Pipeline-Parallelismus HelixPipe:对长序列变异器进行有效分布式培训,注意平行管道平行平行平行 2507.00394v1

Authors (5): Geng Zhang, Shenggan Cheng, Xuanlei Zhao, Ziming Liu, Yang You

As transformer sequence lengths grow, existing pipeline parallelisms incur suboptimal performance due to the quadratic attention computation and the substantial memory overhead. To relieve these challenges, we propose HelixPipe, a novel pipeline parallelism for long sequence transformer training. First, HelixPipe introduces attention parallel partition, which schedules attention computations of different micro batches across different pipeline stages in parallel, reducing pipeline bubbles. Second, it employs a two-fold first-in-last-out micro batch schedule to balance memory usage and overlap communication with computation. Additionally, HelixPipe utilizes recomputation without attention and chunked MLP to mitigate fragmentation and enable longer sequences. Experiments demonstrate that HelixPipe gains increasing advantages with longer sequence lengths, and outperforms existing methods in throughput and scalability across varying pipeline sizes, model sizes, and cluster configurations. Notably, it achieves a 26\% speedup over baseline methods when training a 7B model with 128k sequence length on 64 H20 GPUs. Code is available at https://github.com/code-tunnel/Megatron-LM/tree/dev.

nan


Article 560

Title@2025-07-01 (2): Towards Large-Scale In-Context Reinforcement Learning by Meta-Training in Randomized Worlds

Title: Towards Large-Scale In-Context Reinforcement Learning by Meta-Training in Randomized Worlds Auf dem Weg zu groß angelegtem In-Context-Verstärkungslernen durch Meta-Training in Randomized Worlds 通过在随机化世界中进行元培训进行大规模加强学习 2502.02869v2

Authors (9): Fan Wang, Pengtao Shao, Yiming Zhang, Bo Yu, Shaoshan Liu, Ning Ding, Yang Cao, Yu Kang, Haifeng Wang

In-Context Reinforcement Learning (ICRL) enables agents to learn automatically and on-the-fly from their interactive experiences. However, a major challenge in scaling up ICRL is the lack of scalable task collections. To address this, we propose the procedurally generated tabular Markov Decision Processes, named AnyMDP. Through a carefully designed randomization process, AnyMDP is capable of generating high-quality tasks on a large scale while maintaining relatively low structural biases. To facilitate efficient meta-training at scale, we further introduce step-wise supervision and induce prior information in the ICRL framework.Our results demonstrate that, with a sufficiently large scale of AnyMDP tasks, the proposed model can generalize to tasks that were not considered in the training set. The scalable task set provided by AnyMDP also enables a more thorough empirical investigation of the relationship between data distribution and ICRL performance. We further show that the generalization of ICRL potentially comes at the cost of increased task diversity and longer adaptation periods. This finding carries critical implications for scaling robust ICRL capabilities, highlighting the necessity of diverse and extensive task design, and prioritizing asymptotic performance over few-shot adaptation.

nan


Article 561

Title@2025-07-01 (2): MoNE: Replacing Redundant Experts with Lightweight Novices for Structured Pruning of MoE

Title: MoNE: Replacing Redundant Experts with Lightweight Novices for Structured Pruning of MoE MoNE: Redundante Experten mit leichten Novizen für strukturiertes Beschneiden von MoE ersetzen MONE:将冗余专家替换为教育部结构调节的轻量级新手 2507.00390v1

Authors (6): Geng Zhang, Yuxuan Han, Yuxuan Lou, Wangbo Zhao, Yiqi Zhang, Yang You

Mixture-of-Experts (MoE) enables efficient scaling of large language models by activating only a subset of experts per input token. However, deploying MoE-based models incurs significant memory overhead due to the need to retain all experts in memory. While structured pruning is promising to reduce memory costs, existing methods often show suboptimal performance and unstable degradation in three dimensions: model architectures, calibration data sources, and calibration sample sizes. This paper proposes Mixture-of-Novices-and-Experts (MoNE), a novel expert pruning method that replaces redundant experts with lightweight novices to achieve effective and robust model compression. MoNE evaluates expert redundancy based on two metrics: access frequency and output variance. Experts exhibiting low usage and stable outputs are pruned and replaced with lightweight novices-unbiased estimations of their original outputs-minimizing performance degradation. Extensive experiments demonstrate that MoNE consistently outperforms baseline methods with minimal accuracy degradation across the three dimensions, confirming its effectiveness and robustness. Notably, it improves the average zero shot accuracy across nine downstream tasks by up to 2.71 under 25\% pruning ratio and 3.61 under 50\% pruning. The code is available at https://github.com/zxgx/mode-pd.

nan


Article 562

Title@2025-07-01 (2): BlockDialect: Block-wise Fine-grained Mixed Format Quantization for Energy-Efficient LLM Inference

Title: BlockDialect: Block-wise Fine-grained Mixed Format Quantization for Energy-Efficient LLM Inference BlockDialekt: Blockweise feinkörnige Mischformat-Quantisierung für energieeffiziente LLM-Inferenz BlockDiaect: 节能LLM 推论的粗件精细混合格式量化 2501.01144v4

Authors (2): Wonsuk Jang, Thierry Tambe

The rapidly increasing size of large language models (LLMs) presents significant challenges in memory usage and computational costs. Quantizing both weights and activations can address these issues, with hardware-supported fine-grained scaling emerging as a promising solution to mitigate outliers. However, existing methods struggle to capture nuanced block data distributions. We propose BlockDialect, a block-wise fine-grained mixed format technique that assigns a per-block optimal number format from a formatbook for better data representation. Additionally, we introduce DialectFP4, a formatbook of FP4 variants (akin to dialects) that adapt to diverse data distributions. To leverage this efficiently, we propose a two-stage approach for online DialectFP4 activation quantization. Importantly, DialectFP4 ensures energy efficiency by selecting representable values as scaled integers compatible with low-precision integer arithmetic. BlockDialect achieves 10.78% (7.48%) accuracy gain on the LLaMA3-8B (LLaMA2-7B) model compared to MXFP4 format with lower bit usage per data, while being only 5.45% (2.69%) below full precision even when quantizing full-path matrix multiplication. Focusing on how to represent over how to scale, our work presents a promising path for energy-efficient LLM inference.

nan


Article 563

Title@2025-07-01 (2): SPIRAL: Self-Play on Zero-Sum Games Incentivizes Reasoning via Multi-Agent Multi-Turn Reinforcement Learning

Title: SPIRAL: Self-Play on Zero-Sum Games Incentivizes Reasoning via Multi-Agent Multi-Turn Reinforcement Learning SPIRAL: Selbst-Spiel auf Null-Sum-Spiele Anreize zur Vernunft durch Multi-Agent Multi-Turn Verstärkungs-Lernen SPIRAL: 在零桑运动会上自玩 2506.24119v2

Authors (12): Bo Liu, Leon Guertler, Simon Yu, Zichen Liu, Penghui Qi, Daniel Balcells, Mickel Liu, Cheston Tan, Weiyan Shi, Min Lin, Wee Sun Lee, Natasha Jaques

Recent advances in reinforcement learning have shown that language models can develop sophisticated reasoning through training on tasks with verifiable rewards, but these approaches depend on human-curated problem-answer pairs and domain-specific reward engineering. We introduce SPIRAL, a self-play framework where models learn by playing multi-turn, zero-sum games against continuously improving versions of themselves, eliminating the need for human supervision. Through self-play, SPIRAL generates an infinite curriculum of progressively challenging problems as models must constantly adapt to stronger opponents. To enable this self-play training at scale, We implement a fully online, multi-turn, multi-agent reinforcement learning system for LLMs and propose role-conditioned advantage estimation (RAE) to stabilize multi-agent training. Using SPIRAL, self-play on zero-sum games produces reasoning capabilities that transfer broadly. Training Qwen3-4B-Base on Kuhn Poker alone achieves 8.6% improvement on math and 8.4% on general reasoning, outperforming SFT on 25,000 expert game trajectories. Analysis reveals that this transfer occurs through three cognitive patterns: systematic decomposition, expected value calculation, and case-by-case analysis. Multi-game training (TicTacToe, Kuhn Poker, Simple Negotiation) further enhances performance as each game develops distinct reasoning strengths. Applying SPIRAL to a strong reasoning model (DeepSeek-R1-Distill-Qwen-7B) can still lead to 2.0% average improvement. These results demonstrate that zero-sum games naturally develop transferable reasoning capabilities, highlighting a promising direction for autonomous reasoning development.

nan


Article 564

Title@2025-07-01 (2): Analogical Learning for Cross-Scenario Generalization: Framework and Application to Intelligent Localization

Title: Analogical Learning for Cross-Scenario Generalization: Framework and Application to Intelligent Localization Analogisches Lernen für Cross-Szenario-Verallgemeinerung: Rahmen und Anwendung auf intelligente Lokalisierung 跨设想普遍化的模拟学习:智能本地化的框架和应用 2504.08811v2

Authors (9): Zirui Chen, Zhaoyang Zhang, Ziqing Xing, Ridong Li, Zhaohui Yang, Richeng Jin, Chongwen Huang, Yuzhi Yang, Mérouane Debbah

Existing learning models often exhibit poor generalization when deployed across diverse scenarios. It is primarily due to that the underlying reference frame of the data varies with the deployment environment and settings. However, despite that data of each scenario has a distinct reference frame, its generation generally follows common underlying physical rules. Based on this understanding, this article proposes a deep learning framework named analogical learning (AL), which implicitly retrieves the reference frame information associated with a scenario and then to make accurate prediction by relative analogy with other scenarios. Specifically, we design a bipartite neural network called Mateformer. Its first part captures the relativity within multiple latent feature spaces between the input data and a small amount of embedded data from the studied scenario, while its second part uses this relativity to guide the nonlinear analogy. We apply AL to the typical multi-scenario learning problem of intelligent wireless localization in cellular networks. Extensive experiments validate AL’s superiority across three key dimensions. First, it achieves state-of-the-art accuracy in single-scenario benchmarks. Second, it demonstrates stable transferability between different scenarios, avoiding catastrophic forgetting. Finally, and most importantly, it robustly adapts to new, unseen scenarios–including dynamic weather and traffic conditions–without any tuning. All data and code are available at https://github.com/ziruichen-research/ALLoc.

nan


Article 565

Title@2025-07-01 (2): A Good Start Matters: Enhancing Continual Learning with Data-Driven Weight Initialization

Title: A Good Start Matters: Enhancing Continual Learning with Data-Driven Weight Initialization Ein guter Start: Kontinuierliches Lernen mit datengetriebener Gewichtsinitialisierung verbessern 良好开端事项:加强持续学习,以数据驱动的重量初始化 2503.06385v2

Authors (2): Md Yousuf Harun, Christopher Kanan

To adapt to real-world data streams, continual learning (CL) systems must rapidly learn new concepts while preserving and utilizing prior knowledge. When it comes to adding new information to continually-trained deep neural networks (DNNs), classifier weights for newly encountered categories are typically initialized randomly, leading to high initial training loss (spikes) and instability. Consequently, achieving optimal convergence and accuracy requires prolonged training, increasing computational costs. Inspired by Neural Collapse (NC), we propose a weight initialization strategy to improve learning efficiency in CL. In DNNs trained with mean-squared-error, NC gives rise to a Least-Square (LS) classifier in the last layer, whose weights can be analytically derived from learned features. We leverage this LS formulation to initialize classifier weights in a data-driven manner, aligning them with the feature distribution rather than using random initialization. Our method mitigates initial loss spikes and accelerates adaptation to new tasks. We evaluate our approach in large-scale CL settings, demonstrating faster adaptation and improved CL performance.

nan


Article 566

Title@2025-07-01 (2): Neuro-Informed Joint Learning Enhances Cognitive Workload Decoding in Portable BCIs

Title: Neuro-Informed Joint Learning Enhances Cognitive Workload Decoding in Portable BCIs Neuro-informiertes gemeinsames Lernen verbessert Kognitive Workload-Dekodierung in tragbaren BCIs Neuro-内建联合学习促进可移动生物CI的认知工作量解码 2506.23458v2

Authors (3): Xiaoxiao Yang, Chao Feng, Jiancheng Chen

Portable and wearable consumer-grade electroencephalography (EEG) devices, like Muse headbands, offer unprecedented mobility for daily brain-computer interface (BCI) applications, including cognitive load detection. However, the exacerbated non-stationarity in portable EEG signals constrains data fidelity and decoding accuracy, creating a fundamental trade-off between portability and performance. To mitigate such limitation, we propose MuseCogNet (Muse-based Cognitive Network), a unified joint learning framework integrating self-supervised and supervised training paradigms. In particular, we introduce an EEG-grounded self-supervised reconstruction loss based on average pooling to capture robust neurophysiological patterns, while cross-entropy loss refines task-specific cognitive discriminants. This joint learning framework resembles the bottom-up and top-down attention in humans, enabling MuseCogNet to significantly outperform state-of-the-art methods on a publicly available Muse dataset and establish an implementable pathway for neurocognitive monitoring in ecological settings.

nan


Article 567

Title@2025-07-01 (2): Ansatz-free Hamiltonian learning with Heisenberg-limited scaling

Title: Ansatz-free Hamiltonian learning with Heisenberg-limited scaling Ansatzfreies Hamilton-Lernen mit Heisenberg-begrenzter Skalierung 无安萨兹的汉密尔顿学习 2502.11900v2

Authors (7): Hong-Ye Hu, Muzhou Ma, Weiyuan Gong, Qi Ye, Yu Tong, Steven T. Flammia, Susanne F. Yelin

Learning the unknown interactions that govern a quantum system is crucial for quantum information processing, device benchmarking, and quantum sensing. The problem, known as Hamiltonian learning, is well understood under the assumption that interactions are local, but this assumption may not hold for arbitrary Hamiltonians. Previous methods all require high-order inverse polynomial dependency with precision, unable to surpass the standard quantum limit and reach the gold standard Heisenberg-limited scaling. Whether Heisenberg-limited Hamiltonian learning is possible without prior assumptions about the interaction structures, a challenge we term \emph{ansatz-free Hamiltonian learning}, remains an open question. In this work, we present a quantum algorithm to learn arbitrary sparse Hamiltonians without any structure constraints using only black-box queries of the system’s real-time evolution and minimal digital controls to attain Heisenberg-limited scaling in estimation error. Our method is also resilient to state-preparation-and-measurement errors, enhancing its practical feasibility. We numerically demonstrate our ansatz-free protocol for learning physical Hamiltonians and validating analog quantum simulations, benchmarking our performance against the state-of-the-art Heisenberg-limited learning approach. Moreover, we establish a fundamental trade-off between total evolution time and quantum control on learning arbitrary interactions, revealing the intrinsic interplay between controllability and total evolution time complexity for any learning algorithm. These results pave the way for further exploration into Heisenberg-limited Hamiltonian learning in complex quantum systems under minimal assumptions, potentially enabling new benchmarking and verification protocols.

nan


Article 568

Title@2025-07-01 (2): Prediction of Freezing of Gait in Parkinsons Disease using Explainable AI and Federated Deep Learning for Wearable Sensors

Title: Prediction of Freezing of Gait in Parkinsons Disease using Explainable AI and Federated Deep Learning for Wearable Sensors Vorhersage der Einfrierung von Gait bei Parkinson-Krankheit mit erklärbarer KI und Federated Deep Learning für tragbare Sensoren 利用可解释的AI和穿戴感应器的联邦深层学习,预测帕金ons疾病中盖特的冷冻情况 2507.01068v1

Authors (1): Biplov Paneru

This study leverages an Inertial Measurement Unit (IMU) dataset to develop explainable AI methods for the early detection and prediction of Freezing of Gait (FOG), a common symptom in Parkinson’s disease. Machine learning models, including CatBoost, XGBoost, and Extra Trees classifiers, are employed to accurately categorize FOG episodes based on relevant clinical features. A Stacking Ensemble model achieves superior performance, surpassing a hybrid bidirectional GRU model and reaching nearly 99% classification accuracy. SHAP interpretability analysis reveals that time (seconds) is the most influential factor in distinguishing gait patterns. Additionally, the proposed FOG prediction framework incorporates federated learning, where models are trained locally on individual devices and aggregated on a central server using a federated averaging approach, utilizing a hybrid Conv1D + LSTM architecture for enhanced predictive capability.

nan


Article 569

Title@2025-07-01 (2): CoCMT: Communication-Efficient Cross-Modal Transformer for Collaborative Perception

Title: CoCMT: Communication-Efficient Cross-Modal Transformer for Collaborative Perception CoCMT: Kommunikationseffizienter Cross-Modal Transformer für kollaborative Wahrnehmung COCMT: 用于协作认知的通信效率高的跨模式变换器 2503.13504v2

Authors (5): Rujia Wang, Xiangbo Gao, Hao Xiang, Runsheng Xu, Zhengzhong Tu

Multi-agent collaborative perception enhances each agent perceptual capabilities by sharing sensing information to cooperatively perform robot perception tasks. This approach has proven effective in addressing challenges such as sensor deficiencies, occlusions, and long-range perception. However, existing representative collaborative perception systems transmit intermediate feature maps, such as bird-eye view (BEV) representations, which contain a significant amount of non-critical information, leading to high communication bandwidth requirements. To enhance communication efficiency while preserving perception capability, we introduce CoCMT, an object-query-based collaboration framework that optimizes communication bandwidth by selectively extracting and transmitting essential features. Within CoCMT, we introduce the Efficient Query Transformer (EQFormer) to effectively fuse multi-agent object queries and implement a synergistic deep supervision to enhance the positive reinforcement between stages, leading to improved overall performance. Experiments on OPV2V and V2V4Real datasets show CoCMT outperforms state-of-the-art methods while drastically reducing communication needs. On V2V4Real, our model (Top-50 object queries) requires only 0.416 Mb bandwidth, 83 times less than SOTA methods, while improving AP70 by 1.1 percent. This efficiency breakthrough enables practical collaborative perception deployment in bandwidth-constrained environments without sacrificing detection accuracy.

nan


Article 570

Title@2025-07-01 (2): Lifelong Learning of Video Diffusion Models From a Single Video Stream

Title: Lifelong Learning of Video Diffusion Models From a Single Video Stream Lebenslanges Lernen von Video-Diffusions-Modellen aus einem einzigen Video-Stream 终生学习单一视频流的视频传播模型 2406.04814v3

Authors (7): Jason Yoo, Yingchen He, Saeid Naderiparizi, Dylan Green, Gido M. van de Ven, Geoff Pleiss, Frank Wood

This work demonstrates that training autoregressive video diffusion models from a single video stream$\unicode{x2013}$resembling the experience of embodied agents$\unicode{x2013}$is not only possible, but can also be as effective as standard offline training given the same number of gradient steps. Our work further reveals that this main result can be achieved using experience replay methods that only retain a subset of the preceding video stream. To support training and evaluation in this setting, we introduce four new datasets for streaming lifelong generative video modeling: Lifelong Bouncing Balls, Lifelong 3D Maze, Lifelong Drive, and Lifelong PLAICraft, each consisting of one million consecutive frames from environments of increasing complexity.

nan


Article 571

Title@2025-07-01 (2): Neural Networks Generalize on Low Complexity Data

Title: Neural Networks Generalize on Low Complexity Data Neurale Netzwerke verallgemeinern auf Daten mit geringer Komplexität 通用低复杂度数据神经网络 2409.12446v4

Authors (2): Sourav Chatterjee, Timothy Sudijono

We show that feedforward neural networks with ReLU activation generalize on low complexity data, suitably defined. Given i.i.d.~data generated from a simple programming language, the minimum description length (MDL) feedforward neural network which interpolates the data generalizes with high probability. We define this simple programming language, along with a notion of description length of such networks. We provide several examples on basic computational tasks, such as checking primality of a natural number. For primality testing, our theorem shows the following and more. Suppose that we draw an i.i.d.~sample of $n$ numbers uniformly at random from $1$ to $N$. For each number $x_i$, let $y_i = 1$ if $x_i$ is a prime and $0$ if it is not. Then, the interpolating MDL network accurately answers, with error probability $1- O((\ln N)/n)$, whether a newly drawn number between $1$ and $N$ is a prime or not. Note that the network is not designed to detect primes; minimum description learning discovers a network which does so. Extensions to noisy data are also discussed, suggesting that MDL neural network interpolators can demonstrate tempered overfitting.

nan


Article 572

Title@2025-07-01 (2): Data-Driven Exploration for a Class of Continuous-Time Linear–Quadratic Reinforcement Learning Problems

Title: Data-Driven Exploration for a Class of Continuous-Time Linear–Quadratic Reinforcement Learning Problems Daten-getriebene Exploration für eine Klasse von kontinuierlichen-Zeit-Linear–Quadratische Verstärkung Lernprobleme 连续-时线性-重压强化学习问题分类数据探索 2507.00358v1

Authors (2): Yilie Huang, Xun Yu Zhou

We study reinforcement learning (RL) for the same class of continuous-time stochastic linear–quadratic (LQ) control problems as in \cite{huang2024sublinear}, where volatilities depend on both states and controls while states are scalar-valued and running control rewards are absent. We propose a model-free, data-driven exploration mechanism that adaptively adjusts entropy regularization by the critic and policy variance by the actor. Unlike the constant or deterministic exploration schedules employed in \cite{huang2024sublinear}, which require extensive tuning for implementations and ignore learning progresses during iterations, our adaptive exploratory approach boosts learning efficiency with minimal tuning. Despite its flexibility, our method achieves a sublinear regret bound that matches the best-known model-free results for this class of LQ problems, which were previously derived only with fixed exploration schedules. Numerical experiments demonstrate that adaptive explorations accelerate convergence and improve regret performance compared to the non-adaptive model-free and model-based counterparts.

nan


Article 573

Title@2025-07-01 (2): Transformers from Diffusion: A Unified Framework for Neural Message Passing

Title: Transformers from Diffusion: A Unified Framework for Neural Message Passing Transformer von Diffusion: Ein einheitliches Framework für neurale Message Passing 来自传播的变换器:神经信息传递的统一框架 2409.09111v4

Authors (3): Qitian Wu, David Wipf, Junchi Yan

Learning representations for structured data with certain geometries (e.g., observed or unobserved) is a fundamental challenge, wherein message passing neural networks (MPNNs) have become a de facto class of model solutions. In this paper, inspired by physical systems, we propose an energy-constrained diffusion model, which integrates the inductive bias of diffusion on manifolds with layer-wise constraints of energy minimization. We identify that the diffusion operators have a one-to-one correspondence with the energy functions implicitly descended by the diffusion process, and the finite-difference iteration for solving the energy-constrained diffusion system induces the propagation layers of various types of MPNNs operating on observed or latent structures. This leads to a unified mathematical framework for common neural architectures whose computational flows can be cast as message passing (or its special case), including MLPs, GNNs, and Transformers. Building on these insights, we devise a new class of neural message passing models, dubbed diffusion-inspired Transformers (DIFFormer), whose global attention layers are derived from the principled energy-constrained diffusion framework. Across diverse datasets ranging from real-world networks to images, texts, and physical particles, we demonstrate that the new model achieves promising performance in scenarios where the data structures are observed (as a graph), partially observed, or entirely unobserved.

nan


Article 574

Title@2025-07-01 (2): Augmented Physics-Based Li-ion Battery Model via Adaptive Ensemble Sparse Learning and Conformal Prediction

Title: Augmented Physics-Based Li-ion Battery Model via Adaptive Ensemble Sparse Learning and Conformal Prediction Erweitertes physikbasiertes Li-Ionen-Batteriemodell über Adaptives Ensemble Sparse Lernen und konforme Vorhersage 通过适应性共聚稀散学习和非正规预测,增强基于物理的利离电池模型 2507.00353v1

Authors (4): Samuel Filgueira da Silva, Mehmet Fatih Ozkan, Faissal El Idrissi, Marcello Canova

Accurate electrochemical models are essential for the safe and efficient operation of lithium-ion batteries in real-world applications such as electrified vehicles and grid storage. Reduced-order models (ROM) offer a balance between fidelity and computational efficiency but often struggle to capture complex and nonlinear behaviors, such as the dynamics in the cell voltage response under high C-rate conditions. To address these limitations, this study proposes an Adaptive Ensemble Sparse Identification (AESI) framework that enhances the accuracy of reduced-order li-ion battery models by compensating for unpredictable dynamics. The approach integrates an Extended Single Particle Model (ESPM) with an evolutionary ensemble sparse learning strategy to construct a robust hybrid model. In addition, the AESI framework incorporates a conformal prediction method to provide theoretically guaranteed uncertainty quantification for voltage error dynamics, thereby improving the reliability of the model’s predictions. Evaluation across diverse operating conditions shows that the hybrid model (ESPM + AESI) improves the voltage prediction accuracy, achieving mean squared error reductions of up to 46% on unseen data. Prediction reliability is further supported by conformal prediction, yielding statistically valid prediction intervals with coverage ratios of 96.85% and 97.41% for the ensemble models based on bagging and stability selection, respectively.

nan


Article 575

Title@2025-07-01 (2): Junk DNA Hypothesis: Pruning Small Pre-Trained Weights Irreversibly and Monotonically Impairs “Difficult” Downstream Tasks in LLMs

Title: Junk DNA Hypothesis: Pruning Small Pre-Trained Weights Irreversibly and Monotonically Impairs “Difficult” Downstream Tasks in LLMs Junk-DNA-Hypothese: Beschneiden kleiner vortrainierter Gewichte irreversibel und monoton beeinträchtigt “schwierige” Downstream-Aufgaben in LLMs 理顺低流中“困难”的“困难”下流任务(LLMS) 2310.02277v3

Authors (5): Lu Yin, Ajay Jaiswal, Shiwei Liu, Souvik Kundu, Zhangyang Wang

We present Junk DNA Hypothesis by adopting a novel task-centric angle for the pre-trained weights of large language models (LLMs). It has been believed that weights in LLMs contain significant redundancy, leading to the conception that a considerable chunk of the parameters can be removed by pruning without compromising performance. Contrary to this belief, this paper presents a counter-argument: small-magnitude weights of pre-trained model weights encode vital knowledge essential for tackling difficult downstream tasks - manifested as the monotonic relationship between the performance drop of downstream tasks across the difficulty spectrum, as we prune more pre-trained weights by magnitude. Moreover, we reveal that these seemingly inconsequential weights can result in irreparable loss of knowledge and performance degradation in difficult tasks, even when downstream continual training is allowed. Interestingly, our evaluations show that the other popular compression, namely quantization, fails to exhibit similar monotonic effect and does not as convincingly disentangle this task-difficulty information. To study formally, we introduce several quantifiable metrics to gauge the downstream task difficulty: (1) within the same task category, and (2) across different task categories. Our extensive experiments substantiate the Junk DNA Hypothesis across a diverse range of model sizes, tasks, datasets, and even pruning methods. Codes are available at: https://github.com/VITA-Group/Junk_DNA_Hypothesis.git.

nan


Article 576

Title@2025-07-01 (2): Geological and Well prior assisted full waveform inversion using conditional diffusion models

Title: Geological and Well prior assisted full waveform inversion using conditional diffusion models Geologische und gut vorher assistierte Vollwellenformumwandlung mit bedingten Diffusionsmodellen 使用有条件扩散模型,事先协助的地质和良好古学 2412.06959v2

Authors (3): Fu Wang, Xinquan Huang, Tariq Alkhalifah

Full waveform inversion (FWI) often faces challenges due to inadequate seismic observations, resulting in band-limited and geologically inaccurate inversion results. Incorporating prior information from potential velocity distributions, well-log information, and our geological knowledge and expectations can significantly improve FWI convergence to a realistic model. While diffusion-regularized FWI has shown improved performance compared to conventional FWI by incorporating the velocity distribution prior, it can benefit even more by incorporating well-log information and other geological knowledge priors. To leverage this fact, we propose a geological class and well-information prior-assisted FWI using conditional diffusion models. This method seamlessly integrates multi-modal information into FWI, simultaneously achieving data fitting and universal geologic and geophysics prior matching, which is often not achieved with traditional regularization methods. Specifically, we propose to combine conditional diffusion models with FWI, where we integrate well-log data and geological class conditions into these conditional diffusion models using classifier-free guidance for multi-modal prior matching beyond the original velocity distribution prior. Numerical experiments on the OpenFWI datasets and field marine data demonstrate the effectiveness of our method compared to conventional FWI and the unconditional diffusion-regularized FWI.

nan


Article 577

Title@2025-06-30 (1): Evaluation of a Foundational Model and Stochastic Models for Forecasting Sporadic or Spiky Production Outages of High-Performance Machine Learning Services

Title: Evaluation of a Foundational Model and Stochastic Models for Forecasting Sporadic or Spiky Production Outages of High-Performance Machine Learning Services Bewertung eines Basismodells und stochastische Modelle zur Vorhersage sporadischer oder würziger Produktionsausfälle hochleistungsfähiger Machine Learning Services 评价预测高性能机器学习服务零星或斯皮生产流出的基础模型和存储模型 2507.01067v1

Authors (1): Keun Soo Yim

Time series forecasting models have diverse real world applications (e.g., from electricity metrics to software workload). Latest foundational models trained for time series forecasting show strengths (e.g., for long sequences and in zero-shot settings). However, foundational model was not yet used for forecasting rare, spiky events, i.e., a challenging target because those are a corner case of extreme events. In this paper, we optimize a state-of-the-art foundational model to forecast sporadic or spiky production outages of high-performance machine learning services powering billions of client devices. We evaluate the forecasting errors of the foundational model compared with classical stochastic forecasting models (e.g., moving average and autoregressive). The analysis helps us understand how each of the evaluated models performs for the sporadic or spiky events. For example, it identifies the key patterns in the target data that are well tracked by the foundational model vs. each of the stochastic models. We use the models with optimal parameters to estimate a year-long outage statistics of a particular root cause with less than 6% value errors.

nan


Article 578

Title@2025-06-30 (1): ETTA: Elucidating the Design Space of Text-to-Audio Models

Title: ETTA: Elucidating the Design Space of Text-to-Audio Models ETTA: Erklärung des Designraums von Text-zu-Audio-Modellen ETTA: 说明文本到模拟模型的设计空间 2412.19351v2

Authors (6): Sang-gil Lee, Zhifeng Kong, Arushi Goel, Sungwon Kim, Rafael Valle, Bryan Catanzaro

Recent years have seen significant progress in Text-To-Audio (TTA) synthesis, enabling users to enrich their creative workflows with synthetic audio generated from natural language prompts. Despite this progress, the effects of data, model architecture, training objective functions, and sampling strategies on target benchmarks are not well understood. With the purpose of providing a holistic understanding of the design space of TTA models, we set up a large-scale empirical experiment focused on diffusion and flow matching models. Our contributions include: 1) AF-Synthetic, a large dataset of high quality synthetic captions obtained from an audio understanding model; 2) a systematic comparison of different architectural, training, and inference design choices for TTA models; 3) an analysis of sampling methods and their Pareto curves with respect to generation quality and inference speed. We leverage the knowledge obtained from this extensive analysis to propose our best model dubbed Elucidated Text-To-Audio (ETTA). When evaluated on AudioCaps and MusicCaps, ETTA provides improvements over the baselines trained on publicly available data, while being competitive with models trained on proprietary data. Finally, we show ETTA’s improved ability to generate creative audio following complex and imaginative captions – a task that is more challenging than current benchmarks.

nan


Article 579

Title@2025-06-30 (1): Exploring Theory-Laden Observations in the Brain Basis of Emotional Experience

Title: Exploring Theory-Laden Observations in the Brain Basis of Emotional Experience Erforschung von Theorie-Laden-Beobachtungen in der Brain-Basis emotionaler Erfahrung 在情感经历的大脑基础中探索理论拉拉观察 2507.00320v1

Authors (5): Christiana Westlin, Ashutosh Singh, Deniz Erdogmus, Georgios Stratis, Lisa Feldman Barrett

In the science of emotion, it is widely assumed that folk emotion categories form a biological and psychological typology, and studies are routinely designed and analyzed to identify emotion-specific patterns. This approach shapes the observations that studies report, ultimately reinforcing the assumption that guided the investigation. Here, we reanalyzed data from one such typologically-guided study that reported mappings between individual brain patterns and group-averaged ratings of 34 emotion categories. Our reanalysis was guided by an alternative view of emotion categories as populations of variable, situated instances, and which predicts a priori that there will be significant variation in brain patterns within a category across instances. Correspondingly, our analysis made minimal assumptions about the structure of the variance present in the data. As predicted, we did not observe the original mappings and instead observed significant variation across individuals. These findings demonstrate how starting assumptions can ultimately impact scientific conclusions and suggest that a hypothesis must be supported using multiple analytic methods before it is taken seriously.

nan


Article 580

Title@2025-06-30 (1): Stacked conformal prediction

Title: Stacked conformal prediction Gestapelte konforme Vorhersage 堆叠一致预测 2505.12578v2

Authors (1): Paulo C. Marques F

We consider a method for conformalizing a stacked ensemble of predictive models, showing that the potentially simple form of the meta-learner at the top of the stack enables a procedure with manageable computational cost that achieves approximate marginal validity without requiring the use of a separate calibration sample. Empirical results indicate that the method compares favorably to a standard inductive alternative.

nan


Article 581

Title@2025-06-30 (1): Open-ended Scientific Discovery via Bayesian Surprise

Title: Open-ended Scientific Discovery via Bayesian Surprise Offene wissenschaftliche Entdeckung über Bayesian Surprise 通过贝叶斯惊喜的不限名额科学发现 2507.00310v1

Authors (11): Dhruv Agarwal, Bodhisattwa Prasad Majumder, Reece Adamson, Megha Chakravorty, Satvika Reddy Gavireddy, Aditya Parashar, Harshit Surana, Bhavana Dalvi Mishra, Andrew McCallum, Ashish Sabharwal, Peter Clark

The promise of autonomous scientific discovery (ASD) hinges not only on answering questions, but also on knowing which questions to ask. Most recent works in ASD explore the use of large language models (LLMs) in goal-driven settings, relying on human-specified research questions to guide hypothesis generation. However, scientific discovery may be accelerated further by allowing the AI system to drive exploration by its own criteria. The few existing approaches in open-ended ASD select hypotheses based on diversity heuristics or subjective proxies for human interestingness, but the former struggles to meaningfully navigate the typically vast hypothesis space, and the latter suffers from imprecise definitions. This paper presents AutoDS – a method for open-ended ASD that instead drives scientific exploration using Bayesian surprise. Here, we quantify the epistemic shift from the LLM’s prior beliefs about a hypothesis to its posterior beliefs after gathering experimental results. To efficiently explore the space of nested hypotheses, our method employs a Monte Carlo tree search (MCTS) strategy with progressive widening using surprisal as the reward function. We evaluate AutoDS in the setting of data-driven discovery across 21 real-world datasets spanning domains such as biology, economics, finance, and behavioral science. Our results demonstrate that under a fixed budget, AutoDS substantially outperforms competitors by producing 5–29\% more discoveries deemed surprising by the LLM. Our human evaluation further finds that two-thirds of AutoDS discoveries are surprising to the domain experts, suggesting this is an important step forward towards building open-ended ASD systems.

nan


Article 582

Title@2025-06-30 (1): TabNSA: Native Sparse Attention for Efficient Tabular Data Learning

Title: TabNSA: Native Sparse Attention for Efficient Tabular Data Learning TabNSA: Native Sparse Aufmerksamkeit für effizientes Tabular Data Learning TabNSA: 提高图表数据学习效率的本地分散关注 2503.09850v2

Authors (2): Ali Eslamian, Qiang Cheng

Tabular data poses unique challenges for deep learning due to its heterogeneous feature types, lack of spatial structure, and often limited sample sizes. We propose TabNSA, a novel deep learning framework that integrates Native Sparse Attention (NSA) with a TabMixer backbone to efficiently model tabular data. TabNSA tackles computational and representational challenges by dynamically focusing on relevant feature subsets per instance. The NSA module employs a hierarchical sparse attention mechanism, including token compression, selective preservation, and localized sliding windows, to significantly reduce the quadratic complexity of standard attention operations while addressing feature heterogeneity. Complementing this, the TabMixer backbone captures complex, non-linear dependencies through parallel multilayer perceptron (MLP) branches with independent parameters. These modules are synergistically combined via element-wise summation and mean pooling, enabling TabNSA to model both global context and fine-grained interactions. Extensive experiments across supervised and transfer learning settings show that TabNSA consistently outperforms state-of-the-art deep learning models. Furthermore, by augmenting TabNSA with a fine-tuned large language model (LLM), we enable it to effectively address Few-Shot Learning challenges through language-guided generalization on diverse tabular benchmarks.

nan


Article 583

Title@2025-06-30 (1): MamNet: A Novel Hybrid Model for Time-Series Forecasting and Frequency Pattern Analysis in Network Traffic

Title: MamNet: A Novel Hybrid Model for Time-Series Forecasting and Frequency Pattern Analysis in Network Traffic MamNet: Ein neuartiges Hybridmodell für die Zeitreihenprognose und Frequenzmusteranalyse im Netzverkehr MamNet:网络交通时间-系列预测和频率模式分析新混合模型 2507.00304v1

Authors (7): Yujun Zhang, Runlong Li, Xiaoxiang Liang, Xinhao Yang, Tian Su, Bo Liu, Yan Zhou

The abnormal fluctuations in network traffic may indicate potential security threats or system failures. Therefore, efficient network traffic prediction and anomaly detection methods are crucial for network security and traffic management. This paper proposes a novel network traffic prediction and anomaly detection model, MamNet, which integrates time-domain modeling and frequency-domain feature extraction. The model first captures the long-term dependencies of network traffic through the Mamba module (time-domain modeling), and then identifies periodic fluctuations in the traffic using Fourier Transform (frequency-domain feature extraction). In the feature fusion layer, multi-scale information is integrated to enhance the model’s ability to detect network traffic anomalies. Experiments conducted on the UNSW-NB15 and CAIDA datasets demonstrate that MamNet outperforms several recent mainstream models in terms of accuracy, recall, and F1-Score. Specifically, it achieves an improvement of approximately 2% to 4% in detection performance for complex traffic patterns and long-term trend detection. The results indicate that MamNet effectively captures anomalies in network traffic across different time scales and is suitable for anomaly detection tasks in network security and traffic management. Future work could further optimize the model structure by incorporating external network event information, thereby improving the model’s adaptability and stability in complex network environments.

nan


Article 584

Title@2025-06-30 (1): Structure-preserving Lift & Learn: Scientific machine learning for nonlinear conservative partial differential equations

Title: Structure-preserving Lift & Learn: Scientific machine learning for nonlinear conservative partial differential equations Structure-preserving Lift & Learn: Wissenschaftliches maschinelles Lernen für nichtlineare konservative partielle Differentialgleichungen 结构保护提升和学习:非线性保守部分差异方程的科学机器学习 2507.00301v1

Authors (3): Harsh Sharma, Juan Diego Draxl Giannoni, Boris Kramer

This work presents structure-preserving Lift & Learn, a scientific machine learning method that employs lifting variable transformations to learn structure-preserving reduced-order models for nonlinear partial differential equations (PDEs) with conservation laws. We propose a hybrid learning approach based on a recently developed energy-quadratization strategy that uses knowledge of the nonlinearity at the PDE level to derive an equivalent quadratic lifted system with quadratic system energy. The lifted dynamics obtained via energy quadratization are linear in the old variables, making model learning very effective in the lifted setting. Based on the lifted quadratic PDE model form, the proposed method derives quadratic reduced terms analytically and then uses those derived terms to formulate a constrained optimization problem to learn the remaining linear reduced operators in a structure-preserving way. The proposed hybrid learning approach yields computationally efficient quadratic reduced-order models that respect the underlying physics of the high-dimensional problem. We demonstrate the generalizability of quadratic models learned via the proposed structure-preserving Lift & Learn method through three numerical examples: the one-dimensional wave equation with exponential nonlinearity, the two-dimensional sine-Gordon equation, and the two-dimensional Klein-Gordon-Zakharov equations. The numerical results show that the proposed learning approach is competitive with the state-of-the-art structure-preserving data-driven model reduction method in terms of both accuracy and computational efficiency.

nan


Article 585

Title@2025-06-30 (1): Enhancing Interpretability in Generative Modeling: Statistically Disentangled Latent Spaces Guided by Generative Factors in Scientific Datasets

Title: Enhancing Interpretability in Generative Modeling: Statistically Disentangled Latent Spaces Guided by Generative Factors in Scientific Datasets Verbesserung der Verdolmetschbarkeit in der Generativen Modellierung: Statistisch entfremdete Latentenräume, geleitet von generativen Faktoren in wissenschaftlichen Datensätzen 加强产生模型的可解释性:科学数据集中产生因素所引导的统计分解的低端空间 2507.00298v1

Authors (4): Arkaprabha Ganguli, Nesar Ramachandra, Julie Bessac, Emil Constantinescu

This study addresses the challenge of statistically extracting generative factors from complex, high-dimensional datasets in unsupervised or semi-supervised settings. We investigate encoder-decoder-based generative models for nonlinear dimensionality reduction, focusing on disentangling low-dimensional latent variables corresponding to independent physical factors. Introducing Aux-VAE, a novel architecture within the classical Variational Autoencoder framework, we achieve disentanglement with minimal modifications to the standard VAE loss function by leveraging prior statistical knowledge through auxiliary variables. These variables guide the shaping of the latent space by aligning latent factors with learned auxiliary variables. We validate the efficacy of Aux-VAE through comparative assessments on multiple datasets, including astronomical simulations.

nan


Article 586

Title@2025-06-30 (1): Empirical and computer-aided robustness analysis of long-step and accelerated methods in smooth convex optimization

Title: Empirical and computer-aided robustness analysis of long-step and accelerated methods in smooth convex optimization Empirische und computergestützte Robustheitsanalyse von langen und beschleunigten Methoden in reibungsloser konvexer Optimierung 经验分析和计算机辅助的稳健性分析,分析如何以长步骤和加速方法顺利优化顺流优化 2506.09730v3

Authors (2): Pierre Vernimmen, François Glineur

This work assesses both empirically and theoretically, using the performance estimation methodology, how robust different first-order optimization methods are when subject to relative inexactness in their gradient computations. Relative inexactness occurs, for example, when compressing the gradient using fewer bits of information, which happens when dealing with large-scale problems on GPUs. Three major families of methods are analyzed: constant step gradient descent, long-step methods, and accelerated methods. The latter two are first shown to be theoretically not robust to inexactness. Then, a semi-heuristic shortening factor is introduced to improve their theoretical guarantees. All methods are subsequently tested on a concrete inexact problem, with two different types of relative inexactness, and it is observed that both accelerated methods are much more robust than expected, and that the shortening factor significantly helps the long-step methods. In the end, all shortened methods appear to be promising, even in this inexact setting.

nan


Article 587

Title@2025-06-30 (1): Outlier Weighed Layerwise Sparsity (OWL): A Missing Secret Sauce for Pruning LLMs to High Sparsity

Title: Outlier Weighed Layerwise Sparsity (OWL): A Missing Secret Sauce for Pruning LLMs to High Sparsity Outlier Weigged Layerwise Sparsity (OWL): Eine fehlende Geheimsoße für das Beschneiden von LLMs auf hohe Sparsität 边边边边边边边的层间平分(OWL): 高平面的普鲁宁长程 2310.05175v4

Authors (13): Lu Yin, You Wu, Zhenyu Zhang, Cheng-Yu Hsieh, Yaqing Wang, Yiling Jia, Gen Li, Ajay Jaiswal, Mykola Pechenizkiy, Yi Liang, Michael Bendersky, Zhangyang Wang, Shiwei Liu

Large Language Models (LLMs), renowned for their remarkable performance across diverse domains, present a challenge when it comes to practical deployment due to their colossal model size. In response to this challenge, efforts have been directed toward the application of traditional network pruning techniques to LLMs, uncovering a massive number of parameters that can be pruned in one-shot without hurting performance. Prevailing LLM pruning strategies have consistently adhered to the practice of uniformly pruning all layers at equivalent sparsity, resulting in robust performance. However, this observation stands in contrast to the prevailing trends observed in the field of vision models, where non-uniform layerwise sparsity typically yields stronger results. To understand the underlying reasons for this disparity, we conduct a comprehensive study and discover a strong correlation with the emergence of activation outliers in LLMs. Inspired by this finding, we introduce a novel LLM pruning methodology that incorporates a tailored set of non-uniform layerwise sparsity ratios, termed as Outlier Weighed Layerwise sparsity (OWL). The sparsity ratio of OWL is proportional to the outlier ratio observed within each layer, facilitating a more effective alignment between layerwise weight sparsity and outlier ratios. Our empirical evaluation, conducted across the LLaMA-V1 family and OPT, spanning various benchmarks, demonstrates the distinct advantages offered by OWL over previous methods. For instance, OWL exhibits a remarkable performance gain, surpassing the state-of-the-art Wanda and SparseGPT by 61.22 and 6.80 perplexity at a high sparsity level of 70%, respectively, while delivering 2.6x end-to-end inference speed-up in the DeepSparse inference engine. Codes are available at https://github.com/luuyin/OWL.

nan


Article 588

Title@2025-06-30 (1): The Automated LLM Speedrunning Benchmark: Reproducing NanoGPT Improvements

Title: The Automated LLM Speedrunning Benchmark: Reproducing NanoGPT Improvements Der Automatisierte LLM Speedrunning Benchmark: NanoGPT-Verbesserungen reproduzieren 自动LLM快速运行基准:复制纳米GPT改进 2506.22419v2

Authors (23): Bingchen Zhao, Despoina Magka, Minqi Jiang, Xian Li, Roberta Raileanu, Tatiana Shavrina, Jean-Christophe Gagnon-Audet, Kelvin Niu, Shagun Sodhani, Michael Shvartsman, Andrei Lupu, Alisia Lupidi, Edan Toledo, Karen Hambardzumyan, Martin Josifoski, Thomas Foster, Lucia Cipolina-Kun, Abhishek Charnalia, Derek Dunfield, Alexander H. Miller, Oisin Mac Aodha, Jakob Foerster, Yoram Bachrach

Rapid advancements in large language models (LLMs) have the potential to assist in scientific progress. A critical capability toward this endeavor is the ability to reproduce existing work. To evaluate the ability of AI agents to reproduce results in an active research area, we introduce the Automated LLM Speedrunning Benchmark, leveraging the research community contributions on the NanoGPT speedrun, a competition to train a GPT-2 model in the shortest time. Each of the 19 speedrun tasks provides the agent with the previous records training script, optionally paired with one of three hint formats, ranging from pseudocode to paper-like descriptions of the new records improvements. Records execute quickly by design and speedrun improvements encompass diverse code-level changes, ranging from high-level algorithmic advancements to hardware-aware optimizations. These features make the benchmark both accessible and realistic for the frontier problem of improving LLM training. We find that recent reasoning LLMs combined with SoTA scaffolds struggle to reimplement already-known innovations in our benchmark, even when given detailed hints. Our benchmark thus provides a simple, non-saturated measure of an LLMs ability to automate scientific reproduction, a necessary (but not sufficient) skill for an autonomous research agent.

nan


Article 589

Title@2025-06-30 (1): Double Q-learning for Value-based Deep Reinforcement Learning, Revisited

Title: Double Q-learning for Value-based Deep Reinforcement Learning, Revisited Doppeltes Q-Lernen für wertbasiertes Deep Enforcement Learning, Revisited 以价值为基础的深强化学习的双重学习 2507.00275v1

Authors (3): Prabhat Nagarajan, Martha White, Marlos C. Machado

Overestimation is pervasive in reinforcement learning (RL), including in Q-learning, which forms the algorithmic basis for many value-based deep RL algorithms. Double Q-learning is an algorithm introduced to address Q-learning’s overestimation by training two Q-functions and using both to de-correlate action-selection and action-evaluation in bootstrap targets. Shortly after Q-learning was adapted to deep RL in the form of deep Q-networks (DQN), Double Q-learning was adapted to deep RL in the form of Double DQN. However, Double DQN only loosely adapts Double Q-learning, forgoing the training of two different Q-functions that bootstrap off one another. In this paper, we study algorithms that adapt this core idea of Double Q-learning for value-based deep RL. We term such algorithms Deep Double Q-learning (DDQL). Our aim is to understand whether DDQL exhibits less overestimation than Double DQN and whether performant instantiations of DDQL exist. We answer both questions affirmatively, demonstrating that DDQL reduces overestimation and outperforms Double DQN in aggregate across 57 Atari 2600 games, without requiring additional hyperparameters. We also study several aspects of DDQL, including its network architecture, replay ratio, and minibatch sampling strategy.

nan


Article 590

Title@2025-06-30 (1): Making a Pipeline Production-Ready: Challenges and Lessons Learned in the Healthcare Domain

Title: Making a Pipeline Production-Ready: Challenges and Lessons Learned in the Healthcare Domain Herstellung einer Pipeline-Produktion: Herausforderungen und Lektionen im Bereich Healthcare 《管道生产-准备:保健领域的挑战和经验教训》 2506.06946v2

Authors (6): Daniel Angelo Esteves Lawand, Lucas Quaresma Medina Lam, Roberto Oliveira Bolgheroni, Renato Cordeiro Ferreira, Alfredo Goldman, Marcelo Finger

Deploying a Machine Learning (ML) training pipeline into production requires good software engineering practices. Unfortunately, the typical data science workflow often leads to code that lacks critical software quality attributes. This experience report investigates this problem in SPIRA, a project whose goal is to create an ML-Enabled System (MLES) to pre-diagnose insufficiency respiratory via speech analysis. This paper presents an overview of the architecture of the MLES, then compares three versions of its Continuous Training subsystem: from a proof of concept Big Ball of Mud (v1), to a design pattern-based Modular Monolith (v2), to a test-driven set of Microservices (v3) Each version improved its overall extensibility, maintainability, robustness, and resiliency. The paper shares challenges and lessons learned in this process, offering insights for researchers and practitioners seeking to productionize their pipelines.

nan


Article 591

Title@2025-06-30 (1): Examining Reject Relations in Stimulus Equivalence Simulations

Title: Examining Reject Relations in Stimulus Equivalence Simulations Prüfung der Ablehnung von Beziehungen in Stimulus-Equivalenz-Simulationen 在刺激等效模拟中审查拒绝关系 2507.00265v1

Authors (4): Alexis Carrillo, Asieh Abolpour Mofrad, Anis Yazidi, Moises Betancort

Simulations offer a valuable tool for exploring stimulus equivalence (SE), yet the potential of reject relations to disrupt the assessment of equivalence class formation is contentious. This study investigates the role of reject relations in the acquisition of stimulus equivalence using computational models. We examined feedforward neural networks (FFNs), bidirectional encoder representations from transformers (BERT), and generative pre-trained transformers (GPT) across 18 conditions in matching-to-sample (MTS) simulations. Conditions varied in training structure (linear series, one-to-many, and many-to-one), relation type (select-only, reject-only, and select-reject), and negative comparison selection (standard and biased). A probabilistic agent served as a benchmark, embodying purely associative learning. The primary goal was to determine whether artificial neural networks could demonstrate equivalence class formation or whether their performance reflected associative learning. Results showed that reject relations influenced agent performance. While some agents achieved high accuracy on equivalence tests, particularly with reject relations and biased negative comparisons, this performance was comparable to the probabilistic agent. These findings suggest that artificial neural networks, including transformer models, may rely on associative strategies rather than SE. This underscores the need for careful consideration of reject relations and more stringent criteria in computational models of equivalence.

nan


Article 592

Title@2025-06-30 (1): Room Scene Discovery and Grouping in Unstructured Vacation Rental Image Collections

Title: Room Scene Discovery and Grouping in Unstructured Vacation Rental Image Collections Raumszene Entdeckung und Gruppierung in unstrukturierten Ferienhäusern Bildersammlungen 在无结构的假期中发现和分组房间景点 2507.00263v1

Authors (2): Vignesh Ram Nithin Kappagantula, Shayan Hassantabar

The rapid growth of vacation rental (VR) platforms has led to an increasing volume of property images, often uploaded without structured categorization. This lack of organization poses significant challenges for travelers attempting to understand the spatial layout of a property, particularly when multiple rooms of the same type are present. To address this issue, we introduce an effective approach for solving the room scene discovery and grouping problem, as well as identifying bed types within each bedroom group. This grouping is valuable for travelers to comprehend the spatial organization, layout, and the sleeping configuration of the property. We propose a computationally efficient machine learning pipeline characterized by low latency and the ability to perform effectively with sample-efficient learning, making it well-suited for real-time and data-scarce environments. The pipeline integrates a supervised room-type detection model, a supervised overlap detection model to identify the overlap similarity between two images, and a clustering algorithm to group the images of the same space together using the similarity scores. Additionally, the pipeline maps each bedroom group to the corresponding bed types specified in the property’s metadata, based on the visual content present in the group’s images using a Multi-modal Large Language Model (MLLM) model. We evaluate the aforementioned models individually and also assess the pipeline in its entirety, observing strong performance that significantly outperforms established approaches such as contrastive learning and clustering with pretrained embeddings.

nan


Article 593

Title@2025-06-30 (1): Bregman Centroid Guided Cross-Entropy Method

Title: Bregman Centroid Guided Cross-Entropy Method Bregman Centroid geführte Cross-Entropie-Methode Bregman Central 制导跨计算机方法 2506.02205v2

Authors (4): Yuliang Gu, Hongpeng Cao, Marco Caccamo, Naira Hovakimyan

The Cross-Entropy Method (CEM) is a widely adopted trajectory optimizer in model-based reinforcement learning (MBRL), but its unimodal sampling strategy often leads to premature convergence in multimodal landscapes. In this work, we propose Bregman Centroid Guided CEM ($\mathcal{BC}$-EvoCEM), a lightweight enhancement to ensemble CEM that leverages $\textit{Bregman centroids}$ for principled information aggregation and diversity control. $\textbf{$\mathcal{BC}$-EvoCEM}$ computes a performance-weighted Bregman centroid across CEM workers and updates the least contributing ones by sampling within a trust region around the centroid. Leveraging the duality between Bregman divergences and exponential family distributions, we show that $\textbf{$\mathcal{BC}$-EvoCEM}$ integrates seamlessly into standard CEM pipelines with negligible overhead. Empirical results on synthetic benchmarks, a cluttered navigation task, and full MBRL pipelines demonstrate that $\textbf{$\mathcal{BC}$-EvoCEM}$ enhances both convergence and solution quality, providing a simple yet effective upgrade for CEM.

nan


Article 594

Title@2025-06-30 (1): Disentangled Feature Importance

Title: Disentangled Feature Importance Entwirrte Eigenschaft Bedeutung 分解的特性重要性 2507.00260v1

Authors (3): Jin-Hong Du, Kathryn Roeder, Larry Wasserman

Feature importance quantification faces a fundamental challenge: when predictors are correlated, standard methods systematically underestimate their contributions. We prove that major existing approaches target identical population functionals under squared-error loss, revealing why they share this correlation-induced bias. To address this limitation, we introduce \emph{Disentangled Feature Importance (DFI)}, a nonparametric generalization of the classical $R^2$ decomposition via optimal transport. DFI transforms correlated features into independent latent variables using a transport map, eliminating correlation distortion. Importance is computed in this disentangled space and attributed back through the transport map’s sensitivity. DFI provides a principled decomposition of importance scores that sum to the total predictive variability for latent additive models and to interaction-weighted functional ANOVA variances more generally, under arbitrary feature dependencies. We develop a comprehensive semiparametric theory for DFI. For general transport maps, we establish root-$n$ consistency and asymptotic normality of importance estimators in the latent space, which extends to the original feature space for the Bures-Wasserstein map. Notably, our estimators achieve second-order estimation error, which vanishes if both regression function and transport map estimation errors are $o_{\mathbb{P}}(n^{-1/4})$. By design, DFI avoids the computational burden of repeated submodel refitting and the challenges of conditional covariate distribution estimation, thereby achieving computational efficiency.

nan


Article 595

Title@2025-06-30 (1): Gym4ReaL: A Suite for Benchmarking Real-World Reinforcement Learning

Title: Gym4ReaL: A Suite for Benchmarking Real-World Reinforcement Learning Gym4ReaL: Eine Suite für das Benchmarking von Real-World Verstärkungslernen Gym4ReaL:衡量现实世界加强学习的成套办法 2507.00257v1

Authors (12): Davide Salaorni, Vincenzo De Paola, Samuele Delpero, Giovanni Dispoto, Paolo Bonetti, Alessio Russo, Giuseppe Calcagno, Francesco Trovò, Matteo Papini, Alberto Maria Metelli, Marco Mussi, Marcello Restelli

In recent years, \emph{Reinforcement Learning} (RL) has made remarkable progress, achieving superhuman performance in a wide range of simulated environments. As research moves toward deploying RL in real-world applications, the field faces a new set of challenges inherent to real-world settings, such as large state-action spaces, non-stationarity, and partial observability. Despite their importance, these challenges are often underexplored in current benchmarks, which tend to focus on idealized, fully observable, and stationary environments, often neglecting to incorporate real-world complexities explicitly. In this paper, we introduce \texttt{Gym4ReaL}, a comprehensive suite of realistic environments designed to support the development and evaluation of RL algorithms that can operate in real-world scenarios. The suite includes a diverse set of tasks that expose algorithms to a variety of practical challenges. Our experimental results show that, in these settings, standard RL algorithms confirm their competitiveness against rule-based benchmarks, motivating the development of new methods to fully exploit the potential of RL to tackle the complexities of real-world tasks.

nan


Article 596

Title@2025-06-30 (1): Llama-Nemotron: Efficient Reasoning Models

Title: Llama-Nemotron: Efficient Reasoning Models Llama-Nemotron: Effiziente Denkmodelle Llama-Nepotron: 高效推理模型 2505.00949v4

Authors (136): Akhiad Bercovich, Itay Levy, Izik Golan, Mohammad Dabbah, Ran El-Yaniv, Omri Puny, Ido Galil, Zach Moshe, Tomer Ronen, Najeeb Nabwani, Ido Shahaf, Oren Tropp, Ehud Karpas, Ran Zilberstein, Jiaqi Zeng, Soumye Singhal, Alexander Bukharin, Yian Zhang, Tugrul Konuk, Gerald Shen, Ameya Sunil Mahabaleshwarkar, Bilal Kartal, Yoshi Suhara, Olivier Delalleau, Zijia Chen, Zhilin Wang, David Mosallanezhad, Adi Renduchintala, Haifeng Qian, Dima Rekesh, Fei Jia, Somshubra Majumdar, Vahid Noroozi, Wasi Uddin Ahmad, Sean Narenthiran, Aleksander Ficek, Mehrzad Samadi, Jocelyn Huang, Siddhartha Jain, Igor Gitman, Ivan Moshkov, Wei Du, Shubham Toshniwal, George Armstrong, Branislav Kisacanin, Matvei Novikov, Daria Gitman, Evelina Bakhturina, Prasoon Varshney, Makesh Narsimhan, Jane Polak Scowcroft, John Kamalu, Dan Su, Kezhi Kong, Markus Kliegl, Rabeeh Karimi, Ying Lin, Sanjeev Satheesh, Jupinder Parmar, Pritam Gundecha, Brandon Norick, Joseph Jennings, Shrimai Prabhumoye, Syeda Nahida Akter, Mostofa Patwary, Abhinav Khattar, Deepak Narayanan, Roger Waleffe, Jimmy Zhang, Bor-Yiing Su, Guyue Huang, Terry Kong, Parth Chadha, Sahil Jain, Christine Harvey, Elad Segal, Jining Huang, Sergey Kashirsky, Robert McQueen, Izzy Putterman, George Lam, Arun Venkatesan, Sherry Wu, Vinh Nguyen, Manoj Kilaru, Andrew Wang, Anna Warno, Abhilash Somasamudramath, Sandip Bhaskar, Maka Dong, Nave Assaf, Shahar Mor, Omer Ullman Argov, Scot Junkin, Oleksandr Romanenko, Pedro Larroy, Monika Katariya, Marco Rovinelli, Viji Balas, Nicholas Edelman, Anahita Bhiwandiwalla, Muthu Subramaniam, Smita Ithape, Karthik Ramamoorthy, Yuting Wu, Suguna Varshini Velury, Omri Almog, Joyjit Daw, Denys Fridman, Erick Galinkin, Michael Evans, Shaona Ghosh, Katherine Luna, Leon Derczynski, Nikki Pope, Eileen Long, Seth Schneider, Guillermo Siman, Tomasz Grzegorzek, Pablo Ribalta, Monika Katariya, Chris Alexiuk, Joey Conway, Trisha Saar, Ann Guan, Krzysztof Pawelec, Shyamala Prayaga, Oleksii Kuchaiev, Boris Ginsburg, Oluwatobi Olabiyi, Kari Briski, Jonathan Cohen, Bryan Catanzaro, Jonah Alben, Yonatan Geifman, Eric Chung

We introduce the Llama-Nemotron series of models, an open family of heterogeneous reasoning models that deliver exceptional reasoning capabilities, inference efficiency, and an open license for enterprise use. The family comes in three sizes – Nano (8B), Super (49B), and Ultra (253B) – and performs competitively with state-of-the-art reasoning models such as DeepSeek-R1 while offering superior inference throughput and memory efficiency. In this report, we discuss the training procedure for these models, which entails using neural architecture search from Llama 3 models for accelerated inference, knowledge distillation, and continued pretraining, followed by a reasoning-focused post-training stage consisting of two main parts: supervised fine-tuning and large scale reinforcement learning. Llama-Nemotron models are the first open-source models to support a dynamic reasoning toggle, allowing users to switch between standard chat and reasoning modes during inference. To further support open research and facilitate model development, we provide the following resources: 1. We release the Llama-Nemotron reasoning models – LN-Nano, LN-Super, and LN-Ultra – under the commercially permissive NVIDIA Open Model License Agreement. 2. We release the complete post-training dataset: Llama-Nemotron-Post-Training-Dataset. 3. We also release our training codebases: NeMo, NeMo-Aligner, and Megatron-LM.

nan


Article 597

Title@2025-06-30 (1): Generative Modeling of Full-Atom Protein Conformations using Latent Diffusion on Graph Embeddings

Title: Generative Modeling of Full-Atom Protein Conformations using Latent Diffusion on Graph Embeddings Generative Modellierung von Voll-Atom-Protein-Konformationen mit Latent Diffusion auf Graph-Embeddings 利用在图形嵌入器上延迟扩散生成全原子蛋白质变形的生成模型 2506.17064v3

Authors (5): Aditya Sengar, Ali Hariri, Daniel Probst, Patrick Barth, Pierre Vandergheynst

Generating diverse, all-atom conformational ensembles of dynamic proteins such as G-protein-coupled receptors (GPCRs) is critical for understanding their function, yet most generative models simplify atomic detail or ignore conformational diversity altogether. We present latent diffusion for full protein generation (LD-FPG), a framework that constructs complete all-atom protein structures, including every side-chain heavy atom, directly from molecular dynamics (MD) trajectories. LD-FPG employs a Chebyshev graph neural network (ChebNet) to obtain low-dimensional latent embeddings of protein conformations, which are processed using three pooling strategies: blind, sequential and residue-based. A diffusion model trained on these latent representations generates new samples that a decoder, optionally regularized by dihedral-angle losses, maps back to Cartesian coordinates. Using D2R-MD, a 2-microsecond MD trajectory (12 000 frames) of the human dopamine D2 receptor in a membrane environment, the sequential and residue-based pooling strategy reproduces the reference ensemble with high structural fidelity (all-atom lDDT of approximately 0.7; C-alpha-lDDT of approximately 0.8) and recovers backbone and side-chain dihedral-angle distributions with a Jensen-Shannon divergence of less than 0.03 compared to the MD data. LD-FPG thereby offers a practical route to system-specific, all-atom ensemble generation for large proteins, providing a promising tool for structure-based therapeutic design on complex, dynamic targets. The D2R-MD dataset and our implementation are freely available to facilitate further research.

nan


Article 598

Title@2025-06-30 (1): Developing Lightweight DNN Models With Limited Data For Real-Time Sign Language Recognition

Title: Developing Lightweight DNN Models With Limited Data For Real-Time Sign Language Recognition Entwicklung leichter DNN-Modelle mit begrenzten Daten für Echtzeit-Sign Language-Erkennung 开发轻型DNN模型,具有实时手语识别的有限数据 2507.00248v1

Authors (2): Nikita Nikitin, Eugene Fomin

We present a novel framework for real-time sign language recognition using lightweight DNNs trained on limited data. Our system addresses key challenges in sign language recognition, including data scarcity, high computational costs, and discrepancies in frame rates between training and inference environments. By encoding sign language specific parameters, such as handshape, palm orientation, movement, and location into vectorized inputs, and leveraging MediaPipe for landmark extraction, we achieve highly separable input data representations. Our DNN architecture, optimized for sub 10MB deployment, enables accurate classification of 343 signs with less than 10ms latency on edge devices. The data annotation platform ‘slait data’ facilitates structured labeling and vector extraction. Our model achieved 92% accuracy in isolated sign recognition and has been integrated into the ‘slait ai’ web application, where it demonstrates stable inference.

nan


Article 599

Title@2025-06-30 (1): Scaling Inference-Time Search with Vision Value Model for Improved Visual Comprehension

Title: Scaling Inference-Time Search with Vision Value Model for Improved Visual Comprehension Skalierung der Inferenz-Zeit-Suche mit Vision Value Model für verbesserte visuelle Wahrnehmung 增强视觉理解的视觉价值模型的增强推论-实时搜索 2412.03704v3

Authors (9): Xiyao Wang, Zhengyuan Yang, Linjie Li, Hongjin Lu, Yuancheng Xu, Chung-Ching Lin, Kevin Lin, Furong Huang, Lijuan Wang

Despite significant advancements in vision-language models (VLMs), there lacks effective approaches to enhance response quality by scaling inference-time computation. This capability is known to be a core step towards the self-improving models in recent large language model studies. In this paper, we present Vision Value Model (VisVM) that can guide VLM inference-time search to generate responses with better visual comprehension. Specifically, VisVM not only evaluates the generated sentence quality in the current search step, but also anticipates the quality of subsequent sentences that may result from the current step, thus providing a long-term value. In this way, VisVM steers VLMs away from generating sentences prone to hallucinations or insufficient detail, thereby producing higher quality responses. Experimental results demonstrate that VisVM-guided search significantly enhances VLMs’ ability to generate descriptive captions with richer visual details and fewer hallucinations, compared with greedy decoding and search methods with other visual reward signals. Furthermore, we find that self-training the model with the VisVM-guided captions improve VLM’s performance across a wide range of multimodal benchmarks, indicating the potential for developing self-improving VLMs. Our value model and code are available at https://github.com/si0wang/VisVM.

nan


Article 600

Title@2025-06-30 (1): An Unconditional Representation of the Conditional Score in Infinite-Dimensional Linear Inverse Problems

Title: An Unconditional Representation of the Conditional Score in Infinite-Dimensional Linear Inverse Problems Eine bedingungslose Darstellung des Bedingten Scores in unendlich-dimensionalen linearen Inversen-Problemen 无限、两极线性反问题中有条件评分的无条件代表性 2405.15643v3

Authors (5): Fabian Schneider, Duc-Lam Duong, Matti Lassas, Maarten V. de Hoop, Tapio Helin

Score-based diffusion models (SDMs) have emerged as a powerful tool for sampling from the posterior distribution in Bayesian inverse problems. However, existing methods often require multiple evaluations of the forward mapping to generate a single sample, resulting in significant computational costs for large-scale inverse problems. To address this, we propose an unconditional representation of the conditional score-function (UCoS) tailored to linear inverse problems, which avoids forward model evaluations during sampling by shifting computational effort to an offline training phase. In this phase, a task-dependent score function is learned based on the linear forward operator. Crucially, we show that the conditional score can be derived exactly from a trained (unconditional) score using affine transformations, eliminating the need for conditional score approximations. Our approach is formulated in infinite-dimensional function spaces, making it inherently discretization-invariant. We support this formulation with a rigorous convergence analysis that justifies UCoS beyond any specific discretization. Finally we validate UCoS through high-dimensional computed tomography (CT) and image deblurring experiments, demonstrating both scalability and accuracy.

nan


Article 601

Title@2025-06-30 (1): Causal Machine Learning in IoT-based Engineering Problems: A Tool Comparison in the Case of Household Energy Consumption

Title: Causal Machine Learning in IoT-based Engineering Problems: A Tool Comparison in the Case of Household Energy Consumption Kausales maschinelles Lernen in IoT-basierten Engineering-Problemen: Ein Tool-Vergleich im Fall des Haushaltsenergieverbrauchs 以木工工程问题为基础的因果机械学习:家庭能源消费工具比较 2505.12147v3

Authors (4): Nikolaos-Lysias Kosioris, Sotirios Nikoletseas, Gavrilis Filios, Stefanos Panagiotou

The rapid increase in computing power and the ability to store Big Data in the infrastructure has enabled predictions in a large variety of domains by Machine Learning. However, in many cases, existing Machine Learning tools are considered insufficient or incorrect since they exploit only probabilistic dependencies rather than inference logic. Causal Machine Learning methods seem to close this gap. In this paper, two prevalent tools based on Causal Machine Learning methods are compared, as well as their mathematical underpinning background. The operation of the tools is demonstrated by examining their response to 18 queries, based on the IDEAL Household Energy Dataset, published by the University of Edinburgh. First, it was important to evaluate the causal relations assumption that allowed the use of this approach; this was based on the preexisting scientific knowledge of the domain and was implemented by use of the in-built validation tools. Results were encouraging and may easily be extended to other domains.

nan


Article 602

Title@2025-06-30 (1): Interpretable AI for Time-Series: Multi-Model Heatmap Fusion with Global Attention and NLP-Generated Explanations

Title: Interpretable AI for Time-Series: Multi-Model Heatmap Fusion with Global Attention and NLP-Generated Explanations Interpretierbare KI für die Time-Serie: Multi-Model Heatmap Fusion mit globaler Aufmerksamkeit und NLP-generierten Erklärungen 时间序列可解释的 AI:全球关注的多模型热图融合和NLP - 引人注意的解释 2507.00234v1

Authors (2): Jiztom Kavalakkatt Francis, Matthew J Darr

In this paper, we present a novel framework for enhancing model interpretability by integrating heatmaps produced separately by ResNet and a restructured 2D Transformer with globally weighted input saliency. We address the critical problem of spatial-temporal misalignment in existing interpretability methods, where convolutional networks fail to capture global context and Transformers lack localized precision - a limitation that impedes actionable insights in safety-critical domains like healthcare and industrial monitoring. Our method merges gradient-weighted activation maps (ResNet) and Transformer attention rollout into a unified visualization, achieving full spatial-temporal alignment while preserving real-time performance. Empirical evaluations on clinical (ECG arrhythmia detection) and industrial (energy consumption prediction) datasets demonstrate significant improvements: the hybrid framework achieves 94.1% accuracy (F1 0.93) on the PhysioNet dataset and reduces regression error to RMSE = 0.28 kWh (R2 = 0.95) on the UCI Energy Appliance dataset-outperforming standalone ResNet, Transformer, and InceptionTime baselines by 3.8-12.4%. An NLP module translates fused heatmaps into domain-specific narratives (e.g., “Elevated ST-segment between 2-4 seconds suggests myocardial ischemia”), validated via BLEU-4 (0.586) and ROUGE-L (0.650) scores. By formalizing interpretability as causal fidelity and spatial-temporal alignment, our approach bridges the gap between technical outputs and stakeholder understanding, offering a scalable solution for transparent, time-aware decision-making.

nan


Article 603

Title@2025-06-30 (1): A Graph-Based Classical and Quantum Approach to Deterministic L-System Inference

Title: A Graph-Based Classical and Quantum Approach to Deterministic L-System Inference Ein auf Graphen basierender klassischer und Quantumansatz zur deterministischen L-System-Inferenz 以图表为基础的确定性L-系统系统推断法的古学和量法 2411.19906v3

Authors (3): Ali Lotfi, Ian McQuillan, Steven Rayan

L-systems can be made to model and create simulations of many biological processes, such as plant development. Finding an L-system for a given process is typically solved by hand, by experts, in a massively time-consuming process. It would be significant if this could be done automatically from data, such as from sequences of images. In this paper, we are interested in inferring a particular type of L-system, deterministic context-free L-system (D0L-system) from a sequence of strings. We introduce the characteristic graph of a sequence of strings, which we then utilize to translate our problem (inferring D0L-systems) in polynomial time into the maximum independent set problem (MIS) and the SAT problem. After that, we offer a classical exact algorithm and an approximate quantum algorithm for the problem.

nan


Article 604

Title@2025-06-30 (1): PPFL-RDSN: Privacy-Preserving Federated Learning-based Residual Dense Spatial Networks for Encrypted Lossy Image Reconstruction

Title: PPFL-RDSN: Privacy-Preserving Federated Learning-based Residual Dense Spatial Networks for Encrypted Lossy Image Reconstruction PPFL-RDSN: Privacy-Preserving Federated Learning-based Residual Dense Spatial Networks for Encrypted Lossy Image Reconstruction PPFL-RDSN: 用于加密失传图像重建的基于隐私-保护联邦学习的残余常量空间网络 2507.00230v1

Authors (2): Peilin He, James Joshi

Reconstructing high-quality images from low-resolution inputs using Residual Dense Spatial Networks (RDSNs) is crucial yet challenging, particularly in collaborative scenarios where centralized training poses significant privacy risks, including data leakage and inference attacks, as well as high computational costs. We propose a novel Privacy-Preserving Federated Learning-based RDSN (PPFL-RDSN) framework specifically tailored for lossy image reconstruction. PPFL-RDSN integrates Federated Learning (FL), local differential privacy, and robust model watermarking techniques, ensuring data remains secure on local devices, safeguarding sensitive information, and maintaining model authenticity without revealing underlying data. Empirical evaluations show that PPFL-RDSN achieves comparable performance to the state-of-the-art centralized methods while reducing computational burdens, and effectively mitigates security and privacy vulnerabilities, making it a practical solution for secure and privacy-preserving collaborative computer vision applications.

nan


Article 605

Title@2025-06-30 (1): Soft Dice Confidence: A Near-Optimal Confidence Estimator for Selective Prediction in Semantic Segmentation

Title: Soft Dice Confidence: A Near-Optimal Confidence Estimator for Selective Prediction in Semantic Segmentation Weiche Würfel-Konfidenz: Ein nahezu optimaler Vertrauens-Schätzer für selektive Vorhersagen in semantischer Segmentierung Soft Dice 信任度: 精度分割选择性预测的近最佳信任模拟器 2402.10665v3

Authors (3): Bruno Laboissiere Camargos Borges, Bruno Machado Pacheco, Danilo Silva

Selective prediction augments a model with the option to abstain from providing unreliable predictions. The key ingredient is a confidence score function, which should be directly related to the conditional risk. In the case of binary semantic segmentation, existing score functions either ignore the particularities of the evaluation metric or demand additional held-out data for tuning. We propose the Soft Dice Confidence (SDC), a simple, tuning-free confidence score function that directly aligns with the Dice coefficient metric. We prove that, under conditional independence, the SDC is near optimal: we establish upper and lower bounds on the ratio between the SDC and the ideal (intractable) confidence score function and show that these bounds are very close to 1. Experiments on six public medical-imaging benchmarks and on synthetic data corroborate our theoretical findings. In fact, SDC outperformed all prior confidence estimators from the literature in all of our experiments, including those that rely on additional data. These results position SDC as a reliable and efficient confidence estimator for selective prediction in semantic segmentation.

nan


Article 606

Title@2025-06-30 (1): Avoid Forgetting by Preserving Global Knowledge Gradients in Federated Learning with Non-IID Data

Title: Avoid Forgetting by Preserving Global Knowledge Gradients in Federated Learning with Non-IID Data Vermeiden Sie das Vergessen, indem Sie globale Wissensgradienten im Föderierten Lernen mit nicht-ID-Daten bewahren 避免在使用非二二二维数据进行联邦学习时因保留全球知识进步而被遗忘 2505.20485v3

Authors (5): Abhijit Chunduru, Majid Morafah, Mahdi Morafah, Vishnu Pandi Chellapandi, Ang Li

The inevitable presence of data heterogeneity has made federated learning very challenging. There are numerous methods to deal with this issue, such as local regularization, better model fusion techniques, and data sharing. Though effective, they lack a deep understanding of how data heterogeneity can affect the global decision boundary. In this paper, we bridge this gap by performing an experimental analysis of the learned decision boundary using a toy example. Our observations are surprising: (1) we find that the existing methods suffer from forgetting and clients forget the global decision boundary and only learn the perfect local one, and (2) this happens regardless of the initial weights, and clients forget the global decision boundary even starting from pre-trained optimal weights. In this paper, we present FedProj, a federated learning framework that robustly learns the global decision boundary and avoids its forgetting during local training. To achieve better ensemble knowledge fusion, we design a novel server-side ensemble knowledge transfer loss to further calibrate the learned global decision boundary. To alleviate the issue of learned global decision boundary forgetting, we further propose leveraging an episodic memory of average ensemble logits on a public unlabeled dataset to regulate the gradient updates at each step of local training. Experimental results demonstrate that FedProj outperforms state-of-the-art methods by a large margin.

nan


Article 607

Title@2025-06-30 (1): Holistic Artificial Intelligence in Medicine; improved performance and explainability

Title: Holistic Artificial Intelligence in Medicine; improved performance and explainability Ganzheitliche Künstliche Intelligenz in der Medizin; verbesserte Leistung und Erklärbarkeit 医学领域综合人工医学知识;改进绩效和解释性 2507.00205v1

Authors (4): Periklis Petridis, Georgios Margaritis, Vasiliki Stoumpou, Dimitris Bertsimas

With the increasing interest in deploying Artificial Intelligence in medicine, we previously introduced HAIM (Holistic AI in Medicine), a framework that fuses multimodal data to solve downstream clinical tasks. However, HAIM uses data in a task-agnostic manner and lacks explainability. To address these limitations, we introduce xHAIM (Explainable HAIM), a novel framework leveraging Generative AI to enhance both prediction and explainability through four structured steps: (1) automatically identifying task-relevant patient data across modalities, (2) generating comprehensive patient summaries, (3) using these summaries for improved predictive modeling, and (4) providing clinical explanations by linking predictions to patient-specific medical knowledge. Evaluated on the HAIM-MIMIC-MM dataset, xHAIM improves average AUC from 79.9% to 90.3% across chest pathology and operative tasks. Importantly, xHAIM transforms AI from a black-box predictor into an explainable decision support system, enabling clinicians to interactively trace predictions back to relevant patient data, bridging AI advancements with clinical utility.

nan


Article 608

Title@2025-06-30 (1): Storing overlapping associative memories on latent manifolds in low-rank spiking networks

Title: Storing overlapping associative memories on latent manifolds in low-rank spiking networks Speicherung überlappender assoziativer Erinnerungen an latente Mannigfaltigkeiten in niederrankigen Spiking-Netzwerken 将重叠的关联记忆存储在低级喷水网络中潜伏的柱体上 2411.17485v2

Authors (2): William F. Podlaski, Christian K. Machens

Associative memory architectures such as the Hopfield network have long been important conceptual and theoretical models for neuroscience and artificial intelligence. However, translating these abstract models into spiking neural networks has been surprisingly difficult. Indeed, much previous work has been restricted to storing a small number of primarily non-overlapping memories in large networks, thereby limiting their scalability. Here, we revisit the associative memory problem in light of recent advances in understanding spike-based computation. Using a recently-established geometric framework, we show that the spiking activity for a large class of all-inhibitory networks is situated on a low-dimensional, convex, and piecewise-linear manifold, with dynamics that move along the manifold. We then map the associative memory problem onto these dynamics, and demonstrate how the vertices of a hypercubic manifold can be used to store stable, overlapping activity patterns with a direct correspondence to the original Hopfield model. We propose several learning rules, and demonstrate a linear scaling of the storage capacity with the number of neurons, as well as robust pattern completion abilities. Overall, this work serves as a case study to demonstrate the effectiveness of using a geometrical perspective to design dynamics on neural manifolds, with implications for neuroscience and machine learning.

nan


Article 609

Title@2025-06-30 (1): Embedding-based Retrieval in Multimodal Content Moderation

Title: Embedding-based Retrieval in Multimodal Content Moderation Einbetten-basiertes Retrieval in multimodale Content Moderation 以嵌入为基础的多模式内容调控中回存 2507.01066v1

Authors (9): Hanzhong Liang, Jinghao Shi, Xiang Shen, Zixuan Wang, Vera Wen, Ardalan Mehrani, Zhiqian Chen, Yifan Wu, Zhixin Zhang

Video understanding plays a fundamental role for content moderation on short video platforms, enabling the detection of inappropriate content. While classification remains the dominant approach for content moderation, it often struggles in scenarios requiring rapid and cost-efficient responses, such as trend adaptation and urgent escalations. To address this issue, we introduce an Embedding-Based Retrieval (EBR) method designed to complement traditional classification approaches. We first leverage a Supervised Contrastive Learning (SCL) framework to train a suite of foundation embedding models, including both single-modal and multi-modal architectures. Our models demonstrate superior performance over established contrastive learning methods such as CLIP and MoCo. Building on these embedding models, we design and implement the embedding-based retrieval system that integrates embedding generation and video retrieval to enable efficient and effective trend handling. Comprehensive offline experiments on 25 diverse emerging trends show that EBR improves ROC-AUC from 0.85 to 0.99 and PR-AUC from 0.35 to 0.95. Further online experiments reveal that EBR increases action rates by 10.32% and reduces operational costs by over 80%, while also enhancing interpretability and flexibility compared to classification-based solutions.

nan


Article 610

Title@2025-06-30 (1): Identifying the Truth of Global Model: A Generic Solution to Defend Against Byzantine and Backdoor Attacks in Federated Learning (full version)

Title: Identifying the Truth of Global Model: A Generic Solution to Defend Against Byzantine and Backdoor Attacks in Federated Learning (full version) Die Wahrheit des globalen Modells identifizieren: Eine generische Lösung gegen byzantinische und Hintertürangriffe im Federated Learning (Vollversion) 查明全球模式真相:在联邦学习联盟中防范拜占庭和后门攻击的一般解决办法(全文) 2311.10248v3

Authors (3): Sheldon C. Ebron, Meiying Zhang, Kan Yang

Federated Learning (FL) enables multiple parties to train machine learning models collaboratively without sharing the raw training data. However, the federated nature of FL enables malicious clients to influence a trained model by injecting error model updates via Byzantine or backdoor attacks. To detect malicious model updates, a typical approach is to measure the distance between each model update and a \textit{ground-truth model update}. To find such \textit{ground-truth model updates}, existing defenses either require a benign root dataset on the server (e.g., FLTrust) or simply use trimmed mean or median as the threshold for clipping (e.g., FLAME). However, such benign root datasets are impractical, and the trimmed mean or median may also eliminate contributions from these underrepresented datasets. In this paper, we propose a generic solution, namely FedTruth, to defend against model poisoning attacks in FL, where the \textit{ground-truth model update} (i.e., the global model update) will be estimated among all the model updates with dynamic aggregation weights. Specifically, FedTruth does not have specific assumptions on the benign or malicious data distribution or access to a benign root dataset. Moreover, FedTruth considers the potential contributions from all benign clients. Our empirical results show that FedTruth can reduce the impacts of poisoned model updates against both Byzantine and backdoor attacks, and is also efficient in large-scale FL systems.

nan


Article 611

Title@2025-06-30 (1): Intelligent Routing for Sparse Demand Forecasting: A Comparative Evaluation of Selection Strategies

Title: Intelligent Routing for Sparse Demand Forecasting: A Comparative Evaluation of Selection Strategies Intelligentes Routing für Sparse Demand Forecasting: Eine vergleichende Bewertung von Auswahlstrategien 用于粗略需求预测的智能路标:选择战略的比较评价 2506.14810v2

Authors (1): Qiwen Zhang

Sparse and intermittent demand forecasting in supply chains presents a critical challenge, as frequent zero-demand periods hinder traditional model accuracy and impact inventory management. We propose and evaluate a Model-Router framework that dynamically selects the most suitable forecasting model-spanning classical, ML, and DL methods for each product based on its unique demand pattern. By comparing rule-based, LightGBM, and InceptionTime routers, our approach learns to assign appropriate forecasting strategies, effectively differentiating between smooth, lumpy, or intermittent demand regimes to optimize predictions. Experiments on the large-scale Favorita dataset show our deep learning (Inception Time) router improves forecasting accuracy by up to 11.8% (NWRMSLE) over strong, single-model benchmarks with 4.67x faster inference time. Ultimately, these gains in forecasting precision will drive substantial reductions in both stockouts and wasteful excess inventory, underscoring the critical role of intelligent, adaptive Al in optimizing contemporary supply chain operations.

nan


Article 612

Title@2025-06-30 (1): What Makes Local Updates Effective: The Role of Data Heterogeneity and Smoothness

Title: What Makes Local Updates Effective: The Role of Data Heterogeneity and Smoothness Was lokale Updates effektiv macht: Die Rolle von Daten Heterogenität und Glätte 是什么使本地更新有效:数据多样化和平稳的作用 2507.00195v1

Authors (1): Kumar Kshitij Patel

This thesis contributes to the theoretical understanding of local update algorithms, especially Local SGD, in distributed and federated optimization under realistic models of data heterogeneity. A central focus is on the bounded second-order heterogeneity assumption, which is shown to be both necessary and sufficient for local updates to outperform centralized or mini-batch methods in convex and non-convex settings. The thesis establishes tight upper and lower bounds in several regimes for various local update algorithms and characterizes the min-max complexity of multiple problem classes. At its core is a fine-grained consensus-error-based analysis framework that yields sharper finite-time convergence bounds under third-order smoothness and relaxed heterogeneity assumptions. The thesis also extends to online federated learning, providing fundamental regret bounds under both first-order and bandit feedback. Together, these results clarify when and why local updates offer provable advantages, and the thesis serves as a self-contained guide for analyzing Local SGD in heterogeneous environments.

nan


Article 613

Title@2025-06-30 (1): RocketKV: Accelerating Long-Context LLM Inference via Two-Stage KV Cache Compression

Title: RocketKV: Accelerating Long-Context LLM Inference via Two-Stage KV Cache Compression RocketKV: Beschleunigung der Langkontext-LLM-Inferenz über zweistufige KV-Cache-Kompression RocketKV: 通过两步KV缓存压缩加速长文本LLM推导 2502.14051v2

Authors (6): Payman Behnam, Yaosheng Fu, Ritchie Zhao, Po-An Tsai, Zhiding Yu, Alexey Tumanov

Transformer-based Large Language Models rely critically on the KV cache to efficiently handle extended contexts during the decode phase. Yet, the size of the KV cache grows proportionally with the input length, burdening both memory bandwidth and capacity as decoding progresses. To address this challenge, we present RocketKV, a training-free KV cache compression strategy containing two consecutive stages. In the first stage, it performs coarse-grain permanent KV cache eviction on the input sequence tokens. In the second stage, it adopts a hybrid sparse attention method to conduct fine-grain top-k sparse attention, approximating the attention scores by leveraging both head and sequence dimensionality reductions. We show that RocketKV provides a compression ratio of up to 400$\times$, end-to-end speedup of up to 3.7$\times$ as well as peak memory reduction of up to 32.6% in the decode phase on an NVIDIA A100 GPU compared to the full KV cache baseline, while achieving negligible accuracy loss on a variety of long-context tasks. We also propose a variant of RocketKV for multi-turn scenarios, which consistently outperforms other existing methods and achieves accuracy nearly on par with an oracle top-k attention scheme.

nan


Article 614

Title@2025-06-30 (1): Beyond Sensor Data: Foundation Models of Behavioral Data from Wearables Improve Health Predictions

Title: Beyond Sensor Data: Foundation Models of Behavioral Data from Wearables Improve Health Predictions Jenseits von Sensordaten: Basismodelle von Verhaltensdaten von Wearables verbessern Gesundheitsprognosen 超越感官数据:从改进健康预测的易穿改进健康预测 2507.00191v1

Authors (9): Eray Erturk, Fahad Kamran, Salar Abbaspourazad, Sean Jewell, Harsh Sharma, Yujie Li, Sinead Williamson, Nicholas J Foti, Joseph Futoma

Wearable devices record physiological and behavioral signals that can improve health predictions. While foundation models are increasingly used for such predictions, they have been primarily applied to low-level sensor data, despite behavioral data often being more informative due to their alignment with physiologically relevant timescales and quantities. We develop foundation models of such behavioral signals using over 2.5B hours of wearable data from 162K individuals, systematically optimizing architectures and tokenization strategies for this unique dataset. Evaluated on 57 health-related tasks, our model shows strong performance across diverse real-world applications including individual-level classification and time-varying health state prediction. The model excels in behavior-driven tasks like sleep prediction, and improves further when combined with representations of raw sensor data. These results underscore the importance of tailoring foundation model design to wearables and demonstrate the potential to enable new health applications.

nan


Article 615

Title@2025-06-30 (1): Text-to-Level Diffusion Models With Various Text Encoders for Super Mario Bros

Title: Text-to-Level Diffusion Models With Various Text Encoders for Super Mario Bros Text-zu-Level-Diffusionsmodelle mit verschiedenen Text-Encodern für Super Mario Bros 超级马里奥兄弟公司各种文本编码器的文本到级传播模型 2507.00184v1

Authors (5): Jacob Schrum, Olivia Kilday, Emilio Salas, Bess Hagan, Reid Williams

Recent research shows how diffusion models can unconditionally generate tile-based game levels, but use of diffusion models for text-to-level generation is underexplored. There are practical considerations for creating a usable model: caption/level pairs are needed, as is a text embedding model, and a way of generating entire playable levels, rather than individual scenes. We present strategies to automatically assign descriptive captions to an existing level dataset, and train diffusion models using both pretrained text encoders and simple transformer models trained from scratch. Captions are automatically assigned to generated levels so that the degree of overlap between input and output captions can be compared. We also assess the diversity and playability of the resulting levels. Results are compared with an unconditional diffusion model and a generative adversarial network, as well as the text-to-level approaches Five-Dollar Model and MarioGPT. Notably, the best diffusion model uses a simple transformer model for text embedding, and takes less time to train than diffusion models employing more complex text encoders, indicating that reliance on larger language models is not necessary. We also present a GUI allowing designers to construct long levels from model-generated scenes.

nan


Article 616

Title@2025-06-30 (1): RLCAD: Reinforcement Learning Training Gym for Revolution Involved CAD Command Sequence Generation

Title: RLCAD: Reinforcement Learning Training Gym for Revolution Involved CAD Command Sequence Generation RLCAD: Verstärkungs-Lerntraining Gym für Revolution beteiligt CAD-Befehlssequenz-Generierung RELAD: 革命参与的CAD指挥序列生成者加强学习培训 2503.18549v2

Authors (8): Xiaolong Yin, Xingyu Lu, Jiahang Shen, Jingzhe Ni, Hailong Li, Ruofeng Tong, Min Tang, Peng Du

A CAD command sequence is a typical parametric design paradigm in 3D CAD systems where a model is constructed by overlaying 2D sketches with operations such as extrusion, revolution, and Boolean operations. Although there is growing academic interest in the automatic generation of command sequences, existing methods and datasets only support operations such as 2D sketching, extrusion,and Boolean operations. This limitation makes it challenging to represent more complex geometries. In this paper, we present a reinforcement learning (RL) training environment (gym) built on a CAD geometric engine. Given an input boundary representation (B-Rep) geometry, the policy network in the RL algorithm generates an action. This action, along with previously generated actions, is processed within the gym to produce the corresponding CAD geometry, which is then fed back into the policy network. The rewards, determined by the difference between the generated and target geometries within the gym, are used to update the RL network. Our method supports operations beyond sketches, Boolean, and extrusion, including revolution operations. With this training gym, we achieve state-of-the-art (SOTA) quality in generating command sequences from B-Rep geometries.

nan


Article 617

Title@2025-06-30 (1): Generative Intervention Models for Causal Perturbation Modeling

Title: Generative Intervention Models for Causal Perturbation Modeling Generative Interventionsmodelle für die kausale Störungsmodellierung 造成助动模型模型的产生干预模型 2411.14003v2

Authors (5): Nora Schneider, Lars Lorch, Niki Kilbertus, Bernhard Schölkopf, Andreas Krause

We consider the problem of predicting perturbation effects via causal models. In many applications, it is a priori unknown which mechanisms of a system are modified by an external perturbation, even though the features of the perturbation are available. For example, in genomics, some properties of a drug may be known, but not their causal effects on the regulatory pathways of cells. We propose a generative intervention model (GIM) that learns to map these perturbation features to distributions over atomic interventions in a jointly-estimated causal model. Contrary to prior approaches, this enables us to predict the distribution shifts of unseen perturbation features while gaining insights about their mechanistic effects in the underlying data-generating process. On synthetic data and scRNA-seq drug perturbation data, GIMs achieve robust out-of-distribution predictions on par with unstructured approaches, while effectively inferring the underlying perturbation mechanisms, often better than other causal inference methods.

nan


Article 618

Title@2025-06-30 (1): BlackBoxToBlueprint: Extracting Interpretable Logic from Legacy Systems using Reinforcement Learning and Counterfactual Analysis

Title: BlackBoxToBlueprint: Extracting Interpretable Logic from Legacy Systems using Reinforcement Learning and Counterfactual Analysis BlackBoxToBlueprint: Interpretierbare Logik aus Legacy-Systemen mit Verstärkungslernen und kontrafaktischer Analyse extrahieren BlackBoxto蓝图:利用强化学习和反事实分析从遗留系统中提取可解释逻辑 2507.00180v1

Authors (1): Vidhi Rathore

Modernizing legacy software systems is a critical but challenging task, often hampered by a lack of documentation and understanding of the original system’s intricate decision logic. Traditional approaches like behavioral cloning merely replicate input-output behavior without capturing the underlying intent. This paper proposes a novel pipeline to automatically extract interpretable decision logic from legacy systems treated as black boxes. The approach uses a Reinforcement Learning (RL) agent to explore the input space and identify critical decision boundaries by rewarding actions that cause meaningful changes in the system’s output. These counterfactual state transitions, where the output changes, are collected and clustered using K-Means. Decision trees are then trained on these clusters to extract human-readable rules that approximate the system’s decision logic near the identified boundaries. I demonstrated the pipeline’s effectiveness on three dummy legacy systems with varying complexity, including threshold-based, combined-conditional, and non-linear range logic. Results show that the RL agent successfully focuses exploration on relevant boundary regions, and the extracted rules accurately reflect the core logic of the underlying dummy systems, providing a promising foundation for generating specifications and test cases during legacy migration.

nan


Article 619

Title@2025-06-30 (1): Robust Representation Consistency Model via Contrastive Denoising

Title: Robust Representation Consistency Model via Contrastive Denoising Robustes Darstellungs-Konsistenz-Modell durch Kontrast-Denoisieren 通过反排斥做法形成强力代表性一致性模式 2501.13094v2

Authors (8): Jiachen Lei, Julius Berner, Jiongxiao Wang, Zhongzhu Chen, Zhongjia Ba, Kui Ren, Jun Zhu, Anima Anandkumar

Robustness is essential for deep neural networks, especially in security-sensitive applications. To this end, randomized smoothing provides theoretical guarantees for certifying robustness against adversarial perturbations. Recently, diffusion models have been successfully employed for randomized smoothing to purify noise-perturbed samples before making predictions with a standard classifier. While these methods excel at small perturbation radii, they struggle with larger perturbations and incur a significant computational overhead during inference compared to classical methods. To address this, we reformulate the generative modeling task along the diffusion trajectories in pixel space as a discriminative task in the latent space. Specifically, we use instance discrimination to achieve consistent representations along the trajectories by aligning temporally adjacent points. After fine-tuning based on the learned representations, our model enables implicit denoising-then-classification via a single prediction, substantially reducing inference costs. We conduct extensive experiments on various datasets and achieve state-of-the-art performance with minimal computation budget during inference. For example, our method outperforms the certified accuracy of diffusion-based methods on ImageNet across all perturbation radii by 5.3% on average, with up to 11.6% at larger radii, while reducing inference costs by 85$\times$ on average. Codes are available at: https://github.com/jiachenlei/rRCM.

nan


Article 620

Title@2025-06-30 (1): Data Uniformity Improves Training Efficiency and More, with a Convergence Framework Beyond the NTK Regime

Title: Data Uniformity Improves Training Efficiency and More, with a Convergence Framework Beyond the NTK Regime Dateneinheit verbessert Trainingseffizienz und mehr, mit einem Konvergenzrahmen jenseits des NTK-Regimes 数据统一提高培训效率,提高培训效率,提高培训效率,提高培训效率,除国家培训制度之外,采用统一框架 2506.24120v1

Authors (2): Yuqing Wang, Shangding Gu

Data selection plays a crucial role in data-driven decision-making, including in large language models (LLMs), and is typically task-dependent. Properties such as data quality and diversity have been extensively studied and are known to enhance model performance. However, it remains unclear whether there exist other quantitative and general principles of data selection that can consistently improve performance, especially for complex tasks with limited prior knowledge. In this paper, we demonstrate that selecting more uniformly distributed data can improve training efficiency while enhancing performance. Specifically, we establish that more uniform (less biased) distribution leads to a larger minimum pairwise distance between data points, denoted by $h_{\min}$, and prove that a smaller $h_{\min}$ can slow down the training dynamics of gradient descent (GD). Moreover, we theoretically show that the approximation error of neural networks decreases as $h_{\min}$ increases. Our analysis introduces a convergence framework for GD beyond the Neural Tangent Kernel (NTK) regime, applicable to a broad class of architectures, including transformers, without requiring Lipschitz smoothness. This framework further provides theoretical justification for the use of residual connections and function compositions in deep neural architectures. In the end, we conduct comprehensive experiments for supervised fine-tuning across various settings, including different optimization strategies, model sizes, and training datasets. The results consistently demonstrate that selecting data by maximizing pairwise distance significantly accelerates training and achieves comparable or better performance in LLMs across diverse datasets. Code and Datasets are available at the link: https://github.com/SafeRL-Lab/data-uniformity.

nan


Article 621

Title@2025-06-30 (1): Navigating with Annealing Guidance Scale in Diffusion Space

Title: Navigating with Annealing Guidance Scale in Diffusion Space Navigieren mit Annealing Guidance Scale im Diffusionsraum 与在扩散空间使用 AnnaALing 指导比例值导航 2506.24108v1

Authors (4): Shai Yehezkel, Omer Dahary, Andrey Voynov, Daniel Cohen-Or

Denoising diffusion models excel at generating high-quality images conditioned on text prompts, yet their effectiveness heavily relies on careful guidance during the sampling process. Classifier-Free Guidance (CFG) provides a widely used mechanism for steering generation by setting the guidance scale, which balances image quality and prompt alignment. However, the choice of the guidance scale has a critical impact on the convergence toward a visually appealing and prompt-adherent image. In this work, we propose an annealing guidance scheduler which dynamically adjusts the guidance scale over time based on the conditional noisy signal. By learning a scheduling policy, our method addresses the temperamental behavior of CFG. Empirical results demonstrate that our guidance scheduler significantly enhances image quality and alignment with the text prompt, advancing the performance of text-to-image generation. Notably, our novel scheduler requires no additional activations or memory consumption, and can seamlessly replace the common classifier-free guidance, offering an improved trade-off between prompt alignment and quality.

nan


Article 622

Title@2025-06-30 (1): Development of Hybrid Artificial Intelligence Training on Real and Synthetic Data: Benchmark on Two Mixed Training Strategies

Title: Development of Hybrid Artificial Intelligence Training on Real and Synthetic Data: Benchmark on Two Mixed Training Strategies Entwicklung von hybriden Künstlichen Intelligenz-Schulungen zu realen und synthetischen Daten: Benchmark für zwei gemischte Trainingsstrategien 发展关于真实数据与合成数据的混合人文情报培训:两项混合培训战略的基准 2506.24093v1

Authors (3): Paul Wachter, Lukas Niehaus, Julius Schöning

Synthetic data has emerged as a cost-effective alternative to real data for training artificial neural networks (ANN). However, the disparity between synthetic and real data results in a domain gap. That gap leads to poor performance and generalization of the trained ANN when applied to real-world scenarios. Several strategies have been developed to bridge this gap, which combine synthetic and real data, known as mixed training using hybrid datasets. While these strategies have been shown to mitigate the domain gap, a systematic evaluation of their generalizability and robustness across various tasks and architectures remains underexplored. To address this challenge, our study comprehensively analyzes two widely used mixing strategies on three prevalent architectures and three distinct hybrid datasets. From these datasets, we sample subsets with varying proportions of synthetic to real data to investigate the impact of synthetic and real components. The findings of this paper provide valuable insights into optimizing the use of synthetic data in the training process of any ANN, contributing to enhancing robustness and efficacy.

nan


Article 623

Title@2025-06-30 (1): SEUF: Is Unlearning One Expert Enough for Mixture-of-Experts LLMs?

Title: SEUF: Is Unlearning One Expert Enough for Mixture-of-Experts LLMs? SEUF: Reicht es für LLMs für Mixture-of-Experts aus, einen Experten zu lernen? SEUF:不学习一位专家是否足以使混合专家LLM公司受益? 2411.18797v2

Authors (7): Haomin Zhuang, Yihua Zhang, Kehan Guo, Jinghan Jia, Gaowen Liu, Sijia Liu, Xiangliang Zhang

Recent advancements in LLMs unlearning have shown remarkable success in removing unwanted data-model influences while preserving the model’s utility for legitimate knowledge. Despite these strides, sparse Mixture-of-Experts (MoE) LLMs–a key subset of the LLM family–have remained unexplored in the context of unlearning. As MoE LLMs are celebrated for their exceptional performance, we ask:How can unlearning be performed effectively and efficiently on MoE LLMs? Our pilot study shows that the dynamic routing nature of MoE LLMs introduces unique challenges, leading to excessive forgetting, uncontrolled knowledge erasure and substantial utility drops when existing unlearning methods are applied. To address this, we propose a novel Selected-Expert Unlearning Framework (SEUF). Through expert attribution, unlearning is concentrated on the most actively engaged experts for the specified knowledge. Concurrently, an anchor loss is applied to the router to stabilize the active state of this targeted expert, ensuring focused and controlled unlearning. SEUF is compatible with various standard unlearning algorithms. Extensive experiments demonstrate that SEUF enhances both forget quality up to 5% and model utility by 35% on MoE LLMs across various benchmarks and LLM architectures (compared to standard unlearning algorithms), while only unlearning 0.06% of the model parameters.

nan


Article 624

Title@2025-06-30 (1): SQUASH: A SWAP-Based Quantum Attack to Sabotage Hybrid Quantum Neural Networks

Title: SQUASH: A SWAP-Based Quantum Attack to Sabotage Hybrid Quantum Neural Networks SQUASH: Ein SWAP-basierter Quantenangriff zur Sabotage hybrider Quantum-Neural-Netzwerke 以全部门办法为基础的对破坏混合量子神经网络的量子攻击 2506.24081v1

Authors (6): Rahul Kumar, Wenqi Wei, Ying Mao, Junaid Farooq, Ying Wang, Juntao Chen

We propose a circuit-level attack, SQUASH, a SWAP-Based Quantum Attack to sabotage Hybrid Quantum Neural Networks (HQNNs) for classification tasks. SQUASH is executed by inserting SWAP gate(s) into the variational quantum circuit of the victim HQNN. Unlike conventional noise-based or adversarial input attacks, SQUASH directly manipulates the circuit structure, leading to qubit misalignment and disrupting quantum state evolution. This attack is highly stealthy, as it does not require access to training data or introduce detectable perturbations in input states. Our results demonstrate that SQUASH significantly degrades classification performance, with untargeted SWAP attacks reducing accuracy by up to 74.08\% and targeted SWAP attacks reducing target class accuracy by up to 79.78\%. These findings reveal a critical vulnerability in HQNN implementations, underscoring the need for more resilient architectures against circuit-level adversarial interventions.

nan


Article 625

Title@2025-06-30 (1): Optimization, Isoperimetric Inequalities, and Sampling via Lyapunov Potentials

Title: Optimization, Isoperimetric Inequalities, and Sampling via Lyapunov Potentials Optimierung, isoperimetrische Ungleichheiten und Probenahme über Lyapunov-Potenziale 优化、同位素不平等和通过Lyapunov潜力取样 2410.02979v4

Authors (2): August Y. Chen, Karthik Sridharan

In this paper, we prove that optimizability of any function F using Gradient Flow from all initializations implies a Poincar'e Inequality for Gibbs measures mu_{beta} = e^{-beta F}/Z at low temperature. In particular, under mild regularity assumptions on the convergence rate of Gradient Flow, we establish that mu_{beta} satisfies a Poincar'e Inequality with constant O(C’+1/beta) for beta >= Omega(d), where C’ is the Poincar'e constant of mu_{beta} restricted to a neighborhood of the global minimizers of F. Under an additional mild condition on F, we show that mu_{beta} satisfies a Log-Sobolev Inequality with constant O(beta max(S, 1) max(C’, 1)) where S denotes the second moment of mu_{beta}. Here asymptotic notation hides F-dependent parameters. At a high level, this establishes that optimizability via Gradient Flow from every initialization implies a Poincar'e and Log-Sobolev Inequality for the low-temperature Gibbs measure, which in turn imply sampling from all initializations. Analogously, we establish that under the same assumptions, if F can be initialized from everywhere except some set S, then mu_{beta} satisfies a Weak Poincar'e Inequality with parameters (O(C’+1/beta), O(mu_{beta}(S))) for \beta = Omega(d). At a high level, this shows while optimizability from ‘most’ initializations implies a Weak Poincar'e Inequality, which in turn implies sampling from suitable warm starts. Our regularity assumptions are mild and as a consequence, we show we can efficiently sample from several new natural and interesting classes of non-log-concave densities, an important setting with relatively few examples. As another corollary, we obtain efficient discrete-time sampling results for log-concave measures satisfying milder regularity conditions than smoothness, similar to Lehec (2023).

nan


Article 626

Title@2025-06-30 (1): What Makes Treatment Effects Identifiable? Characterizations and Estimators Beyond Unconfoundedness

Title: What Makes Treatment Effects Identifiable? Characterizations and Estimators Beyond Unconfoundedness Was macht Behandlungseffekte identifizierbar? Charakterisierungen und Stimulatoren jenseits von Unbegründetheit 是什么使治疗效果可以识别?特征和估计因素超越无根据 2506.04194v2

Authors (5): Yang Cai, Alkis Kalavasis, Katerina Mamali, Anay Mehrotra, Manolis Zampetakis

Most of the widely used estimators of the average treatment effect (ATE) in causal inference rely on the assumptions of unconfoundedness and overlap. Unconfoundedness requires that the observed covariates account for all correlations between the outcome and treatment. Overlap requires the existence of randomness in treatment decisions for all individuals. Nevertheless, many types of studies frequently violate unconfoundedness or overlap, for instance, observational studies with deterministic treatment decisions - popularly known as Regression Discontinuity designs - violate overlap. In this paper, we initiate the study of general conditions that enable the identification of the average treatment effect, extending beyond unconfoundedness and overlap. In particular, following the paradigm of statistical learning theory, we provide an interpretable condition that is sufficient and necessary for the identification of ATE. Moreover, this condition also characterizes the identification of the average treatment effect on the treated (ATT) and can be used to characterize other treatment effects as well. To illustrate the utility of our condition, we present several well-studied scenarios where our condition is satisfied and, hence, we prove that ATE can be identified in regimes that prior works could not capture. For example, under mild assumptions on the data distributions, this holds for the models proposed by Tan (2006) and Rosenbaum (2002), and the Regression Discontinuity design model introduced by Thistlethwaite and Campbell (1960). For each of these scenarios, we also show that, under natural additional assumptions, ATE can be estimated from finite samples. We believe these findings open new avenues for bridging learning-theoretic insights and causal inference methodologies, particularly in observational studies with complex treatment mechanisms.

nan


Article 627

Title@2025-06-30 (1): Value-Compressed Sparse Column (VCSC): Sparse Matrix Storage for Redundant Data

Title: Value-Compressed Sparse Column (VCSC): Sparse Matrix Storage for Redundant Data Wertkomprimierte Sparse-Spalte (VCSC): Sparse Matrix Storage für redundante Daten 压压式稀释列(VCSC):为冗余数据储存散式矩阵 2309.04355v2

Authors (6): Skyler Ruiter, Seth Wolfgang, Marc Tunnell, Timothy Triche Jr., Erin Carrier, Zachary DeBruine

Compressed Sparse Column (CSC) and Coordinate (COO) are popular compression formats for sparse matrices. However, both CSC and COO are general purpose and cannot take advantage of any of the properties of the data other than sparsity, such as data redundancy. Highly redundant sparse data is common in many machine learning applications, such as genomics, and is often too large for in-core computation using conventional sparse storage formats. In this paper, we present two extensions to CSC: (1) Value-Compressed Sparse Column (VCSC) and (2) Index- and Value-Compressed Sparse Column (IVCSC). VCSC takes advantage of high redundancy within a column to further compress data up to 3-fold over COO and 2.25-fold over CSC, without significant negative impact to performance characteristics. IVCSC extends VCSC by compressing index arrays through delta encoding and byte-packing, achieving a 10-fold decrease in memory usage over COO and 7.5-fold decrease over CSC. Our benchmarks on simulated and real data show that VCSC and IVCSC can be read in compressed form with little added computational cost. These two novel compression formats offer a broadly useful solution to encoding and reading redundant sparse data.

nan


Article 628

Title@2025-06-30 (1): Logit-Gap Steering: Efficient Short-Suffix Jailbreaks for Aligned Large Language Models

Title: Logit-Gap Steering: Efficient Short-Suffix Jailbreaks for Aligned Large Language Models Logit-Gap Steering: Effiziente Short-Suffix Jailbreaks für ausgerichtete große Sprachmodelle Lologit-Gap 指导:通用大语言模型的高效短后休息室 2506.24056v1

Authors (2): Tung-Ling Li, Hongliang Liu

We introduce logit-gap steering, a fast jailbreak framework that casts the refusal-affirmation gap of RLHF-aligned language models as a single pass over the vocabulary. A forward-computable score blends gap reduction with lightweight proxies for KL penalty and reward shift, allowing a “sort-sum-stop” sweep to complete in under a second and return a short suffix–two orders of magnitude fewer model calls than beam or gradient attacks. The same suffix generalises to unseen prompts and scales from 0.5 B to 70 B checkpoints, lifting one-shot attack success from baseline levels to 80-100% while preserving topical coherence. Beyond efficiency, these suffixes expose sentence-boundary reward cliffs and other alignment artefacts, offering a lightweight probe into how safety tuning reshapes internal representations.

nan


Article 629

Title@2025-06-30 (1): Consensus-based optimization for closed-box adversarial attacks and a connection to evolution strategies

Title: Consensus-based optimization for closed-box adversarial attacks and a connection to evolution strategies Konsensbasierte Optimierung für Closed-Box-Adversarial-Angriffe und eine Verbindung zu Evolutionsstrategien 以共识为基础优化对封闭箱对抗性攻击的优化和与演变战略的联系 2506.24048v1

Authors (3): Tim Roith, Leon Bungert, Philipp Wacker

Consensus-based optimization (CBO) has established itself as an efficient gradient-free optimization scheme, with attractive mathematical properties, such as mean-field convergence results for non-convex loss functions. In this work, we study CBO in the context of closed-box adversarial attacks, which are imperceptible input perturbations that aim to fool a classifier, without accessing its gradient. Our contribution is to establish a connection between the so-called consensus hopping as introduced by Riedl et al. and natural evolution strategies (NES) commonly applied in the context of adversarial attacks and to rigorously relate both methods to gradient-based optimization schemes. Beyond that, we provide a comprehensive experimental study that shows that despite the conceptual similarities, CBO can outperform NES and other evolutionary strategies in certain scenarios.

nan


Article 630

Title@2025-06-30 (1): Agent.xpu: Efficient Scheduling of Agentic LLM Workloads on Heterogeneous SoC

Title: Agent.xpu: Efficient Scheduling of Agentic LLM Workloads on Heterogeneous SoC Agent.xpu: Effiziente Planung von Agentic LLM Workloads auf heterogenen SoC Agent.xpu: 高效地安排对异基因 soC 的Agentic LLM 工作负荷 2506.24045v1

Authors (8): Xinming Wei, Jiahao Zhang, Haoran Li, Jiayu Chen, Rui Qu, Maoliang Li, Xiang Chen, Guojie Luo

The proliferation of agentic Large Language Models (LLMs) on personal devices introduces a new class of workloads characterized by a dichotomy of objectives. Reactive tasks, initiated by users, demand immediate, low-latency responses, while proactive tasks operate invisibly and prioritize throughput. Existing on-device LLM engines, designed for isolated inferences, fail to efficiently manage these concurrent and conflicting requests on consumer-grade heterogeneous SoCs with CPU, integrated GPU, and NPU. This paper introduces Agent.xpu, an efficient serving system for agentic LLM workloads on memory-unified heterogeneous SoCs. With dedicated offline profiling, Agent.xpu first constructs a heterogeneous execution graph, which fuses and chunks model kernels for affinity-guided, elastic accelerator mapping with predictive kernel annotation. At runtime, its online scheduler enables fine-grained, kernel-level preemption to guarantee the responsiveness of reactive tasks. To maximize SoC utilization, it adopts slack-aware kernel backfill to opportunistically append proactive tasks, and mitigates NPU-iGPU contention via bandwidth-aware dispatch. Evaluation on an Intel Core Ultra SoC shows that Agent.xpu achieves 4.6$\times$ lower latency for reactive tasks and sustains 1.6$\times$-6.8$\times$ higher throughput for proactive tasks compared to state-of-the-art inference engines.

nan


Article 631

Title@2025-06-30 (1): Faster Diffusion Models via Higher-Order Approximation

Title: Faster Diffusion Models via Higher-Order Approximation Schnellere Diffusionsmodelle über höhere Rangannäherung 通过高分近端传送快速扩散模型 2506.24042v1

Authors (4): Gen Li, Yuchen Zhou, Yuting Wei, Yuxin Chen

In this paper, we explore provable acceleration of diffusion models without any additional retraining. Focusing on the task of approximating a target data distribution in $\mathbb{R}^d$ to within $\varepsilon$ total-variation distance, we propose a principled, training-free sampling algorithm that requires only the order of \(d^{1+2/K} \varepsilon^{-1/K}\) score function evaluations (up to log factor) in the presence of accurate scores, where $K$ is an arbitrarily large fixed integer. This result applies to a broad class of target data distributions, without the need for assumptions such as smoothness or log-concavity. Our theory is robust vis-a-vis inexact score estimation, degrading gracefully as the score estimation error increases – without demanding higher-order smoothness on the score estimates as assumed in previous work. The proposed algorithm draws insight from high-order ODE solvers, leveraging high-order Lagrange interpolation and successive refinement to approximate the integral derived from the probability flow ODE.

nan


Article 632

Title@2025-06-30 (1): Unsupervised Sparse Coding-based Spiking Neural Network for Real-time Spike Sorting

Title: Unsupervised Sparse Coding-based Spiking Neural Network for Real-time Spike Sorting Unüberwachte Sparse Coding-basierte Spiking Neural Network für Echtzeit Spike Sortierung 用于实时 Spiking Spiking Spiking Spiking Spiking Sypt Scorting 的无监督的 Sparse 编码基础 Spiking 神经网络 2506.24041v1

Authors (5): Alexis Melot, Sean U. N. Wood, Yannick Coffinier, Pierre Yger, Fabien Alibart

Spike sorting is a crucial step in decoding multichannel extracellular neural signals, enabling the identification of individual neuronal activity. A key challenge in brain-machine interfaces (BMIs) is achieving real-time, low-power spike sorting at the edge while keeping high neural decoding performance. This study introduces the Neuromorphic Sparse Sorter (NSS), a compact two-layer spiking neural network optimized for efficient spike sorting. NSS leverages the Locally Competitive Algorithm (LCA) for sparse coding to extract relevant features from noisy events with reduced computational demands. NSS learns to sort detected spike waveforms in an online fashion and operates entirely unsupervised. To exploit multi-bit spike coding capabilities of neuromorphic platforms like Intel’s Loihi 2, a custom neuron model was implemented, enabling flexible power-performance trade-offs via adjustable spike bit-widths. Evaluations on simulated and real-world tetrode signals with biological drift showed NSS outperformed established pipelines such as WaveClus3 and PCA+KMeans. With 2-bit graded spikes, NSS on Loihi 2 outperformed NSS implemented with leaky integrate-and-fire neuron and achieved an F1-score of 77% (+10% improvement) while consuming 8.6mW (+1.65mW) when tested on a drifting recording, with a computational processing time of 0.25ms (+60 us) per inference.

nan


Article 633

Title@2025-06-30 (1): Refined climatologies of future precipitation over High Mountain Asia using probabilistic ensemble learning

Title: Refined climatologies of future precipitation over High Mountain Asia using probabilistic ensemble learning Verfeinerte Klimatologien der zukünftigen Niederschlag über High Mountain Asia mit probabilistic ensemble learning 利用概率共变学习,改进高山区亚洲今后降水的气候学 2501.15690v3

Authors (4): Kenza Tazi, Sun Woo P. Kim, Marc Girona-Mata, Richard E. Turner

High Mountain Asia (HMA) holds the highest concentration of frozen water outside the polar regions, serving as a crucial water source for more than 1.9 billion people. Precipitation represents the largest source of uncertainty for future hydrological modelling in this area. In this study, we propose a probabilistic machine learning framework to combine monthly precipitation from 13 regional climate models developed under the Coordinated Regional Downscaling Experiment (CORDEX) over HMA via a mixture of experts (MoE). This approach accounts for seasonal and spatial biases within the models, enabling the prediction of more faithful precipitation distributions. The MoE is trained and validated against gridded historical precipitation data, yielding 32% improvement over an equally-weighted average and 254% improvement over choosing any single ensemble member. This approach is then used to generate precipitation projections for the near future (2036-2065) and far future (2066-2095) under RCP4.5 and RCP8.5 scenarios. Compared to previous estimates, the MoE projects wetter summers but drier winters over the western Himalayas and Karakoram and wetter winters over the Tibetan Plateau, Hengduan Shan, and South East Tibet.

nan


Article 634

Title@2025-06-30 (1): Experimenting, Fast and Slow: Bayesian Optimization of Long-term Outcomes with Online Experiments

Title: Experimenting, Fast and Slow: Bayesian Optimization of Long-term Outcomes with Online Experiments Experimentieren, schnell und langsam: Bayesische Optimierung langfristiger Ergebnisse mit Online-Experimenten 实验、快速和缓慢:利用在线实验优化长期成果 2506.18744v2

Authors (5): Qing Feng, Samuel Daulton, Benjamin Letham, Maximilian Balandat, Eytan Bakshy

Online experiments in internet systems, also known as A/B tests, are used for a wide range of system tuning problems, such as optimizing recommender system ranking policies and learning adaptive streaming controllers. Decision-makers generally wish to optimize for long-term treatment effects of the system changes, which often requires running experiments for a long time as short-term measurements can be misleading due to non-stationarity in treatment effects over time. The sequential experimentation strategies–which typically involve several iterations–can be prohibitively long in such cases. We describe a novel approach that combines fast experiments (e.g., biased experiments run only for a few hours or days) and/or offline proxies (e.g., off-policy evaluation) with long-running, slow experiments to perform sequential, Bayesian optimization over large action spaces in a short amount of time.

nan


Article 635

Title@2025-06-30 (1): Position: Machine Learning Conferences Should Establish a “Refutations and Critiques” Track

Title: Position: Machine Learning Conferences Should Establish a “Refutations and Critiques” Track Position: Machine Learning Konferenzen sollten einen “Refutations and Critiques” Track erstellen 职位:机器学习会议应建立“反驳和批评”轨道 2506.19882v2

Authors (15): Rylan Schaeffer, Joshua Kazdan, Yegor Denisov-Blanch, Brando Miranda, Matthias Gerstgrasser, Susan Zhang, Andreas Haupt, Isha Gupta, Elyas Obbad, Jesse Dodge, Jessica Zosa Forde, Koustuv Sinha, Francesco Orabona, Sanmi Koyejo, David Donoho

Science progresses by iteratively advancing and correcting humanity’s understanding of the world. In machine learning (ML) research, rapid advancements have led to an explosion of publications, but have also led to misleading, incorrect, flawed or perhaps even fraudulent studies being accepted and sometimes highlighted at ML conferences due to the fallibility of peer review. While such mistakes are understandable, ML conferences do not offer robust processes to help the field systematically correct when such errors are made. This position paper argues that ML conferences should establish a dedicated “Refutations and Critiques” (R&C) Track. This R&C Track would provide a high-profile, reputable platform to support vital research that critically challenges prior research, thereby fostering a dynamic self-correcting research ecosystem. We discuss key considerations including track design, review principles, potential pitfalls, and provide an illustrative example submission concerning a recent ICLR 2025 Oral. We conclude that ML conferences should create official, reputable mechanisms to help ML research self-correct.

nan


Article 636

Title@2025-06-30 (1): Riddle Me This! Stealthy Membership Inference for Retrieval-Augmented Generation

Title: Riddle Me This! Stealthy Membership Inference for Retrieval-Augmented Generation Befreien Sie mich damit! Stealthy Mitgliedschaft Inferenz für Retrieval-Augmented Generation 中我这个! 偷盗会员身份的回溯性 被支持的一代人的推论 2502.00306v2

Authors (6): Ali Naseh, Yuefeng Peng, Anshuman Suri, Harsh Chaudhari, Alina Oprea, Amir Houmansadr

Retrieval-Augmented Generation (RAG) enables Large Language Models (LLMs) to generate grounded responses by leveraging external knowledge databases without altering model parameters. Although the absence of weight tuning prevents leakage via model parameters, it introduces the risk of inference adversaries exploiting retrieved documents in the model’s context. Existing methods for membership inference and data extraction often rely on jailbreaking or carefully crafted unnatural queries, which can be easily detected or thwarted with query rewriting techniques common in RAG systems. In this work, we present Interrogation Attack (IA), a membership inference technique targeting documents in the RAG datastore. By crafting natural-text queries that are answerable only with the target document’s presence, our approach demonstrates successful inference with just 30 queries while remaining stealthy; straightforward detectors identify adversarial prompts from existing methods up to ~76x more frequently than those generated by our attack. We observe a 2x improvement in TPR@1%FPR over prior inference attacks across diverse RAG configurations, all while costing less than $0.02 per document inference.

nan


Article 637

Title@2025-06-30 (1): Post-processing of EEG-based Auditory Attention Decoding Decisions via Hidden Markov Models

Title: Post-processing of EEG-based Auditory Attention Decoding Decisions via Hidden Markov Models Nachbearbeitung von EEG-basierten Auditory-Achtungsdekodierungsentscheidungen über Hidden Markov-Modelle 通过隐藏Markov模式处理基于EEG的审计员注意后解码决定 2506.24024v1

Authors (3): Nicolas Heintz, Tom Francart, Alexander Bertrand

Auditory attention decoding (AAD) algorithms exploit brain signals, such as electroencephalography (EEG), to identify which speaker a listener is focusing on in a multi-speaker environment. While state-of-the-art AAD algorithms can identify the attended speaker on short time windows, their predictions are often too inaccurate for practical use. In this work, we propose augmenting AAD with a hidden Markov model (HMM) that models the temporal structure of attention. More specifically, the HMM relies on the fact that a subject is much less likely to switch attention than to keep attending the same speaker at any moment in time. We show how a HMM can significantly improve existing AAD algorithms in both causal (real-time) and non-causal (offline) settings. We further demonstrate that HMMs outperform existing postprocessing approaches in both accuracy and responsiveness, and explore how various factors such as window length, switching frequency, and AAD accuracy influence overall performance. The proposed method is computationally efficient, intuitive to use and applicable in both real-time and offline settings.

nan


Article 638

Title@2025-06-30 (1): Semiparametric Double Reinforcement Learning with Applications to Long-Term Causal Inference

Title: Semiparametric Double Reinforcement Learning with Applications to Long-Term Causal Inference Semiparametrisches Doppel-Verstärkungs-Lernen mit Anwendungen zur langfristigen Kausalität 半参数双重强化学习,应用到长期因果关系推断 2501.06926v3

Authors (5): Lars van der Laan, David Hubbard, Allen Tran, Nathan Kallus, Aurélien Bibaut

Long-term causal effects often must be estimated from short-term data due to limited follow-up in healthcare, economics, and online platforms. Markov Decision Processes (MDPs) provide a natural framework for capturing such long-term dynamics through sequences of states, actions, and rewards. Double Reinforcement Learning (DRL) enables efficient inference on policy values in MDPs, but nonparametric implementations require strong intertemporal overlap assumptions and often exhibit high variance and instability. We propose a semiparametric extension of DRL for efficient inference on linear functionals of the Q-function–such as policy values–in infinite-horizon, time-homogeneous MDPs. By imposing structural restrictions on the Q-function, our approach relaxes the strong overlap conditions required by nonparametric methods and improves statistical efficiency. Under model misspecification, our estimators target the functional of the best-approximating Q-function, with only second-order bias. We provide conditions for valid inference using sieve methods and data-driven model selection. A central challenge in DRL is the estimation of nuisance functions, such as density ratios, which often involve difficult minimax optimization. To address this, we introduce a novel plug-in estimator based on isotonic Bellman calibration, which combines fitted Q-iteration with an isotonic regression adjustment. The estimator is debiased without requiring estimation of additional nuisance functions and reduces high-dimensional overlap assumptions to a one-dimensional condition. Bellman calibration extends isotonic calibration–widely used in prediction and classification–to the MDP setting and may be of independent interest.

nan


Article 639

Title: Bridging Theory and Practice in Link Representation with Graph Neural Networks Überbrückung von Theorie und Praxis in der Linkdarstellung mit Graph Neural Networks 将代表性与图表神经网络联系起来的架桥理论和实践 2506.24018v1

Authors (6): Veronica Lachi, Francesco Ferrini, Antonio Longa, Bruno Lepri, Andrea Passerini, Manfred Jaeger

Graph Neural Networks (GNNs) are widely used to compute representations of node pairs for downstream tasks such as link prediction. Yet, theoretical understanding of their expressive power has focused almost entirely on graph-level representations. In this work, we shift the focus to links and provide the first comprehensive study of GNN expressiveness in link representation. We introduce a unifying framework, the $k_\phi$-$k_\rho$-$m$ framework, that subsumes existing message-passing link models and enables formal expressiveness comparisons. Using this framework, we derive a hierarchy of state-of-the-art methods and offer theoretical tools to analyze future architectures. To complement our analysis, we propose a synthetic evaluation protocol comprising the first benchmark specifically designed to assess link-level expressiveness. Finally, we ask: does expressiveness matter in practice? We use a graph symmetry metric that quantifies the difficulty of distinguishing links and show that while expressive models may underperform on standard benchmarks, they significantly outperform simpler ones as symmetry increases, highlighting the need for dataset-aware model selection.

nan


Article 640

Title@2025-06-30 (1): Connecting phases of matter to the flatness of the loss landscape in analog variational quantum algorithms

Title: Connecting phases of matter to the flatness of the loss landscape in analog variational quantum algorithms Verbinden von Phasen der Materie mit der Ebenheit der Verlustlandschaft in analogen Variations-Quantenalgorithmen 在模拟变异量算法中将物质阶段与损失平整情况相连接 2506.13865v2

Authors (3): Kasidit Srimahajariyapong, Supanut Thanasilp, Thiparat Chotibut

Variational quantum algorithms (VQAs) promise near-term quantum advantage, yet parametrized quantum states commonly built from the digital gate-based approach often suffer from scalability issues such as barren plateaus, where the loss landscape becomes flat. We study an analog VQA ans"atze composed of $M$ quenches of a disordered Ising chain, whose dynamics is native to several quantum simulation platforms. By tuning the disorder strength we place each quench in either a thermalized phase or a many-body-localized (MBL) phase and analyse (i) the ans"atze’s expressivity and (ii) the scaling of loss variance. Numerics shows that both phases reach maximal expressivity at large $M$, but barren plateaus emerge at far smaller $M$ in the thermalized phase than in the MBL phase. Exploiting this gap, we propose an MBL initialisation strategy: initialise the ans"atze in the MBL regime at intermediate quench $M$, enabling an initial trainability while retaining sufficient expressivity for subsequent optimization. The results link quantum phases of matter and VQA trainability, and provide practical guidelines for scaling analog-hardware VQAs.

nan


Article 641

Title@2025-06-30 (1): Graph Neural Networks in Wind Power Forecasting

Title: Graph Neural Networks in Wind Power Forecasting Schaubild Neurale Netze in der Windkraftprognose 风力预报中的神经网络图 2507.00105v1

Authors (2): Javier Castellano, Ignacio Villanueva

We study the applicability of GNNs to the problem of wind energy forecasting. We find that certain architectures achieve performance comparable to our best CNN-based benchmark. The study is conducted on three wind power facilities using five years of historical data. Numerical Weather Prediction (NWP) variables were used as predictors, and models were evaluated on a 24 to 36 hour ahead test horizon.

nan


Article 642

Title@2025-06-30 (1): Pixel super-resolved virtual staining of label-free tissue using diffusion models

Title: Pixel super-resolved virtual staining of label-free tissue using diffusion models Pixel super gelöste virtuelle Färbung von etikettfreiem Gewebe mit Diffusionsmodellen 使用扩散模型对无标签组织使用无标签组织进行像素超解解超像素的虚拟涂污 2410.20073v2

Authors (6): Yijie Zhang, Luzhe Huang, Nir Pillar, Yuzhu Li, Hanlong Chen, Aydogan Ozcan

Virtual staining of tissue offers a powerful tool for transforming label-free microscopy images of unstained tissue into equivalents of histochemically stained samples. This study presents a diffusion model-based super-resolution virtual staining approach utilizing a Brownian bridge process to enhance both the spatial resolution and fidelity of label-free virtual tissue staining, addressing the limitations of traditional deep learning-based methods. Our approach integrates novel sampling techniques into a diffusion model-based image inference process to significantly reduce the variance in the generated virtually stained images, resulting in more stable and accurate outputs. Blindly applied to lower-resolution auto-fluorescence images of label-free human lung tissue samples, the diffusion-based super-resolution virtual staining model consistently outperformed conventional approaches in resolution, structural similarity and perceptual accuracy, successfully achieving a super-resolution factor of 4-5x, increasing the output space-bandwidth product by 16-25-fold compared to the input label-free microscopy images. Diffusion-based super-resolved virtual tissue staining not only improves resolution and image quality but also enhances the reliability of virtual staining without traditional chemical staining, offering significant potential for clinical diagnostics.

nan


Article 643

Title@2025-06-30 (1): Minimax and Bayes Optimal Best-arm Identification: Adaptive Experimental Design for Treatment Choice

Title: Minimax and Bayes Optimal Best-arm Identification: Adaptive Experimental Design for Treatment Choice Minimax und Buchten Optimale Best-Arm-Identifikation: Adaptives Experimentaldesign für die Behandlungswahl Minimax和Bayes 最佳最佳武器识别:治疗选择的适应性实验设计 2506.24007v1

Authors (1): Masahiro Kato

This study investigates adaptive experimental design for treatment choice, also known as fixed-budget best-arm identification. We consider an adaptive procedure consisting of a treatment-allocation phase followed by a treatment-choice phase, and we design an adaptive experiment for this setup to efficiently identify the best treatment arm, defined as the one with the highest expected outcome. In our designed experiment, the treatment-allocation phase consists of two stages. The first stage is a pilot phase, where we allocate each treatment arm uniformly with equal proportions to eliminate clearly suboptimal arms and estimate outcome variances. In the second stage, we allocate treatment arms in proportion to the variances estimated in the first stage. After the treatment-allocation phase, the procedure enters the treatment-choice phase, where we choose the treatment arm with the highest sample mean as our estimate of the best treatment arm. We prove that this single design is simultaneously asymptotically minimax and Bayes optimal for the simple regret, with upper bounds that match our lower bounds up to exact constants. Therefore, our designed experiment achieves the sharp efficiency limits without requiring separate tuning for minimax and Bayesian objectives.

nan


Article 644

Title@2025-06-30 (1): Provably Efficient and Agile Randomized Q-Learning

Title: Provably Efficient and Agile Randomized Q-Learning Wahrscheinlich effizient und agil Randomized Q-Learning 高效率和简易随机化的学习 Q-学习 2506.24005v1

Authors (3): He Wang, Xingyu Xu, Yuejie Chi

While Bayesian-based exploration often demonstrates superior empirical performance compared to bonus-based methods in model-based reinforcement learning (RL), its theoretical understanding remains limited for model-free settings. Existing provable algorithms either suffer from computational intractability or rely on stage-wise policy updates which reduce responsiveness and slow down the learning process. In this paper, we propose a novel variant of Q-learning algorithm, refereed to as RandomizedQ, which integrates sampling-based exploration with agile, step-wise, policy updates, for episodic tabular RL. We establish an $\widetilde{O}(\sqrt{H^5SAT})$ regret bound, where $S$ is the number of states, $A$ is the number of actions, $H$ is the episode length, and $T$ is the total number of episodes. In addition, we present a logarithmic regret bound under a mild positive sub-optimality condition on the optimal Q-function. Empirically, RandomizedQ exhibits outstanding performance compared to existing Q-learning variants with both bonus-based and Bayesian-based exploration on standard benchmarks.

nan


Article 645

Title@2025-06-30 (1): The Illusion of Progress? A Critical Look at Test-Time Adaptation for Vision-Language Models

Title: The Illusion of Progress? A Critical Look at Test-Time Adaptation for Vision-Language Models Die Illusion des Fortschritts? Ein kritischer Blick auf Test-Zeit-Anpassung für Vision-Sprache-Modelle 《进步的幻影? 对视觉语言模型的试验时间适应的批判性审视》 2506.24000v1

Authors (5): Lijun Sheng, Jian Liang, Ran He, Zilei Wang, Tieniu Tan

Test-time adaptation (TTA) methods have gained significant attention for enhancing the performance of vision-language models (VLMs) such as CLIP during inference, without requiring additional labeled data. However, current TTA researches generally suffer from major limitations such as duplication of baseline results, limited evaluation metrics, inconsistent experimental settings, and insufficient analysis. These problems hinder fair comparisons between TTA methods and obscure their practical strengths and weaknesses. To address these challenges, we introduce TTA-VLM, a comprehensive benchmark for evaluating TTA methods on VLMs. Our benchmark implements 8 episodic TTA and 7 online TTA methods within a unified and reproducible framework, and evaluates them across 15 widely used datasets. Unlike prior studies focused solely on CLIP, we extend the evaluation to SigLIP–a model trained with a Sigmoid loss–and include training-time tuning methods such as CoOp, MaPLe, and TeCoA to assess generality. Beyond classification accuracy, TTA-VLM incorporates various evaluation metrics, including robustness, calibration, out-of-distribution detection, and stability, enabling a more holistic assessment of TTA methods. Through extensive experiments, we find that 1) existing TTA methods produce limited gains compared to the previous pioneering work; 2) current TTA methods exhibit poor collaboration with training-time fine-tuning methods; 3) accuracy gains frequently come at the cost of reduced model trustworthiness. We release TTA-VLM to provide fair comparison and comprehensive evaluation of TTA methods for VLMs, and we hope it encourages the community to develop more reliable and generalizable TTA strategies.

nan


Article 646

Title@2025-06-30 (1): Z-REx: Human-Interpretable GNN Explanations for Real Estate Recommendations

Title: Z-REx: Human-Interpretable GNN Explanations for Real Estate Recommendations Z-REx: Human-Interpretable GNN Erklärungen für Immobilien-Empfehlungen Z-REx:对不动产建议的人-可解释的GNN(GNN)解释 2503.18001v2

Authors (3): Kunal Mukherjee, Zachary Harrison, Saeid Balaneshin

Transparency and interpretability are crucial for enhancing customer confidence and user engagement, especially when dealing with black-box Machine Learning (ML)-based recommendation systems. Modern recommendation systems leverage Graph Neural Network (GNN) due to their ability to produce high-quality recommendations in terms of both relevance and diversity. Therefore, the explainability of GNN is especially important for Link Prediction (LP) tasks since recommending relevant items can be viewed as predicting links between users and items. GNN explainability has been a well-studied field, but existing methods primarily focus on node or graph-level tasks, leaving a gap in LP explanation techniques. This work introduces Z-REx, a GNN explanation framework designed explicitly for heterogeneous link prediction tasks. Z-REx utilizes structural and attribute perturbation to identify critical substructures and important features while reducing the search space by leveraging domain-specific knowledge. In our experimentation, we show the efficacy of Z-REx in generating contextually relevant and human-interpretable explanations for ZiGNN, a GNN-based recommendation engine, using a real-world real-estate dataset from Zillow Group, Inc. We compare against State-of-The-Art (SOTA) GNN explainers to show Z-REx outperforms them by 61% in the Fidelity metric by producing superior human-interpretable explanations.

nan


Article 647

Title@2025-06-30 (1): TTRL: Test-Time Reinforcement Learning

Title: TTRL: Test-Time Reinforcement Learning TTRL: Test-Zeit-Verstärkungs-Lernen TTRL: 试验时间强化学习 2504.16084v3

Authors (16): Yuxin Zuo, Kaiyan Zhang, Li Sheng, Shang Qu, Ganqu Cui, Xuekai Zhu, Haozhan Li, Yuchen Zhang, Xinwei Long, Ermo Hua, Biqing Qi, Youbang Sun, Zhiyuan Ma, Lifan Yuan, Ning Ding, Bowen Zhou

This paper investigates Reinforcement Learning (RL) on data without explicit labels for reasoning tasks in Large Language Models (LLMs). The core challenge of the problem is reward estimation during inference while not having access to ground-truth information. While this setting appears elusive, we find that common practices in Test-Time Scaling (TTS), such as majority voting, yield surprisingly effective rewards suitable for driving RL training. In this work, we introduce Test-Time Reinforcement Learning (TTRL), a novel method for training LLMs using RL on unlabeled data. TTRL enables self-evolution of LLMs by utilizing the priors in the pre-trained models. Our experiments demonstrate that TTRL consistently improves performance across a variety of tasks and models. Notably, TTRL boosts the pass@1 performance of Qwen-2.5-Math-7B by approximately 211% on the AIME 2024 with only unlabeled test data. Furthermore, although TTRL is only supervised by the maj@n metric, TTRL has demonstrated performance to consistently surpass the upper limit of the initial model maj@n, and approach the performance of models trained directly on test data with ground-truth labels. Our experimental findings validate the general effectiveness of TTRL across various tasks and highlight TTRL’s potential for broader tasks and domains. GitHub: https://github.com/PRIME-RL/TTRL

nan


Article 648

Title@2025-06-30 (1): DGSAM: Domain Generalization via Individual Sharpness-Aware Minimization

Title: DGSAM: Domain Generalization via Individual Sharpness-Aware Minimization DGSAM: Domänenverallgemeinerung durch individuelle Schärfen-Bewusst-Minimierung DGSAM:通过个人锐化-最小化 2503.23430v2

Authors (5): Youngjun Song, Youngsik Hwang, Jonghun Lee, Heechang Lee, Dong-Young Lim

Domain generalization (DG) aims to learn models that perform well on unseen target domains by training on multiple source domains. Sharpness-Aware Minimization (SAM), known for finding flat minima that improve generalization, has therefore been widely adopted in DG. However, our analysis reveals that SAM in DG may converge to \textit{fake flat minima}, where the total loss surface appears flat in terms of global sharpness but remains sharp with respect to individual source domains. To understand this phenomenon more precisely, we formalize the average worst-case domain risk as the maximum loss under domain distribution shifts within a bounded divergence, and derive a generalization bound that reveals the limitations of global sharpness-aware minimization. In contrast, we show that individual sharpness provides a valid upper bound on this risk, making it a more suitable proxy for robust domain generalization. Motivated by these insights, we shift the DG paradigm toward minimizing individual sharpness across source domains. We propose \textit{Decreased-overhead Gradual SAM (DGSAM)}, which applies gradual domain-wise perturbations in a computationally efficient manner to consistently reduce individual sharpness. Extensive experiments demonstrate that DGSAM not only improves average accuracy but also reduces performance variance across domains, while incurring less computational overhead than SAM.

nan


Article 649

Title@2025-06-30 (1): The Jacobian and Hessian of the Kullback-Leibler Divergence between Multivariate Gaussian Distributions (Technical Report)

Title: The Jacobian and Hessian of the Kullback-Leibler Divergence between Multivariate Gaussian Distributions (Technical Report) Jacobian und Hessian der Kullback-Leiibler Divergenz zwischen multivariaten Gaussian Verteilungen (Technischer Bericht) 《多种变数高斯发行(技术报告)之间的 库尔贝克-利比尔差异(技术报告)》的Jacobian和Hessian 2506.23996v1

Authors (1): Juan Maroñas

This document shows how to obtain the Jacobian and Hessian matrices of the Kullback-Leibler divergence between two multivariate Gaussian distributions, using the first and second-order differentials. The presented derivations are based on the theory presented by \cite{magnus99}. I’ve also got great inspiration from some of the derivations in \cite{minka}. Since I pretend to be at most didactic, the document is split into a summary of results and detailed derivations on each of the elements involved, with specific references to the tricks used in the derivations, and to many of the underlying concepts.

nan


Article 650

Title@2025-06-30 (1): ROCK: A variational formulation for occupation kernel methods in Reproducing Kernel Hilbert Spaces

Title: ROCK: A variational formulation for occupation kernel methods in Reproducing Kernel Hilbert Spaces ROCK: Eine Variationsformulierung für Okkupationskern-Methoden in der Reproduzierung von Kernel Hilbert Spaces ROCK: 复制凯尔内尔·希尔伯特航天公司职业内核方法的变式配方 2503.13791v2

Authors (6): Victor Rielly, Kamel Lahouel, Chau Nguyen, Anthony Kolshorn, Nicholas Fisher, Bruno Jedynak

We present a Representer Theorem result for a large class of weak formulation problems. We provide examples of applications of our formulation both in traditional machine learning and numerical methods as well as in new and emerging techniques. Finally we apply our formulation to generalize the multivariate occupation kernel (MOCK) method for learning dynamical systems from data proposing the more general Riesz Occupation Kernel (ROCK) method. Our generalized methods are both more computationally efficient and performant on most of the benchmarks we test against.

nan


Article 651

Title@2025-06-30 (1): Machine Understanding of Scientific Language

Title: Machine Understanding of Scientific Language Maschinelles Verständnis der wissenschaftlichen Sprache 科学语言机器理解 2506.23990v1

Authors (1): Dustin Wright

Scientific information expresses human understanding of nature. This knowledge is largely disseminated in different forms of text, including scientific papers, news articles, and discourse among people on social media. While important for accelerating our pursuit of knowledge, not all scientific text is faithful to the underlying science. As the volume of this text has burgeoned online in recent years, it has become a problem of societal importance to be able to identify the faithfulness of a given piece of scientific text automatically. This thesis is concerned with the cultivation of datasets, methods, and tools for machine understanding of scientific language, in order to analyze and understand science communication at scale. To arrive at this, I present several contributions in three areas of natural language processing and machine learning: automatic fact checking, learning with limited data, and scientific text processing. These contributions include new methods and resources for identifying check-worthy claims, adversarial claim generation, multi-source domain adaptation, learning from crowd-sourced labels, cite-worthiness detection, zero-shot scientific fact checking, detecting exaggerated scientific claims, and modeling degrees of information change in science communication. Critically, I demonstrate how the research outputs of this thesis are useful for effectively learning from limited amounts of scientific text in order to identify misinformative scientific statements and generate new insights into the science communication process

nan


Article 652

Title@2025-06-30 (1): Table Foundation Models: on knowledge pre-training for tabular learning

Title: Table Foundation Models: on knowledge pre-training for tabular learning Tabelle Grundlagenmodelle: Wissensvorausbildung für das tabellarische Lernen 表 基础基础表 模式:关于以表格形式学习的知识预培训模式 2505.14415v2

Authors (5): Myung Jun Kim, Félix Lefebvre, Gaëtan Brison, Alexandre Perez-Lebel, Gaël Varoquaux

Table foundation models bring high hopes to data science: pre-trained on tabular data to embark knowledge or priors, they should facilitate downstream tasks on tables. One specific challenge is that of data semantics: numerical entries take their meaning from context, e.g., column name. Pre-trained neural networks that jointly model column names and table entries have recently boosted prediction accuracy. While these models outline the promises of world knowledge to interpret table values, they lack the convenience of popular foundation models in text or vision. Indeed, they must be fine-tuned to bring benefits, come with sizeable computation costs, and cannot easily be reused or combined with other architectures. Here we introduce TARTE, a foundation model that transforms tables to knowledge-enhanced vector representations using the string to capture semantics. Pre-trained on large relational data, TARTE yields representations that facilitate subsequent learning with little additional cost. These representations can be fine-tuned or combined with other learners, giving models that push the state-of-the-art prediction performance and improve the prediction/computation performance trade-off. Specialized to a task or a domain, TARTE gives domain-specific representations that facilitate further learning. Our study demonstrates an effective approach to knowledge pre-training for tabular learning.

nan


Article 653

Title@2025-06-30 (1): LLM Agents Are the Antidote to Walled Gardens

Title: LLM Agents Are the Antidote to Walled Gardens LLM-Agenten sind das Gegenmittel zu ummauerten Gärten LLM 药剂是被围墙隔绝的花园的抗药剂 2506.23978v1

Authors (2): Samuele Marro, Philip Torr

While the Internet’s core infrastructure was designed to be open and universal, today’s application layer is dominated by closed, proprietary platforms. Open and interoperable APIs require significant investment, and market leaders have little incentive to enable data exchange that could erode their user lock-in. We argue that LLM-based agents fundamentally disrupt this status quo. Agents can automatically translate between data formats and interact with interfaces designed for humans: this makes interoperability dramatically cheaper and effectively unavoidable. We name this shift universal interoperability: the ability for any two digital services to exchange data seamlessly using AI-mediated adapters. Universal interoperability undermines monopolistic behaviours and promotes data portability. However, it can also lead to new security risks and technical debt. Our position is that the ML community should embrace this development while building the appropriate frameworks to mitigate the downsides. By acting now, we can harness AI to restore user freedom and competitive markets without sacrificing security.

nan


Article 654

Title@2025-06-30 (1): A Scalable Approach for Safe and Robust Learning via Lipschitz-Constrained Networks

Title: A Scalable Approach for Safe and Robust Learning via Lipschitz-Constrained Networks Ein skalierbarer Ansatz für sicheres und robustes Lernen über Lipschitz-Kontrainierte Netzwerke 通过Lipschitz-Consered 网络推广安全和有力的学习方法 2506.23977v1

Authors (3): Zain ul Abdeen, Vassilis Kekatos, Ming Jin

Certified robustness is a critical property for deploying neural networks (NN) in safety-critical applications. A principle approach to achieving such guarantees is to constrain the global Lipschitz constant of the network. However, accurate methods for Lipschitz-constrained training often suffer from non-convex formulations and poor scalability due to reliance on global semidefinite programs (SDPs). In this letter, we propose a convex training framework that enforces global Lipschitz constraints via semidefinite relaxation. By reparameterizing the NN using loop transformation, we derive a convex admissibility condition that enables tractable and certifiable training. While the resulting formulation guarantees robustness, its scalability is limited by the size of global SDP. To overcome this, we develop a randomized subspace linear matrix inequalities (RS-LMI) approach that decomposes the global constraints into sketched layerwise constraints projected onto low-dimensional subspaces, yielding a smooth and memory-efficient training objective. Empirical results on MNIST, CIFAR-10, and ImageNet demonstrate that the proposed framework achieves competitive accuracy with significantly improved Lipschitz bounds and runtime performance.

nan


Article 655

Title@2025-06-30 (1): SP$^2$OT: Semantic-Regularized Progressive Partial Optimal Transport for Imbalanced Clustering

Title: SP$^2$OT: Semantic-Regularized Progressive Partial Optimal Transport for Imbalanced Clustering SP$^2$OT: Semantisch-regularisierter Progressiver Partieller Optimaler Transport für unausgewogenes Clustering SP$2$OT: 用于平衡集束的语义-分解渐进式最佳部分运输 2404.03446v2

Authors (3): Chuyu Zhang, Hui Ren, Xuming He

Deep clustering, which learns representation and semantic clustering without labels information, poses a great challenge for deep learning-based approaches. Despite significant progress in recent years, most existing methods focus on uniformly distributed datasets, significantly limiting the practical applicability of their methods. In this paper, we propose a more practical problem setting named deep imbalanced clustering, where the underlying classes exhibit an imbalance distribution. To address this challenge, we introduce a novel optimal transport-based pseudo-label learning framework. Our framework formulates pseudo-label generation as a Semantic-regularized Progressive Partial Optimal Transport (SP$^2$OT) problem, which progressively transports each sample to imbalanced clusters under prior and semantic relation constraints, thus generating high-quality and imbalance-aware pseudo-labels. To solve the SP$^2$OT problem, we propose a projected mirror descent algorithm, which alternates between: (1) computing the gradient of the SP$^2$OT objective, and (2) performing gradient descent with projection via an entropy-regularized progressive partial optimal transport formulation. Furthermore, we formulate the second step as an unbalanced optimal transport problem with augmented constraints and develop an efficient solution based on fast matrix scaling algorithms. Experiments on various datasets, including a human-curated long-tailed CIFAR100, challenging ImageNet-R, and large-scale subsets of fine-grained iNaturalist2018 datasets, demonstrate the superiority of our method. Code is available: https://github.com/rhfeiyang/SPPOT

nan


Article 656

Title@2025-06-30 (1): UMA: A Family of Universal Models for Atoms

Title: UMA: A Family of Universal Models for Atoms UMA: Eine Familie universeller Modelle für Atome UMA: “ 原子世界模型大家庭 “ 2506.23971v1

Authors (18): Brandon M. Wood, Misko Dzamba, Xiang Fu, Meng Gao, Muhammed Shuaibi, Luis Barroso-Luque, Kareem Abdelmaqsoud, Vahe Gharakhanyan, John R. Kitchin, Daniel S. Levine, Kyle Michel, Anuroop Sriram, Taco Cohen, Abhishek Das, Ammar Rizvi, Sushree Jagriti Sahoo, Zachary W. Ulissi, C. Lawrence Zitnick

The ability to quickly and accurately compute properties from atomic simulations is critical for advancing a large number of applications in chemistry and materials science including drug discovery, energy storage, and semiconductor manufacturing. To address this need, Meta FAIR presents a family of Universal Models for Atoms (UMA), designed to push the frontier of speed, accuracy, and generalization. UMA models are trained on half a billion unique 3D atomic structures (the largest training runs to date) by compiling data across multiple chemical domains, e.g. molecules, materials, and catalysts. We develop empirical scaling laws to help understand how to increase model capacity alongside dataset size to achieve the best accuracy. The UMA small and medium models utilize a novel architectural design we refer to as mixture of linear experts that enables increasing model capacity without sacrificing speed. For example, UMA-medium has 1.4B parameters but only ~50M active parameters per atomic structure. We evaluate UMA models on a diverse set of applications across multiple domains and find that, remarkably, a single model without any fine-tuning can perform similarly or better than specialized models. We are releasing the UMA code, weights, and associated data to accelerate computational workflows and enable the community to continue to build increasingly capable AI models.

nan


Article 657

Title@2025-06-30 (1): Learning Constraints Directly from Network Data

Title: Learning Constraints Directly from Network Data Lernbeschränkungen direkt aus Netzwerkdaten 直接来自网络数据的学习限制 2506.23964v1

Authors (3): Hongyu Hè, Minhao Jin, Maria Apostolaki

Network data conforms to a wide range of rules that arise from protocols, design principles, and deployment decisions (e.g., a packet’s queuing delay must be less than its end-to-end delay). Formalizing such rules as logic constraints can (i) improve the quality of synthetic data, (ii) reduce the brittleness of machine learning (ML) models, and (iii) improve semantic understanding of network measurements. However, these benefits remain out of reach if rule extraction is manual or solely reliant on ML, as both approaches yield incomplete, unreliable, and/or inaccurate rules. This paper formulates rule extraction as a constraint modeling problem and introduces NetNomos that learns propositional logic constraints directly from raw network measurements. Constraint modeling in this domain is uniquely challenging due to the scale of the data, the inherent learning complexity and passive environment, and the lack of ground truth supervision. NetNomos addresses these challenges via a lattice-based search structured by constraint specificity and succinctness. Our approach reduces learning complexity from superquadratic to logarithmic and enables efficient traversal in combinatorial search space. Our evaluations on diverse network datasets show that NetNomos learns all benchmark rules, including those associated with as little as 0.01% of data points, in under three hours. In contrast, baseline methods discover less than 25% of the rules and require several days to run. Through three case studies, we show that: NetNomos (i) finds rule violations in the outputs of all seven synthetic traffic generators, hence can be used to assess and guide their generation process; (ii) detects semantic differences in traffic, hence can be used for anomaly detection; and (iii) automatically finds rules used for telemetry imputation, hence can support monitoring through inference.

nan


Article 658

Title@2025-06-30 (1): ADReFT: Adaptive Decision Repair for Safe Autonomous Driving via Reinforcement Fine-Tuning

Title: ADReFT: Adaptive Decision Repair for Safe Autonomous Driving via Reinforcement Fine-Tuning ADReFT: Adaptive Entscheidungsreparatur für sicheres autonomes Fahren durch Verstärkung Feintuning ADREFT: 安全自主驾驶的适应性决定修补 2506.23960v1

Authors (5): Mingfei Cheng, Xiaofei Xie, Renzhi Wang, Yuan Zhou, Ming Hu

Autonomous Driving Systems (ADSs) continue to face safety-critical risks due to the inherent limitations in their design and performance capabilities. Online repair plays a crucial role in mitigating such limitations, ensuring the runtime safety and reliability of ADSs. Existing online repair solutions enforce ADS compliance by transforming unacceptable trajectories into acceptable ones based on predefined specifications, such as rule-based constraints or training datasets. However, these approaches often lack generalizability, adaptability and tend to be overly conservative, resulting in ineffective repairs that not only fail to mitigate safety risks sufficiently but also degrade the overall driving experience. To address this issue, we propose Adaptive Decision Repair (ADReFT), a novel and effective repair method that identifies safety-critical states through offline learning from failed tests and generates appropriate mitigation actions to improve ADS safety. Specifically, ADReFT incorporates a transformer-based model with two joint heads, State Monitor and Decision Adapter, designed to capture complex driving environment interactions to evaluate state safety severity and generate adaptive repair actions. Given the absence of oracles for state safety identification, we first pretrain ADReFT using supervised learning with coarse annotations, i.e., labeling states preceding violations as positive samples and others as negative samples. It establishes ADReFT’s foundational capability to mitigate safety-critical violations, though it may result in somewhat conservative mitigation strategies. Therefore, we subsequently finetune ADReFT using reinforcement learning to improve its initial capability and generate more precise and contextually appropriate repair decisions. Our evaluation results illustrate that ADReFT achieves better repair performance.

nan


Article 659

Title@2025-06-30 (1): Intelligent Orchestration of Distributed Large Foundation Model Inference at the Edge

Title: Intelligent Orchestration of Distributed Large Foundation Model Inference at the Edge Intelligente Orchestrierung der verteilten Large Foundation Model Inferenz am Rande 分散在边缘的大基金会模型推断 2504.03668v2

Authors (3): Fernando Koch, Aladin Djuhera, Alecio Binotto

Large Foundation Models (LFMs), including multi-modal and generative models, promise to unlock new capabilities for next-generation Edge AI applications. However, performing inference with LFMs in resource-constrained and heterogeneous edge environments, such as Multi-access Edge Computing (MEC), presents significant challenges for workload orchestration due to time-varying network, compute, and storage conditions. In particular, current split inference strategies, which partition LFM layers across nodes, are not designed to adapt to fluctuating workloads, dynamic bandwidth conditions, or evolving privacy constraints in high-utilization MEC environments. In this work, we propose a novel adaptive split inference orchestration framework that elevates both the placement and partitioning of LFM layers to runtime-tunable variables. Specifically, our framework enables real-time, quality-of-service (QoS)-aware management of inference workloads by extending conventional orchestrators with three key services: (1) Capacity-aware workload distribution, which continuously profiles node resources and selects an optimal subset of MEC nodes; (2) Dynamic partition migration, which transparently relocates pre-cut LFM segments in response to changes in utilization or network conditions; (3) Real-time reconfiguration, which dynamically re-splits LFM layers to balance latency, throughput, and privacy. We formalize the joint placement-partitioning problem, outline a reference architecture and algorithmic workflow, and discuss applicability in representative smart city, V2X, and industrial edge scenarios.

nan


Article 660

Title@2025-06-30 (1): Bridging the Gap with Retrieval-Augmented Generation: Making Prosthetic Device User Manuals Available in Marginalised Languages

Title: Bridging the Gap with Retrieval-Augmented Generation: Making Prosthetic Device User Manuals Available in Marginalised Languages Bridging the Gap with Retrieval-Augmented Generation: Prothetische Geräte-Handbücher in marginalisierten Sprachen verfügbar machen 缩小与回收-启动一代人的差距:在边际语言中提供假肢设备用户手册 2506.23958v1

Authors (6): Ikechukwu Ogbonna, Lesley Davidson, Soumya Banerjee, Abhishek Dasgupta, Laurence Kenney, Vikranth Harthikote Nagaraja

Millions of people in African countries face barriers to accessing healthcare due to language and literacy gaps. This research tackles this challenge by transforming complex medical documents – in this case, prosthetic device user manuals – into accessible formats for underserved populations. This case study in cross-cultural translation is particularly pertinent/relevant for communities that receive donated prosthetic devices but may not receive the accompanying user documentation. Or, if available online, may only be available in formats (e.g., language and readability) that are inaccessible to local populations (e.g., English-language, high resource settings/cultural context). The approach is demonstrated using the widely spoken Pidgin dialect, but our open-source framework has been designed to enable rapid and easy extension to other languages/dialects. This work presents an AI-powered framework designed to process and translate complex medical documents, e.g., user manuals for prosthetic devices, into marginalised languages. The system enables users – such as healthcare workers or patients – to upload English-language medical equipment manuals, pose questions in their native language, and receive accurate, localised answers in real time. Technically, the system integrates a Retrieval-Augmented Generation (RAG) pipeline for processing and semantic understanding of the uploaded manuals. It then employs advanced Natural Language Processing (NLP) models for generative question-answering and multilingual translation. Beyond simple translation, it ensures accessibility to device instructions, treatment protocols, and safety information, empowering patients and clinicians to make informed healthcare decisions.

nan


Article 661

Title@2025-06-30 (1): Scaling and renormalization in high-dimensional regression

Title: Scaling and renormalization in high-dimensional regression Skalierung und Renormalisierung bei hochdimensionaler Regression 高维回归中的缩放和重新整顿 2405.00592v4

Authors (3): Alexander Atanasov, Jacob A. Zavatone-Veth, Cengiz Pehlevan

From benign overfitting in overparameterized models to rich power-law scalings in performance, simple ridge regression displays surprising behaviors sometimes thought to be limited to deep neural networks. This balance of phenomenological richness with analytical tractability makes ridge regression the model system of choice in high-dimensional machine learning. In this paper, we present a unifying perspective on recent results on ridge regression using the basic tools of random matrix theory and free probability, aimed at readers with backgrounds in physics and deep learning. We highlight the fact that statistical fluctuations in empirical covariance matrices can be absorbed into a renormalization of the ridge parameter. This `deterministic equivalence’ allows us to obtain analytic formulas for the training and generalization errors in a few lines of algebra by leveraging the properties of the $S$-transform of free probability. From these precise asymptotics, we can easily identify sources of power-law scaling in model performance. In all models, the $S$-transform corresponds to the train-test generalization gap, and yields an analogue of the generalized-cross-validation estimator. Using these techniques, we derive fine-grained bias-variance decompositions for a very general class of random feature models with structured covariates. This allows us to discover a scaling regime for random feature models where the variance due to the features limits performance in the overparameterized setting. We also demonstrate how anisotropic weight structure in random feature models can limit performance and lead to nontrivial exponents for finite-width corrections in the overparameterized setting. Our results extend and provide a unifying perspective on earlier models of neural scaling laws.

nan


Article 662

Title@2025-06-30 (1): QPART: Adaptive Model Quantization and Dynamic Workload Balancing for Accuracy-aware Edge Inference

Title: QPART: Adaptive Model Quantization and Dynamic Workload Balancing for Accuracy-aware Edge Inference QPART: Adaptive Modell-Quantisierung und dynamische Workload-Balancing für akkurat-bewusste Edge-Inferenz QPART: 适应性模型量化和动态工作量平衡,以利准确度认知边缘推断 2506.23934v1

Authors (6): Xiangchen Li, Saeid Ghafouri, Bo Ji, Hans Vandierendonck, Deepu John, Dimitrios S. Nikolopoulos

As machine learning inferences increasingly move to edge devices, adapting to diverse computational capabilities, hardware, and memory constraints becomes more critical. Instead of relying on a pre-trained model fixed for all future inference queries across diverse edge devices, we argue that planning an inference pattern with a request-specific model tailored to the device’s computational capacity, accuracy requirements, and time constraints is more cost-efficient and robust to diverse scenarios. To this end, we propose an accuracy-aware and workload-balanced inference system that integrates joint model quantization and inference partitioning. In this approach, the server dynamically responds to inference queries by sending a quantized model and adaptively sharing the inference workload with the device. Meanwhile, the device’s computational power, channel capacity, and accuracy requirements are considered when deciding. Furthermore, we introduce a new optimization framework for the inference system, incorporating joint model quantization and partitioning. Our approach optimizes layer-wise quantization bit width and partition points to minimize time consumption and cost while accounting for varying accuracy requirements of tasks through an accuracy degradation metric in our optimization model. To our knowledge, this work represents the first exploration of optimizing quantization layer-wise bit-width in the inference serving system, by introducing theoretical measurement of accuracy degradation. Simulation results demonstrate a substantial reduction in overall time and power consumption, with computation payloads decreasing by over 80% and accuracy degradation kept below 1%.

nan


Article 663

Title@2025-06-30 (1): Industrial brain: a human-like autonomous neuro-symbolic cognitive decision-making system

Title: Industrial brain: a human-like autonomous neuro-symbolic cognitive decision-making system Industrielles Gehirn: ein menschenähnliches autonomes neuro-symbolisches kognitives Entscheidungssystem 工业大脑:像人一样的自主性神经 – – 精神 – – 精神 – – 左侧神经 – – 认知决策系统 2506.23926v1

Authors (4): Junping Wang, Bicheng Wang, Yibo Xuea, Yuan Xie

Resilience non-equilibrium measurement, the ability to maintain fundamental functionality amidst failures and errors, is crucial for scientific management and engineering applications of industrial chain. The problem is particularly challenging when the number or types of multiple co-evolution of resilience (for example, randomly placed) are extremely chaos. Existing end-to-end deep learning ordinarily do not generalize well to unseen full-feld reconstruction of spatiotemporal co-evolution structure, and predict resilience of network topology, especially in multiple chaos data regimes typically seen in real-world applications. To address this challenge, here we propose industrial brain, a human-like autonomous cognitive decision-making and planning framework integrating higher-order activity-driven neuro network and CT-OODA symbolic reasoning to autonomous plan resilience directly from observational data of global variable. The industrial brain not only understands and model structure of node activity dynamics and network co-evolution topology without simplifying assumptions, and reveal the underlying laws hidden behind complex networks, but also enabling accurate resilience prediction, inference, and planning. Experimental results show that industrial brain significantly outperforms resilience prediction and planning methods, with an accurate improvement of up to 10.8\% over GoT and OlaGPT framework and 11.03\% over spectral dimension reduction. It also generalizes to unseen topologies and dynamics and maintains robust performance despite observational disturbances. Our findings suggest that industrial brain addresses an important gap in resilience prediction and planning for industrial chain.

nan


Article 664

Title@2025-06-30 (1): Reinforcement Learning for Synchronised Flow Control in a Dual-Gate Resin Infusion System

Title: Reinforcement Learning for Synchronised Flow Control in a Dual-Gate Resin Infusion System Verstärktes Lernen zur Synchronisierung der Durchflussregelung in einem Dual-Gate-Resin-Infusionssystem 在双气体Resin Incent Incent System 中进行强化学习,以进行同步流动控制 2506.23923v1

Authors (6): Miguel Camacho-Sánchez, Fernando García-Torres, Jesper John Lisegaard, Rocío del Amor, Sankhya Mohanty, Valery Naranjo

Resin infusion (RI) and resin transfer moulding (RTM) are critical processes for the manufacturing of high-performance fibre-reinforced polymer composites, particularly for large-scale applications such as wind turbine blades. Controlling the resin flow dynamics in these processes is critical to ensure the uniform impregnation of the fibre reinforcements, thereby preventing residual porosities and dry spots that impact the consequent structural integrity of the final component. This paper presents a reinforcement learning (RL) based strategy, established using process simulations, for synchronising the different resin flow fronts in an infusion scenario involving two resin inlets and a single outlet. Using Proximal Policy Optimisation (PPO), our approach addresses the challenge of managing the fluid dynamics in a partially observable environment. The results demonstrate the effectiveness of the RL approach in achieving an accurate flow convergence, highlighting its potential towards improving process control and product quality in composites manufacturing.

nan


Article 665

Title@2025-06-30 (1): The Trilemma of Truth in Large Language Models

Title: The Trilemma of Truth in Large Language Models Das Trilemma der Wahrheit in großen Sprachmodellen 大语言模型中的真理三边 2506.23921v1

Authors (2): Germans Savcisens, Tina Eliassi-Rad

We often attribute human characteristics to large language models (LLMs) and claim that they “know” certain things. LLMs have an internal probabilistic knowledge that represents information retained during training. How can we assess the veracity of this knowledge? We examine two common methods for probing the veracity of LLMs and discover several assumptions that are flawed. To address these flawed assumptions, we introduce sAwMIL (short for Sparse Aware Multiple-Instance Learning), a probing method that utilizes the internal activations of LLMs to separate statements into true, false, and neither. sAwMIL is based on multiple-instance learning and conformal prediction. We evaluate sAwMIL on 5 validity criteria across 16 open-source LLMs, including both default and chat-based variants, as well as on 3 new datasets. Among the insights we provide are: (1) the veracity signal is often concentrated in the third quarter of an LLM’s depth; (2) truth and falsehood signals are not always symmetric; (3) linear probes perform better on chat models than on default models; (4) nonlinear probes may be required to capture veracity signals for some LLMs with reinforcement learning from human feedback or knowledge distillation; and (5) LLMs capture a third type of signal that is distinct from true and false and is neither true nor false. These findings provide a reliable method for verifying what LLMs “know” and how certain they are of their probabilistic internal knowledge.

nan


Article 666

Title@2025-06-30 (1): Intrinsic Dimensionality of Fermi-Pasta-Ulam-Tsingou High-Dimensional Trajectories Through Manifold Learning: A Linear Approach

Title: Intrinsic Dimensionality of Fermi-Pasta-Ulam-Tsingou High-Dimensional Trajectories Through Manifold Learning: A Linear Approach Intrinsische Dimensionalität von Fermi-Pasta-Ulam-Tsingou Hochdimensionale Trajektorien durch manifoldes Lernen: Ein linearer Ansatz Fermi-Pasta-Ulam-Tsingou 高多元轨迹的内在维度 2411.02058v2

Authors (1): Gionni Marchetti

A data-driven approach based on unsupervised machine learning is proposed to infer the intrinsic dimension $m^{\ast}$ of the high-dimensional trajectories of the Fermi-Pasta-Ulam-Tsingou (FPUT) model. Principal component analysis (PCA) is applied to trajectory data consisting of $n_s = 4,000,000$ datapoints, of the FPUT $\beta$ model with $N = 32$ coupled oscillators, revealing a critical relationship between $m^{\ast}$ and the model’s nonlinear strength. By estimating the intrinsic dimension $m^{\ast}$ using multiple methods (participation ratio, Kaiser rule, and the Kneedle algorithm), it is found that $m^{\ast}$ increases with the model nonlinearity. Interestingly, in the weakly nonlinear regime, for trajectories initialized by exciting the first mode, the participation ratio estimates $m^{\ast} = 2, 3$, strongly suggesting that quasi-periodic motion on a low-dimensional Riemannian manifold underlies the characteristic energy recurrences observed in the FPUT model.

nan


Article 667

Title@2025-06-30 (1): Learning robust parameter inference and density reconstruction in flyer plate impact experiments

Title: Learning robust parameter inference and density reconstruction in flyer plate impact experiments Erlernen von robusten Parameterinferenzen und Dichterekonstruktionen in Flyer-Platteneinschlagsexperimenten 飞行板块撞击实验中学习强力参数推力和密度重建 2506.23914v1

Authors (5): Evan Bell, Daniel A. Serino, Ben S. Southworth, Trevor Wilcox, Marc L. Klasky

Estimating physical parameters or material properties from experimental observations is a common objective in many areas of physics and material science. In many experiments, especially in shock physics, radiography is the primary means of observing the system of interest. However, radiography does not provide direct access to key state variables, such as density, which prevents the application of traditional parameter estimation approaches. Here we focus on flyer plate impact experiments on porous materials, and resolving the underlying parameterized equation of state (EoS) and crush porosity model parameters given radiographic observation(s). We use machine learning as a tool to demonstrate with high confidence that using only high impact velocity data does not provide sufficient information to accurately infer both EoS and crush model parameters, even with fully resolved density fields or a dynamic sequence of images. We thus propose an observable data set consisting of low and high impact velocity experiments/simulations that capture different regimes of compaction and shock propagation, and proceed to introduce a generative machine learning approach which produces a posterior distribution of physical parameters directly from radiographs. We demonstrate the effectiveness of the approach in estimating parameters from simulated flyer plate impact experiments, and show that the obtained estimates of EoS and crush model parameters can then be used in hydrodynamic simulations to obtain accurate and physically admissible density reconstructions. Finally, we examine the robustness of the approach to model mismatches, and find that the learned approach can provide useful parameter estimates in the presence of out-of-distribution radiographic noise and previously unseen physics, thereby promoting a potential breakthrough in estimating material properties from experimental radiographic images.

nan


Article 668

Title@2025-06-30 (1): RawMal-TF: Raw Malware Dataset Labeled by Type and Family

Title: RawMal-TF: Raw Malware Dataset Labeled by Type and Family RawMal-TF: Raw Malware Datensatz nach Typ und Familie gekennzeichnet RawMal- TF: Raw Maware 数据集按类型和家庭标签 2506.23909v1

Authors (3): David Bálik, Martin Jureček, Mark Stamp

This work addresses the challenge of malware classification using machine learning by developing a novel dataset labeled at both the malware type and family levels. Raw binaries were collected from sources such as VirusShare, VX Underground, and MalwareBazaar, and subsequently labeled with family information parsed from binary names and type-level labels integrated from ClarAVy. The dataset includes 14 malware types and 17 malware families, and was processed using a unified feature extraction pipeline based on static analysis, particularly extracting features from Portable Executable headers, to support advanced classification tasks. The evaluation was focused on three key classification tasks. In the binary classification of malware versus benign samples, Random Forest and XGBoost achieved high accuracy on the full datasets, reaching 98.5% for type-based detection and 98.98% for family-based detection. When using truncated datasets of 1,000 samples to assess performance under limited data conditions, both models still performed strongly, achieving 97.6% for type-based detection and 98.66% for family-based detection. For interclass classification, which distinguishes between malware types or families, the models reached up to 97.5% accuracy on type-level tasks and up to 93.7% on family-level tasks. In the multiclass classification setting, which assigns samples to the correct type or family, SVM achieved 81.1% accuracy on type labels, while Random Forest and XGBoost reached approximately 73.4% on family labels. The results highlight practical trade-offs between accuracy and computational cost, and demonstrate that labeling at both the type and family levels enables more fine-grained and insightful malware classification. The work establishes a robust foundation for future research on advanced malware detection and classification.

nan


Article 669

Title@2025-06-30 (1): Beyond Statistical Learning: Exact Learning Is Essential for General Intelligence

Title: Beyond Statistical Learning: Exact Learning Is Essential for General Intelligence Beyond Statistical Learning: Exaktes Lernen ist für die allgemeine Intelligenz unerlässlich 超越统计学习:精确学习对于一般情报至关重要 2506.23908v1

Authors (4): András György, Tor Lattimore, Nevena Lazić, Csaba Szepesvári

Sound deductive reasoning – the ability to derive new knowledge from existing facts and rules – is an indisputably desirable aspect of general intelligence. Despite the major advances of AI systems in areas such as math and science, especially since the introduction of transformer architectures, it is well-documented that even the most advanced frontier systems regularly and consistently falter on easily-solvable deductive reasoning tasks. Hence, these systems are unfit to fulfill the dream of achieving artificial general intelligence capable of sound deductive reasoning. We argue that their unsound behavior is a consequence of the statistical learning approach powering their development. To overcome this, we contend that to achieve reliable deductive reasoning in learning-based AI systems, researchers must fundamentally shift from optimizing for statistical performance against distributions on reasoning problems and algorithmic tasks to embracing the more ambitious exact learning paradigm, which demands correctness on all inputs. We argue that exact learning is both essential and possible, and that this ambitious objective should guide algorithm design.

nan


Article 670

Title@2025-06-30 (1): Simultaneous Multi-Robot Motion Planning with Projected Diffusion Models

Title: Simultaneous Multi-Robot Motion Planning with Projected Diffusion Models Gleichzeitige Multi-Roboter-Bewegungsplanung mit projizierten Diffusionsmodellen 与预测的传播模型同步的多机器人运动规划 2502.03607v2

Authors (4): Jinhao Liang, Jacob K Christopher, Sven Koenig, Ferdinando Fioretto

Recent advances in diffusion models hold significant potential in robotics, enabling the generation of diverse and smooth trajectories directly from raw representations of the environment. Despite this promise, applying diffusion models to motion planning remains challenging due to their difficulty in enforcing critical constraints, such as collision avoidance and kinematic feasibility. These limitations become even more pronounced in Multi-Robot Motion Planning (MRMP), where multiple robots must coordinate in shared spaces. To address these challenges, this work proposes Simultaneous MRMP Diffusion (SMD), a novel approach integrating constrained optimization into the diffusion sampling process to produce collision-free, kinematically feasible trajectories. Additionally, the paper introduces a comprehensive MRMP benchmark to evaluate trajectory planning algorithms across scenarios with varying robot densities, obstacle complexities, and motion constraints. Experimental results show SMD consistently outperforms classical and other learning-based motion planners, achieving higher success rates and efficiency in complex multi-robot environments.

nan


Article 671

Title@2025-06-30 (1): Green AI in Action: Strategic Model Selection for Ensembles in Production

Title: Green AI in Action: Strategic Model Selection for Ensembles in Production Grüne KI in Aktion: Strategische Modellauswahl für Ensembles in der Produktion 绿色AI “ 行动 “ :生产集合战略示范选择 2405.17451v2

Authors (4): Nienke Nijkamp, June Sallou, Niels van der Heijden, Luís Cruz

Integrating Artificial Intelligence (AI) into software systems has significantly enhanced their capabilities while escalating energy demands. Ensemble learning, combining predictions from multiple models to form a single prediction, intensifies this problem due to cumulative energy consumption. This paper presents a novel approach to model selection that addresses the challenge of balancing the accuracy of AI models with their energy consumption in a live AI ensemble system. We explore how reducing the number of models or improving the efficiency of model usage within an ensemble during inference can reduce energy demands without substantially sacrificing accuracy. This study introduces and evaluates two model selection strategies, Static and Dynamic, for optimizing ensemble learning systems performance while minimizing energy usage. Our results demonstrate that the Static strategy improves the F1 score beyond the baseline, reducing average energy usage from 100% from the full ensemble to 62%. The Dynamic strategy further enhances F1 scores, using on average 76% compared to 100% of the full ensemble. Moreover, we propose an approach that balances accuracy with resource consumption, significantly reducing energy usage without substantially impacting accuracy. This method decreased the average energy usage of the Static strategy from approximately 62% to 14%, and for the Dynamic strategy, from around 76% to 57%. Our field study of Green AI using an operational AI system developed by a large professional services provider shows the practical applicability of adopting energy-conscious model selection strategies in live production environments.

nan


Article 672

Title@2025-06-30 (1): CBAGAN-RRT: Convolutional Block Attention Generative Adversarial Network for Sampling-Based Path Planning

Title: CBAGAN-RRT: Convolutional Block Attention Generative Adversarial Network for Sampling-Based Path Planning CBAGAN-RRT: Convolutional Block Attention Generatives Adversarial Network für die stichprobengestützte Pfadplanung CBAGAN-RRT: 以抽样为基础的路径规划革命性阻力引引引反向网络 2305.10442v2

Authors (2): Abhinav Sagar, Sai Teja Gilukara

Sampling-based path planning algorithms play an important role in autonomous robotics. However, a common problem among the RRT-based algorithms is that the initial path generated is not optimal, and the convergence is too slow for real-world applications. In this paper, we propose a novel image-based learning algorithm using a Convolutional Block Attention Generative Adversarial Network (CBAGAN-RRT) with a combination of spatial and channel attention and a novel loss function to design the heuristics, find a better optimal path, and improve the convergence of the algorithm, both concerning time and speed. The probability distribution of the paths generated from our GAN model is used to guide the sampling process for the RRT algorithm. We demonstrate that our algorithm outperforms the previous state-of-the-art algorithms using both the image quality generation metrics, like IOU Score, Dice Score, FID score, and path planning metrics like time cost and the number of nodes. Ablation studies show the effectiveness of various components in our network architecture. The advantage of our approach is that we can avoid the complicated preprocessing in the state space, our model can be generalized to complex environments like those containing turns and narrow passages without loss of accuracy, and our model can be easily integrated with other sampling-based path planning algorithms.

nan


Article 673

Title@2025-06-30 (1): Towards transparent and data-driven fault detection in manufacturing: A case study on univariate, discrete time series

Title: Towards transparent and data-driven fault detection in manufacturing: A case study on univariate, discrete time series Auf dem Weg zur transparenten und datengesteuerten Fehlererkennung in der Fertigung: Eine Fallstudie zu univariaten, diskreten Zeitreihen 争取在制造业中发现透明和数据驱动的故障:关于单体、离散时间序列的案例研究 2507.00102v1

Authors (4): Bernd Hofmann, Patrick Bruendl, Huong Giang Nguyen, Joerg Franke

Ensuring consistent product quality in modern manufacturing is crucial, particularly in safety-critical applications. Conventional quality control approaches, reliant on manually defined thresholds and features, lack adaptability to the complexity and variability inherent in production data and necessitate extensive domain expertise. Conversely, data-driven methods, such as machine learning, demonstrate high detection performance but typically function as black-box models, thereby limiting their acceptance in industrial environments where interpretability is paramount. This paper introduces a methodology for industrial fault detection, which is both data-driven and transparent. The approach integrates a supervised machine learning model for multi-class fault classification, Shapley Additive Explanations for post-hoc interpretability, and a do-main-specific visualisation technique that maps model explanations to operator-interpretable features. Furthermore, the study proposes an evaluation methodology that assesses model explanations through quantitative perturbation analysis and evaluates visualisations by qualitative expert assessment. The approach was applied to the crimping process, a safety-critical joining technique, using a dataset of univariate, discrete time series. The system achieves a fault detection accuracy of 95.9 %, and both quantitative selectivity analysis and qualitative expert evaluations confirmed the relevance and inter-pretability of the generated explanations. This human-centric approach is designed to enhance trust and interpretability in data-driven fault detection, thereby contributing to applied system design in industrial quality control.

nan


Article 674

Title@2025-06-30 (1): Spurious-Aware Prototype Refinement for Reliable Out-of-Distribution Detection

Title: Spurious-Aware Prototype Refinement for Reliable Out-of-Distribution Detection Spurious-Aware Prototypen-Verfeinerung für zuverlässige Out-of-Distribution-Detection 可靠分销外探测的精密原始类型精炼 2506.23881v1

Authors (6): Reihaneh Zohrabi, Hosein Hasani, Mahdieh Soleymani Baghshah, Anna Rohrbach, Marcus Rohrbach, Mohammad Hossein Rohban

Out-of-distribution (OOD) detection is crucial for ensuring the reliability and safety of machine learning models in real-world applications, where they frequently face data distributions unseen during training. Despite progress, existing methods are often vulnerable to spurious correlations that mislead models and compromise robustness. To address this, we propose SPROD, a novel prototype-based OOD detection approach that explicitly addresses the challenge posed by unknown spurious correlations. Our post-hoc method refines class prototypes to mitigate bias from spurious features without additional data or hyperparameter tuning, and is broadly applicable across diverse backbones and OOD detection settings. We conduct a comprehensive spurious correlation OOD detection benchmarking, comparing our method against existing approaches and demonstrating its superior performance across challenging OOD datasets, such as CelebA, Waterbirds, UrbanCars, Spurious Imagenet, and the newly introduced Animals MetaCoCo. On average, SPROD improves AUROC by 4.7% and FPR@95 by 9.3% over the second best.

nan


Article 675

Title@2025-06-30 (1): DFReg: A Physics-Inspired Framework for Global Weight Distribution Regularization in Neural Networks

Title: DFReg: A Physics-Inspired Framework for Global Weight Distribution Regularization in Neural Networks DFReg: Ein physik-inspirierter Rahmen für die weltweite Regularisierung der Gewichtsverteilung in neuralen Netzwerken DFReg:神经网络全球重量分配规范化的物理启发框架 2507.00101v1

Authors (1): Giovanni Ruggieri

We introduce DFReg, a physics-inspired regularization method for deep neural networks that operates on the global distribution of weights. Drawing from Density Functional Theory (DFT), DFReg applies a functional penalty to encourage smooth, diverse, and well-distributed weight configurations. Unlike traditional techniques such as Dropout or L2 decay, DFReg imposes global structural regularity without architectural changes or stochastic perturbations.

nan


Article 676

Title@2025-06-30 (1): Benchmarking Spiking Neural Network Learning Methods with Varying Locality

Title: Benchmarking Spiking Neural Network Learning Methods with Varying Locality Benchmarking-Spiking Neural Network Lernmethoden mit unterschiedlicher Lokalität 参照不同地点的神经网络学习方法 2402.01782v2

Authors (4): Jiaqi Lin, Sen Lu, Malyaban Bal, Abhronil Sengupta

Spiking Neural Networks (SNNs), providing more realistic neuronal dynamics, have been shown to achieve performance comparable to Artificial Neural Networks (ANNs) in several machine learning tasks. Information is processed as spikes within SNNs in an event-based mechanism that significantly reduces energy consumption. However, training SNNs is challenging due to the non-differentiable nature of the spiking mechanism. Traditional approaches, such as Backpropagation Through Time (BPTT), have shown effectiveness but come with additional computational and memory costs and are biologically implausible. In contrast, recent works propose alternative learning methods with varying degrees of locality, demonstrating success in classification tasks. In this work, we show that these methods share similarities during the training process, while they present a trade-off between biological plausibility and performance. Further, given the implicitly recurrent nature of SNNs, this research investigates the influence of the addition of explicit recurrence to SNNs. We experimentally prove that the addition of explicit recurrent weights enhances the robustness of SNNs. We also investigate the performance of local learning methods under gradient and non-gradient-based adversarial attacks.

nan


Article 677

Title@2025-06-30 (1): Chain of Thought in Order: Discovering Learning-Friendly Orders for Arithmetic

Title: Chain of Thought in Order: Discovering Learning-Friendly Orders for Arithmetic Kette des Denkens in Ordnung: Lernen-Freundliche Aufträge für Arithmetik entdecken 思维链有序:发现对自学的学习友好命令 2506.23875v1

Authors (3): Yuta Sato, Kazuhiko Kawamoto, Hiroshi Kera

The chain of thought is fundamental in Transformers, which is to perform step-by-step reasoning. Besides what intermediate steps work, the order of these steps critically affects the difficulty of the reasoning. This study addresses a novel task of unraveling chain of thought - reordering decoder input tokens to a learning-friendly sequence for Transformers to learn arithmetic tasks. The proposed pipeline first trains a Transformer on a mixture of target sequences arranged in different orders and then identifies benign orders as those with fast loss drops in the early stage. As the search space grows factorially with sequence length, we propose a two-stage hierarchical approach for inter- and intra-block reordering. Experiments on four order-sensitive arithmetic tasks show that our method identifies a learning-friendly order out of a few billion candidates. Notably, on the multiplication task, it recovered the reverse-digit order reported in prior studies.

nan


Article 678

Title@2025-06-30 (1): WeatherEdit: Controllable Weather Editing with 4D Gaussian Field

Title: WeatherEdit: Controllable Weather Editing with 4D Gaussian Field WeatherEdit: Kontrollierbare Wetterbearbeitung mit 4D Gaussian Field 气象编辑: 4D Gaussian 字段的可控天气编辑 2505.20471v2

Authors (4): Chenghao Qian, Wenjing Li, Yuhu Guo, Gustav Markkula

In this work, we present WeatherEdit, a novel weather editing pipeline for generating realistic weather effects with controllable types and severity in 3D scenes. Our approach is structured into two key components: weather background editing and weather particle construction. For weather background editing, we introduce an all-in-one adapter that integrates multiple weather styles into a single pretrained diffusion model, enabling the generation of diverse weather effects in 2D image backgrounds. During inference, we design a Temporal-View (TV-) attention mechanism that follows a specific order to aggregate temporal and spatial information, ensuring consistent editing across multi-frame and multi-view images. To construct the weather particles, we first reconstruct a 3D scene using the edited images and then introduce a dynamic 4D Gaussian field to generate snowflakes, raindrops and fog in the scene. The attributes and dynamics of these particles are precisely controlled through physical-based modelling and simulation, ensuring realistic weather representation and flexible severity adjustments. Finally, we integrate the 4D Gaussian field with the 3D scene to render consistent and highly realistic weather effects. Experiments on multiple driving datasets demonstrate that WeatherEdit can generate diverse weather effects with controllable condition severity, highlighting its potential for autonomous driving simulation in adverse weather. See project page: https://jumponthemoon.github.io/w-edit

nan


Article 679

Title@2025-06-30 (1): Emergent musical properties of a transformer under contrastive self-supervised learning

Title: Emergent musical properties of a transformer under contrastive self-supervised learning Emergente musikalische Eigenschaften eines Transformators unter kontrastivem selbstüberwachtem Lernen 在对比式自我监督的学习下,变压器的新兴音乐特性 2506.23873v1

Authors (5): Yuexuan Kong, Gabriel Meseguer-Brocal, Vincent Lostanlen, Mathieu Lagrange, Romain Hennequin

In music information retrieval (MIR), contrastive self-supervised learning for general-purpose representation models is effective for global tasks such as automatic tagging. However, for local tasks such as chord estimation, it is widely assumed that contrastively trained general-purpose self-supervised models are inadequate and that more sophisticated SSL is necessary; e.g., masked modeling. Our paper challenges this assumption by revealing the potential of contrastive SSL paired with a transformer in local MIR tasks. We consider a lightweight vision transformer with one-dimensional patches in the time–frequency domain (ViT-1D) and train it with simple contrastive SSL through normalized temperature-scaled cross-entropy loss (NT-Xent). Although NT-Xent operates only over the class token, we observe that, potentially thanks to weight sharing, informative musical properties emerge in ViT-1D’s sequence tokens. On global tasks, the temporal average of class and sequence tokens offers a performance increase compared to the class token alone, showing useful properties in the sequence tokens. On local tasks, sequence tokens perform unexpectedly well, despite not being specifically trained for. Furthermore, high-level musical features such as onsets emerge from layer-wise attention maps and self-similarity matrices show different layers capture different musical dimensions. Our paper does not focus on improving performance but advances the musical interpretation of transformers and sheds light on some overlooked abilities of contrastive SSL paired with transformers for sequence modeling in MIR.

nan


Article 680

Title@2025-06-30 (1): When Plants Respond: Electrophysiology and Machine Learning for Green Monitoring Systems

Title: When Plants Respond: Electrophysiology and Machine Learning for Green Monitoring Systems Wenn Pflanzen reagieren: Elektrophysiologie und maschinelles Lernen für grüne Überwachungssysteme 植物响应:电生理和机械学习促进绿色监测系统 2506.23872v1

Authors (3): Eduard Buss, Till Aust, Heiko Hamann

Living plants, while contributing to ecological balance and climate regulation, also function as natural sensors capable of transmitting information about their internal physiological states and surrounding conditions. This rich source of data provides potential for applications in environmental monitoring and precision agriculture. With integration into biohybrid systems, we establish novel channels of physiological signal flow between living plants and artificial devices. We equipped Hedera helix with a plant-wearable device called PhytoNode to continuously record the plant’s electrophysiological activity. We deployed plants in an uncontrolled outdoor environment to map electrophysiological patterns to environmental conditions. Over five months, we collected data that we analyzed using state-of-the-art and automated machine learning (AutoML). Our classification models achieve high performance, reaching macro F1 scores of up to 95 percent in binary tasks. AutoML approaches outperformed manual tuning, and selecting subsets of statistical features further improved accuracy. Our biohybrid living system monitors the electrophysiology of plants in harsh, real-world conditions. This work advances scalable, self-sustaining, and plant-integrated living biohybrid systems for sustainable environmental monitoring.

nan


Article 681

Title@2025-06-30 (1): Scaling Self-Supervised Representation Learning for Symbolic Piano Performance

Title: Scaling Self-Supervised Representation Learning for Symbolic Piano Performance Skalierendes Selbstüberwachtes Repräsentationslernen für symbolische Klavierperformance 象征钢琴表演自我监督代表学习 2506.23869v1

Authors (5): Louis Bradshaw, Honglu Fan, Alexander Spangher, Stella Biderman, Simon Colton

We study the capabilities of generative autoregressive transformer models trained on large amounts of symbolic solo-piano transcriptions. After first pretraining on approximately 60,000 hours of music, we use a comparatively smaller, high-quality subset, to finetune models to produce musical continuations, perform symbolic classification tasks, and produce general-purpose contrastive MIDI embeddings by adapting the SimCLR framework to symbolic music. When evaluating piano continuation coherence, our generative model outperforms leading symbolic generation techniques and remains competitive with proprietary audio generation models. On MIR classification benchmarks, frozen representations from our contrastive model achieve state-of-the-art results in linear probe experiments, while direct finetuning demonstrates the generalizability of pretrained representations, often requiring only a few hundred labeled examples to specialize to downstream tasks.

nan


Article 682

Title@2025-06-30 (1): Differentially Private Synthetic Data Release for Topics API Outputs

Title: Differentially Private Synthetic Data Release for Topics API Outputs Differential private Synthetische Datenfreigabe für Themen API-Ausgaben 不同专题APIP产出的私人不同合成数据发布 2506.23855v1

Authors (8): Travis Dick, Alessandro Epasto, Adel Javanmard, Josh Karlin, Andres Munoz Medina, Vahab Mirrokni, Sergei Vassilvitskii, Peilin Zhong

The analysis of the privacy properties of Privacy-Preserving Ads APIs is an area of research that has received strong interest from academics, industry, and regulators. Despite this interest, the empirical study of these methods is hindered by the lack of publicly available data. Reliable empirical analysis of the privacy properties of an API, in fact, requires access to a dataset consisting of realistic API outputs; however, privacy concerns prevent the general release of such data to the public. In this work, we develop a novel methodology to construct synthetic API outputs that are simultaneously realistic enough to enable accurate study and provide strong privacy protections. We focus on one Privacy-Preserving Ads APIs: the Topics API, part of Google Chrome’s Privacy Sandbox. We developed a methodology to generate a differentially-private dataset that closely matches the re-identification risk properties of the real Topics API data. The use of differential privacy provides strong theoretical bounds on the leakage of private user information from this release. Our methodology is based on first computing a large number of differentially-private statistics describing how output API traces evolve over time. Then, we design a parameterized distribution over sequences of API traces and optimize its parameters so that they closely match the statistics obtained. Finally, we create the synthetic data by drawing from this distribution. Our work is complemented by an open-source release of the anonymized dataset obtained by this methodology. We hope this will enable external researchers to analyze the API in-depth and replicate prior and future work on a realistic large-scale dataset. We believe that this work will contribute to fostering transparency regarding the privacy properties of Privacy-Preserving Ads APIs.

nan


Article 683

Title@2025-06-30 (1): Use Sparse Autoencoders to Discover Unknown Concepts, Not to Act on Known Concepts

Title: Use Sparse Autoencoders to Discover Unknown Concepts, Not to Act on Known Concepts Verwenden Sie Sparse Autoencoder, um unbekannte Konzepte zu entdecken, nicht um auf bekannte Konzepte zu handeln 使用粗略自动编码器发现未知概念, 而不是对已知概念采取行动 2506.23845v1

Authors (5): Kenny Peng, Rajiv Movva, Jon Kleinberg, Emma Pierson, Nikhil Garg

While sparse autoencoders (SAEs) have generated significant excitement, a series of negative results have added to skepticism about their usefulness. Here, we establish a conceptual distinction that reconciles competing narratives surrounding SAEs. We argue that while SAEs may be less effective for acting on known concepts, SAEs are powerful tools for discovering unknown concepts. This distinction cleanly separates existing negative and positive results, and suggests several classes of SAE applications. Specifically, we outline use cases for SAEs in (i) ML interpretability, explainability, fairness, auditing, and safety, and (ii) social and health sciences.

nan


Article 684

Title@2025-06-30 (1): Pressing Intensity: An Intuitive Measure for Pressing in Soccer

Title: Pressing Intensity: An Intuitive Measure for Pressing in Soccer Pressing Intensity: Eine intuitive Maßnahme für das Drücken im Fußball 压力强度:足球压力的直觉措施 2501.04712v2

Authors (1): Joris Bekkers

Pressing is a fundamental defensive strategy in football, characterized by applying pressure on the ball owning team to regain possession. Despite its significance, existing metrics for measuring pressing often lack precision or comprehensive consideration of positional data, player movement and speed. This research introduces an innovative framework for quantifying pressing intensity, leveraging advancements in positional tracking data and components from Spearman’s Pitch Control model. Our method integrates player velocities, movement directions, and reaction times to compute the time required for a defender to intercept an attacker or the ball. This time-to-intercept measure is then transformed into probabilistic values using a logistic function, enabling dynamic and intuitive analysis of pressing situations at the individual frame level. the model captures how every player’s movement influences pressure on the field, offering actionable insights for coaches, analysts, and decision-makers. By providing a robust and intepretable metric, our approach facilitates the identification of pressing strategies, advanced situational analyses, and the derivation of metrics, advancing the analytical capabilities for modern football.

nan


Article 685

Title@2025-06-30 (1): EFPI: Elastic Formation and Position Identification in Football (Soccer) using Template Matching and Linear Assignment

Title: EFPI: Elastic Formation and Position Identification in Football (Soccer) using Template Matching and Linear Assignment EFPI: Elastische Formation und Positionsidentifikation im Fußball (Soccer) unter Verwendung von Template Matching und Linear Assignment EFPI: 使用模板匹配和线性指派的足球(足球)弹性形成和定位识别 2506.23843v1

Authors (1): Joris Bekkers

Understanding team formations and player positioning is crucial for tactical analysis in football (soccer). This paper presents a flexible method for formation recognition and player position assignment in football using predefined static formation templates and cost minimization from spatiotemporal tracking data, called EFPI. Our approach employs linear sum assignment to optimally match players to positions within a set of template formations by minimizing the total distance between actual player locations and template positions, subsequently selecting the formation with the lowest assignment cost. To improve accuracy, we scale actual player positions to match the dimensions of these formation templates in both width and length. While the method functions effectively on individual frames, it extends naturally to larger game segments such as complete periods, possession sequences or specific intervals (e.g. 10 second intervals, 5 minute intervals etc.). Additionally, we incorporate an optional stability parameter that prevents unnecessary formation changes when assignment costs differ only marginally between time segments. EFPI is available as open-source code through the unravelsports Python package.

nan


Article 686

Title@2025-06-30 (1): Proving the Limited Scalability of Centralized Distributed Optimization via a New Lower Bound Construction

Title: Proving the Limited Scalability of Centralized Distributed Optimization via a New Lower Bound Construction Nachweis der begrenzten Skalierbarkeit der zentralisierten verteilten Optimierung durch eine neue untere Bound-Konstruktion 证明通过新建下下界建筑的集中分配最佳优化的有限可扩展性 2506.23836v1

Authors (1): Alexander Tyurin

We consider centralized distributed optimization in the classical federated learning setup, where $n$ workers jointly find an $\varepsilon$-stationary point of an $L$-smooth, $d$-dimensional nonconvex function $f$, having access only to unbiased stochastic gradients with variance $\sigma^2$. Each worker requires at most $h$ seconds to compute a stochastic gradient, and the communication times from the server to the workers and from the workers to the server are $\tau_{s}$ and $\tau_{w}$ seconds per coordinate, respectively. One of the main motivations for distributed optimization is to achieve scalability with respect to $n$. For instance, it is well known that the distributed version of SGD has a variance-dependent runtime term $\frac{h \sigma^2 L \Delta}{n \varepsilon^2},$ which improves with the number of workers $n,$ where $\Delta = f(x^0) - f^*,$ and $x^0 \in R^d$ is the starting point. Similarly, using unbiased sparsification compressors, it is possible to reduce both the variance-dependent runtime term and the communication runtime term. However, once we account for the communication from the server to the workers $\tau_{s}$, we prove that it becomes infeasible to design a method using unbiased random sparsification compressors that scales both the server-side communication runtime term $\tau_{s} d \frac{L \Delta}{\varepsilon}$ and the variance-dependent runtime term $\frac{h \sigma^2 L \Delta}{\varepsilon^2},$ better than poly-logarithmically in $n$, even in the homogeneous (i.i.d.) case, where all workers access the same distribution. To establish this result, we construct a new “worst-case” function and develop a new lower bound framework that reduces the analysis to the concentration of a random sum, for which we prove a concentration bound. These results reveal fundamental limitations in scaling distributed optimization, even under the homogeneous assumption.

nan


Article 687

Title@2025-06-30 (1): Robustness of Decentralised Learning to Nodes and Data Disruption

Title: Robustness of Decentralised Learning to Nodes and Data Disruption Robustheit des dezentralisierten Lernens zu Knoten und Datenstörung 分散学习至节点和数据干扰的强力 2405.02377v2

Authors (6): Luigi Palmieri, Chiara Boldrini, Lorenzo Valerio, Andrea Passarella, Marco Conti, János Kertész

In the vibrant landscape of AI research, decentralised learning is gaining momentum. Decentralised learning allows individual nodes to keep data locally where they are generated and to share knowledge extracted from local data among themselves through an interactive process of collaborative refinement. This paradigm supports scenarios where data cannot leave local nodes due to privacy or sovereignty reasons or real-time constraints imposing proximity of models to locations where inference has to be carried out. The distributed nature of decentralised learning implies significant new research challenges with respect to centralised learning. Among them, in this paper, we focus on robustness issues. Specifically, we study the effect of nodes’ disruption on the collective learning process. Assuming a given percentage of “central” nodes disappear from the network, we focus on different cases, characterised by (i) different distributions of data across nodes and (ii) different times when disruption occurs with respect to the start of the collaborative learning task. Through these configurations, we are able to show the non-trivial interplay between the properties of the network connecting nodes, the persistence of knowledge acquired collectively before disruption or lack thereof, and the effect of data availability pre- and post-disruption. Our results show that decentralised learning processes are remarkably robust to network disruption. As long as even minimum amounts of data remain available somewhere in the network, the learning process is able to recover from disruptions and achieve significant classification accuracy. This clearly varies depending on the remaining connectivity after disruption, but we show that even nodes that remain completely isolated can retain significant knowledge acquired before the disruption.

nan


Article 688

Title@2025-06-30 (1): Hybrid Quantum Neural Networks with Amplitude Encoding: Advancing Recovery Rate Predictions

Title: Hybrid Quantum Neural Networks with Amplitude Encoding: Advancing Recovery Rate Predictions Hybride Quantum-Neural-Netzwerke mit Amplitudenkodierung: Steigerung der Erholungsrate Vorhersagen 具有大度编码的硬化量子神经网络:提高恢复率预测 2501.15828v5

Authors (5): Ying Chen, Paul Griffin, Paolo Recchia, Lei Zhou, Hongrui Zhang

Recovery rate prediction plays a pivotal role in bond investment strategies by enhancing risk assessment, optimizing portfolio allocation, improving pricing accuracy, and supporting effective credit risk management. However, accurate forecasting remains challenging due to complex nonlinear dependencies, high-dimensional feature spaces, and limited sample sizes-conditions under which classical machine learning models are prone to overfitting. We propose a hybrid Quantum Machine Learning (QML) model with Amplitude Encoding, leveraging the unitarity constraint of Parametrized Quantum Circuits (PQC) and the exponential data compression capability of qubits. We evaluate the model on a global recovery rate dataset comprising 1,725 observations and 256 features from 1996 to 2023. Our hybrid method significantly outperforms both classical neural networks and QML models using Angle Encoding, achieving a lower Root Mean Squared Error (RMSE) of 0.228, compared to 0.246 and 0.242, respectively. It also performs competitively with ensemble tree methods such as XGBoost. While practical implementation challenges remain for Noisy Intermediate-Scale Quantum (NISQ) hardware, our quantum simulation and preliminary results on noisy simulators demonstrate the promise of hybrid quantum-classical architectures in enhancing the accuracy and robustness of recovery rate forecasting. These findings illustrate the potential of quantum machine learning in shaping the future of credit risk prediction.

nan


Article 689

Title@2025-06-30 (1): Supercm: Revisiting Clustering for Semi-Supervised Learning

Title: Supercm: Revisiting Clustering for Semi-Supervised Learning Supercm: Clustering für semi-überwachtes Lernen revisitieren 超强cm:为半监督学习重新研究集群 2506.23824v1

Authors (4): Durgesh Singh, Ahcene Boubekki, Robert Jenssen, Michael C. Kampffmeyer

The development of semi-supervised learning (SSL) has in recent years largely focused on the development of new consistency regularization or entropy minimization approaches, often resulting in models with complex training strategies to obtain the desired results. In this work, we instead propose a novel approach that explicitly incorporates the underlying clustering assumption in SSL through extending a recently proposed differentiable clustering module. Leveraging annotated data to guide the cluster centroids results in a simple end-to-end trainable deep SSL approach. We demonstrate that the proposed model improves the performance over the supervised-only baseline and show that our framework can be used in conjunction with other SSL methods to further boost their performance.

nan


Article 690

Title@2025-06-30 (1): Early-Stage Anomaly Detection: A Study of Model Performance on Complete vs. Partial Flows

Title: Early-Stage Anomaly Detection: A Study of Model Performance on Complete vs. Partial Flows Early-Stage-Anomaly Detection: Eine Studie der Modellleistung auf komplette vs. Teilströmungen 早期短期异常检测:关于完整与部分流动的示范绩效研究 2407.02856v3

Authors (2): Adrian Pekar, Richard Jozsa

This study investigates the efficacy of machine learning models in network security threat detection through the critical lens of partial versus complete flow information, addressing a common gap between research settings and real-time operational needs. We systematically evaluate how a standard benchmark model, Random Forest, performs under varying training and testing conditions (complete/complete, partial/partial, complete/partial), quantifying the performance impact when dealing with the incomplete data typical in real-time environments. Our findings demonstrate a significant performance difference, with precision and recall dropping by up to 30% under certain conditions when models trained on complete flows are tested against partial flows. The study also reveals that, for the evaluated dataset and model, a minimum threshold around 7 packets in the test set appears necessary for maintaining reliable detection rates, providing valuable, quantified insights for developing more realistic real-time detection strategies.

nan


Article 691

Title@2025-06-30 (1): Learning World Models With Hierarchical Temporal Abstractions: A Probabilistic Perspective

Title: Learning World Models With Hierarchical Temporal Abstractions: A Probabilistic Perspective Weltmodelle lernen mit hierarchischen zeitlichen Abstraktionen: Eine probabilistische Perspektive 具有等级性时地抽象现象的学习世界模式:概率视角 2404.16078v3

Authors (1): Vaisakh Shaj

Machines that can replicate human intelligence with type 2 reasoning capabilities should be able to reason at multiple levels of spatio-temporal abstractions and scales using internal world models. Devising formalisms to develop such internal world models, which accurately reflect the causal hierarchies inherent in the dynamics of the real world, is a critical research challenge in the domains of artificial intelligence and machine learning. This thesis identifies several limitations with the prevalent use of state space models (SSMs) as internal world models and propose two new probabilistic formalisms namely Hidden-Parameter SSMs and Multi-Time Scale SSMs to address these drawbacks. The structure of graphical models in both formalisms facilitates scalable exact probabilistic inference using belief propagation, as well as end-to-end learning via backpropagation through time. This approach permits the development of scalable, adaptive hierarchical world models capable of representing nonstationary dynamics across multiple temporal abstractions and scales. Moreover, these probabilistic formalisms integrate the concept of uncertainty in world states, thus improving the system’s capacity to emulate the stochastic nature of the real world and quantify the confidence in its predictions. The thesis also discuss how these formalisms are in line with related neuroscience literature on Bayesian brain hypothesis and predicitive processing. Our experiments on various real and simulated robots demonstrate that our formalisms can match and in many cases exceed the performance of contemporary transformer variants in making long-range future predictions. We conclude the thesis by reflecting on the limitations of our current models and suggesting directions for future research.

nan


Article 692

Title@2025-06-30 (1): HyperMono: A Monotonicity-aware Approach to Hyper-Relational Knowledge Representation

Title: HyperMono: A Monotonicity-aware Approach to Hyper-Relational Knowledge Representation HyperMono: Ein Monotonik-bewusster Ansatz zur Hyperrelationalen Wissensdarstellung HyperMono:超关系知识代表性的单感意识方法 2404.09848v3

Authors (5): Zhiwei Hu, Víctor Gutiérrez-Basulto, Zhiliang Xiang, Ru Li, Jeff Z. Pan

In a hyper-relational knowledge graph (HKG), each fact is composed of a main triple associated with attribute-value qualifiers, which express additional factual knowledge. The hyper-relational knowledge graph completion (HKGC) task aims at inferring plausible missing links in a HKG. Most existing approaches to HKGC focus on enhancing the communication between qualifier pairs and main triples, while overlooking two important properties that emerge from the monotonicity of the hyper-relational graphs representation regime. Stage Reasoning allows for a two-step reasoning process, facilitating the integration of coarse-grained inference results derived solely from main triples and fine-grained inference results obtained from hyper-relational facts with qualifiers. In the initial stage, coarse-grained results provide an upper bound for correct predictions, which are subsequently refined in the fine-grained step. More generally, Qualifier Monotonicity implies that by attaching more qualifier pairs to a main triple, we may only narrow down the answer set, but never enlarge it. This paper proposes the HyperMono model for hyper-relational knowledge graph completion, which realizes stage reasoning and qualifier monotonicity. To implement qualifier monotonicity HyperMono resorts to cone embeddings. Experiments on three real-world datasets with three different scenario conditions demonstrate the strong performance of HyperMono when compared to the SoTA.

nan


Article 693

Title@2025-06-30 (1): SGD with Adaptive Preconditioning: Unified Analysis and Momentum Acceleration

Title: SGD with Adaptive Preconditioning: Unified Analysis and Momentum Acceleration SGD mit adaptiver Vorkonditionierung: Unified Analysis und Momentum Acceleration SGD 具有适应性预设条件:统一分析和加速动力 2506.23803v1

Authors (1): Dmitry Kovalev

In this paper, we revisit stochastic gradient descent (SGD) with AdaGrad-type preconditioning. Our contributions are twofold. First, we develop a unified convergence analysis of SGD with adaptive preconditioning under anisotropic or matrix smoothness and noise assumptions. This allows us to recover state-of-the-art convergence results for several popular adaptive gradient methods, including AdaGrad-Norm, AdaGrad, and ASGO/One-sided Shampoo. In addition, we establish the fundamental connection between two recently proposed algorithms, Scion and DASGO, and provide the first theoretical guarantees for the latter. Second, we show that the convergence of methods like AdaGrad and DASGO can be provably accelerated beyond the best-known rates using Nesterov momentum. Consequently, we obtain the first theoretical justification that AdaGrad-type algorithms can simultaneously benefit from both diagonal preconditioning and momentum, which may provide an ultimate explanation for the practical efficiency of Adam.

nan


Article 694

Title@2025-06-30 (1): Adaptive Out-of-Control Point Pattern Detection in Sequential Random Finite Set Observations

Title: Adaptive Out-of-Control Point Pattern Detection in Sequential Random Finite Set Observations Adaptive Out-of-Control-Punkt-Mustererkennung in sequentiellen Random-Finite-Set-Beobachtungen 序列随机有限性观察中的适应性外控制点探测模式 2506.23802v1

Authors (3): Konstantinos Bourazas, Savvas Papaioannou, Panayiotis Kolios

In this work we introduce a novel adaptive anomaly detection framework specifically designed for monitoring sequential random finite set (RFS) observations. Our approach effectively distinguishes between In-Control data (normal) and Out-Of-Control data (anomalies) by detecting deviations from the expected statistical behavior of the process. The primary contributions of this study include the development of an innovative RFS-based framework that not only learns the normal behavior of the data-generating process online but also dynamically adapts to behavioral shifts to accurately identify abnormal point patterns. To achieve this, we introduce a new class of RFS-based posterior distributions, named Power Discounting Posteriors (PD), which facilitate adaptation to systematic changes in data while enabling anomaly detection of point pattern data through a novel predictive posterior density function. The effectiveness of the proposed approach is demonstrated by extensive qualitative and quantitative simulation experiments.

nan


Article 695

Title@2025-06-30 (1): Gibbs randomness-compression proposition: An efficient deep learning

Title: Gibbs randomness-compression proposition: An efficient deep learning Gibbs Randomness-Kompressionssatz: Ein effizientes Deep Learning Gibbs 随机性压缩建议:高效的深层学习 2505.23869v2

Authors (1): M. Süzen

A proposition that connects randomness and compression is put forward via Gibbs entropy over set of measurement vectors associated with a compression process. The proposition states that a lossy compression process is equivalent to {\it directed randomness} that preserves information content. The proposition originated from the observed behaviour in newly proposed {\it Dual Tomographic Compression} (DTC) compress-train framework. This is akin to tomographic reconstruction of layer weight matrices via building compressed sensed projections, via so-called {\it weight rays}. This tomographic approach is applied to previous and next layers in a dual fashion, that triggers neuronal-level pruning. This novel model compress-train scheme appears in iterative fashion and acts as a smart neural architecture search, The experiments demonstrated the utility of this dual-tomography producing state-of-the-art performance with efficient compression during training, accelerating and supporting lottery ticket hypothesis. However, random compress-train iterations having similar performance demonstrated the connection between randomness and compression from statistical physics perspective, we formulated the so-called {\it Gibbs randomness-compression proposition}, signifying randomness-compression relationship via Gibbs entropy. Practically, the DTC framework provides a promising approach for massively energy- and resource-efficient deep learning training.

nan


Article 696

Title@2025-06-30 (1): NeuralOM: Neural Ocean Model for Subseasonal-to-Seasonal Simulation

Title: NeuralOM: Neural Ocean Model for Subseasonal-to-Seasonal Simulation NeuralOM: Neurales Ozeanmodell für die Simulation von Subsaisonal-zu-Seasonal 神经力OM:次季节到季节模拟神经海洋模型 2505.21020v2

Authors (9): Yuan Gao, Ruiqi Shu, Hao Wu, Fan Xu, Yanfei Xiang, Ruijian Gou, Qingsong Wen, Xian Wu, Xiaomeng Huang

Accurate Subseasonal-to-Seasonal (S2S) ocean simulation is critically important for marine research, yet remains challenging due to its substantial thermal inertia and extended time delay. Machine learning (ML)-based models have demonstrated significant advancements in simulation accuracy and computational efficiency compared to traditional numerical methods. Nevertheless, a significant limitation of current ML models for S2S ocean simulation is their inadequate incorporation of physical consistency and the slow-changing properties of the ocean system. In this work, we propose a neural ocean model (NeuralOM) for S2S ocean simulation with a multi-scale interactive graph neural network to emulate diverse physical phenomena associated with ocean systems effectively. Specifically, we propose a multi-stage framework tailored to model the ocean’s slowly changing nature. Additionally, we introduce a multi-scale interactive messaging module to capture complex dynamical behaviors, such as gradient changes and multiplicative coupling relationships inherent in ocean dynamics. Extensive experimental evaluations confirm that our proposed NeuralOM outperforms state-of-the-art models in S2S and extreme event simulation. The codes are available at https://github.com/YuanGao-YG/NeuralOM.

nan


Article 697

Title@2025-06-30 (1): Advancing Learnable Multi-Agent Pathfinding Solvers with Active Fine-Tuning

Title: Advancing Learnable Multi-Agent Pathfinding Solvers with Active Fine-Tuning Erlernbare Multi-Agent-Pathfinding-Lösemittel mit aktiver Feinsteuerung 具有积极微调功能的推进可学习多机构探索式解答器 2506.23793v1

Authors (4): Anton Andreychuk, Konstantin Yakovlev, Aleksandr Panov, Alexey Skrynnik

Multi-agent pathfinding (MAPF) is a common abstraction of multi-robot trajectory planning problems, where multiple homogeneous robots simultaneously move in the shared environment. While solving MAPF optimally has been proven to be NP-hard, scalable, and efficient, solvers are vital for real-world applications like logistics, search-and-rescue, etc. To this end, decentralized suboptimal MAPF solvers that leverage machine learning have come on stage. Building on the success of the recently introduced MAPF-GPT, a pure imitation learning solver, we introduce MAPF-GPT-DDG. This novel approach effectively fine-tunes the pre-trained MAPF model using centralized expert data. Leveraging a novel delta-data generation mechanism, MAPF-GPT-DDG accelerates training while significantly improving performance at test time. Our experiments demonstrate that MAPF-GPT-DDG surpasses all existing learning-based MAPF solvers, including the original MAPF-GPT, regarding solution quality across many testing scenarios. Remarkably, it can work with MAPF instances involving up to 1 million agents in a single environment, setting a new milestone for scalability in MAPF domains.

nan


Article 698

Title@2025-06-30 (1): Quantum computing and artificial intelligence: status and perspectives

Title: Quantum computing and artificial intelligence: status and perspectives Quantenrechentechnik und künstliche Intelligenz: Status und Perspektiven 量子计算和人工智能:现状和前景 2505.23860v3

Authors (39): Giovanni Acampora, Andris Ambainis, Natalia Ares, Leonardo Banchi, Pallavi Bhardwaj, Daniele Binosi, G. Andrew D. Briggs, Tommaso Calarco, Vedran Dunjko, Jens Eisert, Olivier Ezratty, Paul Erker, Federico Fedele, Elies Gil-Fuster, Martin Gärttner, Mats Granath, Markus Heyl, Iordanis Kerenidis, Matthias Klusch, Anton Frisk Kockum, Richard Kueng, Mario Krenn, Jörg Lässig, Antonio Macaluso, Sabrina Maniscalco, Florian Marquardt, Kristel Michielsen, Gorka Muñoz-Gil, Daniel Müssig, Hendrik Poulsen Nautrup, Sophie A. Neubauer, Evert van Nieuwenburg, Roman Orus, Jörg Schmiedmayer, Markus Schmitt, Philipp Slusallek, Filippo Vicentini, Christof Weitenberg, Frank K. Wilhelm

This white paper discusses and explores the various points of intersection between quantum computing and artificial intelligence (AI). It describes how quantum computing could support the development of innovative AI solutions. It also examines use cases of classical AI that can empower research and development in quantum technologies, with a focus on quantum computing and quantum sensing. The purpose of this white paper is to provide a long-term research agenda aimed at addressing foundational questions about how AI and quantum computing interact and benefit one another. It concludes with a set of recommendations and challenges, including how to orchestrate the proposed theoretical work, align quantum AI developments with quantum hardware roadmaps, estimate both classical and quantum resources - especially with the goal of mitigating and optimizing energy consumption - advance this emerging hybrid software engineering discipline, and enhance European industrial competitiveness while considering societal implications.

nan


Article 699

Title@2025-06-30 (1): When GNNs Met a Word Equations Solver: Learning to Rank Equations (Extended Technical Report)

Title: When GNNs Met a Word Equations Solver: Learning to Rank Equations (Extended Technical Report) Wenn GNNs einen Wortgleichungen Solver trafen: Lernen zu Ranggleichungen (erweiterter technischer Bericht) 当 GNNs 遇到一个单等式解答器: 学习到等级( 扩展技术报告) 2506.23784v1

Authors (5): Parosh Aziz Abdulla, Mohamed Faouzi Atig, Julie Cailler, Chencheng Liang, Philipp Rümmer

Nielsen transformation is a standard approach for solving word equations: by repeatedly splitting equations and applying simplification steps, equations are rewritten until a solution is reached. When solving a conjunction of word equations in this way, the performance of the solver will depend considerably on the order in which equations are processed. In this work, the use of Graph Neural Networks (GNNs) for ranking word equations before and during the solving process is explored. For this, a novel graph-based representation for word equations is presented, preserving global information across conjuncts, enabling the GNN to have a holistic view during ranking. To handle the variable number of conjuncts, three approaches to adapt a multi-classification task to the problem of ranking equations are proposed. The training of the GNN is done with the help of minimum unsatisfiable subsets (MUSes) of word equations. The experimental results show that, compared to state-of-the-art string solvers, the new framework solves more problems in benchmarks where each variable appears at most once in each equation.

nan


Article 700

Title@2025-06-30 (1): Mamba-FETrack V2: Revisiting State Space Model for Frame-Event based Visual Object Tracking

Title: Mamba-FETrack V2: Revisiting State Space Model for Frame-Event based Visual Object Tracking Mamba-FETrack V2: Zustands-Weltraummodell für Frame-Event-basiertes Visuelles Objekt-Tracking Mamba-FETRack V2:重新研究基于框架日的视觉物体跟踪国家空间模型 2506.23783v1

Authors (8): Shiao Wang, Ju Huang, Qingchuan Ma, Jinfeng Gao, Chunyi Xu, Xiao Wang, Lan Chen, Bo Jiang

Combining traditional RGB cameras with bio-inspired event cameras for robust object tracking has garnered increasing attention in recent years. However, most existing multimodal tracking algorithms depend heavily on high-complexity Vision Transformer architectures for feature extraction and fusion across modalities. This not only leads to substantial computational overhead but also limits the effectiveness of cross-modal interactions. In this paper, we propose an efficient RGB-Event object tracking framework based on the linear-complexity Vision Mamba network, termed Mamba-FETrack V2. Specifically, we first design a lightweight Prompt Generator that utilizes embedded features from each modality, together with a shared prompt pool, to dynamically generate modality-specific learnable prompt vectors. These prompts, along with the modality-specific embedded features, are then fed into a Vision Mamba-based FEMamba backbone, which facilitates prompt-guided feature extraction, cross-modal interaction, and fusion in a unified manner. Finally, the fused representations are passed to the tracking head for accurate target localization. Extensive experimental evaluations on multiple RGB-Event tracking benchmarks, including short-term COESOT dataset and long-term datasets, i.e., FE108 and FELT V2, demonstrate the superior performance and efficiency of the proposed tracking framework. The source code and pre-trained models will be released on https://github.com/Event-AHU/Mamba_FETrack

nan


Article 701

Title@2025-06-30 (1): Calibrating Graph Neural Networks with Wavelet-Aware Temperature Scaling

Title: Calibrating Graph Neural Networks with Wavelet-Aware Temperature Scaling Kalibrierung von Graphen-Neuralnetzwerken mit Wavelet-Aware-Temperaturskalierung 使用波盘软件温度缩放校准图形神经网络 2506.23782v1

Authors (6): Xiaoyang Li, Linwei Tao, Haohui Lu, Minjing Dong, Junbin Gao, Chang Xu

Graph Neural Networks (GNNs) have demonstrated strong predictive performance on relational data; however, their confidence estimates often misalign with actual predictive correctness, posing significant limitations for deployment in safety-critical settings. While existing graph-aware calibration methods seek to mitigate this limitation, they primarily depend on coarse one-hop statistics, such as neighbor-predicted confidence, or latent node embeddings, thereby neglecting the fine-grained structural heterogeneity inherent in graph topology. In this work, we propose Wavelet-Aware Temperature Scaling (WATS), a post-hoc calibration framework that assigns node-specific temperatures based on tunable heat-kernel graph wavelet features. Specifically, WATS harnesses the scalability and topology sensitivity of graph wavelets to refine confidence estimates, all without necessitating model retraining or access to neighboring logits or predictions. Extensive evaluations across seven benchmark datasets with varying graph structures and two GNN backbones demonstrate that WATS achieves the lowest Expected Calibration Error (ECE) among all compared methods, outperforming both classical and graph-specific baselines by up to 42.3\% in ECE and reducing calibration variance by 17.24\% on average compared with graph-specific methods. Moreover, WATS remains computationally efficient, scaling well across graphs of diverse sizes and densities. Code will be released based on publication.

nan


Article 702

Title@2025-06-30 (1): Towards Automated Self-Supervised Learning for Truly Unsupervised Graph Anomaly Detection

Title: Towards Automated Self-Supervised Learning for Truly Unsupervised Graph Anomaly Detection Auf dem Weg zu einem automatisierten selbstüberwachten Lernen für wirklich unbeaufsichtigte Graphenanomalienerkennung 走向为真正无人监督的异常图像探测进行自动自动自监督学习 2501.14694v2

Authors (3): Zhong Li, Yuhang Wang, Matthijs van Leeuwen

Self-supervised learning (SSL) is an emerging paradigm that exploits supervisory signals generated from the data itself, and many recent studies have leveraged SSL to conduct graph anomaly detection. However, we empirically found that three important factors can substantially impact detection performance across datasets: 1) the specific SSL strategy employed; 2) the tuning of the strategy’s hyperparameters; and 3) the allocation of combination weights when using multiple strategies. Most SSL-based graph anomaly detection methods circumvent these issues by arbitrarily or selectively (i.e., guided by label information) choosing SSL strategies, hyperparameter settings, and combination weights. While an arbitrary choice may lead to subpar performance, using label information in an unsupervised setting is label information leakage and leads to severe overestimation of a method’s performance. Leakage has been criticized as “one of the top ten data mining mistakes”, yet many recent studies on SSL-based graph anomaly detection have been using label information to select hyperparameters. To mitigate this issue, we propose to use an internal evaluation strategy (with theoretical analysis) to select hyperparameters in SSL for unsupervised anomaly detection. We perform extensive experiments using 10 recent SSL-based graph anomaly detection algorithms on various benchmark datasets, demonstrating both the prior issues with hyperparameter selection and the effectiveness of our proposed strategy.

nan


Article 703

Title@2025-06-30 (1): Addressing the Inconsistency in Bayesian Deep Learning via Generalized Laplace Approximation

Title: Addressing the Inconsistency in Bayesian Deep Learning via Generalized Laplace Approximation Die Inkonsistenz in Bayesian Deep Learning über die allgemeine Laplace-Annäherung ansprechen 解决通过普遍拉拉近拉拉位办法解决巴耶斯深层学习不一致性问题 2405.13535v4

Authors (4): Yinsong Chen, Samson S. Yu, Zhong Li, Chee Peng Lim

In recent years, inconsistency in Bayesian deep learning has attracted significant attention. Tempered or generalized posterior distributions are frequently employed as direct and effective solutions. Nonetheless, the underlying mechanisms and the effectiveness of generalized posteriors remain active research topics. In this work, we interpret posterior tempering as a correction for model misspecification via adjustments to the joint probability, and as a recalibration of priors by reducing aleatoric uncertainty. We also identify a unique property of the Laplace approximation: the generalized normalizing constant remains invariant, in contrast to general Bayesian learning, where this constant typically depends on model parameters after generalization. Leveraging this property, we introduce the generalized Laplace approximation, which requires only a simple modification to the Hessian calculation of the regularized loss. This approach provides a flexible and scalable framework for high-quality posterior inference. We evaluate the proposed method on state-of-the-art neural networks and real-world datasets, demonstrating that the generalized Laplace approximation enhances predictive performance.

nan


Article 704

Title@2025-06-30 (1): Model-driven Stochastic Trace Clustering

Title: Model-driven Stochastic Trace Clustering Modellgetriebenes Stochastisches Trace Clustering 由模型驱动的蒸蒸气追踪集束 2506.23776v1

Authors (3): Jari Peeperkorn, Johannes De Smedt, Jochen De Weerdt

Process discovery algorithms automatically extract process models from event logs, but high variability often results in complex and hard-to-understand models. To mitigate this issue, trace clustering techniques group process executions into clusters, each represented by a simpler and more understandable process model. Model-driven trace clustering improves on this by assigning traces to clusters based on their conformity to cluster-specific process models. However, most existing clustering techniques rely on either no process model discovery, or non-stochastic models, neglecting the frequency or probability of activities and transitions, thereby limiting their capability to capture real-world execution dynamics. We propose a novel model-driven trace clustering method that optimizes stochastic process models within each cluster. Our approach uses entropic relevance, a stochastic conformance metric based on directly-follows probabilities, to guide trace assignment. This allows clustering decisions to consider both structural alignment with a cluster’s process model and the likelihood that a trace originates from a given stochastic process model. The method is computationally efficient, scales linearly with input size, and improves model interpretability by producing clusters with clearer control-flow patterns. Extensive experiments on public real-life datasets show that our method outperforms existing alternatives in representing process behavior and reveals how clustering performance rankings can shift when stochasticity is considered.

nan


Article 705

Title@2025-06-30 (1): A Consequentialist Critique of Binary Classification Evaluation Practices

Title: A Consequentialist Critique of Binary Classification Evaluation Practices Eine konsequentistische Kritik an binären Klassifikationsbewertungspraktiken 二元分类评价做法的相继性优异性评分评价做法 2504.04528v2

Authors (5): Gerardo Flores, Abigail Schiff, Alyssa H. Smith, Julia A Fukuyama, Ashia C. Wilson

ML-supported decisions, such as ordering tests or determining preventive custody, often involve binary classification based on probabilistic forecasts. Evaluation frameworks for such forecasts typically consider whether to prioritize independent-decision metrics (e.g., Accuracy) or top-K metrics (e.g., Precision@K), and whether to focus on fixed thresholds or threshold-agnostic measures like AUC-ROC. We highlight that a consequentialist perspective, long advocated by decision theorists, should naturally favor evaluations that support independent decisions using a mixture of thresholds given their prevalence, such as Brier scores and Log loss. However, our empirical analysis reveals a strong preference for top-K metrics or fixed thresholds in evaluations at major conferences like ICML, FAccT, and CHIL. To address this gap, we use this decision-theoretic framework to map evaluation metrics to their optimal use cases, along with a Python package, briertools, to promote the broader adoption of Brier scores. In doing so, we also uncover new theoretical connections, including a reconciliation between the Brier Score and Decision Curve Analysis, which clarifies and responds to a longstanding critique by (Assel, et al. 2017) regarding the clinical utility of proper scoring rules.

nan


Article 706

Title@2025-06-30 (1): Benchmarking Uncertainty Quantification Methods for Large Language Models with LM-Polygraph

Title: Benchmarking Uncertainty Quantification Methods for Large Language Models with LM-Polygraph Benchmarking Uncertainty Quantification Methods for Large Language Models mit LM-Polygraph 与LM-Porgraph 参照大语言模型的不确定性量化方法 2406.15627v4

Authors (15): Roman Vashurin, Ekaterina Fadeeva, Artem Vazhentsev, Lyudmila Rvanova, Akim Tsvigun, Daniil Vasilev, Rui Xing, Abdelrahman Boda Sadallah, Kirill Grishchenkov, Sergey Petrakov, Alexander Panchenko, Timothy Baldwin, Preslav Nakov, Maxim Panov, Artem Shelmanov

The rapid proliferation of large language models (LLMs) has stimulated researchers to seek effective and efficient approaches to deal with LLM hallucinations and low-quality outputs. Uncertainty quantification (UQ) is a key element of machine learning applications in dealing with such challenges. However, research to date on UQ for LLMs has been fragmented in terms of techniques and evaluation methodologies. In this work, we address this issue by introducing a novel benchmark that implements a collection of state-of-the-art UQ baselines and offers an environment for controllable and consistent evaluation of novel UQ techniques over various text generation tasks. Our benchmark also supports the assessment of confidence normalization methods in terms of their ability to provide interpretable scores. Using our benchmark, we conduct a large-scale empirical investigation of UQ and normalization techniques across eleven tasks, identifying the most effective approaches. Code: https://github.com/IINemo/lm-polygraph Benchmark: https://huggingface.co/LM-Polygraph

nan


Article 707

Title@2025-06-30 (1): Training of Spiking Neural Networks with Expectation-Propagation

Title: Training of Spiking Neural Networks with Expectation-Propagation Ausbildung von neuralen Spiking-Netzwerken mit Erwartungs-Propagation 培训有期望-促进的Spiking神经网络 2506.23757v1

Authors (3): Dan Yao, Steve McLaughlin, Yoann Altmann

In this paper, we propose a unifying message-passing framework for training spiking neural networks (SNNs) using Expectation-Propagation. Our gradient-free method is capable of learning the marginal distributions of network parameters and simultaneously marginalizes nuisance parameters, such as the outputs of hidden layers. This framework allows for the first time, training of discrete and continuous weights, for deterministic and stochastic spiking networks, using batches of training samples. Although its convergence is not ensured, the algorithm converges in practice faster than gradient-based methods, without requiring a large number of passes through the training data. The classification and regression results presented pave the way for new efficient training methods for deep Bayesian networks.

nan


Article 708

Title@2025-06-30 (1): Aligning Evaluation with Clinical Priorities: Calibration, Label Shift, and Error Costs

Title: Aligning Evaluation with Clinical Priorities: Calibration, Label Shift, and Error Costs Bewertung mit klinischen Prioritäten ausrichten: Kalibrierung, Label Shift und Fehlerkosten 使评价与临床优先事项:校准、标签转换和错误成本相一致 2506.14540v3

Authors (4): Gerardo A. Flores, Alyssa H. Smith, Julia A. Fukuyama, Ashia C. Wilson

Machine learning-based decision support systems are increasingly deployed in clinical settings, where probabilistic scoring functions are used to inform and prioritize patient management decisions. However, widely used scoring rules, such as accuracy and AUC-ROC, fail to adequately reflect key clinical priorities, including calibration, robustness to distributional shifts, and sensitivity to asymmetric error costs. In this work, we propose a principled yet practical evaluation framework for selecting calibrated thresholded classifiers that explicitly accounts for the uncertainty in class prevalences and domain-specific cost asymmetries often found in clinical settings. Building on the theory of proper scoring rules, particularly the Schervish representation, we derive an adjusted variant of cross-entropy (log score) that averages cost-weighted performance over clinically relevant ranges of class balance. The resulting evaluation is simple to apply, sensitive to clinical deployment conditions, and designed to prioritize models that are both calibrated and robust to real-world variations.

nan


Article 709

Title@2025-06-30 (1): Visual Encoders for Data-Efficient Imitation Learning in Modern Video Games

Title: Visual Encoders for Data-Efficient Imitation Learning in Modern Video Games Visuelle Encoder für dateneffizientes Imitationslernen in modernen Videospielen 现代视频游戏中数据高效模拟学习视觉编码器 2312.02312v3

Authors (10): Lukas Schäfer, Logan Jones, Anssi Kanervisto, Yuhan Cao, Tabish Rashid, Raluca Georgescu, Dave Bignell, Siddhartha Sen, Andrea Treviño Gavito, Sam Devlin

Video games have served as useful benchmarks for the decision-making community, but going beyond Atari games towards modern games has been prohibitively expensive for the vast majority of the research community. Prior work in modern video games typically relied on game-specific integration to obtain game features and enable online training, or on existing large datasets. An alternative approach is to train agents using imitation learning to play video games purely from images. However, this setting poses a fundamental question: which visual encoders obtain representations that retain information critical for decision making? To answer this question, we conduct a systematic study of imitation learning with publicly available pre-trained visual encoders compared to the typical task-specific end-to-end training approach in Minecraft, Counter-Strike: Global Offensive, and Minecraft Dungeons. Our results show that end-to-end training can be effective with comparably low-resolution images and only minutes of demonstrations, but significant improvements can be gained by utilising pre-trained encoders such as DINOv2 depending on the game. In addition to enabling effective decision making, we show that pre-trained encoders can make decision-making research in video games more accessible by significantly reducing the cost of training.

nan


Article 710

Title@2025-06-30 (1): Value-Free Policy Optimization via Reward Partitioning

Title: Value-Free Policy Optimization via Reward Partitioning Wertfreie Politikoptimierung über Belohnungspartitionierung 无价值政策优化,通过评分分分股 2506.13702v2

Authors (3): Bilal Faye, Hanane Azzag, Mustapha Lebbah

Single-trajectory reinforcement learning (RL) methods aim to optimize policies from datasets consisting of (prompt, response, reward) triplets, where scalar rewards are directly available. This supervision format is highly practical, as it mirrors real-world human feedback, such as thumbs-up/down signals, and avoids the need for structured preference annotations. In contrast, pairwise preference-based methods like Direct Preference Optimization (DPO) rely on datasets with both preferred and dispreferred responses, which are harder to construct and less natural to collect. Among single-trajectory approaches, Direct Reward Optimization (DRO) has shown strong empirical performance due to its simplicity and stability. However, DRO requires approximating a value function, which introduces several limitations: high off-policy variance, coupling between policy and value learning, and a lack of absolute supervision on the policy itself. We introduce Reward Partitioning Optimization (RPO), a new method that resolves these limitations by removing the need to model the value function. Instead, RPO normalizes observed rewards using a partitioning approach estimated directly from data. This leads to a straightforward supervised learning objective on the policy, with no auxiliary models and no joint optimization. RPO provides direct and stable supervision on the policy, making it robust and easy to implement in practice. We validate RPO on scalar-feedback language modeling tasks using Flan-T5 encoder-decoder models. Our results demonstrate that RPO outperforms existing single-trajectory baselines such as DRO and Kahneman-Tversky Optimization (KTO). These findings confirm that RPO is a simple, effective, and theoretically grounded method for single-trajectory policy optimization.

nan


Article 711

Title@2025-06-30 (1): Recovering Imbalanced Clusters via Gradient-Based Projection Pursuit

Title: Recovering Imbalanced Clusters via Gradient-Based Projection Pursuit Wiederherstellung unausgewogener Cluster über gradient-based Projektionsverfolgung 通过基于渐进的预测追求恢复平衡的群集 2502.02668v2

Authors (3): Martin Eppert, Satyaki Mukherjee, Debarghya Ghoshdastidar

Projection Pursuit is a classic exploratory technique for finding interesting projections of a dataset. We propose a method for recovering projections containing either Imbalanced Clusters or a Bernoulli-Rademacher distribution using a gradient-based technique to optimize the projection index. As sample complexity is a major limiting factor in Projection Pursuit, we analyze our algorithm’s sample complexity within a Planted Vector setting where we can observe that Imbalanced Clusters can be recovered more easily than balanced ones. Additionally, we give a generalized result that works for a variety of data distributions and projection indices. We compare these results to computational lower bounds in the Low-Degree-Polynomial Framework. Finally, we experimentally evaluate our method’s applicability to real-world data using FashionMNIST and the Human Activity Recognition Dataset, where our algorithm outperforms others when only a few samples are available.

nan


Article 712

Title@2025-06-30 (1): Online model learning with data-assimilated reservoir computers

Title: Online model learning with data-assimilated reservoir computers Online-Modelllernen mit datengestützten Reservoir-Computern 利用数据扩散储油库计算机进行在线示范学习 2504.16767v2

Authors (2): Andrea Nóvoa, Luca Magri

We propose an online learning framework for forecasting nonlinear spatio-temporal signals (fields). The method integrates (i) dimensionality reduction, here, a simple proper orthogonal decomposition (POD) projection; (ii) a generalized autoregressive model to forecast reduced dynamics, here, a reservoir computer; (iii) online adaptation to update the reservoir computer (the model), here, ensemble sequential data assimilation. We demonstrate the framework on a wake past a cylinder governed by the Navier-Stokes equations, exploring the assimilation of full flow fields (projected onto POD modes) and sparse sensors. Three scenarios are examined: a na"ive physical state estimation; a two-fold estimation of physical and reservoir states; and a three-fold estimation that also adjusts the model parameters. The two-fold strategy significantly improves ensemble convergence and reduces reconstruction error compared to the na"ive approach. The three-fold approach enables robust online training of partially-trained reservoir computers, overcoming limitations of a priori training. By unifying data-driven reduced order modelling with Bayesian data assimilation, this work opens new opportunities for scalable online model learning for nonlinear time series forecasting.

nan


Article 713

Title@2025-06-30 (1): Refine-POI: Reinforcement Fine-Tuned Large Language Models for Next Point-of-Interest Recommendation

Title: Refine-POI: Reinforcement Fine-Tuned Large Language Models for Next Point-of-Interest Recommendation Verfeinerungs-POI: Verstärkte Feinabstimmungen für große Sprachmodelle für die nächste Point-of-Interest-Empfehlung 参考-POI:为下一个利得点建议加强精微推荐的大型语言模式 2506.21599v2

Authors (8): Peibo Li, Shuang Ao, Hao Xue, Yang Song, Maarten de Rijke, Johan Barthélemy, Tomasz Bednarz, Flora D. Salim

Large language models (LLMs) have been adopted for next point-of-interest (POI) recommendation tasks. Typical LLM-based recommenders fall into two categories: prompt-based and supervised fine-tuning (SFT)-based models. Prompt-based models generally offer greater output flexibility but deliver lower accuracy, whereas SFT-based models achieve higher performance yet face a fundamental mismatch: next POI recommendation data does not naturally suit supervised fine-tuning. In SFT, the model is trained to reproduce the exact ground truth, but each training example provides only a single target POI, so there is no ground truth for producing a top-k list. To address this, we propose Refine-POI, a reinforcement fine-tuning framework for next POI recommendation. We introduce recommendation-driven rewards that enable LLMs to learn to generate top-k recommendation lists using only one ground-truth POI per example. Experiments on real-world datasets demonstrate that Refine-POI achieves state-of-the-art top-k recommendation performance.

nan


Article 714

Title@2025-06-30 (1): Radioactive Watermarks in Diffusion and Autoregressive Image Generative Models

Title: Radioactive Watermarks in Diffusion and Autoregressive Image Generative Models Radioaktive Wasserzeichen in Diffusion und autoregressiven Bildgenerativen Modellen 扩散和自动递减图像生成模型中放射性水标记 2506.23731v1

Authors (4): Michel Meintz, Jan Dubiński, Franziska Boenisch, Adam Dziedzic

Image generative models have become increasingly popular, but training them requires large datasets that are costly to collect and curate. To circumvent these costs, some parties may exploit existing models by using the generated images as training data for their own models. In general, watermarking is a valuable tool for detecting unauthorized use of generated images. However, when these images are used to train a new model, watermarking can only enable detection if the watermark persists through training and remains identifiable in the outputs of the newly trained model - a property known as radioactivity. We analyze the radioactivity of watermarks in images generated by diffusion models (DMs) and image autoregressive models (IARs). We find that existing watermarking methods for DMs fail to retain radioactivity, as watermarks are either erased during encoding into the latent space or lost in the noising-denoising process (during the training in the latent space). Meanwhile, despite IARs having recently surpassed DMs in image generation quality and efficiency, no radioactive watermarking methods have been proposed for them. To overcome this limitation, we propose the first watermarking method tailored for IARs and with radioactivity in mind - drawing inspiration from techniques in large language models (LLMs), which share IARs’ autoregressive paradigm. Our extensive experimental evaluation highlights our method’s effectiveness in preserving radioactivity within IARs, enabling robust provenance tracking, and preventing unauthorized use of their generated images.

nan


Article 715

Title@2025-06-30 (1): System-Embedded Diffusion Bridge Models

Title: System-Embedded Diffusion Bridge Models Modelle der System-Eingebetteten Diffusionsbrücke 系统嵌入式扩散桥模型 2506.23726v1

Authors (8): Bartlomiej Sobieski, Matthew Tivnan, Yuang Wang, Siyeop Yoon, Pengfei Jin, Dufan Wu, Quanzheng Li, Przemyslaw Biecek

Solving inverse problems – recovering signals from incomplete or noisy measurements – is fundamental in science and engineering. Score-based generative models (SGMs) have recently emerged as a powerful framework for this task. Two main paradigms have formed: unsupervised approaches that adapt pretrained generative models to inverse problems, and supervised bridge methods that train stochastic processes conditioned on paired clean and corrupted data. While the former typically assume knowledge of the measurement model, the latter have largely overlooked this structural information. We introduce System embedded Diffusion Bridge Models (SDBs), a new class of supervised bridge methods that explicitly embed the known linear measurement system into the coefficients of a matrix-valued SDE. This principled integration yields consistent improvements across diverse linear inverse problems and demonstrates robust generalization under system misspecification between training and deployment, offering a promising solution to real-world applications.

nan


Article 716

Title@2025-06-30 (1): Deep Learning-Based Semantic Segmentation for Real-Time Kidney Imaging and Measurements with Augmented Reality-Assisted Ultrasound

Title: Deep Learning-Based Semantic Segmentation for Real-Time Kidney Imaging and Measurements with Augmented Reality-Assisted Ultrasound Deep Learning-based Semantic Segmentation für Echtzeit-Kidney-Imaging und Messungen mit Augmented Reality-Assisted Ultrasound 实时肾脏成像和计量增强现实辅助超声波 2506.23721v1

Authors (8): Gijs Luijten, Roberto Maria Scardigno, Lisle Faray de Paiva, Peter Hoyer, Jens Kleesiek, Domenico Buongiorno, Vitoantonio Bevilacqua, Jan Egger

Ultrasound (US) is widely accessible and radiation-free but has a steep learning curve due to its dynamic nature and non-standard imaging planes. Additionally, the constant need to shift focus between the US screen and the patient poses a challenge. To address these issues, we integrate deep learning (DL)-based semantic segmentation for real-time (RT) automated kidney volumetric measurements, which are essential for clinical assessment but are traditionally time-consuming and prone to fatigue. This automation allows clinicians to concentrate on image interpretation rather than manual measurements. Complementing DL, augmented reality (AR) enhances the usability of US by projecting the display directly into the clinician’s field of view, improving ergonomics and reducing the cognitive load associated with screen-to-patient transitions. Two AR-DL-assisted US pipelines on HoloLens-2 are proposed: one streams directly via the application programming interface for a wireless setup, while the other supports any US device with video output for broader accessibility. We evaluate RT feasibility and accuracy using the Open Kidney Dataset and open-source segmentation models (nnU-Net, Segmenter, YOLO with MedSAM and LiteMedSAM). Our open-source GitHub pipeline includes model implementations, measurement algorithms, and a Wi-Fi-based streaming solution, enhancing US training and diagnostics, especially in point-of-care settings.

nan


Article 717

Title@2025-06-30 (1): DABstep: Data Agent Benchmark for Multi-step Reasoning

Title: DABstep: Data Agent Benchmark for Multi-step Reasoning DABstep: Data Agent Benchmark für mehrstufige Vernunft DABstep:多步理由数据代理基准 2506.23719v1

Authors (6): Alex Egg, Martin Iglesias Goyanes, Friso Kingma, Andreu Mora, Leandro von Werra, Thomas Wolf

We introduce DABstep, a novel benchmark for evaluating AI agents on realistic multi-step data analysis tasks. DABstep comprises over 450 real-world challenges derived from a financial analytics platform, requiring models to combine code-based data processing with contextual reasoning over heterogeneous documentation. Each task demands an iterative, multi-step problem-solving approach, testing capabilities in data manipulation, cross-referencing multiple sources, and precise result reporting. The benchmark provides a factoid-style answer format with automatic correctness checks for objective scoring at scale. We evaluate leading LLM-based agents, revealing a substantial performance gap: even the best agent achieves only 14.55% accuracy on the hardest tasks. We detail our benchmark’s design, dataset composition, task formulation, evaluation protocol, report baseline results and analyze failure modes. DABstep is released with a public leaderboard and toolkit to accelerate research in autonomous data analysis.

nan


Article 718

Title@2025-06-30 (1): Towards Efficient and Accurate Spiking Neural Networks via Adaptive Bit Allocation

Title: Towards Efficient and Accurate Spiking Neural Networks via Adaptive Bit Allocation Auf dem Weg zu effizienten und präzisen Spiking Neuronal Networks über adaptive Bit Allocation 通过适应性比位分配实现高效和准确的Spiking神经网络 2506.23717v1

Authors (7): Xingting Yao, Qinghao Hu, Fei Zhou, Tielong Liu, Gang Li, Peisong Wang, Jian Cheng

Multi-bit spiking neural networks (SNNs) have recently become a heated research spot, pursuing energy-efficient and high-accurate AI. However, with more bits involved, the associated memory and computation demands escalate to the point where the performance improvements become disproportionate. Based on the insight that different layers demonstrate different importance and extra bits could be wasted and interfering, this paper presents an adaptive bit allocation strategy for direct-trained SNNs, achieving fine-grained layer-wise allocation of memory and computation resources. Thus, SNN’s efficiency and accuracy can be improved. Specifically, we parametrize the temporal lengths and the bit widths of weights and spikes, and make them learnable and controllable through gradients. To address the challenges caused by changeable bit widths and temporal lengths, we propose the refined spiking neuron, which can handle different temporal lengths, enable the derivation of gradients for temporal lengths, and suit spike quantization better. In addition, we theoretically formulate the step-size mismatch problem of learnable bit widths, which may incur severe quantization errors to SNN, and accordingly propose the step-size renewal mechanism to alleviate this issue. Experiments on various datasets, including the static CIFAR and ImageNet and the dynamic CIFAR-DVS and DVS-GESTURE, demonstrate that our methods can reduce the overall memory and computation cost while achieving higher accuracy. Particularly, our SEWResNet-34 can achieve a 2.69\% accuracy gain and 4.16$\times$ lower bit budgets over the advanced baseline work on ImageNet. This work will be fully open-sourced.

nan


Article 719

Title@2025-06-30 (1): Rethinking Aleatoric and Epistemic Uncertainty

Title: Rethinking Aleatoric and Epistemic Uncertainty Rethinking Aleatoric und Epistemic Uncertainty 重新思考疏离和分解不确定性 2412.20892v2

Authors (6): Freddie Bickford Smith, Jannik Kossen, Eleanor Trollope, Mark van der Wilk, Adam Foster, Tom Rainforth

The ideas of aleatoric and epistemic uncertainty are widely used to reason about the probabilistic predictions of machine-learning models. We identify incoherence in existing discussions of these ideas and suggest this stems from the aleatoric-epistemic view being insufficiently expressive to capture all the distinct quantities that researchers are interested in. To address this we present a decision-theoretic perspective that relates rigorous notions of uncertainty, predictive performance and statistical dispersion in data. This serves to support clearer thinking as the field moves forward. Additionally we provide insights into popular information-theoretic quantities, showing they can be poor estimators of what they are often purported to measure, while also explaining how they can still be useful in guiding data acquisition.

nan


Article 720

Title@2025-06-30 (1): SmileyLlama: Modifying Large Language Models for Directed Chemical Space Exploration

Title: SmileyLlama: Modifying Large Language Models for Directed Chemical Space Exploration SmileyLlama: Ändern von großen Sprachmodellen für gerichtete chemische Weltraumforschung SmileyLlama:修改用于定向化学空间探索的大型语言模型 2409.02231v3

Authors (7): Joseph M. Cavanagh, Kunyang Sun, Andrew Gritsevskiy, Dorian Bagni, Yingze Wang, Thomas D. Bannister, Teresa Head-Gordon

Here we show that a general-purpose large language model (LLM) chatbot, Llama-3.1-8B-Instruct, can be transformed via supervised fine-tuning of engineered prompts into a chemical language model (CLM), SmileyLlama, for molecule generation. We benchmark SmileyLlama by comparing it to CLMs trained from scratch on large amounts of ChEMBL data for their ability to generate valid and novel drug-like molecules. We also use direct preference optimization to both improve SmileyLlama’s adherence to a prompt and to generate molecules within the iMiner reinforcement learning framework to predict new drug molecules with optimized 3D conformations and high binding affinity to drug targets, illustrated with the SARS-Cov-2 Main Protease. This overall framework allows a LLM to speak directly as a CLM which can generate molecules with user-specified properties, rather than acting only as a chatbot with knowledge of chemistry or as a helpful virtual assistant. While our dataset and analyses are geared toward drug discovery, this general procedure can be extended to other chemical applications such as chemical synthesis.

nan


Article 721

Title@2025-06-30 (1): Methodology for an Analysis of Influencing Factors on 3D Object Detection Performance

Title: Methodology for an Analysis of Influencing Factors on 3D Object Detection Performance Methodik für eine Analyse von Einflussfaktoren auf die 3D-Objekterkennungsleistung 3D物体探测性能影响因素分析方法 2411.08482v3

Authors (3): Anton Kuznietsov, Dirk Schweickard, Steven Peters

In automated driving, object detection is crucial for perceiving the environment. Although deep learning-based detectors offer high performance, their black-box nature complicates safety assurance. We propose a novel methodology to analyze how object- and environment-related factors affect LiDAR- and camera-based 3D object detectors. A statistical univariate analysis relates each factor to pedestrian detection errors. Additionally, a Random Forest (RF) model predicts errors from meta-information, with Shapley Values interpreting feature importance. By capturing feature dependencies, the RF enables a nuanced analysis of detection errors. Understanding these factors reveals detector performance gaps and supports safer object detection system development.

nan


Article 722

Title@2025-06-30 (1): Error Optimization: Overcoming Exponential Signal Decay in Deep Predictive Coding Networks

Title: Error Optimization: Overcoming Exponential Signal Decay in Deep Predictive Coding Networks Fehler-Optimierung: Überwindung exponentieller Signaldekay in tiefen vorausschauenden Codierungsnetzwerken 错误 优化 : 克服深预报编码网络中的指数信号衰减 2505.20137v2

Authors (4): Cédric Goemaere, Gaspard Oliviers, Rafal Bogacz, Thomas Demeester

Predictive Coding (PC) offers a biologically plausible alternative to backpropagation for neural network training, yet struggles with deeper architectures. This paper identifies the root cause: an inherent signal decay problem where gradients attenuate exponentially with depth, becoming computationally negligible due to numerical precision constraints. To address this fundamental limitation, we introduce Error Optimization (EO), a novel reparameterization that preserves PC’s theoretical properties while eliminating signal decay. By optimizing over prediction errors rather than states, EO enables signals to reach all layers simultaneously and without attenuation, converging orders of magnitude faster than standard PC. Experiments across multiple architectures and datasets demonstrate that EO matches backpropagation’s performance even for deeper models where conventional PC struggles. Besides practical improvements, our work provides theoretical insight into PC dynamics and establishes a foundation for scaling biologically-inspired learning to deeper architectures on digital hardware and beyond.

nan


Article 723

Title@2025-06-30 (1): Learning Modular Exponentiation with Transformers

Title: Learning Modular Exponentiation with Transformers Modulare Exponentiation mit Transformern lernen 与变换器一起学习模块化指数 2506.23679v1

Authors (3): David Demitri Africa, Sara M. Kapoor, Theo Simon Sorg

Modular exponentiation is crucial to number theory and cryptography, yet remains largely unexplored from a mechanistic interpretability standpoint. We train a 4-layer encoder-decoder Transformer model to perform this operation and investigate the emergence of numerical reasoning during training. Utilizing principled sampling strategies, PCA-based embedding analysis, and activation patching, we examine how number-theoretic properties are encoded within the model. We find that reciprocal operand training leads to strong performance gains, with sudden generalization across related moduli. These synchronized accuracy surges reflect grokking-like dynamics, suggesting the model internalizes shared arithmetic structure. We also find a subgraph consisting entirely of attention heads in the final layer sufficient to achieve full performance on the task of regular exponentiation. These results suggest that transformer models learn modular arithmetic through specialized computational circuits, paving the way for more interpretable and efficient neural approaches to modular exponentiation.

nan


Article 724

Title@2025-06-30 (1): Thinking About Thinking: SAGE-nano’s Inverse Reasoning for Self-Aware Language Models

Title: Thinking About Thinking: SAGE-nano’s Inverse Reasoning for Self-Aware Language Models Denken über das Denken: SAGE-nano’s Inverse Reasoning for Self-Aware Language Models 思考思考:SAGE-nano 自我意识语言模型的反向理由 2507.00092v1

Authors (6): Basab Jha, Firoj Paudel, Ujjwal Puri, Zhang Yuting, Choi Donghyuk, Wang Junhao

Large Language Models (LLMs) have demonstrated remarkable capabilities at solving complex reasoning tasks with Chain-of-Thought (CoT) prompting, but their decision-making processes remain somewhat blackbox. We introduce textbfinverse reasoning, a novel paradigm enabling LLMs to decompose and explain their own reasoning chains post-hoc. Our approach, used in SAGE-nano, a 4-billion-parameter reasoning model, employs a metacognitive structure that reflects back via attention processes to identify major decision points and generate explanations of reasoning choices. While typical CoT approaches are directed towards forward reasoning generation, inverse reasoning provides insight into why specific reasoning chains were selected over others. Through thorough testing of logical reasoning puzzles, math problems and ethical dilemmas from AQUA-RAT, CommonsenseQA, and customized benchmarks, we demonstrate that SAGE-nano is at the cutting edge both on reasoning accuracy (74.6% on AQUA-RAT) and explanation quality (92.1% human preference score) for its task, and offers performance almost on par with models like Claude-3.5 Sonnet or GPT-4o. Our contributions are: (i) the first rigorous framework for LLM self-reflection via inverse reasoning, (ii) a novel metalearning framework to reverse the attention flow, (iii) comprehensive evaluation frameworks for reasoning transparency, and (iv) evidence that increasing reasoning using inverse reasoning improves interpretability along with reasoning performance. Our work creates new avenues for transparent AI systems and closes significant gaps in AI safety, education, and scientific discovery.

nan


Article 725

Title@2025-06-30 (1): Sparsing Law: Towards Large Language Models with Greater Activation Sparsity

Title: Sparsing Law: Towards Large Language Models with Greater Activation Sparsity Sparsing Law: Auf dem Weg zu großen Sprachmodellen mit größerer Aktivierungssparsität 评分法:走向大语言模式,具有更大的激活率平等性 2411.02335v4

Authors (10): Yuqi Luo, Chenyang Song, Xu Han, Yingfa Chen, Chaojun Xiao, Xiaojun Meng, Liqun Deng, Jiansheng Wei, Zhiyuan Liu, Maosong Sun

Activation sparsity denotes the existence of substantial weakly-contributed elements within activation outputs that can be eliminated, benefiting many important applications concerned with large language models (LLMs). Although promoting greater activation sparsity within LLMs deserves deep studies, existing works lack comprehensive and quantitative research on the correlation between activation sparsity and potentially influential factors. In this paper, we present a comprehensive study on the quantitative scaling properties and influential factors of the activation sparsity within decoder-only Transformer-based LLMs. Specifically, we propose PPL-$p\%$ sparsity, a precise and performance-aware activation sparsity metric that is applicable to any activation function. Through extensive experiments, we find several important phenomena. Firstly, different activation functions exhibit comparable performance but opposite training-time sparsity trends. The activation ratio (i.e., $1-\mathrm{sparsity\ ratio}$) evolves as a convergent increasing power-law and decreasing logspace power-law with the amount of training data for SiLU-activated and ReLU-activated LLMs, respectively. These demonstrate that ReLU is more efficient as the activation function than SiLU and can leverage more training data to improve activation sparsity. Secondly, the activation ratio linearly increases with the width-depth ratio below a certain bottleneck point, indicating the potential advantage of a deeper architecture at a fixed parameter scale. Finally, at similar width-depth ratios, we surprisingly find that the limit value of activation sparsity varies weakly with the parameter scale, i.e., the activation patterns within LLMs are insensitive to the parameter scale. These empirical laws towards LLMs with greater activation sparsity have important implications for making LLMs more efficient and interpretable.

nan


Article 726

Title@2025-06-30 (1): Generating Heterogeneous Multi-dimensional Data : A Comparative Study

Title: Generating Heterogeneous Multi-dimensional Data : A Comparative Study Heterogene mehrdimensionale Daten generieren: Eine vergleichende Studie 生成异质多维数据:比较研究 2507.00090v1

Authors (4): Corbeau Michael, Claeys Emmanuelle, Serrurier Mathieu, Zaraté Pascale

Allocation of personnel and material resources is highly sensible in the case of firefighter interventions. This allocation relies on simulations to experiment with various scenarios. The main objective of this allocation is the global optimization of the firefighters response. Data generation is then mandatory to study various scenarios In this study, we propose to compare different data generation methods. Methods such as Random Sampling, Tabular Variational Autoencoders, standard Generative Adversarial Networks, Conditional Tabular Generative Adversarial Networks and Diffusion Probabilistic Models are examined to ascertain their efficacy in capturing the intricacies of firefighter interventions. Traditional evaluation metrics often fall short in capturing the nuanced requirements of synthetic datasets for real-world scenarios. To address this gap, an evaluation of synthetic data quality is conducted using a combination of domain-specific metrics tailored to the firefighting domain and standard measures such as the Wasserstein distance. Domain-specific metrics include response time distribution, spatial-temporal distribution of interventions, and accidents representation. These metrics are designed to assess data variability, the preservation of fine and complex correlations and anomalies such as event with a very low occurrence, the conformity with the initial statistical distribution and the operational relevance of the synthetic data. The distribution has the particularity of being highly unbalanced, none of the variables following a Gaussian distribution, adding complexity to the data generation process.

nan


Article 727

Title@2025-06-30 (1): On the Domain Robustness of Contrastive Vision-Language Models

Title: On the Domain Robustness of Contrastive Vision-Language Models Auf dem Gebiet Robustheit von kontrastiven Vision-Sprachen-Modellen 关于不同愿景-语言模型域的强力 2506.23663v1

Authors (4): Mario Koddenbrock, Rudolf Hoffmann, David Brodmann, Erik Rodner

In real-world vision-language applications, practitioners increasingly rely on large, pretrained foundation models rather than custom-built solutions, despite limited transparency regarding their training data and processes. While these models achieve impressive performance on general benchmarks, their effectiveness can decline notably under specialized domain shifts, such as unique imaging conditions or environmental variations. In this work, we introduce Deepbench, a framework designed to assess domain-specific robustness of vision-language models (VLMs). Deepbench leverages a large language model (LLM) to generate realistic, context-aware image corruptions tailored to specific deployment domains without requiring labeled data. We evaluate a range of contrastive vision-language architectures and architectural variants across six real-world domains and observe substantial variability in robustness, highlighting the need for targeted, domain-aware evaluation. Deepbench is released as open-source software to support further research into domain-aware robustness assessment.

nan


Article 728

Title@2025-06-30 (1): Modular Distributed Nonconvex Learning with Error Feedback

Title: Modular Distributed Nonconvex Learning with Error Feedback Modulares verteiltes Nonconvex-Lernen mit Fehler-Feedback 含有错误反馈的模块分布式非 Confonvex 学习 2503.14055v2

Authors (2): Guido Carnevale, Nicola Bastianello

In this paper, we design a novel distributed learning algorithm using stochastic compressed communications. In detail, we pursue a modular approach, merging ADMM and a gradient-based approach, benefiting from the robustness of the former and the computational efficiency of the latter. Additionally, we integrate a stochastic integral action (error feedback) enabling almost sure rejection of the compression error. We analyze the resulting method in nonconvex scenarios and guarantee almost sure asymptotic convergence to the set of stationary points of the problem. This result is obtained using system-theoretic tools based on stochastic timescale separation. We corroborate our findings with numerical simulations in nonconvex classification.

nan


Article 729

Title@2025-06-30 (1): Robust LLM Unlearning with MUDMAN: Meta-Unlearning with Disruption Masking And Normalization

Title: Robust LLM Unlearning with MUDMAN: Meta-Unlearning with Disruption Masking And Normalization Robustes LLM-Unlearning mit MUDMAN: Meta-Unlearning mit Disruptionsmasken und Normalisierung 与 MUDMAN 一起重新学习: 以干扰蒙蔽和正常化的方式重新学习 2506.12484v3

Authors (4): Filip Sondej, Yushi Yang, Mikołaj Kniejski, Marcel Windys

Language models can retain dangerous knowledge and skills even after extensive safety fine-tuning, posing both misuse and misalignment risks. Recent studies show that even specialized unlearning methods can be easily reversed. To address this, we systematically evaluate many existing and novel components of unlearning methods and identify ones crucial for irreversible unlearning. We introduce Disruption Masking, a technique in which we only allow updating weights, where the signs of the unlearning gradient and the retaining gradient are the same. This ensures all updates are non-disruptive. Additionally, we identify the need for normalizing the unlearning gradients, and also confirm the usefulness of meta-learning. We combine these insights into MUDMAN (Meta-Unlearning with Disruption Masking and Normalization) and validate its effectiveness at preventing the recovery of dangerous capabilities. MUDMAN outperforms the prior TAR method by 40%, setting a new state-of-the-art for robust unlearning.

nan


Article 730

Title@2025-06-30 (1): A new machine learning framework for occupational accidents forecasting with safety inspections integration

Title: A new machine learning framework for occupational accidents forecasting with safety inspections integration Ein neuer Rahmen für maschinelles Lernen bei Arbeitsunfällen, die mit der Integration von Sicherheitsinspektionen prognostiziert werden 新的工伤事故预测机械学习新框架,安全检查一体化 2507.00089v1

Authors (4): Aho Yapi, Pierre Latouche, Arnaud Guillin, Yan Bailly

We propose a generic framework for short-term occupational accident forecasting that leverages safety inspections and models accident occurrences as binary time series. The approach generates daily predictions, which are then aggregated into weekly safety assessments to better inform decision making. To ensure the reliability and operational applicability of the forecasts, we apply a sliding-window cross-validation procedure specifically designed for time series data, combined with an evaluation based on aggregated period-level metrics. Several machine learning algorithms, including logistic regression, tree-based models, and neural networks, are trained and systematically compared within this framework. Unlike the other approaches, the long short-term memory (LSTM) network outperforms the other approaches and detects the upcoming high-risk periods with a balanced accuracy of 0.86, confirming the robustness of our methodology and demonstrating that a binary time series model can anticipate these critical periods based on safety inspections. The proposed methodology converts routine safety inspection data into clear weekly risk scores, detecting the periods when accidents are most likely. Decision-makers can integrate these scores into their planning tools to classify inspection priorities, schedule targeted interventions, and funnel resources to the sites or shifts classified as highest risk, stepping in before incidents occur and getting the greatest return on safety investments.

nan


Article 731

Title@2025-06-30 (1): GL-LowPopArt: A Nearly Instance-Wise Minimax-Optimal Estimator for Generalized Low-Rank Trace Regression

Title: GL-LowPopArt: A Nearly Instance-Wise Minimax-Optimal Estimator for Generalized Low-Rank Trace Regression GL-LowPopArt: Ein nahezu instance-weiser Minimax-Optimal-Estimator für generalisierte Low-Rank-Trace-Regression GL-LowPopArt: 通用低射线回归近似实例- Wisis Minimax- 最佳最佳模拟器 2506.03074v4

Authors (5): Junghyun Lee, Kyoungseok Jang, Kwang-Sung Jun, Milan Vojnović, Se-Young Yun

We present GL-LowPopArt, a novel Catoni-style estimator for generalized low-rank trace regression. Building on LowPopArt (Jang et al., 2024), it employs a two-stage approach: nuclear norm regularization followed by matrix Catoni estimation. We establish state-of-the-art estimation error bounds, surpassing existing guarantees (Fan et al., 2019; Kang et al., 2022), and reveal a novel experimental design objective, $\mathrm{GL}(\pi)$. The key technical challenge is controlling bias from the nonlinear inverse link function, which we address by our two-stage approach. We prove a local minimax lower bound, showing that our GL-LowPopArt enjoys instance-wise optimality up to the condition number of the ground-truth Hessian. Applications include generalized linear matrix completion, where GL-LowPopArt achieves a state-of-the-art Frobenius error guarantee, and bilinear dueling bandits, a novel setting inspired by general preference learning (Zhang et al., 2024). Our analysis of a GL-LowPopArt-based explore-then-commit algorithm reveals a new, potentially interesting problem-dependent quantity, along with improved Borda regret bound than vectorization (Wu et al., 2024).

nan


Article 732

Title@2025-06-30 (1): Geminet: Learning the Duality-based Iterative Process for Lightweight Traffic Engineering in Changing Topologies

Title: Geminet: Learning the Duality-based Iterative Process for Lightweight Traffic Engineering in Changing Topologies Geminet: Lernen des duality-basierten Iterativprozesses für Leichtbau in wechselnden Topologien Geminet:学习基于质量的循环程序,以在变化的地形学中轻量级交通工程 2506.23640v1

Authors (3): Ximeng Liu, Shizhen Zhao, Xinbing Wang

Recently, researchers have explored ML-based Traffic Engineering (TE), leveraging neural networks to solve TE problems traditionally addressed by optimization. However, existing ML-based TE schemes remain impractical: they either fail to handle topology changes or suffer from poor scalability due to excessive computational and memory overhead. To overcome these limitations, we propose Geminet, a lightweight and scalable ML-based TE framework that can handle changing topologies. Geminet is built upon two key insights: (i) a methodology that decouples neural networks from topology by learning an iterative gradient-descent-based adjustment process, as the update rule of gradient descent is topology-agnostic, relying only on a few gradient-related quantities; (ii) shifting optimization from path-level routing weights to edge-level dual variables, reducing memory consumption by leveraging the fact that edges are far fewer than paths. Evaluations on WAN and data center datasets show that Geminet significantly improves scalability. Its neural network size is only 0.04% to 7% of existing schemes, while handling topology variations as effectively as HARP, a state-of-the-art ML-based TE approach, without performance degradation. When trained on large-scale topologies, Geminet consumes under 10 GiB of memory, more than eight times less than the 80-plus GiB required by HARP, while achieving 5.45 times faster convergence speed, demonstrating its potential for large-scale deployment.

nan


Article 733

Title@2025-06-30 (1): A Nonlinear Low-rank Representation Model with Convolutional Neural Network for Imputing Water Quality Data

Title: A Nonlinear Low-rank Representation Model with Convolutional Neural Network for Imputing Water Quality Data Ein nichtlineares Low-Rank-Darstellungsmodell mit konvolutionärem Neuralnetzwerk zur Imputierung von Wasserqualitätsdaten 非线性低级别代表模式,与计算水质数据革命神经网络 2506.23629v1

Authors (3): Xin Liao, Bing Yang, Cai Yu

The integrity of Water Quality Data (WQD) is critical in environmental monitoring for scientific decision-making and ecological protection. However, water quality monitoring systems are often challenged by large amounts of missing data due to unavoidable problems such as sensor failures and communication delays, which further lead to water quality data becoming High-Dimensional and Sparse (HDS). Traditional data imputation methods are difficult to depict the potential dynamics and fail to capture the deep data features, resulting in unsatisfactory imputation performance. To effectively address the above issues, this paper proposes a Nonlinear Low-rank Representation model (NLR) with Convolutional Neural Networks (CNN) for imputing missing WQD, which utilizes CNNs to implement two ideas: a) fusing temporal features to model the temporal dependence of data between time slots, and b) Extracting nonlinear interactions and local patterns to mine higher-order relationships features and achieve deep fusion of multidimensional information. Experimental studies on three real water quality datasets demonstrate that the proposed model significantly outperforms existing state-of-the-art data imputation models in terms of estimation accuracy. It provides an effective approach for handling water quality monitoring data in complex dynamic environments.

nan


Article 734

Title@2025-06-30 (1): Brain Tumor Detection through Thermal Imaging and MobileNET

Title: Brain Tumor Detection through Thermal Imaging and MobileNET Gehirntumorerkennung durch Thermische Bildgebung und MobileNET 通过热成像和移动网检测脑肿瘤 2506.23627v1

Authors (2): Roham Maiti, Debasmita Bhoumik

Brain plays a crucial role in regulating body functions and cognitive processes, with brain tumors posing significant risks to human health. Precise and prompt detection is a key factor in proper treatment and better patient outcomes. Traditional methods for detecting brain tumors, that include biopsies, MRI, and CT scans often face challenges due to their high costs and the need for specialized medical expertise. Recent developments in machine learning (ML) and deep learning (DL) has exhibited strong capabilities in automating the identification and categorization of brain tumors from medical images, especially MRI scans. However, these classical ML models have limitations, such as high computational demands, the need for large datasets, and long training times, which hinder their accessibility and efficiency. Our research uses MobileNET model for efficient detection of these tumors. The novelty of this project lies in building an accurate tumor detection model which use less computing re-sources and runs in less time followed by efficient decision making through the use of image processing technique for accurate results. The suggested method attained an average accuracy of 98.5%.

nan


Article 735

Title@2025-06-30 (1): Evaluating K-Fold Cross Validation for Transformer Based Symbolic Regression Models

Title: Evaluating K-Fold Cross Validation for Transformer Based Symbolic Regression Models Bewertung der K-Fold Cross-Validierung für Transformer-basierte symbolische Regressionsmodelle 评估基于变换器的符号回归模型的 K- Fold 交叉验证 2410.21896v2

Authors (7): Kaustubh Kislay, Shlok Singh, Soham Joshi, Rohan Dutta, Jay Shim, George Flint, Kevin Zhu

Symbolic Regression remains an NP-Hard problem, with extensive research focusing on AI models for this task. Transformer models have shown promise in Symbolic Regression, but performance suffers with smaller datasets. We propose applying k-fold cross-validation to a transformer-based symbolic regression model trained on a significantly reduced dataset (15,000 data points, down from 500,000). This technique partitions the training data into multiple subsets (folds), iteratively training on some while validating on others. Our aim is to provide an estimate of model generalization and mitigate overfitting issues associated with smaller datasets. Results show that this process improves the model’s output consistency and generalization by a relative improvement in validation loss of 53.31%. Potentially enabling more efficient and accessible symbolic regression in resource-constrained environments.

nan


Article 736

Title@2025-06-30 (1): pUniFind: a unified large pre-trained deep learning model pushing the limit of mass spectra interpretation

Title: pUniFind: a unified large pre-trained deep learning model pushing the limit of mass spectra interpretation pUniFind: ein einheitliches großes vortrainiertes Deep-Learning-Modell, das die Grenze der Massenspektreninterpretation sprengt pUniFind:一个统一的大型预先培训的深层学习模式,推进质量光谱解释的极限 2507.00087v1

Authors (15): Jiale Zhao, Pengzhi Mao, Kaifei Wang, Yiming Li, Yaping Peng, Ranfei Chen, Shuqi Lu, Xiaohong Ji, Jiaxiang Ding, Xin Zhang, Yucheng Liao, Weinan E, Weijie Zhang, Han Wen, Hao Chi

Deep learning has advanced mass spectrometry data interpretation, yet most models remain feature extractors rather than unified scoring frameworks. We present pUniFind, the first large-scale multimodal pre-trained model in proteomics that integrates end-to-end peptide-spectrum scoring with open, zero-shot de novo sequencing. Trained on over 100 million open search-derived spectra, pUniFind aligns spectral and peptide modalities via cross modality prediction and outperforms traditional engines across diverse datasets, particularly achieving a 42.6 percent increase in the number of identified peptides in immunopeptidomics. Supporting over 1,300 modifications, pUniFind identifies 60 percent more PSMs than existing de novo methods despite a 300-fold larger search space. A deep learning based quality control module further recovers 38.5 percent additional peptides including 1,891 mapped to the genome but absent from reference proteomes while preserving full fragment ion coverage. These results establish a unified, scalable deep learning framework for proteomic analysis, offering improved sensitivity, modification coverage, and interpretability.

nan


Article 737

Title@2025-06-30 (1): Overparametrized models with posterior drift

Title: Overparametrized models with posterior drift Überparametrisierte Modelle mit posteriorer Drift 带有后游漂流的过度平衡模型 2506.23619v1

Authors (2): Guillaume Coqueret, Martial Laguerre

This paper investigates the impact of posterior drift on out-of-sample forecasting accuracy in overparametrized machine learning models. We document the loss in performance when the loadings of the data generating process change between the training and testing samples. This matters crucially in settings in which regime changes are likely to occur, for instance, in financial markets. Applied to equity premium forecasting, our results underline the sensitivity of a market timing strategy to sub-periods and to the bandwidth parameters that control the complexity of the model. For the average investor, we find that focusing on holding periods of 15 years can generate very heterogeneous returns, especially for small bandwidths. Large bandwidths yield much more consistent outcomes, but are far less appealing from a risk-adjusted return standpoint. All in all, our findings tend to recommend cautiousness when resorting to large linear models for stock market predictions.

nan


Article 738

Title@2025-06-30 (1): CPT: Competence-progressive Training Strategy for Few-shot Node Classification

Title: CPT: Competence-progressive Training Strategy for Few-shot Node Classification CPT: Kompetenz-fortschrittliche Ausbildungsstrategie für wenige Knotenklassifikation CPT: 微粒节点分类能力渐进培训战略 2402.00450v5

Authors (5): Qilong Yan, Yufeng Zhang, Jinghao Zhang, Jingpu Duan, Jian Yin

Graph Neural Networks (GNNs) have made significant advancements in node classification, but their success relies on sufficient labeled nodes per class in the training data. Real-world graph data often exhibits a long-tail distribution with sparse labels, emphasizing the importance of GNNs’ ability in few-shot node classification, which entails categorizing nodes with limited data. Traditional episodic meta-learning approaches have shown promise in this domain, but they face an inherent limitation: it might lead the model to converge to suboptimal solutions because of random and uniform task assignment, ignoring task difficulty levels. This could lead the meta-learner to face complex tasks too soon, hindering proper learning. Ideally, the meta-learner should start with simple concepts and advance to more complex ones, like human learning. So, we introduce CPT, a novel two-stage curriculum learning method that aligns task difficulty with the meta-learner’s progressive competence, enhancing overall performance. Specifically, in CPT’s initial stage, the focus is on simpler tasks, fostering foundational skills for engaging with complex tasks later. Importantly, the second stage dynamically adjusts task difficulty based on the meta-learner’s growing competence, aiming for optimal knowledge acquisition. Extensive experiments on popular node classification datasets demonstrate significant improvements of our strategy over existing methods.

nan


Article 739

Title@2025-06-30 (1): TinyAlign: Boosting Lightweight Vision-Language Models by Mitigating Modal Alignment Bottlenecks

Title: TinyAlign: Boosting Lightweight Vision-Language Models by Mitigating Modal Alignment Bottlenecks TinyAlign: Lightweight Vision-Language Models durch Abmilderung von Modal Alignment Engpässe fördern 微量对称:通过减缓模式调整瓶头,促进轻轻量度视觉-语言模型 2505.12884v2

Authors (11): Yuanze Hu, Zhaoxin Fan, Xinyu Wang, Gen Li, Ye Qiu, Zhichao Yang, Wenjun Wu, Kejian Wu, Yifan Sun, Xiaotie Deng, Jin Dong

Lightweight Vision-Language Models (VLMs) are indispensable for resource-constrained applications. The prevailing approach to aligning vision and language models involves freezing both the vision encoder and the language model while training small connector modules. However, this strategy heavily depends on the intrinsic capabilities of the language model, which can be suboptimal for lightweight models with limited representational capacity. In this work, we investigate this alignment bottleneck through the lens of mutual information, demonstrating that the constrained capacity of the language model inherently limits the Effective Mutual Information (EMI) between multimodal inputs and outputs, thereby compromising alignment quality. To address this challenge, we propose TinyAlign, a novel framework inspired by Retrieval-Augmented Generation, which strategically retrieves relevant context from a memory bank to enrich multimodal inputs and enhance their alignment. Extensive empirical evaluations reveal that TinyAlign significantly reduces training loss, accelerates convergence, and enhances task performance. Remarkably, it allows models to achieve baseline-level performance with only 40\% of the fine-tuning data, highlighting exceptional data efficiency. Our work thus offers a practical pathway for developing more capable lightweight VLMs while introducing a fresh theoretical lens to better understand and address alignment bottlenecks in constrained multimodal systems.

nan


Article 740

Title@2025-06-30 (1): ChemMiner: A Large Language Model Agent System for Chemical Literature Data Mining

Title: ChemMiner: A Large Language Model Agent System for Chemical Literature Data Mining ChemMiner: Ein großes Sprachmodellsystem für chemische Literatur Data Mining Chemminer:化学文献数据采矿大语言示范剂系统 2402.12993v2

Authors (10): Kexin Chen, Yuyang Du, Junyou Li, Hanqun Cao, Menghao Guo, Xilin Dang, Lanqing Li, Jiezhong Qiu, Pheng Ann Heng, Guangyong Chen

The development of AI-assisted chemical synthesis tools requires comprehensive datasets covering diverse reaction types, yet current high-throughput experimental (HTE) approaches are expensive and limited in scope. Chemical literature represents a vast, underexplored data source containing thousands of reactions published annually. However, extracting reaction information from literature faces significant challenges including varied writing styles, complex coreference relationships, and multimodal information presentation. This paper proposes ChemMiner, a novel end-to-end framework leveraging multiple agents powered by large language models (LLMs) to extract high-fidelity chemical data from literature. ChemMiner incorporates three specialized agents: a text analysis agent for coreference mapping, a multimodal agent for non-textual information extraction, and a synthesis analysis agent for data generation. Furthermore, we developed a comprehensive benchmark with expert-annotated chemical literature to evaluate both extraction efficiency and precision. Experimental results demonstrate reaction identification rates comparable to human chemists while significantly reducing processing time, with high accuracy, recall, and F1 scores. Our open-sourced benchmark facilitates future research in chemical literature data mining.

nan


Article 741

Title@2025-06-30 (1): Challenging Gradient Boosted Decision Trees with Tabular Transformers for Fraud Detection at Booking.com

Title: Challenging Gradient Boosted Decision Trees with Tabular Transformers for Fraud Detection at Booking.com Anspruchsvolle Gradient Boosted Entscheidung Bäume mit Tabular Transformers für Betrug Detection bei Booking.com 与制表变换器一起挑战逐步推动决策树,以便在订书时发现欺诈。 com 2405.13692v2

Authors (6): Sergei Krutikov, Bulat Khaertdinov, Rodion Kiriukhin, Shubham Agrawal, Mozhdeh Ariannezhad, Kees Jan De Vries

Transformer-based neural networks, empowered by Self-Supervised Learning (SSL), have demonstrated unprecedented performance across various domains. However, related literature suggests that tabular Transformers may struggle to outperform classical Machine Learning algorithms, such as Gradient Boosted Decision Trees (GBDT). In this paper, we aim to challenge GBDTs with tabular Transformers on a typical task faced in e-commerce, namely fraud detection. Our study is additionally motivated by the problem of selection bias, often occurring in real-life fraud detection systems. It is caused by the production system affecting which subset of traffic becomes labeled. This issue is typically addressed by sampling randomly a small part of the whole production data, referred to as a Control Group. This subset follows a target distribution of production data and therefore is usually preferred for training classification models with standard ML algorithms. Our methodology leverages the capabilities of Transformers to learn transferable representations using all available data by means of SSL, giving it an advantage over classical methods. Furthermore, we conduct large-scale experiments, pre-training tabular Transformers on vast amounts of data instances and fine-tuning them on smaller target datasets. The proposed approach outperforms heavily tuned GBDTs by a considerable margin of the Average Precision (AP) score in offline evaluations. Finally, we report the results of an online A/B experiment. Experimental results confirm the superiority of tabular Transformers compared to GBDTs in production, demonstrated by a statistically significant improvement in our business metric.

nan


Article 742

Title@2025-06-30 (1): When Will It Fail?: Anomaly to Prompt for Forecasting Future Anomalies in Time Series

Title: When Will It Fail?: Anomaly to Prompt for Forecasting Future Anomalies in Time Series Wann wird es fehlschlagen?: Anomalie zur Prognose zukünftiger Anomalien in der Zeitreihe 何时会失败? :在《时间序列》中对未来异常现象作出预测时不时地迅速作出预测。 2506.23596v1

Authors (4): Min-Yeong Park, Won-Jeong Lee, Seong Tae Kim, Gyeong-Moon Park

Recently, forecasting future abnormal events has emerged as an important scenario to tackle real-world necessities. However, the solution of predicting specific future time points when anomalies will occur, known as Anomaly Prediction (AP), remains under-explored. Existing methods dealing with time series data fail in AP, focusing only on immediate anomalies or failing to provide precise predictions for future anomalies. To address the AP task, we propose a novel framework called Anomaly to Prompt (A2P), comprised of Anomaly-Aware Forecasting (AAF) and Synthetic Anomaly Prompting (SAP). To enable the forecasting model to forecast abnormal time points, we adopt a strategy to learn the relationships of anomalies. For the robust detection of anomalies, our proposed SAP introduces a learnable Anomaly Prompt Pool (APP) that simulates diverse anomaly patterns using signal adaptive prompt. Comprehensive experiments on multiple real-world datasets demonstrate the superiority of A2P over state-of-the-art methods, showcasing its ability to predict future anomalies. Our implementation code is available at https://github.com/KU-VGI/AP.

nan


Article 743

Title@2025-06-30 (1): Transition Matching: Scalable and Flexible Generative Modeling

Title: Transition Matching: Scalable and Flexible Generative Modeling Transition Matching: Skalierbare und flexible Generative Modellierung 过渡匹配:可缩放和灵活生成模型 2506.23589v1

Authors (4): Neta Shaul, Uriel Singer, Itai Gat, Yaron Lipman

Diffusion and flow matching models have significantly advanced media generation, yet their design space is well-explored, somewhat limiting further improvements. Concurrently, autoregressive (AR) models, particularly those generating continuous tokens, have emerged as a promising direction for unifying text and media generation. This paper introduces Transition Matching (TM), a novel discrete-time, continuous-state generative paradigm that unifies and advances both diffusion/flow models and continuous AR generation. TM decomposes complex generation tasks into simpler Markov transitions, allowing for expressive non-deterministic probability transition kernels and arbitrary non-continuous supervision processes, thereby unlocking new flexible design avenues. We explore these choices through three TM variants: (i) Difference Transition Matching (DTM), which generalizes flow matching to discrete-time by directly learning transition probabilities, yielding state-of-the-art image quality and text adherence as well as improved sampling efficiency. (ii) Autoregressive Transition Matching (ARTM) and (iii) Full History Transition Matching (FHTM) are partially and fully causal models, respectively, that generalize continuous AR methods. They achieve continuous causal AR generation quality comparable to non-causal approaches and potentially enable seamless integration with existing AR text generation techniques. Notably, FHTM is the first fully causal model to match or surpass the performance of flow-based methods on text-to-image task in continuous domains. We demonstrate these contributions through a rigorous large-scale comparison of TM variants and relevant baselines, maintaining a fixed architecture, training data, and hyperparameters.

nan


Article 744

Title@2025-06-30 (1): Detect \& Score: Privacy-Preserving Misbehaviour Detection and Contribution Evaluation in Federated Learning

Title: Detect \& Score: Privacy-Preserving Misbehaviour Detection and Contribution Evaluation in Federated Learning Detect \& Score: Privacy-Preserving Misbehaviour Detection and Contribution Evaluation in Federated Learning * 评分:在联邦学习中保护隐私、错误行为检测和贡献评价 2506.23583v1

Authors (3): Marvin Xhemrishi, Alexandre Graell i Amat, Balázs Pejó

Federated learning with secure aggregation enables private and collaborative learning from decentralised data without leaking sensitive client information. However, secure aggregation also complicates the detection of malicious client behaviour and the evaluation of individual client contributions to the learning. To address these challenges, QI (Pejo et al.) and FedGT (Xhemrishi et al.) were proposed for contribution evaluation (CE) and misbehaviour detection (MD), respectively. QI, however, lacks adequate MD accuracy due to its reliance on the random selection of clients in each training round, while FedGT lacks the CE ability. In this work, we combine the strengths of QI and FedGT to achieve both robust MD and accurate CE. Our experiments demonstrate superior performance compared to using either method independently.

nan


Article 745

Title@2025-06-30 (1): PBCAT: Patch-based composite adversarial training against physically realizable attacks on object detection

Title: PBCAT: Patch-based composite adversarial training against physically realizable attacks on object detection PBCAT: Patch-basiertes komposites gegnerisches Training gegen physikalisch realisierbare Angriffe auf Objekterkennung PBCAT:针对实际可实现的物体探测攻击的补补综合对抗性训练 2506.23581v1

Authors (7): Xiao Li, Yiming Zhu, Yifan Huang, Wei Zhang, Yingzhe He, Jie Shi, Xiaolin Hu

Object detection plays a crucial role in many security-sensitive applications. However, several recent studies have shown that object detectors can be easily fooled by physically realizable attacks, \eg, adversarial patches and recent adversarial textures, which pose realistic and urgent threats. Adversarial Training (AT) has been recognized as the most effective defense against adversarial attacks. While AT has been extensively studied in the $l_\infty$ attack settings on classification models, AT against physically realizable attacks on object detectors has received limited exploration. Early attempts are only performed to defend against adversarial patches, leaving AT against a wider range of physically realizable attacks under-explored. In this work, we consider defending against various physically realizable attacks with a unified AT method. We propose PBCAT, a novel Patch-Based Composite Adversarial Training strategy. PBCAT optimizes the model by incorporating the combination of small-area gradient-guided adversarial patches and imperceptible global adversarial perturbations covering the entire image. With these designs, PBCAT has the potential to defend against not only adversarial patches but also unseen physically realizable attacks such as adversarial textures. Extensive experiments in multiple settings demonstrated that PBCAT significantly improved robustness against various physically realizable attacks over state-of-the-art defense methods. Notably, it improved the detection accuracy by 29.7\% over previous defense methods under one recent adversarial texture attack.

nan


Article 746

Title@2025-06-30 (1): EFRame: Deeper Reasoning via Exploration-Filter-Replay Reinforcement Learning Framework

Title: EFRame: Deeper Reasoning via Exploration-Filter-Replay Reinforcement Learning Framework EFRame: Deeper Reasoning über Exploration-Filter-Replay Stärkungs-Lernrahmen EFRame:通过探索-钻孔-回放-强化学习框架,更深的原因 2506.22200v2

Authors (8): Chen Wang, Lai Wei, Yanzhi Zhang, Chenyang Shao, Zedong Dan, Weiran Huang, Yue Wang, Yuzhi Zhang

Recent advances in reinforcement learning (RL) have significantly enhanced the reasoning capabilities of large language models (LLMs). Group Relative Policy Optimization (GRPO), an efficient variant of PPO that lowers RL’s computational cost, still faces limited exploration, low sample efficiency and instability, constraining its performance on complex reasoning tasks. To address these limitations, we introduce EFRame, an Exploration-Filtering-Replay framework that systematically augments GRPO along three critical dimensions. EFRame performs additional rollouts to explore high-quality trajectories, applies online filtering to eliminate low-quality samples that introduce noise and variance, and leverages experience replay to repeatedly exploit rare but informative samples. EFRame establishes a complete and stable learning cycle, guiding the model through a structured transition from exploration to convergence. Our experiments across a variety of reasoning benchmarks demonstrate that EFRame not only improves the robustness and efficiency of training, but also enables access to deeper reasoning capabilities that remain unattainable under vanilla GRPO. Furthermore, EFRame enables a more fine-grained categorization of training samples, allowing for a deeper analysis of how different types of samples contribute to the learning process in RL. Our code is available at https://github.com/597358816/EFRame.

nan


Article 747

Title@2025-06-30 (1): Online Human Action Detection during Escorting

Title: Online Human Action Detection during Escorting Online-Erkennung von Menschen während der Eskortierung 护送期间的在线人类行动探测 2506.23573v1

Authors (3): Siddhartha Mondal, Avik Mitra, Chayan Sarkar

The deployment of robot assistants in large indoor spaces has seen significant growth, with escorting tasks becoming a key application. However, most current escorting robots primarily rely on navigation-focused strategies, assuming that the person being escorted will follow without issue. In crowded environments, this assumption often falls short, as individuals may struggle to keep pace, become obstructed, get distracted, or need to stop unexpectedly. As a result, conventional robotic systems are often unable to provide effective escorting services due to their limited understanding of human movement dynamics. To address these challenges, an effective escorting robot must continuously detect and interpret human actions during the escorting process and adjust its movement accordingly. However, there is currently no existing dataset designed specifically for human action detection in the context of escorting. Given that escorting often occurs in crowded environments, where other individuals may enter the robot’s camera view, the robot also needs to identify the specific human it is escorting (the subject) before predicting their actions. Since no existing model performs both person re-identification and action prediction in real-time, we propose a novel neural network architecture that can accomplish both tasks. This enables the robot to adjust its speed dynamically based on the escortee’s movements and seamlessly resume escorting after any disruption. In comparative evaluations against strong baselines, our system demonstrates superior efficiency and effectiveness, showcasing its potential to significantly improve robotic escorting services in complex, real-world scenarios.

nan


Article 748

Title@2025-06-30 (1): Metadata, Wavelet, and Time Aware Diffusion Models for Satellite Image Super Resolution

Title: Metadata, Wavelet, and Time Aware Diffusion Models for Satellite Image Super Resolution Metadaten, Wavelet und Time Aware Diffusion Modelle für Satellitenbild Super Auflösung 元数据、波盘和有时间意识的卫星图像超分辨率传播模型 2506.23566v1

Authors (3): Luigi Sigillo, Renato Giamba, Danilo Comminiello

The acquisition of high-resolution satellite imagery is often constrained by the spatial and temporal limitations of satellite sensors, as well as the high costs associated with frequent observations. These challenges hinder applications such as environmental monitoring, disaster response, and agricultural management, which require fine-grained and high-resolution data. In this paper, we propose MWT-Diff, an innovative framework for satellite image super-resolution (SR) that combines latent diffusion models with wavelet transforms to address these challenges. At the core of the framework is a novel metadata-, wavelet-, and time-aware encoder (MWT-Encoder), which generates embeddings that capture metadata attributes, multi-scale frequency information, and temporal relationships. The embedded feature representations steer the hierarchical diffusion dynamics, through which the model progressively reconstructs high-resolution satellite imagery from low-resolution inputs. This process preserves critical spatial characteristics including textural patterns, boundary discontinuities, and high-frequency spectral components essential for detailed remote sensing analysis. The comparative analysis of MWT-Diff across multiple datasets demonstrated favorable performance compared to recent approaches, as measured by standard perceptual quality metrics including FID and LPIPS.

nan


Article 749

Title@2025-06-30 (1): FedMM-X: A Trustworthy and Interpretable Framework for Federated Multi-Modal Learning in Dynamic Environments

Title: FedMM-X: A Trustworthy and Interpretable Framework for Federated Multi-Modal Learning in Dynamic Environments FedMM-X: Ein vertrauensvolles und interpretierbares Rahmenwerk für das Federated Multi-Modal Learning in dynamischen Umgebungen FEMM-X:动态环境中联邦多模式学习的可信赖和可解释框架 2503.19564v2

Authors (1): Sree Bhargavi Balija

As artificial intelligence systems increasingly operate in Real-world environments, the integration of multi-modal data sources such as vision, language, and audio presents both unprecedented opportunities and critical challenges for achieving trustworthy intelligence. In this paper, we propose a novel framework that unifies federated learning with explainable multi-modal reasoning to ensure trustworthiness in decentralized, dynamic settings. Our approach, called FedMM-X (Federated Multi-Modal Explainable Intelligence), leverages cross-modal consistency checks, client-level interpretability mechanisms, and dynamic trust calibration to address challenges posed by data heterogeneity, modality imbalance, and out-of-distribution generalization. Through rigorous evaluation across federated multi-modal benchmarks involving vision-language tasks, we demonstrate improved performance in both accuracy and interpretability while reducing vulnerabilities to adversarial and spurious correlations. Further, we introduce a novel trust score aggregation method to quantify global model reliability under dynamic client participation. Our findings pave the way toward developing robust, interpretable, and socially responsible AI systems in Real-world environments.

nan


Article 750

Title@2025-06-30 (1): A unified framework on the universal approximation of transformer-type architectures

Title: A unified framework on the universal approximation of transformer-type architectures Ein einheitlicher Rahmen für die universelle Annäherung von Architekturen vom Typ Transformator 关于变压器类型结构普遍近似的统一框架 2506.23551v1

Authors (4): Jingpu Cheng, Qianxiao Li, Ting Lin, Zuowei Shen

We investigate the universal approximation property (UAP) of transformer-type architectures, providing a unified theoretical framework that extends prior results on residual networks to models incorporating attention mechanisms. Our work identifies token distinguishability as a fundamental requirement for UAP and introduces a general sufficient condition that applies to a broad class of architectures. Leveraging an analyticity assumption on the attention layer, we can significantly simplify the verification of this condition, providing a non-constructive approach in establishing UAP for such architectures. We demonstrate the applicability of our framework by proving UAP for transformers with various attention mechanisms, including kernel-based and sparse attention mechanisms. The corollaries of our results either generalize prior works or establish UAP for architectures not previously covered. Furthermore, our framework offers a principled foundation for designing novel transformer architectures with inherent UAP guarantees, including those with specific functional symmetries. We propose examples to illustrate these insights.

nan


Article 751

Title@2025-06-30 (1): Seeding neural network quantum states with tensor network states

Title: Seeding neural network quantum states with tensor network states Neurale Netzwerk-Quantenzustände mit Tensor-Netzwerkzuständen absähen 种子神经网络量量度状态与 ARW 网络状态 2506.23550v1

Authors (2): Ryui Kaneko, Shimpei Goto

We find an efficient approach to approximately convert matrix product states (MPSs) into restricted Boltzmann machine wave functions consisting of a multinomial hidden unit through a canonical polyadic (CP) decomposition of the MPSs. This method allows us to generate well-behaved initial neural network quantum states for quantum many-body ground-state calculations in polynomial time of the number of variational parameters and systematically shorten the distance between the initial states and the ground states with increasing the rank of the CP decomposition. We demonstrate the efficiency of our method by taking the transverse-field Ising model as an example and discuss possible applications of our method to more general quantum many-body systems in which the ground-state wave functions possess complex nodal structures.

nan


Article 752

Title@2025-06-30 (1): From Alignment to Advancement: Bootstrapping Audio-Language Alignment with Synthetic Data

Title: From Alignment to Advancement: Bootstrapping Audio-Language Alignment with Synthetic Data Von der Ausrichtung zur Weiterentwicklung: Bootstrapping Audio-Language Alignment mit synthetischen Daten 从对齐到推进: 用合成数据推动音频语言对齐 2505.20166v2

Authors (2): Chun-Yi Kuan, Hung-yi Lee

Audio-aware large language models (ALLMs) have recently made great strides in understanding and processing audio inputs. These models are typically adapted from text-based large language models (LLMs) through additional training on audio-related tasks. However, this adaptation process presents two major limitations. First, ALLMs often suffer from catastrophic forgetting, where crucial textual capabilities like instruction-following are lost after training on audio data. In some cases, models may even hallucinate sounds that are not present in the input audio, raising concerns about reliability. Second, achieving cross-modal alignment between audio and language typically relies on large collections of task-specific question-answer pairs for instruction tuning, making it resource-intensive. To address these issues, previous works have leveraged the backbone LLMs to synthesize general-purpose, caption-style alignment data. In this paper, we propose a data generation framework that produces contrastive-like training data, designed to enhance ALLMs’ ability to differentiate between present and absent sounds. We further extend our approach to multi-audio scenarios, enabling the model to either explain differences between audio inputs or produce unified captions that describe all inputs, thereby enhancing audio-language alignment. We refer to the entire ALLM training framework as bootstrapping audio-language alignment via synthetic data generation from backbone LLMs (BALSa). Experimental results indicate that our method effectively mitigates audio hallucinations while reliably maintaining strong performance on audio understanding and reasoning benchmarks, as well as instruction-following skills. Moreover, incorporating multi-audio training further enhances the model’s comprehension and reasoning capabilities. Overall, BALSa offers an efficient and scalable approach to developing ALLMs.

nan


Article 753

Title@2025-06-30 (1): RegionGCN: Spatial-Heterogeneity-Aware Graph Convolutional Networks

Title: RegionGCN: Spatial-Heterogeneity-Aware Graph Convolutional Networks RegionGCN: Geo-Heterogenität-Bewusst Graph Convolutional Networks 区域GCN:空间-异异异性-认知图图集网络 2501.17599v2

Authors (6): Hao Guo, Han Wang, Di Zhu, Lun Wu, A. Stewart Fotheringham, Yu Liu

Modeling spatial heterogeneity in the data generation process is essential for understanding and predicting geographical phenomena. Despite their prevalence in geospatial tasks, neural network models usually assume spatial stationarity, which could limit their performance in the presence of spatial process heterogeneity. By allowing model parameters to vary over space, several approaches have been proposed to incorporate spatial heterogeneity into neural networks. However, current geographically weighting approaches are ineffective on graph neural networks, yielding no significant improvement in prediction accuracy. We assume the crux lies in the over-fitting risk brought by a large number of local parameters. Accordingly, we propose to model spatial process heterogeneity at the regional level rather than at the individual level, which largely reduces the number of spatially varying parameters. We further develop a heuristic optimization procedure to learn the region partition adaptively in the process of model training. Our proposed spatial-heterogeneity-aware graph convolutional network, named RegionGCN, is applied to the spatial prediction of county-level vote share in the 2016 US presidential election based on socioeconomic attributes. Results show that RegionGCN achieves significant improvement over the basic and geographically weighted GCNs. We also offer an exploratory analysis tool for the spatial variation of non-linear relationships through ensemble learning of regional partitions from RegionGCN. Our work contributes to the practice of Geospatial Artificial Intelligence (GeoAI) in tackling spatial heterogeneity.

nan


Article 754

Title@2025-06-30 (1): CooT: Learning to Coordinate In-Context with Coordination Transformers

Title: CooT: Learning to Coordinate In-Context with Coordination Transformers CooT: In-Kontext-Koordinierung mit Koordinationstransformatoren lernen CoT:学会与协调变换者协调内流 2506.23549v1

Authors (5): Huai-Chih Wang, Hsiang-Chun Chuang, Hsi-Chun Cheng, Dai-Jie Wu, Shao-Hua Sun

Effective coordination among artificial agents in dynamic and uncertain environments remains a significant challenge in multi-agent systems. Existing approaches, such as self-play and population-based methods, either generalize poorly to unseen partners or require extensive training. To overcome these limitations, we propose Coordination Transformers (CooT), a novel in-context coordination framework that uses recent interaction histories to adapt to unseen partners rapidly. Unlike previous approaches that primarily aim to increase the diversity of training partners, CooT explicitly focuses on adapting to new partner behaviors by predicting actions aligned with observed partner interactions. Trained on interaction trajectories collected from diverse pairs of agents with complementary behaviors, CooT quickly learns effective coordination strategies without explicit supervision or fine-tuning. Evaluations on the Overcooked benchmark demonstrate that CooT significantly outperforms baseline methods in coordination tasks involving previously unseen partners. Human evaluations further confirm CooT as the most effective collaborative partner, while extensive ablations highlight its robustness, flexibility, and sensitivity to context in multi-agent scenarios.

nan


Article 755

Title@2025-06-30 (1): PipeOffload: Improving Scalability of Pipeline Parallelism with Memory Optimization

Title: PipeOffload: Improving Scalability of Pipeline Parallelism with Memory Optimization PipeOffload: Verbesserung der Skalierbarkeit von Pipeline Parallelismus mit Speicheroptimierung 管道卸载: 提高管道平行式与内存优化的可缩放性 2503.01328v2

Authors (5): Xinyi Wan, Penghui Qi, Guangxing Huang, Min Lin, Jialin Li

Pipeline parallelism (PP) is widely used for training large language models (LLMs), yet its scalability is often constrained by high activation memory consumption as the number of in-flight microbatches grows with the degree of PP. In this paper, we focus on addressing this challenge by leveraging the under-explored memory offload strategy in PP. With empirical study, we discover that in the majority of standard configurations, at least half, and potentially all, of the activations can be offloaded with negligible overhead. In the cases where full overload is not possible, we introduce a novel selective offload strategy that decreases peak activation memory in a better-than-linear manner. Furthermore, we integrate memory offload with other techniques to jointly consider overall throughput and memory limitation. Our experiments proves that the per-device activation memory effectively reduces with the total number of stages, making PP a stronger alternative than TP, offering up to a 19\% acceleration with even lower memory consumption. The implementation is open-sourced at \href{https://github.com/sail-sg/zero-bubble-pipeline-parallelism}{this url}.

nan


Article 756

Title@2025-06-30 (1): Neural Langevin Machine: a local asymmetric learning rule can be creative

Title: Neural Langevin Machine: a local asymmetric learning rule can be creative Neural Langevin Machine: Eine lokale asymmetrische Lernregel kann kreativ sein 神经机:当地不对称学习规则可以是创造性的 2506.23546v1

Authors (3): Zhendong Yu, Weizhong Huang, Haiping Huang

Fixed points of recurrent neural networks can be leveraged to store and generate information. These fixed points can be captured by the Boltzmann-Gibbs measure, which leads to neural Langevin dynamics that can be used for sampling and learning a real dataset. We call this type of generative model neural Langevin machine, which is interpretable due to its analytic form of distribution and is simple to train. Moreover, the learning process is derived as a local asymmetric plasticity rule, bearing biological relevance. Therefore, one can realize a continuous sampling of creative dynamics in a neural network, mimicking an imagination process in brain circuits. This neural Langevin machine may be another promising generative model, at least in its strength in circuit-based sampling and biologically plausible learning rule.

nan


Article 757

Title@2025-06-30 (1): A Joint Topology-Data Fusion Graph Network for Robust Traffic Speed Prediction with Data Anomalism

Title: A Joint Topology-Data Fusion Graph Network for Robust Traffic Speed Prediction with Data Anomalism Ein gemeinsames Topologie-Daten-Fusions-Graph-Netzwerk für robuste Verkehrsgeschwindigkeitsvorhersage mit Datenanomalismus 具有数据Anomalisis的强力交通速度预测联合地形学-数据融合图图网络 2507.00085v1

Authors (6): Ruiyuan Jiang, Dongyao Jia, Eng Gee Lim, Pengfei Fan, Yuli Zhang, Shangbo Wang

Accurate traffic prediction is essential for Intelligent Transportation Systems (ITS), yet current methods struggle with the inherent complexity and non-linearity of traffic dynamics, making it difficult to integrate spatial and temporal characteristics. Furthermore, existing approaches use static techniques to address non-stationary and anomalous historical data, which limits adaptability and undermines data smoothing. To overcome these challenges, we propose the Graph Fusion Enhanced Network (GFEN), an innovative framework for network-level traffic speed prediction. GFEN introduces a novel topological spatiotemporal graph fusion technique that meticulously extracts and merges spatial and temporal correlations from both data distribution and network topology using trainable methods, enabling the modeling of multi-scale spatiotemporal features. Additionally, GFEN employs a hybrid methodology combining a k-th order difference-based mathematical framework with an attention-based deep learning structure to adaptively smooth historical observations and dynamically mitigate data anomalies and non-stationarity. Extensive experiments demonstrate that GFEN surpasses state-of-the-art methods by approximately 6.3% in prediction accuracy and exhibits convergence rates nearly twice as fast as recent hybrid models, confirming its superior performance and potential to significantly enhance traffic prediction system efficiency.

nan


Article 758

Title@2025-06-30 (1): MoORE: SVD-based Model MoE-ization for Conflict- and Oblivion-Resistant Multi-Task Adaptation

Title: MoORE: SVD-based Model MoE-ization for Conflict- and Oblivion-Resistant Multi-Task Adaptation MoORE: SVD-basierte Modell-MoE-ization für Konflikt- und Vergessenheits-Resistenz-Multi-Task-Anpassung MoORE: 以SVD为基础的冲突与遗忘-恢复-远程多任务适应示范MoE化模式 2506.14436v3

Authors (5): Shen Yuan, Yin Zheng, Taifeng Wang, Binbin Liu, Hongteng Xu

Adapting large-scale foundation models in multi-task scenarios often suffers from task conflict and oblivion. To mitigate such issues, we propose a novel ‘‘model MoE-ization’’ strategy that leads to a conflict- and oblivion-resistant multi-task adaptation method. Given a weight matrix of a pre-trained model, our method applies SVD to it and introduces a learnable router to adjust its singular values based on tasks and samples. Accordingly, the weight matrix becomes a Mixture of Orthogonal Rank-one Experts (MoORE), in which each expert corresponds to the outer product of a left singular vector and the corresponding right one. We can improve the model capacity by imposing a learnable orthogonal transform on the right singular vectors. Unlike low-rank adaptation (LoRA) and its MoE-driven variants, MoORE guarantees the experts’ orthogonality and maintains the column space of the original weight matrix. These two properties make the adapted model resistant to the conflicts among the new tasks and the oblivion of its original tasks, respectively. Experiments on various datasets demonstrate that MoORE outperforms existing multi-task adaptation methods consistently, showing its superiority in terms of conflict- and oblivion-resistance. The code of the experiments is available at https://github.com/DaShenZi721/MoORE.

nan


Article 759

Title@2025-06-30 (1): Uncertainty-aware Diffusion and Reinforcement Learning for Joint Plane Localization and Anomaly Diagnosis in 3D Ultrasound

Title: Uncertainty-aware Diffusion and Reinforcement Learning for Joint Plane Localization and Anomaly Diagnosis in 3D Ultrasound Unsicheres Diffusions- und Verstärkungslernen für Joint Plane Localization und Anomaliediagnose im 3D-Ultraschall 3D 超声波中用于联合规划地方化和异常诊断的不确定-有意识的传播和强化学习 2506.23538v1

Authors (7): Yuhao Huang, Yueyue Xu, Haoran Dou, Jiaxiao Deng, Xin Yang, Hongyu Zheng, Dong Ni

Congenital uterine anomalies (CUAs) can lead to infertility, miscarriage, preterm birth, and an increased risk of pregnancy complications. Compared to traditional 2D ultrasound (US), 3D US can reconstruct the coronal plane, providing a clear visualization of the uterine morphology for assessing CUAs accurately. In this paper, we propose an intelligent system for simultaneous automated plane localization and CUA diagnosis. Our highlights are: 1) we develop a denoising diffusion model with local (plane) and global (volume/text) guidance, using an adaptive weighting strategy to optimize attention allocation to different conditions; 2) we introduce a reinforcement learning-based framework with unsupervised rewards to extract the key slice summary from redundant sequences, fully integrating information across multiple planes to reduce learning difficulty; 3) we provide text-driven uncertainty modeling for coarse prediction, and leverage it to adjust the classification probability for overall performance improvement. Extensive experiments on a large 3D uterine US dataset show the efficacy of our method, in terms of plane localization and CUA diagnosis. Code is available at https://github.com/yuhoo0302/CUA-US.

nan


Article 760

Title@2025-06-30 (1): A general language model for peptide identification

Title: A general language model for peptide identification Ein allgemeines Sprachmodell für die Peptididentifikation 铅化物识别通用语言模式 2502.15610v3

Authors (8): Jixiu Zhai, Tianchi Lu, Haitian Zhong, Ziyang Xu, Yuhuan Liu, Shengrui Xu, Jingwan Wang, Dan Huang

Accurate identification of bioactive peptides (BPs) and protein post-translational modifications (PTMs) is essential for understanding protein function and advancing therapeutic discovery. However, most computational methods remain limited in their generalizability across diverse peptide functions. Here, we present PDeepPP, a unified deep learning framework that integrates pretrained protein language models with a hybrid transformer-convolutional architecture, enabling robust identification across diverse peptide classes and PTM sites. We curated comprehensive benchmark datasets and implemented strategies to address data imbalance, allowing PDeepPP to systematically extract both global and local sequence features. Through extensive analyses-including dimensionality reduction and comparison studies-PDeepPP demonstrates strong, interpretable peptide representations and achieves state-of-the-art performance in 25 of the 33 biological identification tasks. Notably, PDeepPP attains high accuracy in antimicrobial (0.9726) and phosphorylation site (0.9984) identification, with 99.5% specificity in glycosylation site prediction and substantial reduction in false negatives in antimalarial tasks. By enabling large-scale, accurate peptide analysis, PDeepPP supports biomedical research and the discovery of novel therapeutic targets for disease treatment. All code, datasets, and pretrained models are publicly available via GitHub:https://github.com/fondress/PDeepPP and Hugging Face:https://huggingface.co/fondress/PDeppPP.

nan


Article 761

Title@2025-06-30 (1): GViT: Representing Images as Gaussians for Visual Recognition

Title: GViT: Representing Images as Gaussians for Visual Recognition GViT: Darstellung von Bildern als Gaussianer für die visuelle Erkennung GVVT: 将图像代表为视觉识别高斯人 2506.23532v1

Authors (5): Jefferson Hernandez, Ruozhen He, Guha Balakrishnan, Alexander C. Berg, Vicente Ordonez

We introduce GVIT, a classification framework that abandons conventional pixel or patch grid input representations in favor of a compact set of learnable 2D Gaussians. Each image is encoded as a few hundred Gaussians whose positions, scales, orientations, colors, and opacities are optimized jointly with a ViT classifier trained on top of these representations. We reuse the classifier gradients as constructive guidance, steering the Gaussians toward class-salient regions while a differentiable renderer optimizes an image reconstruction loss. We demonstrate that by 2D Gaussian input representations coupled with our GVIT guidance, using a relatively standard ViT architecture, closely matches the performance of a traditional patch-based ViT, reaching a 76.9% top-1 accuracy on Imagenet-1k using a ViT-B architecture.

nan


Article 762

Title@2025-06-30 (1): When Test-Time Adaptation Meets Self-Supervised Models

Title: When Test-Time Adaptation Meets Self-Supervised Models Wenn Test-Time-Anpassung auf selbstüberwachte Modelle trifft 测试时适应与自控模型相匹配 2506.23529v1

Authors (4): Jisu Han, Jihee Park, Dongyoon Han, Wonjun Hwang

Training on test-time data enables deep learning models to adapt to dynamic environmental changes, enhancing their practical applicability. Online adaptation from source to target domains is promising but it remains highly reliant on the performance of source pretrained model. In this paper, we investigate whether test-time adaptation (TTA) methods can continuously improve models trained via self-supervised learning (SSL) without relying on source pretraining. We introduce a self-supervised TTA protocol after observing that existing TTA approaches struggle when directly applied to self-supervised models with low accuracy on the source domain. Furthermore, we propose a collaborative learning framework that integrates SSL and TTA models, leveraging contrastive learning and knowledge distillation for stepwise representation refinement. We validate our method on diverse self-supervised models, including DINO, MoCo, and iBOT, across TTA benchmarks. Extensive experiments validate the effectiveness of our approach in SSL, showing that it achieves competitive performance even without source pretraining.

nan


Article 763

Title@2025-06-30 (1): RetroInfer: A Vector-Storage Approach for Scalable Long-Context LLM Inference

Title: RetroInfer: A Vector-Storage Approach for Scalable Long-Context LLM Inference RetroInfer: Ein Vector-Storage-Ansatz für skalierbare Long-Context-LLM-Inferenz 变制器:对可缩放的长负负载LLM 推断法采用矢量-压力法 2505.02922v2

Authors (18): Yaoqi Chen, Jinkai Zhang, Baotong Lu, Qianxi Zhang, Chengruidong Zhang, Jingjia Luo, Di Liu, Huiqiang Jiang, Qi Chen, Jing Liu, Bailu Ding, Xiao Yan, Jiawei Jiang, Chen Chen, Mingxing Zhang, Yuqing Yang, Fan Yang, Mao Yang

The growing context lengths of large language models (LLMs) pose significant challenges for efficient inference, primarily due to GPU memory and bandwidth constraints. We present RetroInfer, a novel system that reconceptualizes the key-value (KV) cache as a vector storage system which exploits the inherent attention sparsity to accelerate long-context LLM inference. At its core is the wave index, an Attention-aWare VEctor index that enables efficient and accurate retrieval of critical tokens through techniques such as tripartite attention approximation, accuracy-bounded attention estimation, and segmented clustering. Complementing this is the wave buffer, which coordinates KV cache placement and overlaps computation and data transfer across GPU and CPU to sustain high throughput. Unlike prior sparsity-based methods that struggle with token selection and hardware coordination, RetroInfer delivers robust performance without compromising model accuracy. Experiments on long-context benchmarks show up to 4.5X speedup over full attention within GPU memory limits and up to 10.5X over sparse attention baselines when KV cache is extended to CPU memory, all while preserving full-attention-level accuracy.

nan


Article 764

Title@2025-06-30 (1): Progressive Binarization with Semi-Structured Pruning for LLMs

Title: Progressive Binarization with Semi-Structured Pruning for LLMs Progressive Binarisierung mit halbstrukturiertem Pruning für LLMs LLMM 与半结构冲压半结构 2502.01705v3

Authors (4): Xianglong Yan, Tianao Zhang, Zhiteng Li, Yulun Zhang

Large language models (LLMs) have achieved remarkable progress in natural language processing, but their high computational and memory costs hinder deployment on resource-constrained devices. Binarization, which reduces model weights to 1 bit, is a promising solution for efficient inference. However, binarized LLMs still exhibit redundancy that can be further compressed. Semi-structured pruning offers a favorable trade-off between model performance and hardware efficiency, but naively combining it with binarization often leads to severe performance degradation. To address this, we propose Progressive Binarization with Semi-Structured Pruning (PBS$^2$P), a novel post-training compression framework. We propose Stepwise semi-structured Pruning with Binarization Optimization (SPBO) to jointly reduce pruning and binarization error. Additionally, we develop a Coarse-to-Fine Search (CFS) strategy to more effectively select pruning elements. Extensive experiments across multiple LLM families show that PBS$^2$P consistently outperforms state-of-the-art binary post-training quantization methods in both perplexity and downstream accuracy. The code and models will be available at: https://github.com/XIANGLONGYAN/PBS2P.

nan


Article 765

Title@2025-06-30 (1): Gumiho: A Hybrid Architecture to Prioritize Early Tokens in Speculative Decoding

Title: Gumiho: A Hybrid Architecture to Prioritize Early Tokens in Speculative Decoding Gumiho: Eine hybride Architektur, um frühe Token in spekulativer Dekodierung zu priorisieren Gumiho:在投机下限中优先考虑早期物料的混合结构 2503.10135v2

Authors (7): Jinze Li, Yixing Xu, Haiduo Huang, Xuanwu Yin, Dong Li, Edith C. H. Ngai, Emad Barsoum

Speculative decoding (SPD) aims to accelerate the auto-regressive token generation process of a target Large Language Model (LLM). Some approaches employ a draft model with multiple heads to predict a sequence of future tokens, where each head handles a token in the sequence. The target LLM verifies the predicted sequence and accepts aligned tokens, enabling efficient multi-token generation. However, existing methods assume that all tokens within a sequence are equally important, employing identical head structures and relying on a single-generation paradigm, either serial or parallel. To this end, we theoretically demonstrate that initial tokens in the draft sequence are more important than later ones. Building on this insight, we propose Gumiho, a hybrid model combining serial and parallel heads. Specifically, given the critical importance of early tokens, we employ a sophisticated Transformer architecture for the early draft heads in a serial configuration to improve accuracy. For later tokens, we utilize multiple lightweight MLP heads operating in parallel to enhance efficiency. By allocating more advanced model structures and longer running times to the early heads, Gumiho achieves improved overall performance. The experimental results demonstrate that our method outperforms existing approaches, fully validating its effectiveness.

nan


Article 766

Title@2025-06-30 (1): FedWSQ: Efficient Federated Learning with Weight Standardization and Distribution-Aware Non-Uniform Quantization

Title: FedWSQ: Efficient Federated Learning with Weight Standardization and Distribution-Aware Non-Uniform Quantization FedWSQ: Effizientes Federated Learning mit Gewichtsstandardisierung und distributionssicherer, nicht-einheitlicher Quantisierung FFWSQ: 节能的联邦学习,重标准化和发行软件非统一量化 2506.23516v1

Authors (5): Seung-Wook Kim, Seongyeol Kim, Jiah Kim, Seowon Ji, Se-Ho Lee

Federated learning (FL) often suffers from performance degradation due to key challenges such as data heterogeneity and communication constraints. To address these limitations, we present a novel FL framework called FedWSQ, which integrates weight standardization (WS) and the proposed distribution-aware non-uniform quantization (DANUQ). WS enhances FL performance by filtering out biased components in local updates during training, thereby improving the robustness of the model against data heterogeneity and unstable client participation. In addition, DANUQ minimizes quantization errors by leveraging the statistical properties of local model updates. As a result, FedWSQ significantly reduces communication overhead while maintaining superior model accuracy. Extensive experiments on FL benchmark datasets demonstrate that FedWSQ consistently outperforms existing FL methods across various challenging FL settings, including extreme data heterogeneity and ultra-low-bit communication scenarios.

nan


Article 767

Title@2025-06-30 (1): The Hitchhiker’s Guide to Efficient, End-to-End, and Tight DP Auditing

Title: The Hitchhiker’s Guide to Efficient, End-to-End, and Tight DP Auditing Der Hitchhiker-Leitfaden für effizientes, Ende-zu-Ende und enges DP-Auditing Hitchhiker的《高效、最终到最终和严格DP审计指南》 2506.16666v2

Authors (5): Meenatchi Sundaram Muthu Selva Annamalai, Borja Balle, Jamie Hayes, Georgios Kaissis, Emiliano De Cristofaro

This paper systematizes research on auditing Differential Privacy (DP) techniques, aiming to identify key insights into the current state of the art and open challenges. First, we introduce a comprehensive framework for reviewing work in the field and establish three cross-contextual desiderata that DP audits should target–namely, efficiency, end-to-end-ness, and tightness. Then, we systematize the modes of operation of state-of-the-art DP auditing techniques, including threat models, attacks, and evaluation functions. This allows us to highlight key details overlooked by prior work, analyze the limiting factors to achieving the three desiderata, and identify open research problems. Overall, our work provides a reusable and systematic methodology geared to assess progress in the field and identify friction points and future directions for our community to focus on.

nan


Article 768

Title@2025-06-30 (1): Achieving binary weight and activation for LLMs using Post-Training Quantization

Title: Achieving binary weight and activation for LLMs using Post-Training Quantization Erreichen des binären Gewichts und Aktivierung für LLMs mit Post-Training Quantization 利用培训后量化办法使LLMMs实现二进制加权和激活 2504.05352v3

Authors (5): Siqing Song, Chuang Wang, Ruiqi Wang, Yi Yang, Xu-Yao Zhang

Quantizing large language models (LLMs) to 1-bit precision significantly reduces computational costs, but existing quantization techniques suffer from noticeable performance degradation when using weight and activation precisions below 4 bits (W4A4). In this paper, we propose a post-training quantization framework with W(1+1)A(1*4) configuration, where weights are quantized to 1 bit with an additional 1 bit for fine-grain grouping and activations are quantized to 1 bit with a 4-fold increase in the number of channels. For weight quantization, we propose utilizing Hessian-aware fine-grained grouping along with an EM-based quantization scheme. For activation quantization, we decompose INT4-quantized activations into a 4 * INT1 format equivalently and simultaneously smooth the scaling factors based on quantization errors, which further reduces the quantization errors in activations. Our method surpasses state-of-the-art (SOTA) LLM quantization baselines on W2A4 across multiple tasks, pushing the boundaries of existing LLM quantization methods toward fully binarized models. Code is available at https://github.com/JimmyCrave/LLM-PTQ-binarization.

nan


Article 769

Title@2025-06-30 (1): Strategic Counterfactual Modeling of Deep-Target Airstrike Systems via Intervention-Aware Spatio-Causal Graph Networks

Title: Strategic Counterfactual Modeling of Deep-Target Airstrike Systems via Intervention-Aware Spatio-Causal Graph Networks Strategische gegenfaktische Modellierung von Deep-Target Airstrike-Systemen über interventionsfähige Spatio-Causal Graph Networks 通过干预-ware Spatio-Causal图表网络对深目标空袭系统进行战略反事实模拟 2507.00083v1

Authors (1): Wei Meng

This study addresses the lack of structured causal modeling between tactical strike behavior and strategic delay in current strategic-level simulations, particularly the structural bottlenecks in capturing intermediate variables within the “resilience - nodal suppression - negotiation window” chain. We propose the Intervention-Aware Spatio-Temporal Graph Neural Network (IA-STGNN), a novel framework that closes the causal loop from tactical input to strategic delay output. The model integrates graph attention mechanisms, counterfactual simulation units, and spatial intervention node reconstruction to enable dynamic simulations of strike configurations and synchronization strategies. Training data are generated from a multi-physics simulation platform (GEANT4 + COMSOL) under NIST SP 800-160 standards, ensuring structural traceability and policy-level validation. Experimental results demonstrate that IA-STGNN significantly outperforms baseline models (ST-GNN, GCN-LSTM, XGBoost), achieving a 12.8 percent reduction in MAE and 18.4 percent increase in Top-5 percent accuracy, while improving causal path consistency and intervention stability. IA-STGNN enables interpretable prediction of strategic delay and supports applications such as nuclear deterrence simulation, diplomatic window assessment, and multi-strategy optimization, providing a structured and transparent AI decision-support mechanism for high-level policy modeling.

nan


Article 770

Title@2025-06-30 (1): Neural Canonical Polyadic Factorization for Traffic Analysis

Title: Neural Canonical Polyadic Factorization for Traffic Analysis Neurale kanonischen Polyadische Faktorisierung für die Verkehrsanalyse 用于交通分析的神经卡纳尼基多功能因子化 2506.15079v2

Authors (2): Yikai Hou, Peng Tang

Modern intelligent transportation systems rely on accurate spatiotemporal traffic analysis to optimize urban mobility and infrastructure resilience. However, pervasive missing data caused by sensor failures and heterogeneous sensing gaps fundamentally hinders reliable traffic modeling. This paper proposes a Neural Canonical Polyadic Factorization (NCPF) model that synergizes low-rank tensor algebra with deep representation learning for robust traffic data imputation. The model innovatively embeds CP decomposition into neural architecture through learnable embedding projections, where sparse traffic tensors are encoded into dense latent factors across road segments, time intervals, and mobility metrics. A hierarchical feature fusion mechanism employs Hadamard products to explicitly model multilinear interactions, while stacked multilayer perceptron layers nonlinearly refine these representations to capture complex spatiotemporal couplings. Extensive evaluations on six urban traffic datasets demonstrate NCPF’s superiority over six state-of-the-art baselines. By unifying CP decomposition’s interpretable factor analysis with neural network’s nonlinear expressive power, NCPF provides a principled yet flexible approaches for high-dimensional traffic data imputation, offering critical support for next-generation transportation digital twins and adaptive traffic control systems.

nan


Article 771

Title@2025-06-30 (1): Efficient Online Reinforcement Learning for Diffusion Policy

Title: Efficient Online Reinforcement Learning for Diffusion Policy Effizientes Online-Verstärkungslernen für die Diffusionspolitik 高效在线强化学习促进传播政策 2502.00361v4

Authors (5): Haitong Ma, Tianyi Chen, Kai Wang, Na Li, Bo Dai

Diffusion policies have achieved superior performance in imitation learning and offline reinforcement learning (RL) due to their rich expressiveness. However, the conventional diffusion training procedure requires samples from target distribution, which is impossible in online RL since we cannot sample from the optimal policy. Backpropagating policy gradient through the diffusion process incurs huge computational costs and instability, thus being expensive and not scalable. To enable efficient training of diffusion policies in online RL, we generalize the conventional denoising score matching by reweighting the loss function. The resulting Reweighted Score Matching (RSM) preserves the optimal solution and low computational cost of denoising score matching, while eliminating the need to sample from the target distribution and allowing learning to optimize value functions. We introduce two tractable reweighted loss functions to solve two commonly used policy optimization problems, policy mirror descent and max-entropy policy, resulting in two practical algorithms named Diffusion Policy Mirror Descent (DPMD) and Soft Diffusion Actor-Critic (SDAC). We conducted comprehensive comparisons on MuJoCo benchmarks. The empirical results show that the proposed algorithms outperform recent diffusion-policy online RLs on most tasks, and the DPMD improves more than 120% over soft actor-critic on Humanoid and Ant.

nan


Article 772

Title@2025-06-30 (1): Understanding and Reducing the Class-Dependent Effects of Data Augmentation with A Two-Player Game Approach

Title: Understanding and Reducing the Class-Dependent Effects of Data Augmentation with A Two-Player Game Approach Verständnis und Reduzierung der klassenabhängigen Effekte von Datenvergrößerung mit einem Zwei-Spieler-Spiel-Ansatz 理解和减少数据递增的二级依赖影响,采用双层游戏方法 2407.03146v5

Authors (3): Yunpeng Jiang, Yutong Ban, Paul Weng

Data augmentation is widely applied and has shown its benefits in different machine learning tasks. However, as recently observed, it may have an unfair effect in multi-class classification. While data augmentation generally improves the overall performance (and therefore is beneficial for many classes), it can actually be detrimental for other classes, which can be problematic in some application domains. In this paper, to counteract this phenomenon, we propose CLAM, a CLAss-dependent Multiplicative-weights method. To derive it, we first formulate the training of a classifier as a non-linear optimization problem that aims at simultaneously maximizing the individual class performances and balancing them. By rewriting this optimization problem as an adversarial two-player game, we propose a novel multiplicative weight algorithm, for which we prove the convergence. Interestingly, our formulation also reveals that the class-dependent effects of data augmentation is not due to data augmentation only, but is in fact a general phenomenon. Our empirical results over six datasets demonstrate that the performance of learned classifiers is indeed more fairly distributed over classes, with only limited impact on the average accuracy.

nan


Article 773

Title@2025-06-30 (1): Deep Unlearn: Benchmarking Machine Unlearning for Image Classification

Title: Deep Unlearn: Benchmarking Machine Unlearning for Image Classification Deep Unlearn: Benchmarking Machine Unlearning für die Bildklassifizierung 深未读:为图像分类确定基准的机器脱学 2410.01276v2

Authors (5): Xavier F. Cadet, Anastasia Borovykh, Mohammad Malekzadeh, Sara Ahmadi-Abhari, Hamed Haddadi

Machine unlearning (MU) aims to remove the influence of particular data points from the learnable parameters of a trained machine learning model. This is a crucial capability in light of data privacy requirements, trustworthiness, and safety in deployed models. MU is particularly challenging for deep neural networks (DNNs), such as convolutional nets or vision transformers, as such DNNs tend to memorize a notable portion of their training dataset. Nevertheless, the community lacks a rigorous and multifaceted study that looks into the success of MU methods for DNNs. In this paper, we investigate 18 state-of-the-art MU methods across various benchmark datasets and models, with each evaluation conducted over 10 different initializations, a comprehensive evaluation involving MU over 100K models. We show that, with the proper hyperparameters, Masked Small Gradients (MSG) and Convolution Transpose (CT), consistently perform better in terms of model accuracy and run-time efficiency across different models, datasets, and initializations, assessed by population-based membership inference attacks (MIA) and per-sample unlearning likelihood ratio attacks (U-LiRA). Furthermore, our benchmark highlights the fact that comparing a MU method only with commonly used baselines, such as Gradient Ascent (GA) or Successive Random Relabeling (SRL), is inadequate, and we need better baselines like Negative Gradient Plus (NG+) with proper hyperparameter selection.

nan


Article 774

Title@2025-06-30 (1): Sample Margin-Aware Recalibration of Temperature Scaling

Title: Sample Margin-Aware Recalibration of Temperature Scaling Probenmargin-Aware Rekalibrierung der Temperaturskalierung 样本 边距天体调整温度缩放 2506.23492v1

Authors (5): Haolan Guo, Linwei Tao, Haoyang Luo, Minjing Dong, Chang Xu

Recent advances in deep learning have significantly improved predictive accuracy. However, modern neural networks remain systematically overconfident, posing risks for deployment in safety-critical scenarios. Current post-hoc calibration methods face a fundamental dilemma: global approaches like Temperature Scaling apply uniform adjustments across all samples, introducing high bias despite computational efficiency, while more expressive methods that operate on full logit distributions suffer from high variance due to noisy high-dimensional inputs and insufficient validation data. To address these challenges, we propose Sample Margin-Aware Recalibration of Temperature (SMART), a lightweight, data-efficient recalibration method that precisely scales logits based on the margin between the top two logits – termed the logit gap. Specifically, the logit gap serves as a denoised, scalar signal directly tied to decision boundary uncertainty, providing a robust indicator that avoids the noise inherent in high-dimensional logit spaces while preserving model prediction invariance. Meanwhile, SMART employs a novel soft-binned Expected Calibration Error (SoftECE) objective that balances model bias and variance through adaptive binning, enabling stable parameter updates even with extremely limited calibration data. Extensive evaluations across diverse datasets and architectures demonstrate that SMART achieves state-of-the-art calibration performance even with substantially fewer parameters compared to existing parametric methods, offering a principled, robust, and highly efficient solution for practical uncertainty quantification in neural network predictions. The source code is available at: https://anonymous.4open.science/r/SMART-8B11.

nan


Article 775

Title@2025-06-30 (1): FZOO: Fast Zeroth-Order Optimizer for Fine-Tuning Large Language Models towards Adam-Scale Speed

Title: FZOO: Fast Zeroth-Order Optimizer for Fine-Tuning Large Language Models towards Adam-Scale Speed FZOO: Schneller Zeroth-Order-Optimierer für feintunende große Sprachmodelle in Richtung Adam-Scale-Geschwindigkeit FZOO: 向亚当规模速度的微调大语言模型最佳最佳优选器 2506.09034v2

Authors (7): Sizhe Dang, Yangyang Guo, Yanjun Zhao, Haishan Ye, Xiaodong Zheng, Guang Dai, Ivor Tsang

Fine-tuning large language models (LLMs) often faces GPU memory bottlenecks: the backward pass of first-order optimizers like Adam increases memory usage to more than 10 times the inference level (e.g., 633 GB for OPT-30B). Zeroth-order (ZO) optimizers avoid this cost by estimating gradients only from forward passes, yet existing methods like MeZO usually require many more steps to converge. Can this trade-off between speed and memory in ZO be fundamentally improved? Normalized-SGD demonstrates strong empirical performance with greater memory efficiency than Adam. In light of this, we introduce FZOO, a Fast Zeroth-Order Optimizer toward Adam-Scale Speed. FZOO reduces the total forward passes needed for convergence by employing batched one-sided estimates that adapt step sizes based on the standard deviation of batch losses. It also accelerates per-batch computation through the use of Rademacher random vector perturbations coupled with CUDA’s parallel processing. Extensive experiments on diverse models, including RoBERTa-large, OPT (350M-66B), Phi-2, and Llama3, across 11 tasks validate FZOO’s effectiveness. On average, FZOO outperforms MeZO by 3 percent in accuracy while requiring 3 times fewer forward passes. For RoBERTa-large, FZOO achieves average improvements of 5.6 percent in accuracy and an 18 times reduction in forward passes compared to MeZO, achieving convergence speeds comparable to Adam. We also provide theoretical analysis proving FZOO’s formal equivalence to a normalized-SGD update rule and its convergence guarantees. FZOO integrates smoothly into PEFT techniques, enabling even larger memory savings. Overall, our results make single-GPU, high-speed, full-parameter fine-tuning practical and point toward future work on memory-efficient pre-training.

nan


Article 776

Title@2025-06-30 (1): Test of partial effects for Frechet regression on Bures-Wasserstein manifolds

Title: Test of partial effects for Frechet regression on Bures-Wasserstein manifolds Prüfung der partiellen Wirkungen auf Frechet-Regression auf Bures-Wasserstein-Muffen 测试布雷斯-沃塞尔斯坦元体Frechet回归部分效果 2506.23487v1

Authors (2): Haoshu Xu, Hongzhe Li

We propose a novel test for assessing partial effects in Frechet regression on Bures Wasserstein manifolds. Our approach employs a sample splitting strategy: the first subsample is used to fit the Frechet regression model, yielding estimates of the covariance matrices and their associated optimal transport maps, while the second subsample is used to construct the test statistic. We prove that this statistic converges in distribution to a weighted mixture of chi squared components, where the weights correspond to the eigenvalues of an integral operator defined by an appropriate RKHS kernel. We establish that our procedure achieves the nominal asymptotic size and demonstrate that its worst-case power converges uniformly to one. Through extensive simulations and a real data application, we illustrate the test’s finite-sample accuracy and practical utility.

nan


Article 777

Title@2025-06-30 (1): Federated Learning-Enabled Hybrid Language Models for Communication-Efficient Token Transmission

Title: Federated Learning-Enabled Hybrid Language Models for Communication-Efficient Token Transmission Federated Learning-Enabled Hybrid Language Models für kommunikationseffiziente Token-Übertragung 通信-高效调式传真传播的联邦学习-进进混合语言模式 2507.00082v1

Authors (5): Faranaksadat Solat, Joohyung Lee, Mohamed Seif, Dusit Niyato, H. Vincent Poor

Hybrid Language Models (HLMs) combine the low-latency efficiency of Small Language Models (SLMs) on edge devices with the high accuracy of Large Language Models (LLMs) on centralized servers. Unlike traditional end-to-end LLM inference, HLMs reduce latency and communication by invoking LLMs only when local SLM predictions are uncertain, i.e., when token-level confidence is low or entropy is high. However, ambiguous or low-confidence predictions still require frequent offloading to the LLM, leading to significant communication overhead in bandwidth-constrained settings. To address this, we propose FedHLM, a communication-efficient HLM framework that integrates uncertainty-aware inference with Federated Learning (FL). FedHLM’s key innovation lies in collaboratively learning token-level uncertainty thresholds that govern when LLM assistance is needed. Rather than using static or manually tuned thresholds, FedHLM employs FL to optimize these thresholds in a privacy-preserving, distributed manner. Additionally, it leverages embedding-based token representations for Peer-to-Peer (P2P) resolution, enabling clients to reuse tokens inferred by semantically similar peers without engaging the LLM. We further introduce hierarchical model aggregation: edge servers refine local routing policies through client updates, while cross-cluster coordination aligns global decision boundaries. This layered design captures recurring uncertainty patterns, reducing redundant LLM queries. Experiments on large-scale news classification tasks show that FedHLM reduces LLM transmissions by over 95 percent with negligible accuracy loss, making it well-suited for scalable and efficient edge-AI applications.

nan


Article 778

Title@2025-06-30 (1): Reconciling Attribute and Structural Anomalies for Improved Graph Anomaly Detection

Title: Reconciling Attribute and Structural Anomalies for Improved Graph Anomaly Detection Verknüpfung von Attributen und strukturellen Anomalien für verbesserte Graphenanomalienerkennung 调节特征和结构异常改进图异常探测 2506.23469v1

Authors (7): Chunjing Xiao, Jiahui Lu, Xovee Xu, Fan Zhou, Tianshu Xie, Wei Lu, Lifeng Xu

Graph anomaly detection is critical in domains such as healthcare and economics, where identifying deviations can prevent substantial losses. Existing unsupervised approaches strive to learn a single model capable of detecting both attribute and structural anomalies. However, they confront the tug-of-war problem between two distinct types of anomalies, resulting in suboptimal performance. This work presents TripleAD, a mutual distillation-based triple-channel graph anomaly detection framework. It includes three estimation modules to identify the attribute, structural, and mixed anomalies while mitigating the interference between different types of anomalies. In the first channel, we design a multiscale attribute estimation module to capture extensive node interactions and ameliorate the over-smoothing issue. To better identify structural anomalies, we introduce a link-enhanced structure estimation module in the second channel that facilitates information flow to topologically isolated nodes. The third channel is powered by an attribute-mixed curvature, a new indicator that encapsulates both attribute and structural information for discriminating mixed anomalies. Moreover, a mutual distillation strategy is introduced to encourage communication and collaboration between the three channels. Extensive experiments demonstrate the effectiveness of the proposed TripleAD model against strong baselines.

nan


Article 779

Title@2025-06-30 (1): AdFair-CLIP: Adversarial Fair Contrastive Language-Image Pre-training for Chest X-rays

Title: AdFair-CLIP: Adversarial Fair Contrastive Language-Image Pre-training for Chest X-rays AdFair-CLIP: Adversarial Fair Contractive Language-Image Pre-Training für Röntgenaufnahmen im Brustkorb 阿德法-CLIP:胸前X光检查的公平竞争公平语言图象前训练 2506.23467v1

Authors (8): Chenlang Yi, Zizhan Xiong, Qi Qi, Xiyuan Wei, Girish Bathla, Ching-Long Lin, Bobak Jack Mortazavi, Tianbao Yang

Contrastive Language-Image Pre-training (CLIP) models have demonstrated superior performance across various visual tasks including medical image classification. However, fairness concerns, including demographic biases, have received limited attention for CLIP models. This oversight leads to critical issues, particularly those related to race and gender, resulting in disparities in diagnostic outcomes and reduced reliability for underrepresented groups. To address these challenges, we introduce AdFair-CLIP, a novel framework employing adversarial feature intervention to suppress sensitive attributes, thereby mitigating spurious correlations and improving prediction fairness. We conduct comprehensive experiments on chest X-ray (CXR) datasets, and show that AdFair-CLIP significantly enhances both fairness and diagnostic accuracy, while maintaining robust generalization in zero-shot and few-shot scenarios. These results establish new benchmarks for fairness-aware learning in CLIP-based medical diagnostic models, particularly for CXR analysis.

nan


Article 780

Title@2025-06-30 (1): Time to Rethink AI for Combinatorial Optimization: Classical Algorithms Remain Tough to Match

Title: Time to Rethink AI for Combinatorial Optimization: Classical Algorithms Remain Tough to Match Zeit, KI für die Kombinatorische Optimierung neu zu denken: Klassische Algorithmen bleiben schwer zu passen 重新思考 AI 组合优化的时间: 古老的算法仍然难以匹配 2502.03669v2

Authors (3): Yikai Wu, Haoyu Zhao, Sanjeev Arora

This position paper argues that the machine learning community should fundamentally rethink how AI-inspired methods are developed and evaluated for combinatorial optimization (CO). We present comprehensive empirical benchmarks comparing various recent AI-inspired GPU-based methods with several classical CPU-based solvers on the Maximum Independent Set (MIS) problem. Strikingly, even on in-distribution random graphs, leading AI-inspired methods are consistently outperformed by the state-of-the-art classical solver KaMIS, and some AI-inspired methods frequently fail to surpass even the simplest degree-based greedy heuristic. To better understand the source of these failures, we introduce a novel analysis, serialization, which reveals that non-backtracking AI methods, such as LTFT (based on GFlowNets), end up reasoning similarly to the simplest degree-based greedy heuristic, and thus worse than KaMIS. Our findings reveal three core issues: (1) Limited benchmarks and evaluation - AI-inspired methods are often tested only on small instances with very limited inference time, which covers up issues with scalability and resource usage; (2) Intrinsic hardness and learning limits - even under ideal, in-distribution conditions, learning-based approaches lag behind classical heuristics, highlighting inherent barriers that receive little attention; and (3) Insufficient use and understanding of classical heuristics - current learning frameworks often neglect to incorporate effective classical techniques. Although we use MIS as a testbed, similar gaps and challenges have been reported in other combinatorial optimization problems, suggesting broader relevance for our recommendations. We propose that future research must address these issues by rigorous benchmarking, deepening understanding of learning limitations, and integrating classical heuristics into AI-inspired methods.

nan


Article 781

Title@2025-06-30 (1): TabReason: A Reinforcement Learning-Enhanced Reasoning LLM for Explainable Tabular Data Prediction

Title: TabReason: A Reinforcement Learning-Enhanced Reasoning LLM for Explainable Tabular Data Prediction TabReason: Eine verstärkte Lern-verbesserte Begründung LLM für erklärbare tabellarische Datenvorhersage TabReson: 用于可解释的图表数据预测的强化学习-提高合理理由的强化学习-强化LLMLM 2505.21807v3

Authors (6): Tommy Xu, Zhitian Zhang, Xiangyu Sun, Lauren Kelly Zung, Hossein Hajimirsadeghi, Greg Mori

Predictive modeling on tabular data is the cornerstone of many real-world applications. Although gradient boosting machines and some recent deep models achieve strong performance on tabular data, they often lack interpretability. On the other hand, large language models (LLMs) have demonstrated powerful capabilities to generate human-like reasoning and explanations, but remain under-performed for tabular data prediction. In this paper, we propose a new approach that leverages reasoning-based LLMs, trained using reinforcement learning, to perform more accurate and explainable predictions on tabular data. Our method introduces custom reward functions that guide the model not only toward better prediction accuracy but also toward human-understandable reasons for its predictions. The proposed method is evaluated on financial benchmark datasets and compared against established LLMs.

nan


Article 782

Title@2025-06-30 (1): Can We Predict the Unpredictable? Leveraging DisasterNet-LLM for Multimodal Disaster Classification

Title: Can We Predict the Unpredictable? Leveraging DisasterNet-LLM for Multimodal Disaster Classification Können wir das Unvorhersehbare vorhersagen? Leveraging DisasterNet-LLM für die multimodale Katastrophenklassifikation 我们能够预测难以预测的情况吗?利用灾害网络-LLM来进行多模式灾害分类。 2506.23462v1

Authors (4): Manaswi Kulahara, Gautam Siddharth Kashyap, Nipun Joshi, Arpita Soni

Effective disaster management requires timely and accurate insights, yet traditional methods struggle to integrate multimodal data such as images, weather records, and textual reports. To address this, we propose DisasterNet-LLM, a specialized Large Language Model (LLM) designed for comprehensive disaster analysis. By leveraging advanced pretraining, cross-modal attention mechanisms, and adaptive transformers, DisasterNet-LLM excels in disaster classification. Experimental results demonstrate its superiority over state-of-the-art models, achieving higher accuracy of 89.5%, an F1 score of 88.0%, AUC of 0.92%, and BERTScore of 0.88% in multimodal disaster classification tasks.

nan


Article 783

Title@2025-06-30 (1): Benefits of Early Stopping in Gradient Descent for Overparameterized Logistic Regression

Title: Benefits of Early Stopping in Gradient Descent for Overparameterized Logistic Regression Vorteile des vorzeitigen Stoppens im gradienten Abstieg für überparameterisierte logistische Regression 梯层提前停产对过度量化的后勤倒退的益处 2502.13283v2

Authors (4): Jingfeng Wu, Peter Bartlett, Matus Telgarsky, Bin Yu

In overparameterized logistic regression, gradient descent (GD) iterates diverge in norm while converging in direction to the maximum $\ell_2$-margin solution – a phenomenon known as the implicit bias of GD. This work investigates additional regularization effects induced by early stopping in well-specified high-dimensional logistic regression. We first demonstrate that the excess logistic risk vanishes for early-stopped GD but diverges to infinity for GD iterates at convergence. This suggests that early-stopped GD is well-calibrated, whereas asymptotic GD is statistically inconsistent. Second, we show that to attain a small excess zero-one risk, polynomially many samples are sufficient for early-stopped GD, while exponentially many samples are necessary for any interpolating estimator, including asymptotic GD. This separation underscores the statistical benefits of early stopping in the overparameterized regime. Finally, we establish nonasymptotic bounds on the norm and angular differences between early-stopped GD and $\ell_2$-regularized empirical risk minimizer, thereby connecting the implicit regularization of GD with explicit $\ell_2$-regularization.

nan


Article 784

Title@2025-06-30 (1): Sampling and Identity-Testing Without Approximate Tensorization of Entropy

Title: Sampling and Identity-Testing Without Approximate Tensorization of Entropy Probenahme und Identitätstest ohne annähernde Tensorisierung der Entropie 采样和不近近于孔虫代号进行身份测试 2506.23456v1

Authors (4): William Gay, William He, Nicholas Kocurek, Ryan O’Donnell

Certain tasks in high-dimensional statistics become easier when the underlying distribution satisfies a local-to-global property called approximate tensorization of entropy (ATE). For example, the Glauber dynamics Markov chain of an ATE distribution mixes fast and can produce approximate samples in a small amount of time, since such a distribution satisfies a modified log-Sobolev inequality. Moreover, identity-testing for an ATE distribution requires few samples if the tester is given coordinate conditional access to the unknown distribution, as shown by Blanca, Chen, \v{S}tefankovi\v{c}, and Vigoda (COLT 2023). A natural class of distributions that do not satisfy ATE consists of mixtures of (few) distributions that do satisfy ATE. We study the complexity of identity-testing and sampling for these distributions. Our main results are the following: 1. We show fast mixing of Glauber dynamics from a data-based initialization, with optimal sample complexity, for mixtures of distributions satisfying modified log-Sobolev inequalities. This extends work of Huang, Koehler, Lee, Mohanty, Rajaraman, Vuong, and Wu (STOC 2025, COLT 2025) for mixtures of distributions satisfying Poincar'e inequalities. 2. Answering an open question posed by Blanca et al., we give efficient identity-testers for mixtures of ATE distributions in the coordinate-conditional sampling access model. We also give some simplifications and improvements to the original algorithm of Blanca et al.

nan


Article 785

Title@2025-06-30 (1): Minimax Optimal Two-Stage Algorithm For Moment Estimation Under Covariate Shift

Title: Minimax Optimal Two-Stage Algorithm For Moment Estimation Under Covariate Shift Minimax Optimaler Zwei-Stufen-Algorithmus für Moment-Schätzung unter Kovariate Shift 最小最大最佳双层对调 2506.23453v1

Authors (4): Zhen Zhang, Xin Liu, Shaoli Wang, Jiaye Teng

Covariate shift occurs when the distribution of input features differs between the training and testing phases. In covariate shift, estimating an unknown function’s moment is a classical problem that remains under-explored, despite its common occurrence in real-world scenarios. In this paper, we investigate the minimax lower bound of the problem when the source and target distributions are known. To achieve the minimax optimal bound (up to a logarithmic factor), we propose a two-stage algorithm. Specifically, it first trains an optimal estimator for the function under the source distribution, and then uses a likelihood ratio reweighting procedure to calibrate the moment estimator. In practice, the source and target distributions are typically unknown, and estimating the likelihood ratio may be unstable. To solve this problem, we propose a truncated version of the estimator that ensures double robustness and provide the corresponding upper bound. Extensive numerical studies on synthetic examples confirm our theoretical findings and further illustrate the effectiveness of our proposed method.

nan


Article 786

Title@2025-06-30 (1): Bridge: A Unified Framework to Knowledge Graph Completion via Language Models and Knowledge Representation

Title: Bridge: A Unified Framework to Knowledge Graph Completion via Language Models and Knowledge Representation Brücke: Ein einheitliches Framework zur Wissensgraphenvervollständigung über Sprachmodelle und Wissensdarstellung 桥梁:通过语言模式和知识代表性完成知识图的统一框架 2411.06660v3

Authors (5): Qiao Qiao, Yuepei Li, Qing Wang, Kang Zhou, Qi Li

Knowledge graph completion (KGC) is a task of inferring missing triples based on existing Knowledge Graphs (KGs). Both structural and semantic information are vital for successful KGC. However, existing methods only use either the structural knowledge from the KG embeddings or the semantic information from pre-trained language models (PLMs), leading to suboptimal model performance. Moreover, since PLMs are not trained on KGs, directly using PLMs to encode triples may be inappropriate. To overcome these limitations, we propose a novel framework called Bridge, which jointly encodes structural and semantic information of KGs. Specifically, we strategically encode entities and relations separately by PLMs to better utilize the semantic knowledge of PLMs and enable structured representation learning via a structural learning principle. Furthermore, to bridge the gap between KGs and PLMs, we employ a self-supervised representation learning method called BYOL to fine-tune PLMs with two different views of a triple. Unlike BYOL, which uses augmentation methods to create two semantically similar views of the same image, potentially altering the semantic information. We strategically separate the triple into two parts to create different views, thus avoiding semantic alteration. Experiments demonstrate that Bridge outperforms the SOTA models on three benchmark datasets.

nan


Article 787

Title@2025-06-30 (1): Learning Causally Predictable Outcomes from Psychiatric Longitudinal Data

Title: Learning Causally Predictable Outcomes from Psychiatric Longitudinal Data Erlernen ursächlich vorhersehbarer Ergebnisse aus Psychiatrischen Langzeitdaten 精神病纵向数据产生的可预期的学习结果 2506.16629v2

Authors (1): Eric V. Strobl

Causal inference in longitudinal biomedical data remains a central challenge, especially in psychiatry, where symptom heterogeneity and latent confounding frequently undermine classical estimators. Most existing methods for treatment effect estimation presuppose a fixed outcome variable and address confounding through observed covariate adjustment. However, the assumption of unconfoundedness may not hold for a fixed outcome in practice. To address this foundational limitation, we directly optimize the outcome definition to maximize causal identifiability. Our DEBIAS (Durable Effects with Backdoor-Invariant Aggregated Symptoms) algorithm learns non-negative, clinically interpretable weights for outcome aggregation, maximizing durable treatment effects and empirically minimizing both observed and latent confounding by leveraging the time-limited direct effects of prior treatments in psychiatric longitudinal data. The algorithm also furnishes an empirically verifiable test for outcome unconfoundedness. DEBIAS consistently outperforms state-of-the-art methods in recovering causal effects for clinically interpretable composite outcomes across comprehensive experiments in depression and schizophrenia.

nan


Article 788

Title@2025-06-30 (1): Optimizing Conversational Product Recommendation via Reinforcement Learning

Title: Optimizing Conversational Product Recommendation via Reinforcement Learning Optimierung der Conversational Product Recommendation durch Verstärkungslernen 通过强化学习优化对口产品建议 2507.01060v1

Authors (1): Kang Liu

We propose a reinforcement learning-based approach to optimize conversational strategies for product recommendation across diverse industries. As organizations increasingly adopt intelligent agents to support sales and service operations, the effectiveness of a conversation hinges not only on what is recommended but how and when recommendations are delivered. We explore a methodology where agentic systems learn optimal dialogue policies through feedback-driven reinforcement learning. By mining aggregate behavioral patterns and conversion outcomes, our approach enables agents to refine talk tracks that drive higher engagement and product uptake, while adhering to contextual and regulatory constraints. We outline the conceptual framework, highlight key innovations, and discuss the implications for scalable, personalized recommendation in enterprise environments.

nan


Article 789

Title@2025-06-30 (1): NatureLM-audio: an Audio-Language Foundation Model for Bioacoustics

Title: NatureLM-audio: an Audio-Language Foundation Model for Bioacoustics NatureLM-Audio: ein Audio-Sprachen-Stiftungsmodell für Bioakustik NatureLM-Audio: 听觉语言基金会生物生物学模型 2411.07186v2

Authors (9): David Robinson, Marius Miron, Masato Hagiwara, Benno Weck, Sara Keen, Milad Alizadeh, Gagan Narula, Matthieu Geist, Olivier Pietquin

Large language models (LLMs) prompted with text and audio have achieved state-of-the-art performance across various auditory tasks, including speech, music, and general audio, showing emergent abilities on unseen tasks. However, their potential has yet to be fully demonstrated in bioacoustics tasks, such as detecting animal vocalizations in large recordings, classifying rare and endangered species, and labeling context and behavior – tasks that are crucial for conservation, biodiversity monitoring, and animal behavior studies. In this work, we present NatureLM-audio, the first audio-language foundation model specifically designed for bioacoustics. Our training dataset consists of carefully curated text-audio pairs spanning bioacoustics, speech, and music, designed to address the field’s limited availability of annotated data. We demonstrate successful transfer of learned representations from music and speech to bioacoustics, and our model shows promising generalization to unseen taxa and tasks. We evaluate NatureLM-audio on a novel benchmark (BEANS-Zero) and it sets a new state of the art on several bioacoustics tasks, including zero-shot classification of unseen species. To advance bioacoustics research, we release our model weights, benchmark data, and open-source the code for training and benchmark data generation and model training.

nan


Article 790

Title@2025-06-30 (1): Enhancing Insider Threat Detection Using User-Based Sequencing and Transformer Encoders

Title: Enhancing Insider Threat Detection Using User-Based Sequencing and Transformer Encoders Verbesserung der Erkennung von Insiderbedrohungen mit benutzerbasierten Sequenzierungs- und Transformer-Encodern 利用用户定序和变换器计算器加强内部威胁探测 2506.23446v1

Authors (2): Mohamed Elbasheer, Adewale Akinfaderin

Insider threat detection presents unique challenges due to the authorized status of malicious actors and the subtlety of anomalous behaviors. Existing machine learning methods often treat user activity as isolated events, thereby failing to leverage sequential dependencies in user behavior. In this study, we propose a User-Based Sequencing (UBS) methodology, transforming the CERT insider threat dataset into structured temporal sequences suitable for deep sequential modeling. We deploy a Transformer Encoder architecture to model benign user activity and employ its reconstruction errors as anomaly scores. These scores are subsequently evaluated using three unsupervised outlier detection algorithms: One-Class SVM (OCSVM), Local Outlier Factor (LOF), and Isolation Forest (iForest). Across four rigorously designed test sets, including combinations of multiple CERT dataset releases, our UBS-Transformer pipeline consistently achieves state-of-the-art performance - notably 96.61% accuracy, 99.43% recall, 96.38% F1-score, 95.00% AUROC, and exceptionally low false negative (0.0057) and false positive (0.0571) rates. Comparative analyses demonstrate that our approach substantially outperforms tabular and conventional autoencoder baselines, underscoring the efficacy of sequential user modeling and advanced anomaly detection in the insider threat domain.

nan


Article 791

Title@2025-06-30 (1): Graph Fourier Neural ODEs: Modeling Spatial-temporal Multi-scales in Molecular Dynamics

Title: Graph Fourier Neural ODEs: Modeling Spatial-temporal Multi-scales in Molecular Dynamics Graph Fourier Neural ODEs: Modellierung räumlich-zeitlicher Multiskalen in molekularer Dynamik 图 4级神经数极:分子动态中空间-时空多尺度建模 2411.01600v3

Authors (7): Fang Sun, Zijie Huang, Haixin Wang, Huacong Tang, Xiao Luo, Wei Wang, Yizhou Sun

Accurately predicting long-horizon molecular dynamics (MD) trajectories remains a significant challenge, as existing deep learning methods often struggle to retain fidelity over extended simulations. We hypothesize that one key factor limiting accuracy is the difficulty of capturing interactions that span distinct spatial and temporal scales, ranging from high-frequency local vibrations to low-frequency global conformational changes. To address these limitations, we propose Graph Fourier Neural ODEs (GF-NODE), integrating a graph Fourier transform for spatial frequency decomposition with a Neural ODE framework for continuous-time evolution. Specifically, GF-NODE first decomposes molecular configurations into multiple spatial frequency modes using the graph Laplacian, then evolves the frequency components in time via a learnable Neural ODE module that captures both local and global dynamics, and finally reconstructs the updated molecular geometry through an inverse graph Fourier transform. By explicitly modeling high- and low-frequency phenomena in this unified pipeline, GF-NODE captures long-range correlations and local fluctuations more effectively. We provide theoretical insight through heat equation analysis on a simplified diffusion model, demonstrating how graph Laplacian eigenvalues can determine temporal dynamics scales, and crucially validate this correspondence through comprehensive empirical analysis on real molecular dynamics trajectories showing quantitative spatial-temporal correlations across diverse molecular systems. Experimental results on challenging MD benchmarks demonstrate that GF-NODE achieves state-of-the-art accuracy while preserving essential geometrical features over extended simulations. These findings highlight the promise of bridging spectral decomposition with continuous-time modeling to improve the robustness and predictive power of MD simulations.

nan


Article 792

Title@2025-06-30 (1): CauSkelNet: Causal Representation Learning for Human Behaviour Analysis

Title: CauSkelNet: Causal Representation Learning for Human Behaviour Analysis CauSkelNet: Kausales Repräsentationslernen für die menschliche Verhaltensanalyse CauSkelNet: 人类行为分析的因果关系学习 2409.15564v4

Authors (8): Xingrui Gu, Chuyi Jiang, Erte Wang, Qiang Cui, Leimin Tian, Lianlong Wu, Siyang Song, Chuang Yu

Traditional machine learning methods for movement recognition often struggle with limited model interpretability and a lack of insight into human movement dynamics. This study introduces a novel representation learning framework based on causal inference to address these challenges. Our two-stage approach combines the Peter-Clark (PC) algorithm and Kullback-Leibler (KL) divergence to identify and quantify causal relationships between human joints. By capturing joint interactions, the proposed causal Graph Convolutional Network (GCN) produces interpretable and robust representations. Experimental results on the EmoPain dataset demonstrate that the causal GCN outperforms traditional GCNs in accuracy, F1 score, and recall, particularly in detecting protective behaviors. This work contributes to advancing human motion analysis and lays a foundation for adaptive and intelligent healthcare solutions.

nan


Article 793

Title@2025-06-29 (7): DPOT: A DeepParticle method for Computation of Optimal Transport with convergence guarantee

Title: DPOT: A DeepParticle method for Computation of Optimal Transport with convergence guarantee DPOT: Eine DeepPartikel-Methode zur Berechnung des optimalen Verkehrs mit Konvergenzgarantie DPOT: 具有趋同保证的优化运输计算深颗粒法 2506.23429v1

Authors (3): Yingyuan Li, Aokun Wang, Zhongjian Wang

In this work, we propose a novel machine learning approach to compute the optimal transport map between two continuous distributions from their unpaired samples, based on the DeepParticle methods. The proposed method leads to a min-min optimization during training and does not impose any restriction on the network structure. Theoretically we establish a weak convergence guarantee and a quantitative error bound between the learned map and the optimal transport map. Our numerical experiments validate the theoretical results and the effectiveness of the new approach, particularly on real-world tasks.

nan


Article 794

Title@2025-06-29 (7): Detecting What Matters: A Novel Approach for Out-of-Distribution 3D Object Detection in Autonomous Vehicles

Title: Detecting What Matters: A Novel Approach for Out-of-Distribution 3D Object Detection in Autonomous Vehicles Was zählt: Ein neuartiger Ansatz zur Erkennung von 3D-Objekten außerhalb der Verteilung in autonomen Fahrzeugen 发现什么重要事项:在自主车辆中采用新颖的外分发3D物体探测方法 2506.23426v1

Authors (4): Menna Taha, Aya Ahmed, Mohammed Karmoose, Yasser Gadallah

Autonomous vehicles (AVs) use object detection models to recognize their surroundings and make driving decisions accordingly. Conventional object detection approaches classify objects into known classes, which limits the AV’s ability to detect and appropriately respond to Out-of-Distribution (OOD) objects. This problem is a significant safety concern since the AV may fail to detect objects or misclassify them, which can potentially lead to hazardous situations such as accidents. Consequently, we propose a novel object detection approach that shifts the emphasis from conventional class-based classification to object harmfulness determination. Instead of object detection by their specific class, our method identifies them as either ‘harmful’ or ‘harmless’ based on whether they pose a danger to the AV. This is done based on the object position relative to the AV and its trajectory. With this metric, our model can effectively detect previously unseen objects to enable the AV to make safer real-time decisions. Our results demonstrate that the proposed model effectively detects OOD objects, evaluates their harmfulness, and classifies them accordingly, thus enhancing the AV decision-making effectiveness in dynamic environments.

nan


Article 795

Title@2025-06-29 (7): Accurate Parameter-Efficient Test-Time Adaptation for Time Series Forecasting

Title: Accurate Parameter-Efficient Test-Time Adaptation for Time Series Forecasting Genaue Parameter-Effiziente Test-Zeit-Anpassung für Zeitreihenprognosen 用于时间序列预测的精确参数有效试验时间适应 2506.23424v1

Authors (4): Heitor R. Medeiros, Hossein Sharifi-Noghabi, Gabriel L. Oliveira, Saghar Irandoust

Real-world time series often exhibit a non-stationary nature, degrading the performance of pre-trained forecasting models. Test-Time Adaptation (TTA) addresses this by adjusting models during inference, but existing methods typically update the full model, increasing memory and compute costs. We propose PETSA, a parameter-efficient method that adapts forecasters at test time by only updating small calibration modules on the input and output. PETSA uses low-rank adapters and dynamic gating to adjust representations without retraining. To maintain accuracy despite limited adaptation capacity, we introduce a specialized loss combining three components: (1) a robust term, (2) a frequency-domain term to preserve periodicity, and (3) a patch-wise structural term for structural alignment. PETSA improves the adaptability of various forecasting backbones while requiring fewer parameters than baselines. Experimental results on benchmark datasets show that PETSA achieves competitive or better performance across all horizons. Our code is available at: https://github.com/BorealisAI/PETSA

nan


Article 796

Title@2025-06-29 (7): BenchMake: Turn any scientific data set into a reproducible benchmark

Title: BenchMake: Turn any scientific data set into a reproducible benchmark BenchMake: Verwandeln Sie alle wissenschaftlichen Daten in eine reproduzierbare Benchmark BackMake:将任何科学数据集转换成可复制的基准 2506.23419v1

Authors (1): Amanda S Barnard

Benchmark data sets are a cornerstone of machine learning development and applications, ensuring new methods are robust, reliable and competitive. The relative rarity of benchmark sets in computational science, due to the uniqueness of the problems and the pace of change in the associated domains, makes evaluating new innovations difficult for computational scientists. In this paper a new tool is developed and tested to potentially turn any of the increasing numbers of scientific data sets made openly available into a benchmark accessible to the community. BenchMake uses non-negative matrix factorisation to deterministically identify and isolate challenging edge cases on the convex hull (the smallest convex set that contains all existing data instances) and partitions a required fraction of matched data instances into a testing set that maximises divergence and statistical significance, across tabular, graph, image, signal and textual modalities. BenchMake splits are compared to establish splits and random splits using ten publicly available benchmark sets from different areas of science, with different sizes, shapes, distributions.

nan


Article 797

Title@2025-06-29 (7): LNUCB-TA: Linear-nonlinear Hybrid Bandit Learning with Temporal Attention

Title: LNUCB-TA: Linear-nonlinear Hybrid Bandit Learning with Temporal Attention LNUCB-TA: Linear-nichtlinearer Hybrid Bandit Lernen mit zeitlicher Aufmerksamkeit LNCB-TA:线-非线-非线-非线性混合强盗在时间上注意学习 2503.00387v2

Authors (5): Hamed Khosravi, Mohammad Reza Shafie, Ahmed Shoyeb Raihan, Srinjoy Das, Imtiaz Ahmed

Existing contextual multi-armed bandit (MAB) algorithms fail to effectively capture both long-term trends and local patterns across all arms, leading to suboptimal performance in environments with rapidly changing reward structures. They also rely on static exploration rates, which do not dynamically adjust to changing conditions. To overcome these limitations, we propose LNUCB-TA, a hybrid bandit model integrating a novel nonlinear component (adaptive k-Nearest Neighbors (k-NN)) for reducing time complexity, alongside a global-and-local attention-based exploration mechanism. Our approach uniquely combines linear and nonlinear estimation techniques, with the nonlinear module dynamically adjusting k based on reward variance to enhance spatiotemporal pattern recognition. This reduces the likelihood of selecting suboptimal arms while improving reward estimation accuracy and computational efficiency. The attention-based mechanism ranks arms by past performance and selection frequency, dynamically adjusting exploration and exploitation in real time without requiring manual tuning of exploration rates. By integrating global attention (assessing all arms collectively) and local attention (focusing on individual arms), LNUCB-TA efficiently adapts to temporal and spatial complexities. Empirical results show LNUCB-TA significantly outperforms state-of-the-art linear, nonlinear, and hybrid bandits in cumulative and mean reward, convergence, and robustness across different exploration rates. Theoretical analysis further confirms its reliability with a sub-linear regret bound.

nan


Article 798

Title@2025-06-29 (7): Remove Symmetries to Control Model Expressivity and Improve Optimization

Title: Remove Symmetries to Control Model Expressivity and Improve Optimization Entfernen Sie Symmetrien, um die Expressivität des Modells zu steuern und die Optimierung zu verbessern 删除对称以控制模型表达性并改进优化 2408.15495v4

Authors (3): Liu Ziyin, Yizhou Xu, Isaac Chuang

When symmetry is present in the loss function, the model is likely to be trapped in a low-capacity state that is sometimes known as a “collapse”. Being trapped in these low-capacity states can be a major obstacle to training across many scenarios where deep learning technology is applied. We first prove two concrete mechanisms through which symmetries lead to reduced capacities and ignored features during training and inference. We then propose a simple and theoretically justified algorithm, syre, to remove almost all symmetry-induced low-capacity states in neural networks. When this type of entrapment is especially a concern, removing symmetries with the proposed method is shown to correlate well with improved optimization or performance. A remarkable merit of the proposed method is that it is model-agnostic and does not require any knowledge of the symmetry.

nan


Article 799

Title@2025-06-29 (7): Datasets for Fairness in Language Models: An In-Depth Survey

Title: Datasets for Fairness in Language Models: An In-Depth Survey Datensätze für Fairness in Sprachmodellen: Eine In-Depth-Umfrage 语言模型公平性数据集:内部调查 2506.23411v1

Authors (5): Jiale Zhang, Zichong Wang, Avash Palikhe, Zhipeng Yin, Wenbin Zhang

Fairness benchmarks play a central role in shaping how we evaluate language models, yet surprisingly little attention has been given to examining the datasets that these benchmarks rely on. This survey addresses that gap by presenting a broad and careful review of the most widely used fairness datasets in current language model research, characterizing them along several key dimensions including their origin, scope, content, and intended use to help researchers better appreciate the assumptions and limitations embedded in these resources. To support more meaningful comparisons and analyses, we introduce a unified evaluation framework that reveals consistent patterns of demographic disparities across datasets and scoring methods. Applying this framework to twenty four common benchmarks, we highlight the often overlooked biases that can influence conclusions about model fairness and offer practical guidance for selecting, combining, and interpreting these datasets. We also point to opportunities for creating new fairness benchmarks that reflect more diverse social contexts and encourage more thoughtful use of these tools going forward. All code, data, and detailed results are publicly available at https://github.com/vanbanTruong/Fairness-in-Large-Language-Models/tree/main/datasets to promote transparency and reproducibility across the research community.

nan


Article 800

Title@2025-06-29 (7): No, of course I can! Refusal Mechanisms Can Be Exploited Using Harmless Fine-Tuning Data

Title: No, of course I can! Refusal Mechanisms Can Be Exploited Using Harmless Fine-Tuning Data Nein, natürlich kann ich! Weigerungsmechanismen können mit harmlosen Feintuning-Daten ausgebeutet werden 不,当然可以! 拒绝机制可以被利用无伤害的微调数据被利用 2502.19537v4

Authors (8): Joshua Kazdan, Abhay Puri, Rylan Schaeffer, Lisa Yu, Chris Cundy, Jason Stanley, Sanmi Koyejo, Krishnamurthy Dvijotham

Leading language model (LM) providers like OpenAI and Anthropic allow customers to fine-tune frontier LMs for specific use cases. To prevent abuse, these providers apply filters to block fine-tuning on overtly harmful data. In this setting, we make three contributions: First, while past work has shown that safety alignment is “shallow”, we correspondingly demonstrate that existing fine-tuning attacks are shallow – attacks target only the first several tokens of the model response, and consequently can be blocked by generating the first several response tokens with an aligned model. Second, we conceptually illustrate how to make attacks deeper by introducing a new fine-tuning attack that trains models to first refuse harmful requests before answering them; this “refuse-then-comply” strategy bypasses shallow defenses and produces harmful responses that evade output filters. Third, we demonstrate the potency of our new fine-tuning attack by jailbreaking both open-source models equipped with defenses and production models, achieving attack success rates of 57% and 72% against GPT-4o and Claude Haiku, respectively. Our attack received a $2000 bug bounty from OpenAI and was acknowledged as a vulnerability by Anthropic. Our work undermines the notion that models are safe because they initially refuse harmful requests and broadens awareness of the scope of attacks that face production fine-tuning APIs.

nan


Article 801

Title@2025-06-29 (7): Do LLMs Dream of Discrete Algorithms?

Title: Do LLMs Dream of Discrete Algorithms? Träumen LLMs von diskreten Algorithmen? LLMs 梦到分辨算法吗? 2506.23408v1

Authors (3): Claudionor Coelho Jr, Yanen Li, Philip Tee

Large Language Models (LLMs) have rapidly transformed the landscape of artificial intelligence, enabling natural language interfaces and dynamic orchestration of software components. However, their reliance on probabilistic inference limits their effectiveness in domains requiring strict logical reasoning, discrete decision-making, and robust interpretability. This paper investigates these limitations and proposes a neurosymbolic approach that augments LLMs with logic-based reasoning modules, particularly leveraging Prolog predicates and composable toolsets. By integrating first-order logic and explicit rule systems, our framework enables LLMs to decompose complex queries into verifiable sub-tasks, orchestrate reliable solutions, and mitigate common failure modes such as hallucination and incorrect step decomposition. We demonstrate the practical benefits of this hybrid architecture through experiments on the DABStep benchmark, showing improved precision, coverage, and system documentation in multi-step reasoning tasks. Our results indicate that combining LLMs with modular logic reasoning restores engineering rigor, enhances system reliability, and offers a scalable path toward trustworthy, interpretable AI agents across complex domains.

nan


Article 802

Title@2025-06-29 (7): Equivariance Everywhere All At Once: A Recipe for Graph Foundation Models

Title: Equivariance Everywhere All At Once: A Recipe for Graph Foundation Models Equivariance Everywhere All Once: Ein Rezept für Graph Foundation Modelle 偶而实现所有地方的均匀差异:图表基础模型的食谱 2506.14291v2

Authors (4): Ben Finkelshtein, İsmail İlkan Ceylan, Michael Bronstein, Ron Levie

Graph machine learning architectures are typically tailored to specific tasks on specific datasets, which hinders their broader applicability. This has led to a new quest in graph machine learning: how to build graph foundation models capable of generalizing across arbitrary graphs and features? In this work, we present a recipe for designing graph foundation models for node-level tasks from first principles. The key ingredient underpinning our study is a systematic investigation of the symmetries that a graph foundation model must respect. In a nutshell, we argue that label permutation-equivariance alongside feature permutation-invariance are necessary in addition to the common node permutation-equivariance on each local neighborhood of the graph. To this end, we first characterize the space of linear transformations that are equivariant to permutations of nodes and labels, and invariant to permutations of features. We then prove that the resulting network is a universal approximator on multisets that respect the aforementioned symmetries. Our recipe uses such layers on the multiset of features induced by the local neighborhood of the graph to obtain a class of graph foundation models for node property prediction. We validate our approach through extensive experiments on 29 real-world node classification datasets, demonstrating both strong zero-shot empirical performance and consistent improvement as the number of training graphs increases.

nan


Article 803

Title@2025-06-29 (7): AICO: Feature Significance Tests for Supervised Learning

Title: AICO: Feature Significance Tests for Supervised Learning AICO: Feature Significance Tests für überwachtes Lernen AICO: 监督学习的特异意义测试 2506.23396v1

Authors (3): Kay Giesecke, Enguerrand Horel, Chartsiri Jirachotkulthorn

The opacity of many supervised learning algorithms remains a key challenge, hindering scientific discovery and limiting broader deployment – particularly in high-stakes domains. This paper develops model- and distribution-agnostic significance tests to assess the influence of input features in any regression or classification algorithm. Our method evaluates a feature’s incremental contribution to model performance by masking its values across samples. Under the null hypothesis, the distribution of performance differences across a test set has a non-positive median. We construct a uniformly most powerful, randomized sign test for this median, yielding exact p-values for assessing feature significance and confidence intervals with exact coverage for estimating population-level feature importance. The approach requires minimal assumptions, avoids model retraining or auxiliary models, and remains computationally efficient even for large-scale, high-dimensional settings. Experiments on synthetic tasks validate its statistical and computational advantages, and applications to real-world data illustrate its practical utility.

nan


Article 804

Title@2025-06-29 (7): Satisficing Regret Minimization in Bandits: Constant Rate and Light-Tailed Distribution

Title: Satisficing Regret Minimization in Bandits: Constant Rate and Light-Tailed Distribution Zufriedene Reue Minimierung bei Banditen: Konstante Rate und lichtgebremste Verteilung 山贼中令人满意的遗憾最小化:常数率和轻故障分布 2406.06802v3

Authors (3): Qing Feng, Tianyi Ma, Ruihao Zhu

Motivated by the concept of satisficing in decision-making, we consider the problem of satisficing regret minimization in bandit optimization. In this setting, the learner aims at selecting satisficing arms (arms with mean reward exceeding a certain threshold value) as frequently as possible. The performance is measured by satisficing regret, which is the cumulative deficit of the chosen arm’s mean reward compared to the threshold. We propose SELECT, a general algorithmic template for Satisficing REgret Minimization via SampLing and LowEr Confidence bound Testing, that attains constant expected satisficing regret for a wide variety of bandit optimization problems in the realizable case (i.e., a satisficing arm exists). As a complement, SELECT also enjoys the same (standard) regret guarantee as the oracle in the non-realizable case. To further ensure stability of the algorithm, we introduce SELECT-LITE that achieves a light-tailed satisficing regret distribution plus a constant expected satisficing regret in the realizable case and a sub-linear expected (standard) regret in the non-realizable case. Notably, SELECT-LITE can operate on learning oracles with heavy-tailed (standard) regret distribution. More importantly, our results reveal the surprising compatibility between constant expected satisficing regret and light-tailed satisficing regret distribution, which is in sharp contrast to the case of (standard) regret. Finally, we conduct numerical experiments to validate the performance of SELECT and SELECT-LITE on both synthetic datasets and a real-world dynamic pricing case study.

nan


Article 805

Title@2025-06-29 (7): Enhancing Adversarial Robustness through Multi-Objective Representation Learning

Title: Enhancing Adversarial Robustness through Multi-Objective Representation Learning Verbesserung der adversarialen Robustheit durch multi-objektives Repräsentationslernen 通过多目标代表制学习加强反向强力 2410.01697v4

Authors (2): Sedjro Salomon Hotegni, Sebastian Peitz

Deep neural networks (DNNs) are vulnerable to small adversarial perturbations, which are tiny changes to the input data that appear insignificant but cause the model to produce drastically different outputs. Many defense methods require modifying model architectures during evaluation or performing test-time data purification. This not only introduces additional complexity but is often architecture-dependent. We show, however, that robust feature learning during training can significantly enhance DNN robustness. We propose MOREL, a multi-objective approach that aligns natural and adversarial features using cosine similarity and multi-positive contrastive losses to encourage similar features for same-class inputs. Extensive experiments demonstrate that MOREL significantly improves robustness against both white-box and black-box attacks. Our code is available at https://github.com/salomonhotegni/MOREL

nan


Article 806

Title: Uncertain Boundaries: Multidisciplinary Approaches to Copyright Issues in Generative AI Unbestimmte Grenzen: multidisziplinäre Ansätze zu Urheberrechtsfragen in generativer KI 不确定性边界:创新的对版权问题的多学科方法 2404.08221v2

Authors (5): Archer Amon, Zhipeng Yin, Zichong Wang, Avash Palikhe, Wenbin Zhang

Generative AI is becoming increasingly prevalent in creative fields, sparking urgent debates over how current copyright laws can keep pace with technological innovation. Recent controversies of AI models generating near-replicas of copyrighted material highlight the need to adapt current legal frameworks and develop technical methods to mitigate copyright infringement risks. This task requires understanding the intersection between computational concepts such as large-scale data scraping and probabilistic content generation, legal definitions of originality and fair use, and economic impacts on IP rights holders. However, most existing research on copyright in AI takes a purely computer science or law-based approach, leaving a gap in coordinating these approaches that only multidisciplinary efforts can effectively address. To bridge this gap, our survey adopts a comprehensive approach synthesizing insights from law, policy, economics, and computer science. It begins by discussing the foundational goals and considerations that should be applied to copyright in generative AI, followed by methods for detecting and assessing potential violations in AI system outputs. Next, it explores various regulatory options influenced by legal, policy, and economic frameworks to manage and mitigate copyright concerns associated with generative AI and reconcile the interests of IP rights holders with that of generative AI producers. The discussion then introduces techniques to safeguard individual creative works from unauthorized replication, such as watermarking and cryptographic protections. Finally, it describes advanced training strategies designed to prevent AI models from reproducing protected content. In doing so, we highlight key opportunities for action and offer actionable strategies that creators, developers, and policymakers can use in navigating the evolving copyright landscape.

nan


Article 807

Title@2025-06-29 (7): Two-dimensional Taxonomy for N-ary Knowledge Representation Learning Methods

Title: Two-dimensional Taxonomy for N-ary Knowledge Representation Learning Methods Zweidimensionale Taxonomie für N-are Wissensdarstellung Lernmethoden Nary知识代表制学习方法二维分类学 2506.05626v2

Authors (3): Xiaohua Lu, Liubov Tupikina, Mehwish Alam

Real-world knowledge can take various forms, including structured, semi-structured, and unstructured data. Among these, knowledge graphs are a form of structured human knowledge that integrate heterogeneous data sources into structured representations but typically reduce complex n-ary relations to simple triples, thereby losing higher-order relational details. In contrast, hypergraphs naturally represent n-ary relations with hyperedges, which directly connect multiple entities together. Yet hypergraph representation learning often overlooks entity roles in hyperedges, limiting the finegrained semantic modelling. To address these issues, knowledge hypergraphs and hyper-relational knowledge graphs combine the advantages of knowledge graphs and hypergraphs to better capture the complex structures and role-specific semantics of real world knowledge. This survey provides a comprehensive review of methods handling n-ary relational data, covering both knowledge hypergraphs and hyper-relational knowledge graphs literatures. We propose a two-dimensional taxonomy: the first dimension categorises models based on their methodology, i.e., translation-based models, tensor factorisation-based models, deep neural network-based models, logic rules-based models, and hyperedge expansion-based models. The second dimension classifies models according to their awareness of entity roles and positions in n-ary relations, dividing them into aware-less, position-aware, and role-aware approaches. Finally, we discuss existing datasets, training settings and strategies, and outline open challenges to inspire future research.

nan


Article 808

Title@2025-06-29 (7): SIEDD: Shared-Implicit Encoder with Discrete Decoders

Title: SIEDD: Shared-Implicit Encoder with Discrete Decoders SIEDD: Gemeinsam-impliziter Encoder mit diskreten Decodern SIEDD: 含有分解代碼器的共享- 集成编码器 2506.23382v1

Authors (4): Vikram Rangarajan, Shishira Maiya, Max Ehrlich, Abhinav Shrivastava

Implicit Neural Representations (INRs) offer exceptional fidelity for video compression by learning per-video optimized functions, but their adoption is crippled by impractically slow encoding times. Existing attempts to accelerate INR encoding often sacrifice reconstruction quality or crucial coordinate-level control essential for adaptive streaming and transcoding. We introduce SIEDD (Shared-Implicit Encoder with Discrete Decoders), a novel architecture that fundamentally accelerates INR encoding without these compromises. SIEDD first rapidly trains a shared, coordinate-based encoder on sparse anchor frames to efficiently capture global, low-frequency video features. This encoder is then frozen, enabling massively parallel training of lightweight, discrete decoders for individual frame groups, further expedited by aggressive coordinate-space sampling. This synergistic design delivers a remarkable 20-30X encoding speed-up over state-of-the-art INR codecs on HD and 4K benchmarks, while maintaining competitive reconstruction quality and compression ratios. Critically, SIEDD retains full coordinate-based control, enabling continuous resolution decoding and eliminating costly transcoding. Our approach significantly advances the practicality of high-fidelity neural video compression, demonstrating a scalable and efficient path towards real-world deployment. Our codebase is available at https://github.com/VikramRangarajan/SIEDD .

nan


Article 809

Title@2025-06-29 (7): Data Filtering for Genetic Perturbation Prediction

Title: Data Filtering for Genetic Perturbation Prediction Datenfilterung für genetische Störungsvorhersage 遗传扰动预测数据过滤 2503.14571v3

Authors (5): George Panagopoulos, Johannes F. Lutzeyer, Sofiane Ennadir, Michalis Vazirgiannis, Jun Pang

Genomic studies, including CRISPR-based PerturbSeq analyses, face a vast hypothesis space, while gene perturbations remain costly and time-consuming. Gene expression models based on graph neural networks are trained to predict the outcomes of gene perturbations to facilitate such experiments. Active learning methods are often employed to train these models due to the cost of the genomic experiments required to build the training set. However, poor model initialization in active learning can result in suboptimal early selections, wasting time and valuable resources. While typical active learning mitigates this issue over many iterations, the limited number of experimental cycles in genomic studies exacerbates the risk. To this end, we propose graph-based data filtering as an alternative. Unlike active learning, data filtering selects the gene perturbations before training, meaning it is free of bias due to random initialization and initial random selection. Moreover, reducing the iterations between the wet lab and the model provides several operational advantages resulting in significant acceleration. The proposed methods are motivated by theoretical studies of graph neural network generalization. The criteria are defined over the input graph and are optimized with submodular maximization. We compare them empirically to baselines and active learning methods that are state-of-the-art. The results demonstrate that graph-based data filtering achieves comparable accuracy while alleviating the aforementioned risks.

nan


Article 810

Title@2025-06-29 (7): Adversarial Robustness Unhardening via Backdoor Attacks in Federated Learning

Title: Adversarial Robustness Unhardening via Backdoor Attacks in Federated Learning Widerwärtige Robustheit Unhärtend durch Hintertürangriffe im Federated Learning 在联邦学习中通过后门攻击而不硬化的逆向强力 2310.11594v3

Authors (5): Taejin Kim, Jiarui Li, Shubhranshu Singh, Nikhil Madaan, Carlee Joe-Wong

The delicate equilibrium between user privacy and the ability to unleash the potential of distributed data is an important concern. Federated learning, which enables the training of collaborative models without sharing of data, has emerged as a privacy-centric solution. This approach brings forth security challenges, notably poisoning and backdoor attacks where malicious entities inject corrupted data into the training process, as well as evasion attacks that aim to induce misclassifications at test time. Our research investigates the intersection of adversarial training, a common defense method against evasion attacks, and backdoor attacks within federated learning. We introduce Adversarial Robustness Unhardening (ARU), which is employed by a subset of adversarial clients to intentionally undermine model robustness during federated training, rendering models susceptible to a broader range of evasion attacks. We present extensive experiments evaluating ARU’s impact on adversarial training and existing robust aggregation defenses against poisoning and backdoor attacks. Our results show that ARU can substantially undermine adversarial training’s ability to harden models against test-time evasion attacks, and that adversaries employing ARU can even evade robust aggregation defenses that often neutralize poisoning or backdoor attacks.

nan


Article 811

Title@2025-06-29 (7): When Additive Noise Meets Unobserved Mediators: Bivariate Denoising Diffusion for Causal Discovery

Title: When Additive Noise Meets Unobserved Mediators: Bivariate Denoising Diffusion for Causal Discovery Wenn additive Geräusche unbeobachtete Mediatoren treffen: Bivariate dämpfende Diffusion für die kausale Entdeckung 当“添加噪音”遇到无旁观调解员时:双重拒绝为发现原因而传播 2506.23374v1

Authors (4): Dominik Meier, Sujai Hiremath, Promit Ghosal, Kyra Gan

Distinguishing cause and effect from bivariate observational data is a foundational problem in many disciplines, but challenging without additional assumptions. Additive noise models (ANMs) are widely used to enable sample-efficient bivariate causal discovery. However, conventional ANM-based methods fail when unobserved mediators corrupt the causal relationship between variables. This paper makes three key contributions: first, we rigorously characterize why standard ANM approaches break down in the presence of unmeasured mediators. Second, we demonstrate that prior solutions for hidden mediation are brittle in finite sample settings, limiting their practical utility. To address these gaps, we propose Bivariate Denoising Diffusion (BiDD) for causal discovery, a method designed to handle latent noise introduced by unmeasured mediators. Unlike prior methods that infer directionality through mean squared error loss comparisons, our approach introduces a novel independence test statistic: during the noising and denoising processes for each variable, we condition on the other variable as input and evaluate the independence of the predicted noise relative to this input. We prove asymptotic consistency of BiDD under the ANM, and conjecture that it performs well under hidden mediation. Experiments on synthetic and real-world data demonstrate consistent performance, outperforming existing methods in mediator-corrupted settings while maintaining strong performance in mediator-free settings.

nan


Article 812

Title@2025-06-29 (7): Emotional RAG LLMs: Reading Comprehension for the Open Internet

Title: Emotional RAG LLMs: Reading Comprehension for the Open Internet Emotionale RAG LLMs: Leseverständnis für das offene Internet 情感性RAG LLM: 阅读开放互联网理解 2408.11189v2

Authors (5): Benjamin Reichman, Adar Avsian, Kartik Talamadupula, Toshish Jawale, Larry Heck

Queries to large language models (LLMs) can be divided into two parts: the instruction/question and the accompanying context. The context for retrieval-augmented generation (RAG) systems in most benchmarks comes from Wikipedia-like texts written in a neutral and factual tone. However, real-world RAG applications often retrieve internet-based text with diverse tones and linguistic styles, posing challenges for downstream tasks. This paper introduces (a) a dataset that transforms RAG-retrieved passages into emotionally inflected and sarcastic text, (b) an emotion translation model for adapting text to different tones, and (c) a prompt-based method to improve LLMs’ pragmatic interpretation of retrieved text.

nan


Article 813

Title@2025-06-29 (7): Investigating an Overfitting and Degeneration Phenomenon in Self-Supervised Multi-Pitch Estimation

Title: Investigating an Overfitting and Degeneration Phenomenon in Self-Supervised Multi-Pitch Estimation Untersuchung eines Über- und Degenerationsphänomens in selbstüberwachter Multi-Pitch-Schätzung 调查在自我监督的多方案估算中一个超称和破旧现象 2506.23371v1

Authors (2): Frank Cwitkowitz, Zhiyao Duan

Multi-Pitch Estimation (MPE) continues to be a sought after capability of Music Information Retrieval (MIR) systems, and is critical for many applications and downstream tasks involving pitch, including music transcription. However, existing methods are largely based on supervised learning, and there are significant challenges in collecting annotated data for the task. Recently, self-supervised techniques exploiting intrinsic properties of pitch and harmonic signals have shown promise for both monophonic and polyphonic pitch estimation, but these still remain inferior to supervised methods. In this work, we extend the classic supervised MPE paradigm by incorporating several self-supervised objectives based on pitch-invariant and pitch-equivariant properties. This joint training results in a substantial improvement under closed training conditions, which naturally suggests that applying the same objectives to a broader collection of data will yield further improvements. However, in doing so we uncover a phenomenon whereby our model simultaneously overfits to the supervised data while degenerating on data used for self-supervision only. We demonstrate and investigate this and offer our insights on the underlying problem.

nan


Article 814

Title@2025-06-29 (7): Federated Timeline Synthesis: Scalable and Private Methodology For Model Training and Deployment

Title: Federated Timeline Synthesis: Scalable and Private Methodology For Model Training and Deployment Federated Timeline Synthesis: Skalierbare und private Methodik für Modellausbildung und -einsatz 联邦时间线综合分析:示范培训和部署可缩放和私用方法 2506.23358v1

Authors (6): Pawel Renc, Michal K. Grzeszczyk, Linglong Qian, Nassim Oufattole, Jeff Rasley, Arkadiusz Sitek

We present Federated Timeline Synthesis (FTS), a novel framework for training generative foundation models across distributed timeseries data applied to electronic health records (EHR). At its core, FTS represents patient history as tokenized Patient Health Timelines (PHTs), language-agnostic sequences encoding temporal, categorical, and continuous clinical information. Each institution trains an autoregressive transformer on its local PHTs and transmits only model weights to a central server. The server uses the generators to synthesize a large corpus of trajectories and train a Global Generator (GG), enabling zero-shot inference via Monte Carlo simulation of future PHTs. We evaluate FTS on five clinically meaningful prediction tasks using MIMIC-IV data, showing that models trained on synthetic data generated by GG perform comparably to those trained on real data. FTS offers strong privacy guarantees, scalability across institutions, and extensibility to diverse prediction and simulation tasks especially in healthcare, including counterfactual inference, early warning detection, and synthetic trial design.

nan


Article 815

Title@2025-06-29 (7): Rethinking Algorithmic Fairness for Human-AI Collaboration

Title: Rethinking Algorithmic Fairness for Human-AI Collaboration Algorithmische Fairness für die Zusammenarbeit zwischen Mensch und AI neu denken 重新思考人类-AI合作的算法公平性 2310.03647v3

Authors (3): Haosen Ge, Hamsa Bastani, Osbert Bastani

Existing approaches to algorithmic fairness aim to ensure equitable outcomes if human decision-makers comply perfectly with algorithmic decisions. However, perfect compliance with the algorithm is rarely a reality or even a desirable outcome in human-AI collaboration. Yet, recent studies have shown that selective compliance with fair algorithms can amplify discrimination relative to the prior human policy. As a consequence, ensuring equitable outcomes requires fundamentally different algorithmic design principles that ensure robustness to the decision-maker’s (a priori unknown) compliance pattern. We define the notion of compliance-robustly fair algorithmic recommendations that are guaranteed to (weakly) improve fairness in decisions, regardless of the human’s compliance pattern. We propose a simple optimization strategy to identify the best performance-improving compliance-robustly fair policy. However, we show that it may be infeasible to design algorithmic recommendations that are simultaneously fair in isolation, compliance-robustly fair, and more accurate than the human policy; thus, if our goal is to improve the equity and accuracy of human-AI collaboration, it may not be desirable to enforce traditional algorithmic fairness constraints. We illustrate the value of our approach on criminal sentencing data before and after the introduction of an algorithmic risk assessment tool in Virginia.

nan


Article 816

Title@2025-06-29 (7): A case for data valuation transparency via DValCards

Title: A case for data valuation transparency via DValCards Ein Fall für Datenbewertungstransparenz über DValCards 通过 DValCards 提高数据估价透明度的一个案例 2506.23349v1

Authors (2): Keziah Naggita, Julienne LaChance

Following the rise in popularity of data-centric machine learning (ML), various data valuation methods have been proposed to quantify the contribution of each datapoint to desired ML model performance metrics (e.g., accuracy). Beyond the technical applications of data valuation methods (e.g., data cleaning, data acquisition, etc.), it has been suggested that within the context of data markets, data buyers might utilize such methods to fairly compensate data owners. Here we demonstrate that data valuation metrics are inherently biased and unstable under simple algorithmic design choices, resulting in both technical and ethical implications. By analyzing 9 tabular classification datasets and 6 data valuation methods, we illustrate how (1) common and inexpensive data pre-processing techniques can drastically alter estimated data values; (2) subsampling via data valuation metrics may increase class imbalance; and (3) data valuation metrics may undervalue underrepresented group data. Consequently, we argue in favor of increased transparency associated with data valuation in-the-wild and introduce the novel Data Valuation Cards (DValCards) framework towards this aim. The proliferation of DValCards will reduce misuse of data valuation metrics, including in data pricing, and build trust in responsible ML systems.

nan


Article 817

Title@2025-06-29 (7): Improving Robustness and Reliability in Medical Image Classification with Latent-Guided Diffusion and Nested-Ensembles

Title: Improving Robustness and Reliability in Medical Image Classification with Latent-Guided Diffusion and Nested-Ensembles Verbesserung der Robustheit und Zuverlässigkeit in der medizinischen Bildklassifizierung mit latentgeführter Diffusion und verschachtelten Baugruppen 提高医疗图像分类的强力和可靠性,包括前期引导传播和内装成品 2310.15952v5

Authors (4): Xing Shen, Hengguan Huang, Brennan Nichyporuk, Tal Arbel

Once deployed, medical image analysis methods are often faced with unexpected image corruptions and noise perturbations. These unknown covariate shifts present significant challenges to deep learning based methods trained on “clean” images. This often results in unreliable predictions and poorly calibrated confidence, hence hindering clinical applicability. While recent methods have been developed to address specific issues such as confidence calibration or adversarial robustness, no single framework effectively tackles all these challenges simultaneously. To bridge this gap, we propose LaDiNE, a novel ensemble learning method combining the robustness of Vision Transformers with diffusion-based generative models for improved reliability in medical image classification. Specifically, transformer encoder blocks are used as hierarchical feature extractors that learn invariant features from images for each ensemble member, resulting in features that are robust to input perturbations. In addition, diffusion models are used as flexible density estimators to estimate member densities conditioned on the invariant features, leading to improved modeling of complex data distributions while retaining properly calibrated confidence. Extensive experiments on tuberculosis chest X-rays and melanoma skin cancer datasets demonstrate that LaDiNE achieves superior performance compared to a wide range of state-of-the-art methods by simultaneously improving prediction accuracy and confidence calibration under unseen noise, adversarial perturbations, and resolution degradation.

nan


Article 818

Title@2025-06-29 (7): Composing Parts for Expressive Object Generation

Title: Composing Parts for Expressive Object Generation Komponieren von Teilen für die Expressive Objektgenerierung 表达式对象生成的组合部件 2406.10197v2

Authors (5): Harsh Rangwani, Aishwarya Agarwal, Kuldeep Kulkarni, R. Venkatesh Babu, Srikrishna Karanam

Image composition and generation are processes where the artists need control over various parts of the generated images. However, the current state-of-the-art generation models, like Stable Diffusion, cannot handle fine-grained part-level attributes in the text prompts. Specifically, when additional attribute details are added to the base text prompt, these text-to-image models either generate an image vastly different from the image generated from the base prompt or ignore the attribute details. To mitigate these issues, we introduce PartComposer, a training-free method that enables image generation based on fine-grained part-level attributes specified for objects in the base text prompt. This allows more control for artists and enables novel object compositions by combining distinctive object parts. PartComposer first localizes object parts by denoising the object region from a specific diffusion process. This enables each part token to be localized to the right region. After obtaining part masks, we run a localized diffusion process in each part region based on fine-grained part attributes and combine them to produce the final image. All stages of PartComposer are based on repurposing a pre-trained diffusion model, which enables it to generalize across domains. We demonstrate the effectiveness of part-level control provided by PartComposer through qualitative visual examples and quantitative comparisons with contemporary baselines.

nan


Article 819

Title@2025-06-29 (7): Data-Driven Self-Supervised Learning for the Discovery of Solution Singularity for Partial Differential Equations

Title: Data-Driven Self-Supervised Learning for the Discovery of Solution Singularity for Partial Differential Equations Datengestütztes Selbstüberwachtes Lernen für die Entdeckung von Lösungssingularität für partielle Differentialgleichungen 数据驱动自监测学习,以发现部分差异等分解的解析元件 2506.23344v1

Authors (2): Difeng Cai, Paulina Sepúlveda

The appearance of singularities in the function of interest constitutes a fundamental challenge in scientific computing. It can significantly undermine the effectiveness of numerical schemes for function approximation, numerical integration, and the solution of partial differential equations (PDEs), etc. The problem becomes more sophisticated if the location of the singularity is unknown, which is often encountered in solving PDEs. Detecting the singularity is therefore critical for developing efficient adaptive methods to reduce computational costs in various applications. In this paper, we consider singularity detection in a purely data-driven setting. Namely, the input only contains given data, such as the vertex set from a mesh. To overcome the limitation of the raw unlabeled data, we propose a self-supervised learning (SSL) framework for estimating the location of the singularity. A key component is a filtering procedure as the pretext task in SSL, where two filtering methods are presented, based on $k$ nearest neighbors and kernel density estimation, respectively. We provide numerical examples to illustrate the potential pathological or inaccurate results due to the use of raw data without filtering. Various experiments are presented to demonstrate the ability of the proposed approach to deal with input perturbation, label corruption, and different kinds of singularities such interior circle, boundary layer, concentric semicircles, etc.

nan


Article 820

Title@2025-06-29 (7): Semantic-Aware Adaptive Video Streaming Using Latent Diffusion Models for Wireless Networks

Title: Semantic-Aware Adaptive Video Streaming Using Latent Diffusion Models for Wireless Networks Semantic-Aware Adaptive Video-Streaming mit latenten Diffusions-Modellen für drahtlose Netzwerke 使用无线网络的冷传传播模型进行语义- 智能可调控视频流 2502.05695v2

Authors (5): Zijiang Yan, Jianhua Pei, Hongda Wu, Hina Tabassum, Ping Wang

This paper proposes a novel Semantic Communication (SemCom) framework for real-time adaptive-bitrate video streaming by integrating Latent Diffusion Models (LDMs) within the FFmpeg techniques. This solution addresses the challenges of high bandwidth usage, storage inefficiencies, and quality of experience (QoE) degradation associated with traditional Constant Bitrate Streaming (CBS) and Adaptive Bitrate Streaming (ABS). The proposed approach leverages LDMs to compress I-frames into a latent space, offering significant storage and semantic transmission savings without sacrificing high visual quality. While retaining B-frames and P-frames as adjustment metadata to support efficient refinement of video reconstruction at the user side, the proposed framework further incorporates state-of-the-art denoising and Video Frame Interpolation (VFI) techniques. These techniques mitigate semantic ambiguity and restore temporal coherence between frames, even in noisy wireless communication environments. Experimental results demonstrate the proposed method achieves high-quality video streaming with optimized bandwidth usage, outperforming state-of-the-art solutions in terms of QoE and resource efficiency. This work opens new possibilities for scalable real-time video streaming in 5G and future post-5G networks.

nan


Article 821

Title@2025-06-29 (7): Scalable Non-Equivariant 3D Molecule Generation via Rotational Alignment

Title: Scalable Non-Equivariant 3D Molecule Generation via Rotational Alignment Skalierbare nicht-equivariante 3D-Molekül-Erzeugung über Rotational Alignment 通过旋转对齐可缩放的非等式 3D 分子生成 2506.10186v2

Authors (2): Yuhui Ding, Thomas Hofmann

Equivariant diffusion models have achieved impressive performance in 3D molecule generation. These models incorporate Euclidean symmetries of 3D molecules by utilizing an SE(3)-equivariant denoising network. However, specialized equivariant architectures limit the scalability and efficiency of diffusion models. In this paper, we propose an approach that relaxes such equivariance constraints. Specifically, our approach learns a sample-dependent SO(3) transformation for each molecule to construct an aligned latent space. A non-equivariant diffusion model is then trained over the aligned representations. Experimental results demonstrate that our approach performs significantly better than previously reported non-equivariant models. It yields sample quality comparable to state-of-the-art equivariant diffusion models and offers improved training and sampling efficiency. Our code is available at https://github.com/skeletondyh/RADM

nan


Article 822

Title@2025-06-29 (7): VALID-Mol: a Systematic Framework for Validated LLM-Assisted Molecular Design

Title: VALID-Mol: a Systematic Framework for Validated LLM-Assisted Molecular Design VALID-Mol: Systematisches Framework für validiertes LLM-Assisted Molecular Design VALID-Mol:验证LLM辅助分子设计系统框架 2506.23339v1

Authors (2): Malikussaid, Hilal Hudan Nuha

Large Language Models (LLMs) demonstrate remarkable potential for scientific discovery, but their application in domains requiring factual accuracy and domain-specific constraints remains challenging. In molecular design for drug discovery, LLMs can suggest creative molecular modifications but often produce chemically invalid or impractical structures. We present VALID-Mol, a systematic framework for integrating chemical validation with LLM-driven molecular design that increases the rate of generating valid chemical structures from 3% to 83%. Our approach combines methodical prompt engineering, automated chemical validation, and a fine-tuned domain-adapted LLM to ensure reliable generation of synthesizable molecules with improved properties. Beyond the specific implementation, we contribute a generalizable methodology for scientifically-constrained LLM applications, with quantifiable reliability improvements. Computational predictions suggest our framework can generate promising candidates for synthesis with up to 17-fold computationally predicted improvements in target affinity while maintaining synthetic accessibility. We provide a detailed analysis of our prompt engineering process, validation architecture, and fine-tuning approach, offering a reproducible blueprint for applying LLMs to other scientific domains where domain-specific validation is essential.

nan


Article 823

Title@2025-06-29 (7): An Investigation into Seasonal Variations in Energy Forecasting for Student Residences

Title: An Investigation into Seasonal Variations in Energy Forecasting for Student Residences Untersuchung von saisonalen Variationen in der Energieprognose für Studentenwohnheime 对学生住宅能源预测季节性变化的调查 2501.07423v2

Authors (5): Muhammad Umair Danish, Mathumitha Sureshkumar, Tehara Fonseka, Umeshika Uthayakumar, Vinura Galwaduge

This research provides an in-depth evaluation of various machine learning models for energy forecasting, focusing on the unique challenges of seasonal variations in student residential settings. The study assesses the performance of baseline models, such as LSTM and GRU, alongside state-of-the-art forecasting methods, including Autoregressive Feedforward Neural Networks, Transformers, and hybrid approaches. Special attention is given to predicting energy consumption amidst challenges like seasonal patterns, vacations, meteorological changes, and irregular human activities that cause sudden fluctuations in usage. The findings reveal that no single model consistently outperforms others across all seasons, emphasizing the need for season-specific model selection or tailored designs. Notably, the proposed Hyper Network based LSTM and MiniAutoEncXGBoost models exhibit strong adaptability to seasonal variations, effectively capturing abrupt changes in energy consumption during summer months. This study advances the energy forecasting field by emphasizing the critical role of seasonal dynamics and model-specific behavior in achieving accurate predictions.

nan


Article 824

Title@2025-06-29 (7): A Library for Learning Neural Operators

Title: A Library for Learning Neural Operators Eine Bibliothek zum Lernen neuraler Operatoren 学习神经操作员图书馆 2412.10354v4

Authors (11): Jean Kossaifi, Nikola Kovachki, Zongyi Li, David Pitt, Miguel Liu-Schiaffini, Valentin Duruisseaux, Robert Joseph George, Boris Bonev, Kamyar Azizzadenesheli, Julius Berner, Anima Anandkumar

We present NeuralOperator, an open-source Python library for operator learning. Neural operators generalize neural networks to maps between function spaces instead of finite-dimensional Euclidean spaces. They can be trained and inferenced on input and output functions given at various discretizations, satisfying a discretization convergence properties. Built on top of PyTorch, NeuralOperator provides all the tools for training and deploying neural operator models, as well as developing new ones, in a high-quality, tested, open-source package. It combines cutting-edge models and customizability with a gentle learning curve and simple user interface for newcomers.

nan


Article 825

Title@2025-06-29 (7): Parameter-Efficient Fine-Tuning for Pre-Trained Vision Models: A Survey and Benchmark

Title: Parameter-Efficient Fine-Tuning for Pre-Trained Vision Models: A Survey and Benchmark Parametereffizientes Feintuning für vortrainierte Visionsmodelle: Eine Umfrage und ein Benchmark 培训前愿景模型:调查和基准 2402.02242v5

Authors (12): Yi Xin, Jianjiang Yang, Siqi Luo, Yuntao Du, Qi Qin, Kangrui Cen, Yangfan He, Bin Fu, Xiaokang Yang, Guangtao Zhai, Ming-Hsuan Yang, Xiaohong Liu

Pre-trained vision models (PVMs) have demonstrated remarkable adaptability across a wide range of downstream vision tasks, showcasing exceptional performance. However, as these models scale to billions or even trillions of parameters, conventional full fine-tuning has become increasingly impractical due to its high computational and storage demands. To address these challenges, parameter-efficient fine-tuning (PEFT) has emerged as a promising alternative, aiming to achieve performance comparable to full fine-tuning while making minimal adjustments to the model parameters. This paper presents a comprehensive survey of the latest advancements in the visual PEFT field, systematically reviewing current methodologies and categorizing them into four primary categories: addition-based, partial-based, unified-based, and multi-task tuning. In addition, this paper offers an in-depth analysis of widely used visual datasets and real-world applications where PEFT methods have been successfully applied. Furthermore, this paper introduces the V-PEFT Bench, a unified benchmark designed to standardize the evaluation of PEFT methods across a diverse set of vision tasks, ensuring consistency and fairness in comparison. Finally, the paper outlines potential directions for future research to propel advances in the PEFT field. A comprehensive collection of resources is available at https://github.com/synbol/Awesome-Parameter-Efficient-Transfer-Learning.

nan


Article 826

Title@2025-06-29 (7): Scaling Laws for Black box Adversarial Attacks

Title: Scaling Laws for Black box Adversarial Attacks Skalierungsgesetze für Black Box-Adversarialangriffe 黑盒对面袭击法律推广 2411.16782v3

Authors (5): Chuan Liu, Huanran Chen, Yichi Zhang, Yinpeng Dong, Jun Zhu

Adversarial examples usually exhibit good cross-model transferability, enabling attacks on black-box models with limited information about their architectures and parameters, which are highly threatening in commercial black-box scenarios. Model ensembling is an effective strategy to improve the transferability of adversarial examples by attacking multiple surrogate models. However, since prior studies usually adopt few models in the ensemble, there remains an open question of whether scaling the number of models can further improve black-box attacks. Inspired by the scaling law of large foundation models, we investigate the scaling laws of black-box adversarial attacks in this work. Through theoretical analysis and empirical evaluations, we conclude with clear scaling laws that using more surrogate models enhances adversarial transferability. Comprehensive experiments verify the claims on standard image classifiers, diverse defended models and multimodal large language models using various adversarial attack methods. Specifically, by scaling law, we achieve 90%+ transfer attack success rate on even proprietary models like GPT-4o. Further visualization indicates that there is also a scaling law on the interpretability and semantics of adversarial perturbations.

nan


Article 827

Title@2025-06-29 (7): Perturbation Analysis of Singular Values in Concatenated Matrices

Title: Perturbation Analysis of Singular Values in Concatenated Matrices Störungsanalyse von Singularwerten in konkatenierten Matrizen 封闭母体中单子值的扰动分析 2505.01427v2

Authors (1): Maksym Shamrai

Concatenating matrices is a common technique for uncovering shared structures in data through singular value decomposition (SVD) and low-rank approximations. The fundamental question arises: How does the singular value spectrum of the concatenated matrix relate to the spectra of its individual components? In the present work, we develop a perturbation technique that extends classical results such as Weyl’s inequality to concatenated matrices. We setup analytical bounds that quantify stability of singular values under small perturbations in submatrices. The results demonstrate that if submatrices are close in a norm, dominant singular values of the concatenated matrix remain stable enabling controlled trade-offs between accuracy and compression. These provide a theoretical basis for improved matrix clustering and compression strategies with applications in the numerical linear algebra, signal processing, and data-driven modeling.

nan


Article 828

Title@2025-06-29 (7): Learning to Rank with Variable Result Presentation Lengths

Title: Learning to Rank with Variable Result Presentation Lengths Ranken lernen mit variablem Ergebnis Präsentationslängen 学习以变量结果列报长度排序 2506.23319v1

Authors (2): Norman Knyazev, Harrie Oosterhuis

Learning to Rank (LTR) methods generally assume that each document in a top-K ranking is presented in an equal format. However, previous work has shown that users’ perceptions of relevance can be changed by varying presentations, i.e., allocating more vertical space to some documents to provide additional textual or image information. Furthermore, presentation length can also redirect attention, as users are more likely to notice longer presentations when scrolling through results. Deciding on the document presentation lengths in a fixed vertical space ranking is an important problem that has not been addressed by existing LTR methods. We address this gap by introducing the variable presentation length ranking task, where simultaneously the ordering of documents and their presentation length is decided. Despite being a generalization of standard ranking, we show that this setting brings significant new challenges: Firstly, the probability ranking principle no longer applies to this setting, and secondly, the problem cannot be divided into separate ordering and length selection tasks. We therefore propose VLPL - a new family of Plackett-Luce list-wise gradient estimation methods for the joint optimization of document ordering and lengths. Our semi-synthetic experiments show that VLPL can effectively balance the expected exposure and attractiveness of all documents, achieving the best performance across different ranking settings. Furthermore, we observe that even simple length-aware methods can achieve significant performance improvements over fixed-length models. Altogether, our theoretical and empirical results highlight the importance and difficulties of combining document presentation with LTR.

nan


Article 829

Title@2025-06-29 (7): Ensemble BERT for Medication Event Classification on Electronic Health Records (EHRs)

Title: Ensemble BERT for Medication Event Classification on Electronic Health Records (EHRs) Ensemble BERT für Medikationsveranstaltungsklassifikation auf elektronischen Gesundheitsakten (EHRs) 电子健康记录(EHRs)药品事件分类集合BERT 2506.23315v1

Authors (3): Shouvon Sarker, Xishuang Dong, Lijun Qian

Identification of key variables such as medications, diseases, relations from health records and clinical notes has a wide range of applications in the clinical domain. n2c2 2022 provided shared tasks on challenges in natural language processing for clinical data analytics on electronic health records (EHR), where it built a comprehensive annotated clinical data Contextualized Medication Event Dataset (CMED). This study focuses on subtask 2 in Track 1 of this challenge that is to detect and classify medication events from clinical notes through building a novel BERT-based ensemble model. It started with pretraining BERT models on different types of big data such as Wikipedia and MIMIC. Afterwards, these pretrained BERT models were fine-tuned on CMED training data. These fine-tuned BERT models were employed to accomplish medication event classification on CMED testing data with multiple predictions. These multiple predictions generated by these fine-tuned BERT models were integrated to build final prediction with voting strategies. Experimental results demonstrated that BERT-based ensemble models can effectively improve strict Micro-F score by about 5% and strict Macro-F score by about 6%, respectively.

nan


Article 830

Title@2025-06-29 (7): Physics informed guided diffusion for accelerated multi-parametric MRI reconstruction

Title: Physics informed guided diffusion for accelerated multi-parametric MRI reconstruction Physik informiert geführte Diffusion für beschleunigte multiparametrische MRT-Rekonstruktion 为加速多参数MRI的重建提供物理信息 2506.23311v1

Authors (5): Perla Mayo, Carolin M. Pirkl, Alin Achim, Bjoern Menze, Mohammad Golbabaee

We introduce MRF-DiPh, a novel physics informed denoising diffusion approach for multiparametric tissue mapping from highly accelerated, transient-state quantitative MRI acquisitions like Magnetic Resonance Fingerprinting (MRF). Our method is derived from a proximal splitting formulation, incorporating a pretrained denoising diffusion model as an effective image prior to regularize the MRF inverse problem. Further, during reconstruction it simultaneously enforces two key physical constraints: (1) k-space measurement consistency and (2) adherence to the Bloch response model. Numerical experiments on in-vivo brain scans data show that MRF-DiPh outperforms deep learning and compressed sensing MRF baselines, providing more accurate parameter maps while better preserving measurement fidelity and physical model consistency-critical for solving reliably inverse problems in medical imaging.

nan


Article 831

Title@2025-06-29 (7): MedLeak: Multimodal Medical Data Leakage in Secure Federated Learning with Crafted Models

Title: MedLeak: Multimodal Medical Data Leakage in Secure Federated Learning with Crafted Models MedLeak: Multimodale medizinische Daten-Leakage im sicheren Federated Learning mit handwerklichen Modellen Medleak:安全联邦学习中多式医疗数据漏漏 2407.09972v2

Authors (8): Shanghao Shi, Md Shahedul Haque, Abhijeet Parida, Chaoyu Zhang, Marius George Linguraru, Y. Thomas Hou, Syed Muhammad Anwar, Wenjing Lou

Federated learning (FL) allows participants to collaboratively train machine learning models while keeping their data local, making it ideal for collaborations among healthcare institutions on sensitive data. However, in this paper, we propose a novel privacy attack called MedLeak, which allows a malicious FL server to recover high-quality site-specific private medical data from the client model updates. MedLeak works by introducing an adversarially crafted model during the FL training process. Honest clients, unaware of the insidious changes in the published models, continue to send back their updates as per the standard FL protocol. Leveraging a novel analytical method, MedLeak can efficiently recover private client data from the aggregated parameter updates, eliminating costly optimization. In addition, the scheme relies solely on the aggregated updates, thus rendering secure aggregation protocols ineffective, as they depend on the randomization of intermediate results for security while leaving the final aggregated results unaltered. We implement MedLeak on medical image datasets (MedMNIST, COVIDx CXR-4, and Kaggle Brain Tumor MRI), as well as a medical text dataset (MedAbstract). The results demonstrate that our attack achieves high recovery rates and strong quantitative scores on both image and text datasets. We also thoroughly evaluate MedLeak across different attack parameters, providing insights into key factors that influence attack performance and potential defenses. Furthermore, we demonstrate that the recovered data can support downstream tasks such as disease classification with minimal performance loss. Our findings validate the need for enhanced privacy measures in FL systems, particularly for safeguarding sensitive medical data against powerful model inversion attacks.

nan


Article 832

Title@2025-06-29 (7): Overcoming Dimensional Factorization Limits in Discrete Diffusion Models through Quantum Joint Distribution Learning

Title: Overcoming Dimensional Factorization Limits in Discrete Diffusion Models through Quantum Joint Distribution Learning Überwindung von Dimensional Factorization Limits in diskreten Diffusionsmodellen durch Quantum Joint Distribution Learning 通过量子联合分发学习克服分辨传播模式中的分量限制 2505.05151v3

Authors (6): Chuangtao Chen, Qinglin Zhao, MengChu Zhou, Dusit Niyato, Zhimin He, Haozhen Situ

Discrete diffusion models represent a significant advance in generative modeling, demonstrating remarkable success in synthesizing complex, high-quality discrete data. However, to avoid exponential computational costs, they typically rely on calculating per-dimension transition probabilities when learning high-dimensional distributions. In this study, we rigorously prove that this approach leads to a worst-case linear scaling of Kullback-Leibler (KL) divergence with data dimension. To address this, we propose a Quantum Discrete Denoising Diffusion Probabilistic Model (QD3PM), which enables joint probability learning through diffusion and denoising in exponentially large Hilbert spaces, offering a theoretical pathway to faithfully capture the true joint distribution. By deriving posterior states through quantum Bayes’ theorem, similar to the crucial role of posterior probabilities in classical diffusion models, and by learning the joint probability, we establish a solid theoretical foundation for quantum-enhanced diffusion models. For denoising, we design a quantum circuit that utilizes temporal information for parameter sharing and incorporates learnable classical-data-controlled rotations for encoding. Exploiting joint distribution learning, our approach enables single-step sampling from pure noise, eliminating iterative requirements of existing models. Simulations demonstrate the proposed model’s superior accuracy in modeling complex distributions compared to factorization methods. Hence, this paper establishes a new theoretical paradigm in generative models by leveraging the quantum advantage in joint distribution learning.

nan


Article 833

Title@2025-06-29 (7): Objective-Free Local Learning and Emergent Language Structure in Thinking Machines

Title: Objective-Free Local Learning and Emergent Language Structure in Thinking Machines Zielfreies lokales Lernen und neue Sprachstrukturen in denkenden Maschinen 考虑机器中无目标的地方学习和新兴语言结构 2506.23293v1

Authors (1): P. Myles Eugenio

We present a neuro-symbolic framework for generative language modeling based on local, event-driven emergent learning. At its core is a hierarchical Hopfield memory chain acting as a compositional short-term memory and dynamic tokenizer (retokenizer). Rather than relying on predefined tokens or supervision, the model builds structure from scratch, learning symbol sequences as multi-scale representations. It constructs projection tensors that bind co-occurring features into hierarchical tokens, introducing redundancy (i.e an emergent gauge structure) and enabling compression of local activations into long-range dependencies. Curiously, we find that the retokenizer can filter natural language patterns from noise, generating synthetic languages with coherent internal morphology – quantifiably the same as human language. Language is learned in a local (Hebbian) fashion, where model constraints dictate allowed emergent structure, and new information is retained in alignment with this structure. The absence of a global objective enables a form of plasticity not found in conventional language models, allowing the system to generalize beyond its initial inference class – even without explicit data. We demonstrate that briefly activating a new neuron during inference binds distributed multi-scale token features into a symbolic embedding. These emergent embedding neurons act as long-term memory and support a key-value mechanism for compositional inference and generalization. This architecture provides a methodological foundation for studying how symbolic structure can emerge from local neural learning. It offers a new pathway for building scalable, interpretable neuro-symbolic systems – where tokens, grammar, and reasoning arise as compressed memory traces within a Hopfield hierarchy. This approach advances the development of neuromorphic architectures for generative language models.

nan


Article 834

Title@2025-06-29 (7): Sample then Identify: A General Framework for Risk Control and Assessment in Multimodal Large Language Models

Title: Sample then Identify: A General Framework for Risk Control and Assessment in Multimodal Large Language Models Beispiel dann Identifizieren: Ein allgemeiner Rahmen für die Risikokontrolle und Bewertung in multimodalen großen Sprachmodellen 确定:多式大语言模式风险管理和评估总框架 2410.08174v3

Authors (6): Qingni Wang, Tiantian Geng, Zhiyuan Wang, Teng Wang, Bo Fu, Feng Zheng

Multimodal Large Language Models (MLLMs) exhibit promising advancements across various tasks, yet they still encounter significant trustworthiness issues. Prior studies apply Split Conformal Prediction (SCP) in language modeling to construct prediction sets with statistical guarantees. However, these methods typically rely on internal model logits or are restricted to multiple-choice settings, which hampers their generalizability and adaptability in dynamic, open-ended environments. In this paper, we introduce TRON, a two-step framework for risk control and assessment, applicable to any MLLM that supports sampling in both open-ended and closed-ended scenarios. TRON comprises two main components: (1) a novel conformal score to sample response sets of minimum size, and (2) a nonconformity score to identify high-quality responses based on self-consistency theory, controlling the error rates by two specific risk levels. Furthermore, we investigate semantic redundancy in prediction sets within open-ended contexts for the first time, leading to a promising evaluation metric for MLLMs based on average set size. Our comprehensive experiments across four Video Question-Answering (VideoQA) datasets utilizing eight MLLMs show that TRON achieves desired error rates bounded by two user-specified risk levels. Additionally, deduplicated prediction sets maintain adaptiveness while being more efficient and stable for risk assessment under different risk levels.

nan


Article 835

Title@2025-06-29 (7): Hierarchical Quantized Diffusion Based Tree Generation Method for Hierarchical Representation and Lineage Analysis

Title: Hierarchical Quantized Diffusion Based Tree Generation Method for Hierarchical Representation and Lineage Analysis Hierarchische Quantisierte Diffusionsbasierte Baumerstellungsmethode für die hierarchische Darstellung und Linienanalyse 用于等级代表性和直线分析的等级定量传播植树生成方法 2506.23287v1

Authors (7): Zelin Zang, WenZhe Li, Fei Chen, Yongjie Xu, Chang Yu, Zhen Lei, Stan Z. Li

In single-cell research, tracing and analyzing high-throughput single-cell differentiation trajectories is crucial for understanding complex biological processes. Key to this is the modeling and generation of hierarchical data that represents the intrinsic structure within datasets. Traditional methods face limitations in terms of computational cost, performance, generative capacity, and stability. Recent VAEs based approaches have made strides in addressing these challenges but still require specialized network modules for each tree branch, limiting their stability and ability to capture deep hierarchical relationships. To overcome these challenges, we introduce diffusion-based approach called HDTree. HDTree captures tree relationships within a hierarchical latent space using a unified hierarchical codebook and quantized diffusion processes to model tree node transitions. This method improves stability by eliminating branch-specific modules and enhancing generative capacity through gradual hierarchical changes simulated by the diffusion process. HDTree’s effectiveness is demonstrated through comparisons on both general-purpose and single-cell datasets, where it outperforms existing methods in terms of accuracy and performance. These contributions provide a new tool for hierarchical lineage analysis, enabling more accurate and efficient modeling of cellular differentiation paths and offering insights for downstream biological tasks. The code of HDTree is available at anonymous link https://anonymous.4open.science/r/code_HDTree_review-A8DB.

nan


Article 836

Title@2025-06-29 (7): Not All Explanations for Deep Learning Phenomena Are Equally Valuable

Title: Not All Explanations for Deep Learning Phenomena Are Equally Valuable Nicht alle Erklärungen für Deep Learning Phänomene sind gleichermaßen wertvoll 并非所有关于深深学习现象的解释都具有同等价值。 2506.23286v1

Authors (2): Alan Jeffares, Mihaela van der Schaar

Developing a better understanding of surprising or counterintuitive phenomena has constituted a significant portion of deep learning research in recent years. These include double descent, grokking, and the lottery ticket hypothesis – among many others. Works in this area often develop ad hoc hypotheses attempting to explain these observed phenomena on an isolated, case-by-case basis. This position paper asserts that, in many prominent cases, there is little evidence to suggest that these phenomena appear in real-world applications and these efforts may be inefficient in driving progress in the broader field. Consequently, we argue against viewing them as isolated puzzles that require bespoke resolutions or explanations. However, despite this, we suggest that deep learning phenomena do still offer research value by providing unique settings in which we can refine our broad explanatory theories of more general deep learning principles. This position is reinforced by analyzing the research outcomes of several prominent examples of these phenomena from the recent literature. We revisit the current norms in the research community in approaching these problems and propose practical recommendations for future research, aiming to ensure that progress on deep learning phenomena is well aligned with the ultimate pragmatic goal of progress in the broader field of deep learning.

nan


Article 837

Title@2025-06-29 (7): BAPE: Learning an Explicit Bayes Classifier for Long-tailed Visual Recognition

Title: BAPE: Learning an Explicit Bayes Classifier for Long-tailed Visual Recognition BAPE: Erlernen eines expliziten Bayes-Klassifikators für die visuelle Langzeiterkennung BAPE: 学习长尾目视辨识明确贝类分类 2506.23280v1

Authors (4): Chaoqun Du, Yulin Wang, Shiji Song, Gao Huang

Bayesian decision theory advocates the Bayes classifier as the optimal approach for minimizing the risk in machine learning problems. Current deep learning algorithms usually solve for the optimal classifier by \emph{implicitly} estimating the posterior probabilities, \emph{e.g.}, by minimizing the Softmax cross-entropy loss. This simple methodology has been proven effective for meticulously balanced academic benchmark datasets. However, it is not applicable to the long-tailed data distributions in the real world, where it leads to the gradient imbalance issue and fails to ensure the Bayes optimal decision rule. To address these challenges, this paper presents a novel approach (BAPE) that provides a more precise theoretical estimation of the data distributions by \emph{explicitly} modeling the parameters of the posterior probabilities and solving them with point estimation. Consequently, our method directly learns the Bayes classifier without gradient descent based on Bayes’ theorem, simultaneously alleviating the gradient imbalance and ensuring the Bayes optimal decision rule. Furthermore, we propose a straightforward yet effective \emph{distribution adjustment} technique. This method enables the Bayes classifier trained from the long-tailed training set to effectively adapt to the test data distribution with an arbitrary imbalance factor, thereby enhancing performance without incurring additional computational costs. In addition, we demonstrate the gains of our method are orthogonal to existing learning approaches for long-tailed scenarios, as they are mostly designed under the principle of \emph{implicitly} estimating the posterior probabilities. Extensive empirical evaluations on CIFAR-10-LT, CIFAR-100-LT, ImageNet-LT, and iNaturalist demonstrate that our method significantly improves the generalization performance of popular deep networks, despite its simplicity.

nan


Article 838

Title@2025-06-29 (7): FLOAT: Generative Motion Latent Flow Matching for Audio-driven Talking Portrait

Title: FLOAT: Generative Motion Latent Flow Matching for Audio-driven Talking Portrait FLOAT: Generative Bewegung Latent Flow passend für Audio-getriebenes Gesprächsporträt FLOAT: 音频驱动的谈话肖像的生成动力前端流动比对 2412.01064v3

Authors (3): Taekyung Ki, Dongchan Min, Gyeongsu Chae

With the rapid advancement of diffusion-based generative models, portrait image animation has achieved remarkable results. However, it still faces challenges in temporally consistent video generation and fast sampling due to its iterative sampling nature. This paper presents FLOAT, an audio-driven talking portrait video generation method based on flow matching generative model. Instead of a pixel-based latent space, we take advantage of a learned orthogonal motion latent space, enabling efficient generation and editing of temporally consistent motion. To achieve this, we introduce a transformer-based vector field predictor with an effective frame-wise conditioning mechanism. Additionally, our method supports speech-driven emotion enhancement, enabling a natural incorporation of expressive motions. Extensive experiments demonstrate that our method outperforms state-of-the-art audio-driven talking portrait methods in terms of visual quality, motion fidelity, and efficiency.

nan


Article 839

Title@2025-06-29 (7): Predicting thinking time in Reasoning models

Title: Predicting thinking time in Reasoning models Denkzeit in vernünftigen Modellen voraussagen 理性模型的预测思考时间 2506.23274v1

Authors (3): Hans Peter Lynsgøe Raaschou-jensen, Constanza Fierro, Anders Søgaard

Reasoning models that produce long, hidden chains of thought have emerged as powerful tools for complex, reasoning-intensive tasks\citep{deepseekai2025deepseekr1incentivizingreasoningcapability, openai2024openaio1card}. However, this paradigm introduces a new user experience challenge: users have little insight into how much time the model will spend reasoning before returning an answer. This unpredictability, can lead to user frustration and is likely to compound as LLMs can produce increasingly long tasks asynchronously \citep{kwa2025measuringaiabilitycomplete}. In this paper, we introduce and evaluate methods for both online and offline prediction of model “thinking time,” aiming to develop a practical “progress bar for reasoning.” We discuss the implications for user interaction and future research directions.

nan


Article 840

Title@2025-06-29 (7): Token Activation Map to Visually Explain Multimodal LLMs

Title: Token Activation Map to Visually Explain Multimodal LLMs Token Aktivierungskarte, um multimodale LLMs visuell zu erklären 直观解释多模式LMS的声控激活映射图 2506.23270v1

Authors (5): Yi Li, Hualiang Wang, Xinpeng Ding, Haonan Wang, Xiaomeng Li

Multimodal large language models (MLLMs) are broadly empowering various fields. Despite their advancements, the explainability of MLLMs remains less explored, hindering deeper understanding, model credibility, and effective visualization. Unlike conventional vision models (e.g., CNNs, ViTs, CLIP) that produce a single output, MLLMs generate sequences of tokens progressively, where each generated token depends on the previous context. Therefore, earlier context tokens can introduce redundant activations that interfere with the explanation of later tokens beyond their original information. Existing studies often overlook this issue, but our observations reveal that these redundant correlations can significantly hurt the reliability of explanations. To address this, we propose an estimated causal inference method to mitigate the interference of context to achieve high-quality MLLM explanation, with a novel rank Gaussian filter to further reduce activation noises. We term this method Token Activation Map (TAM) to highlight the consideration of interactions between tokens. TAM also indicates that it excels at explaining multiple tokens of MLLM, which is different from the Class Activation Map (CAM) for a single prediction. Our TAM method significantly outperforms existing SoTA methods, showcasing high-quality visualization results that can be utilized for various scenarios, such as object localization, failure case analysis, video visualization, MLLMs visual comparison, and model understanding (e.g., color, shape, action, location, visual reasoning, multi-turn conversation, etc). The code is available atgithub.com/xmed-lab/TAM.

nan


Article 841

Title@2025-06-29 (7): Sub-MoE: Efficient Mixture-of-Expert LLMs Compression via Subspace Expert Merging

Title: Sub-MoE: Efficient Mixture-of-Expert LLMs Compression via Subspace Expert Merging Sub-MoE: Effiziente Mixture-of-Expert LLMs-Kompression über Subspace Expert Merging 子MoE:通过子空间专家合并压缩的高效混合专家LLMs 2506.23266v1

Authors (7): Lujun Li, Zhu Qiyuan, Jiacheng Wang, Wei Li, Hao Gu, Sirui Han, Yike Guo

Mixture of Experts (MoE) LLMs face significant obstacles due to their massive parameter scale, which imposes memory, storage, and deployment challenges. Although recent expert merging methods promise greater efficiency by consolidating multiple experts, they are fundamentally hindered by parameter conflicts arising from expert specialization. In this paper, we present Sub-MoE, a novel MoE compression framework via Subspace Expert Merging. Our key insight is to perform joint Singular Value Decomposition (SVD) on concatenated expert weights, reducing conflicting parameters by extracting shared $U$-matrices while enabling effective merging of the expert-specific $V$ components. Specifically, Sub-MoE consists of two innovative phases: (1) Adaptive Expert Clustering, which groups functionally coherent experts via K-means clustering based on cosine similarity of expert outputs; and (2) Subspace Expert Merging, which first enforces Experts Union Decomposition to derive the shared $U$-matrix across experts in the same group, then pursues frequency-based merging for individual $V$-matrices, and finalizes expert reconstruction using the merged $V$-matrix. In this way, we align and fuse experts in a shared subspace, and can be extended with intra-expert compression for further inference optimization. Extensive experiments on Mixtral, DeepSeek, and Qwen-1.5 3 MoE LLMs demonstrate that our Sub-MoE significantly outperforms existing expert pruning and merging methods. Notably, our Sub-MoE maintains 96\% 86\% of original performance with 25\% 50\% expert reduction on Mixtral-8x7B in zero-shot benchmarks. Code will be released at https://github.com/lliai/MoERazor.

nan


Article 842

Title@2025-06-29 (7): Drivetrain simulation using variational autoencoders

Title: Drivetrain simulation using variational autoencoders Drivetrain-Simulation mit variablen Autoencodern 使用变式自动自动转换器进行驱动模拟模拟 2501.17653v2

Authors (7): Pallavi Sharma, Jorge-Humberto Urrea-Quintero, Bogdan Bogdan, Adrian-Dumitru Ciotec, Laura Vasilie, Henning Wessels, Matteo Skull

This work proposes variational autoencoders (VAEs) to predict a vehicle’s jerk signals from torque demand in the context of limited real-world drivetrain datasets. We implement both unconditional and conditional VAEs, trained on experimental data from two variants of a fully electric SUV with differing torque and drivetrain configurations. The VAEs synthesize jerk signals that capture characteristics from multiple drivetrain scenarios by leveraging the learned latent space. A performance comparison with baseline physics-based and hybrid models confirms the effectiveness of the VAEs, without requiring detailed system parametrization. Unconditional VAEs generate realistic jerk signals without prior system knowledge, while conditional VAEs enable the generation of signals tailored to specific torque inputs. This approach reduces the dependence on costly and time-intensive real-world experiments and extensive manual modeling. The results support the integration of generative models such as VAEs into drivetrain simulation pipelines, both for data augmentation and for efficient exploration of complex operational scenarios, with the potential to streamline validation and accelerate vehicle development.

nan


Article 843

Title@2025-06-29 (7): Online Meal Detection Based on CGM Data Dynamics

Title: Online Meal Detection Based on CGM Data Dynamics Online-Meal Detection basierend auf CGM Data Dynamics 基于CGM数据动态的在线膳食检测 2507.00080v1

Authors (2): Ali Tavasoli, Heman Shakeri

We utilize dynamical modes as features derived from Continuous Glucose Monitoring (CGM) data to detect meal events. By leveraging the inherent properties of underlying dynamics, these modes capture key aspects of glucose variability, enabling the identification of patterns and anomalies associated with meal consumption. This approach not only improves the accuracy of meal detection but also enhances the interpretability of the underlying glucose dynamics. By focusing on dynamical features, our method provides a robust framework for feature extraction, facilitating generalization across diverse datasets and ensuring reliable performance in real-world applications. The proposed technique offers significant advantages over traditional approaches, improving detection accuracy,

nan


Article 844

Title@2025-06-29 (7): Deep Multi-Manifold Transformation Based Multivariate Time Series Fault Detection

Title: Deep Multi-Manifold Transformation Based Multivariate Time Series Fault Detection Tiefe Multi-Manifold-Transformations-basierte Multivariate-Zeitreihen-Fehlererkennung 深多功能变换基于多变量的时间序列错失检测 2405.16258v2

Authors (6): Hong Liu, Xiuxiu Qiu, Yiming Shi, Miao Xu, Zelin Zang, Zhen Lei

Unsupervised fault detection in multivariate time series plays a vital role in ensuring the stable operation of complex systems. Traditional methods often assume that normal data follow a single Gaussian distribution and identify anomalies as deviations from this distribution. {\color{black} However, this simplified assumption fails to capture the diversity and structural complexity of real-world time series, which can lead to misjudgments and reduced detection performance in practical applications. To address this issue, we propose a new method that combines a neighborhood-driven data augmentation strategy with a multi-manifold representation learning framework.} By incorporating information from local neighborhoods, the augmentation module can simulate contextual variations of normal data, enhancing the model’s adaptability to distributional changes. In addition, we design a structure-aware feature learning approach that encourages natural clustering of similar patterns in the feature space while maintaining sufficient distinction between different operational states. Extensive experiments on several public benchmark datasets demonstrate that our method achieves superior performance in terms of both accuracy and robustness, showing strong potential for generalization and real-world deployment.

nan


Article 845

Title@2025-06-29 (7): VoyagerVision: Investigating the Role of Multi-modal Information for Open-ended Learning Systems

Title: VoyagerVision: Investigating the Role of Multi-modal Information for Open-ended Learning Systems VoyagerVision: Untersuchung der Rolle multimodaler Informationen für offene Lernsysteme 旅行展望:调查多模式信息对不限名额学习系统的作用 2507.00079v1

Authors (2): Ethan Smyth, Alessandro Suglia

Open-endedness is an active field of research in the pursuit of capable Artificial General Intelligence (AGI), allowing models to pursue tasks of their own choosing. Simultaneously, recent advancements in Large Language Models (LLMs) such as GPT-4o [9] have allowed such models to be capable of interpreting image inputs. Implementations such as OMNI-EPIC [4] have made use of such features, providing an LLM with pixel data of an agent’s POV to parse the environment and allow it to solve tasks. This paper proposes that providing these visual inputs to a model gives it greater ability to interpret spatial environments, and as such, can increase the number of tasks it can successfully perform, extending its open-ended potential. To this aim, this paper proposes VoyagerVision – a multi-modal model capable of creating structures within Minecraft using screenshots as a form of visual feedback, building on the foundation of Voyager. VoyagerVision was capable of creating an average of 2.75 unique structures within fifty iterations of the system, as Voyager was incapable of this, it is an extension in an entirely new direction. Additionally, in a set of building unit tests VoyagerVision was successful in half of all attempts in flat worlds, with most failures arising in more complex structures. Project website is available at https://esmyth-dev.github.io/VoyagerVision.github.io/

nan


Article 846

Title@2025-06-29 (7): State Entropy Regularization for Robust Reinforcement Learning

Title: State Entropy Regularization for Robust Reinforcement Learning Staatliche Entropie-Regularisierung für robustes Stärkungslernen 强力强化学习的状态元件常规化 2506.07085v2

Authors (6): Yonatan Ashlag, Uri Koren, Mirco Mutti, Esther Derman, Pierre-Luc Bacon, Shie Mannor

State entropy regularization has empirically shown better exploration and sample complexity in reinforcement learning (RL). However, its theoretical guarantees have not been studied. In this paper, we show that state entropy regularization improves robustness to structured and spatially correlated perturbations. These types of variation are common in transfer learning but often overlooked by standard robust RL methods, which typically focus on small, uncorrelated changes. We provide a comprehensive characterization of these robustness properties, including formal guarantees under reward and transition uncertainty, as well as settings where the method performs poorly. Much of our analysis contrasts state entropy with the widely used policy entropy regularization, highlighting their different benefits. Finally, from a practical standpoint, we illustrate that compared with policy entropy, the robustness advantages of state entropy are more sensitive to the number of rollouts used for policy evaluation.

nan


Article 847

Title@2025-06-29 (7): Aggregating Local Saliency Maps for Semi-Global Explainable Image Classification

Title: Aggregating Local Saliency Maps for Semi-Global Explainable Image Classification Aggregation lokaler Salienzkarten für semi-globale erklärbare Bildklassifikation 为半全球可解释图像分类汇总地方悬浮图 2506.23247v1

Authors (2): James Hinns, David Martens

Deep learning dominates image classification tasks, yet understanding how models arrive at predictions remains a challenge. Much research focuses on local explanations of individual predictions, such as saliency maps, which visualise the influence of specific pixels on a model’s prediction. However, reviewing many of these explanations to identify recurring patterns is infeasible, while global methods often oversimplify and miss important local behaviours. To address this, we propose Segment Attribution Tables (SATs), a method for summarising local saliency explanations into (semi-)global insights. SATs take image segments (such as “eyes” in Chihuahuas) and leverage saliency maps to quantify their influence. These segments highlight concepts the model relies on across instances and reveal spurious correlations, such as reliance on backgrounds or watermarks, even when out-of-distribution test performance sees little change. SATs can explain any classifier for which a form of saliency map can be produced, using segmentation maps that provide named segments. SATs bridge the gap between oversimplified global summaries and overly detailed local explanations, offering a practical tool for analysing and debugging image classifiers.

nan


Article 848

Title@2025-06-29 (7): The language of time: a language model perspective on time-series foundation models

Title: The language of time: a language model perspective on time-series foundation models Die Sprache der Zeit: ein Sprachmodell Perspektive auf Zeitreihen Grundmodelle 时间语言:时间序列基础模型的语言模式视角 2507.00078v1

Authors (5): Yi Xie, Yun Xiong, Zejian Shi, Hao Niu, Zhengfu Liu

With the rise of large language models, the paradigm of training foundation models with massive parameter counts on vast datasets has been adopted in multiple domains to achieve remarkable success. Time series foundation models represent a significant extension of this paradigm, demonstrating exceptional expressive power, generalization, and cross-domain transferability. However, this gives rise to a fundamental paradox: time series data reflect distinct dynamical systems, making cross-domain transfer intuitively implausible, yet this is contradicted by the models’ empirical success. To resolve this paradox, this paper investigates, from both theoretical and experimental perspectives, the representation learning mechanisms and generalization capabilities of patch-based time series foundation models. We argue that such models are not merely applying a new architecture but are fundamentally generalizing the representation paradigm of language models by extending deterministic vector-based representations to latent probabilistic distributional forms. Our theoretical analysis supports this framework by demonstrating that continuous time-series patches can be faithfully quantized into a discrete vocabulary whose key statistical properties are highly consistent with those of natural language. This generalization allows time series models to inherit the robust representation and transfer abilities of large language models, thereby explaining their superior performance in temporal tasks. Ultimately, our work provides a rigorous theoretical cornerstone for understanding, evaluating, and improving the safety and reliability of large-scale time series foundation models.

nan


Article 849

Title@2025-06-29 (7): Automating Versatile Time-Series Analysis with Tiny Transformers on Embedded FPGAs

Title: Automating Versatile Time-Series Analysis with Tiny Transformers on Embedded FPGAs Automatisierung der vielseitigen Zeitreihenanalyse mit kleinen Transformern auf eingebetteten FPGAs 在嵌入的 FPGAs 上用小变形器进行自动的V对活性时间序列分析 2505.17662v4

Authors (4): Tianheng Ling, Chao Qian, Lukas Johannes Haßler, Gregor Schiele

Transformer-based models have shown strong performance across diverse time-series tasks, but their deployment on resource-constrained devices remains challenging due to high memory and computational demand. While prior work targeting Microcontroller Units (MCUs) has explored hardware-specific optimizations, such approaches are often task-specific and limited to 8-bit fixed-point precision. Field-Programmable Gate Arrays (FPGAs) offer greater flexibility, enabling fine-grained control over data precision and architecture. However, existing FPGA-based deployments of Transformers for time-series analysis typically focus on high-density platforms with manual configuration. This paper presents a unified and fully automated deployment framework for Tiny Transformers on embedded FPGAs. Our framework supports a compact encoder-only Transformer architecture across three representative time-series tasks (forecasting, classification, and anomaly detection). It combines quantization-aware training (down to 4 bits), hardware-aware hyperparameter search using Optuna, and automatic VHDL generation for seamless deployment. We evaluate our framework on six public datasets across two embedded FPGA platforms. Results show that our framework produces integer-only, task-specific Transformer accelerators achieving as low as 0.033 mJ per inference with millisecond latency on AMD Spartan-7, while also providing insights into deployment feasibility on Lattice iCE40. All source code will be released in the GitHub repository (https://github.com/Edwina1030/TinyTransformer4TS).

nan


Article 850

Title@2025-06-29 (7): MOE-Enhanced Explanable Deep Manifold Transformation for Complex Data Embedding and Visualization

Title: MOE-Enhanced Explanable Deep Manifold Transformation for Complex Data Embedding and Visualization MOE-Enhanced Explanable Deep Manifold Transformation für komplexe Daten-Einbettung und Visualisierung MOE - 强化了复杂数据嵌入和可视化的可推广深磁变换 2410.19504v2

Authors (7): Zelin Zang, Yuhao Wang, Jinlin Wu, Hong Liu, Yue Shen, Zhen Lei, Stan. Z Li

Dimensionality reduction (DR) plays a crucial role in various fields, including data engineering and visualization, by simplifying complex datasets while retaining essential information. However, achieving both high DR accuracy and strong explainability remains a fundamental challenge, especially for users dealing with high-dimensional data. Traditional DR methods often face a trade-off between precision and transparency, where optimizing for performance can lead to reduced explainability, and vice versa. This limitation is especially prominent in real-world applications such as image, tabular, and text data analysis, where both accuracy and explainability are critical. To address these challenges, this work introduces the MOE-based Explainable Deep Manifold Transformation (DMT-ME). The proposed approach combines hyperbolic embeddings, which effectively capture complex hierarchical structures, with Mixture of Experts (MOE) models, which dynamically allocate tasks based on input features. DMT-ME enhances DR accuracy by leveraging hyperbolic embeddings to represent the hierarchical nature of data, while also improving explainability by explicitly linking input data, embedding outcomes, and key features through the MOE structure. Extensive experiments demonstrate that DMT-ME consistently achieves superior performance in both DR accuracy and model explainability, making it a robust solution for complex data analysis. The code is available at https://github.com/zangzelin/code_dmtme

nan


Article 851

Title@2025-06-29 (7): Global Convergence of Iteratively Reweighted Least Squares for Robust Subspace Recovery

Title: Global Convergence of Iteratively Reweighted Least Squares for Robust Subspace Recovery Globale Konvergenz iterativ umgewichteter Least Squares für robuste Subraum-Recovery 自动再加权最低空间平面对强力亚空间恢复的全球趋同 2506.20533v2

Authors (4): Gilad Lerman, Kang Li, Tyler Maunu, Teng Zhang

Robust subspace estimation is fundamental to many machine learning and data analysis tasks. Iteratively Reweighted Least Squares (IRLS) is an elegant and empirically effective approach to this problem, yet its theoretical properties remain poorly understood. This paper establishes that, under deterministic conditions, a variant of IRLS with dynamic smoothing regularization converges linearly to the underlying subspace from any initialization. We extend these guarantees to affine subspace estimation, a setting that lacks prior recovery theory. Additionally, we illustrate the practical benefits of IRLS through an application to low-dimensional neural network training. Our results provide the first global convergence guarantees for IRLS in robust subspace recovery and, more broadly, for nonconvex IRLS on a Riemannian manifold.

nan


Article 852

Title@2025-06-29 (7): Bayes correlated equilibria, no-regret dynamics in Bayesian games, and the price of anarchy

Title: Bayes correlated equilibria, no-regret dynamics in Bayesian games, and the price of anarchy Bayes korrelierte Äquilibrien, No-Regret-Dynamik in Bayesischen Spielen, und der Preis der Anarchie Bayes相关平衡、Bayesian游戏中的无区域动态以及无政府状态的代价 2304.05005v2

Authors (1): Kaito Fujii

This paper investigates equilibrium computation and the price of anarchy for Bayesian games, which are the fundamental models of games with incomplete information. In normal-form games with complete information, it is known that efficiently computable no-regret dynamics converge to correlated equilibria, and the price of anarchy for correlated equilibria can be bounded for a broad class of games called smooth games. However, in Bayesian games, as surveyed by Forges (1993), several non-equivalent extensions of correlated equilibria exist, and it remains unclear whether they can be efficiently computed or whether their price of anarchy can be bounded. In this paper, we identify a natural extension of correlated equilibria that can be computed efficiently and is guaranteed to have bounds on the price of anarchy in various games. First, we propose a variant of regret called untruthful swap regret. If each player minimizes it in repeated play of Bayesian games, the empirical distribution of these dynamics is guaranteed to converge to communication equilibria, which is one of the extensions of correlated equilibria proposed by Myerson (1982). We present an efficient algorithm for minimizing untruthful swap regret with a sublinear upper bound, which we prove to be tight in terms of the number of types. As a result, by simulating the dynamics with our algorithm, we can approximately compute a communication equilibrium in polynomial time. Furthermore, we extend existing lower bounds on the price of anarchy based on the smoothness arguments from Bayes–Nash equilibria to equilibria obtained by the proposed dynamics.

nan


Article 853

Title@2025-06-29 (7): Masked Gated Linear Unit

Title: Masked Gated Linear Unit Maskierte gezahnte Lineareinheit 面罩线条股 2506.23225v1

Authors (5): Yukito Tajima, Nakamasa Inoue, Yusuke Sekikawa, Ikuro Sato, Rio Yokota

Gated Linear Units (GLUs) have become essential components in the feed-forward networks of state-of-the-art Large Language Models (LLMs). However, they require twice as many memory reads compared to feed-forward layers without gating, due to the use of separate weight matrices for the gate and value streams. To address this bottleneck, we introduce Masked Gated Linear Units (MGLUs), a novel family of GLUs with an efficient kernel implementation. The core contribution of MGLUs include: (1) the Mixture of Element-wise Gating (MoEG) architecture that learns multiple binary masks, each determining gate or value assignments at the element level on a single shared weight matrix resulting in reduced memory transfer, and (2) FlashMGLU, a hardware-friendly kernel that yields up to a 19.7 $\times$ inference-time speed-up over a naive PyTorch MGLU and is 47% more memory-efficient and 34% faster than standard GLUs despite added architectural complexity on an RTX5090 GPU. In LLM experiments, the Swish-activated variant SwiMGLU preserves its memory advantages while matching - or even surpassing - the downstream accuracy of the SwiGLU baseline.

nan


Article 854

Title@2025-06-29 (7): Single Image Inpainting and Super-Resolution with Simultaneous Uncertainty Guarantees by Universal Reproducing Kernels

Title: Single Image Inpainting and Super-Resolution with Simultaneous Uncertainty Guarantees by Universal Reproducing Kernels Single Image Inpainting und Super-Resolution mit gleichzeitigen Unsicherheitsgarantien von Universal Reproducing Kernels 由环球再生产内核提供的单一图像油漆和超分辨率,同时具有不确定性的保证 2506.23221v1

Authors (2): Bálint Horváth, Balázs Csanád Csáji

The paper proposes a statistical learning approach to the problem of estimating missing pixels of images, crucial for image inpainting and super-resolution problems. One of the main novelties of the method is that it also provides uncertainty quantifications together with the estimated values. Our core assumption is that the underlying data-generating function comes from a Reproducing Kernel Hilbert Space (RKHS). A special emphasis is put on band-limited functions, central to signal processing, which form Paley-Wiener type RKHSs. The proposed method, which we call Simultaneously Guaranteed Kernel Interpolation (SGKI), is an extension and refinement of a recently developed kernel method. An advantage of SGKI is that it not only estimates the missing pixels, but also builds non-asymptotic confidence bands for the unobserved values, which are simultaneously guaranteed for all missing pixels. We also show how to compute these bands efficiently using Schur complements, we discuss a generalization to vector-valued functions, and we present a series of numerical experiments on various datasets containing synthetically generated and benchmark images, as well.

nan


Article 855

Title@2025-06-29 (7): Disrupting Model Merging: A Parameter-Level Defense Without Sacrificing Accuracy

Title: Disrupting Model Merging: A Parameter-Level Defense Without Sacrificing Accuracy Disrupting Model Merging: Ein Parameter-Level-Verteidigung ohne Opferung Genauigkeit 破坏性合并模式:不牺牲准确度的参数级防御 2503.07661v2

Authors (3): Wei Junhao, Yu Zhe, Sakuma Jun

Model merging is a technique that combines multiple finetuned models into a single model without additional training, allowing a free-rider to cheaply inherit specialized capabilities. This study investigates methodologies to suppress unwanted model merging by free-riders. Existing methods such as model watermarking or fingerprinting can only detect merging in hindsight. In contrast, we propose a first proactive defense against model merging. Specifically, our defense method modifies the model parameters so that the model is disrupted if the model is merged with any other model, while its functionality is kept unchanged if not merged with others. Our approach consists of two modules, rearranging MLP parameters and scaling attention heads, which push the model out of the shared basin in parameter space, causing the merging performance with other models to degrade significantly. We conduct extensive experiments on image classification, image generation, and text classification to demonstrate that our defense severely disrupts merging while retaining the functionality of the post-protect model. Moreover, we analyze potential adaptive attacks and further propose a dropout-based pruning to improve our proposal’s robustness.

nan


Article 856

Title@2025-06-29 (7): FedRef: Communication-Efficient Bayesian Fine Tuning with Reference Model

Title: FedRef: Communication-Efficient Bayesian Fine Tuning with Reference Model FedRef: Kommunikation-Effizient Bayesian Feinabstimmung mit Referenzmodell FedRef: 通信-节能贝ysian精密票,参考模型 2506.23210v1

Authors (2): Taehwan Yoon, Bongjun Choi

Federated learning(FL) is used for distributed scenarios to train artificial intelligence(AI) models while ensuring users’ privacy. In federated learning scenario, the server generally never knows about users’ data. This type of concept makes the AI training process efficient in terms of data privacy. However, regarding model performance, federated AI models may not sufficiently satisfy AI users’ expectations. Furthermore, AI users have a wide range of different needs. It is not easy to satisfy the whole users needs. These types of issues can be addressed through AI model optimization, fine-tuning, or personalization to achieve optimal model performance. To address model optimization challenges, we propose reference model-based federated learning for optimal fine-tuning, which overcomes catastrophic forgetting in each round. This method is derived from Bayesian parameter-efficient transfer learning, which includes an optimal proximal term and enables overcoming the catastrophic forgetting issue in each round by utilizing a reference model that incorporates previous model parameters. As a result, this method achieves both high model performance and low computing cost.

nan


Article 857

Title@2025-06-29 (7): External Data-Enhanced Meta-Representation for Adaptive Probabilistic Load Forecasting

Title: External Data-Enhanced Meta-Representation for Adaptive Probabilistic Load Forecasting Externe datengestützte Meta-Vertretung für adaptive probabilistische Lastprognosen 适应性概率负载预测的外部数据强化元代表 2506.23201v1

Authors (5): Haoran Li, Muhao Guo, Marija Ilic, Yang Weng, Guangchun Ruan

Accurate residential load forecasting is critical for power system reliability with rising renewable integration and demand-side flexibility. However, most statistical and machine learning models treat external factors, such as weather, calendar effects, and pricing, as extra input, ignoring their heterogeneity, and thus limiting the extraction of useful external information. We propose a paradigm shift: external data should serve as meta-knowledge to dynamically adapt the forecasting model itself. Based on this idea, we design a meta-representation framework using hypernetworks that modulate selected parameters of a base Deep Learning (DL) model in response to external conditions. This provides both expressivity and adaptability. We further integrate a Mixture-of-Experts (MoE) mechanism to enhance efficiency through selective expert activation, while improving robustness by filtering redundant external inputs. The resulting model, dubbed as a Meta Mixture of Experts for External data (M2oE2), achieves substantial improvements in accuracy and robustness with limited additional overhead, outperforming existing state-of-the-art methods in diverse load datasets. The dataset and source code are publicly available at https://github.com/haorandd/M2oE2_load_forecast.git.

nan


Article 858

Title@2025-06-29 (7): Pretrained Reversible Generation as Unsupervised Visual Representation Learning

Title: Pretrained Reversible Generation as Unsupervised Visual Representation Learning Pretrained Reversible Generation als unüberwachtes visuelles Repräsentationslernen 作为无人监督的视觉代表学习 2412.01787v4

Authors (7): Rongkun Xue, Jinouwen Zhang, Yazhe Niu, Dazhong Shen, Bingqi Ma, Yu Liu, Jing Yang

Recent generative models based on score matching and flow matching have significantly advanced generation tasks, but their potential in discriminative tasks remains underexplored. Previous approaches, such as generative classifiers, have not fully leveraged the capabilities of these models for discriminative tasks due to their intricate designs. We propose Pretrained Reversible Generation (PRG), which extracts unsupervised representations by reversing the generative process of a pretrained continuous generation model. PRG effectively reuses unsupervised generative models, leveraging their high capacity to serve as robust and generalizable feature extractors for downstream tasks. This framework enables the flexible selection of feature hierarchies tailored to specific downstream tasks. Our method consistently outperforms prior approaches across multiple benchmarks, achieving state-of-the-art performance among generative model based methods, including 78% top-1 accuracy on ImageNet at a resolution of 64*64. Extensive ablation studies, including out-of-distribution evaluations, further validate the effectiveness of our approach.PRG is available at https://github.com/opendilab/PRG.

nan


Article 859

Title@2025-06-29 (7): RiverText: A Python Library for Training and Evaluating Incremental Word Embeddings from Text Data Streams

Title: RiverText: A Python Library for Training and Evaluating Incremental Word Embeddings from Text Data Streams RiverText: Eine Python-Bibliothek für das Training und Evaluieren inkrementaler Word-Einbettungen aus Textdatenströmen RiverText:一个培训和评价来自文本数据流的递增单词嵌入的Python图书馆 2506.23192v1

Authors (2): Gabriel Iturra-Bocaz, Felipe Bravo-Marquez

Word embeddings have become essential components in various information retrieval and natural language processing tasks, such as ranking, document classification, and question answering. However, despite their widespread use, traditional word embedding models present a limitation in their static nature, which hampers their ability to adapt to the constantly evolving language patterns that emerge in sources such as social media and the web (e.g., new hashtags or brand names). To overcome this problem, incremental word embedding algorithms are introduced, capable of dynamically updating word representations in response to new language patterns and processing continuous data streams. This paper presents RiverText, a Python library for training and evaluating incremental word embeddings from text data streams. Our tool is a resource for the information retrieval and natural language processing communities that work with word embeddings in streaming scenarios, such as analyzing social media. The library implements different incremental word embedding techniques, such as Skip-gram, Continuous Bag of Words, and Word Context Matrix, in a standardized framework. In addition, it uses PyTorch as its backend for neural network training. We have implemented a module that adapts existing intrinsic static word embedding evaluation tasks for word similarity and word categorization to a streaming setting. Finally, we compare the implemented methods with different hyperparameter settings and discuss the results. Our open-source library is available at https://github.com/dccuchile/rivertext.

nan


Article 860

Title@2025-06-29 (7): Efficient Algorithms for Learning and Compressing Monophonic Halfspaces in Graphs

Title: Efficient Algorithms for Learning and Compressing Monophonic Halfspaces in Graphs Effiziente Algorithmen zum Lernen und Verdichten monophoner Halbräume in Graphen 用于学习和压缩图形中单声半空的高效算法 2506.23186v1

Authors (4): Marco Bressan, Victor Chepoi, Emmanuel Esposito, Maximilian Thiessen

Abstract notions of convexity over the vertices of a graph, and corresponding notions of halfspaces, have recently gained attention from the machine learning community. In this work we study monophonic halfspaces, a notion of graph halfspaces defined through closure under induced paths. Our main result is a $2$-satisfiability based decomposition theorem, which allows one to represent monophonic halfspaces as a disjoint union of certain vertex subsets. Using this decomposition, we achieve efficient and (nearly) optimal algorithms for various learning problems, such as teaching, active, and online learning. Most notably, we obtain a polynomial-time algorithm for empirical risk minimization. Independently of the decomposition theorem, we obtain an efficient, stable, and proper sample compression scheme. This makes monophonic halfspaces efficiently learnable with proper learners and linear error rate $1/\varepsilon$ in the realizable PAC setting. Our results answer open questions from the literature, and show a stark contrast with geodesic halfspaces, for which most of the said learning problems are NP-hard.

nan


Article 861

Title@2025-06-29 (7): Attribution assignment for deep-generative sequence models enables interpretability analysis using positive-only data

Title: Attribution assignment for deep-generative sequence models enables interpretability analysis using positive-only data Zuordnungszuweisung für tiefgenerative Sequenzmodelle ermöglicht eine Interpretationsanalyse mit Positiv-Only-Daten 深遗传序列模型的归属分配,使得能够使用只使用正数数据的可解释性分析 2506.23182v1

Authors (7): Robert Frank, Michael Widrich, Rahmad Akbar, Günter Klambauer, Geir Kjetil Sandve, Philippe A. Robert, Victor Greiff

Generative machine learning models offer a powerful framework for therapeutic design by efficiently exploring large spaces of biological sequences enriched for desirable properties. Unlike supervised learning methods, which require both positive and negative labeled data, generative models such as LSTMs can be trained solely on positively labeled sequences, for example, high-affinity antibodies. This is particularly advantageous in biological settings where negative data are scarce, unreliable, or biologically ill-defined. However, the lack of attribution methods for generative models has hindered the ability to extract interpretable biological insights from such models. To address this gap, we developed Generative Attribution Metric Analysis (GAMA), an attribution method for autoregressive generative models based on Integrated Gradients. We assessed GAMA using synthetic datasets with known ground truths to characterize its statistical behavior and validate its ability to recover biologically relevant features. We further demonstrated the utility of GAMA by applying it to experimental antibody-antigen binding data. GAMA enables model interpretability and the validation of generative sequence design strategies without the need for negative training data.

nan


Article 862

Title@2025-06-29 (7): Automatic Depression Assessment using Machine Learning: A Comprehensive Survey

Title: Automatic Depression Assessment using Machine Learning: A Comprehensive Survey Automatische Depressionsbewertung mit maschinellem Lernen: Eine umfassende Umfrage 利用机器学习进行自动抑郁症评估:全面调查 2506.18915v2

Authors (7): Siyang Song, Yupeng Huo, Shiqing Tang, Jiaee Cheong, Rui Gao, Michel Valstar, Hatice Gunes

Depression is a common mental illness across current human society. Traditional depression assessment relying on inventories and interviews with psychologists frequently suffer from subjective diagnosis results, slow and expensive diagnosis process as well as lack of human resources. Since there is a solid evidence that depression is reflected by various human internal brain activities and external expressive behaviours, early traditional machine learning (ML) and advanced deep learning (DL) models have been widely explored for human behaviour-based automatic depression assessment (ADA) since 2012. However, recent ADA surveys typically only focus on a limited number of human behaviour modalities. Despite being used as a theoretical basis for developing ADA approaches, existing ADA surveys lack a comprehensive review and summary of multi-modal depression-related human behaviours. To bridge this gap, this paper specifically summarises depression-related human behaviours across a range of modalities (e.g. the human brain, verbal language and non-verbal audio/facial/body behaviours). We focus on conducting an up-to-date and comprehensive survey of ML-based ADA approaches for learning depression cues from these behaviours as well as discussing and comparing their distinctive features and limitations. In addition, we also review existing ADA competitions and datasets, identify and discuss the main challenges and opportunities to provide further research directions for future ADA researchers.

nan


Article 863

Title@2025-06-29 (7): Data Can Speak for Itself: Quality-guided Utilization of Wireless Synthetic Data

Title: Data Can Speak for Itself: Quality-guided Utilization of Wireless Synthetic Data Daten können für sich selbst sprechen: Qualitativ hochwertige Nutzung von drahtlosen synthetischen Daten 《无线合成数据的质量指南利用》。 2506.23174v1

Authors (4): Chen Gong, Bo Liang, Wei Gao, Chenren Xu

Generative models have gained significant attention for their ability to produce realistic synthetic data that supplements the quantity of real-world datasets. While recent studies show performance improvements in wireless sensing tasks by incorporating all synthetic data into training sets, the quality of synthetic data remains unpredictable and the resulting performance gains are not guaranteed. To address this gap, we propose tractable and generalizable metrics to quantify quality attributes of synthetic data - affinity and diversity. Our assessment reveals prevalent affinity limitation in current wireless synthetic data, leading to mislabeled data and degraded task performance. We attribute the quality limitation to generative models’ lack of awareness of untrained conditions and domain-specific processing. To mitigate these issues, we introduce SynCheck, a quality-guided synthetic data utilization scheme that refines synthetic data quality during task model training. Our evaluation demonstrates that SynCheck consistently outperforms quality-oblivious utilization of synthetic data, and achieves 4.3% performance improvement even when the previous utilization degrades performance by 13.4%.

nan


Article 864

Title@2025-06-29 (7): Deep Learning for Optical Misalignment Diagnostics in Multi-Lens Imaging Systems

Title: Deep Learning for Optical Misalignment Diagnostics in Multi-Lens Imaging Systems Deep Learning für optische Fehlausrichtungsdiagnostik in Multi-Lens-Imaging-Systemen 多层成像系统中光相错配诊断深入学习 2506.23173v1

Authors (5): Tomer Slor, Dean Oren, Shira Baneth, Tom Coen, Haim Suchowski

In the rapidly evolving field of optical engineering, precise alignment of multi-lens imaging systems is critical yet challenging, as even minor misalignments can significantly degrade performance. Traditional alignment methods rely on specialized equipment and are time-consuming processes, highlighting the need for automated and scalable solutions. We present two complementary deep learning-based inverse-design methods for diagnosing misalignments in multi-element lens systems using only optical measurements. First, we use ray-traced spot diagrams to predict five-degree-of-freedom (5-DOF) errors in a 6-lens photographic prime, achieving a mean absolute error of 0.031mm in lateral translation and 0.011$^\circ$ in tilt. We also introduce a physics-based simulation pipeline that utilizes grayscale synthetic camera images, enabling a deep learning model to estimate 4-DOF, decenter and tilt errors in both two- and six-lens multi-lens systems. These results show the potential to reshape manufacturing and quality control in precision imaging.

nan


Article 865

Title@2025-06-29 (7): Compositions of Variant Experts for Integrating Short-Term and Long-Term Preferences

Title: Compositions of Variant Experts for Integrating Short-Term and Long-Term Preferences Zusammensetzungen von Variant-Experten für die Integration kurzfristiger und langfristiger Präferenzen 纳入短期和长期优惠的备选专家的组成 2506.23170v1

Authors (3): Jaime Hieu Do, Trung-Hoang Le, Hady W. Lauw

In the online digital realm, recommendation systems are ubiquitous and play a crucial role in enhancing user experience. These systems leverage user preferences to provide personalized recommendations, thereby helping users navigate through the paradox of choice. This work focuses on personalized sequential recommendation, where the system considers not only a user’s immediate, evolving session context, but also their cumulative historical behavior to provide highly relevant and timely recommendations. Through an empirical study conducted on diverse real-world datasets, we have observed and quantified the existence and impact of both short-term (immediate and transient) and long-term (enduring and stable) preferences on users’ historical interactions. Building on these insights, we propose a framework that combines short- and long-term preferences to enhance recommendation performance, namely Compositions of Variant Experts (CoVE). This novel framework dynamically integrates short- and long-term preferences through the use of different specialized recommendation models (i.e., experts). Extensive experiments showcase the effectiveness of the proposed methods and ablation studies further investigate the impact of variant expert types.

nan


Article 866

Title@2025-06-29 (7): Mirror Descent Policy Optimisation for Robust Constrained Markov Decision Processes

Title: Mirror Descent Policy Optimisation for Robust Constrained Markov Decision Processes Spiegelabstiegspolitik Optimierung für robuste eingeschränkte Markov-Entscheidungsprozesse 强力约束马可夫决策过程的 “ 镜光源 “ 政策优化 2506.23165v1

Authors (2): David Bossens, Atsushi Nitanda

Safety is an essential requirement for reinforcement learning systems. The newly emerging framework of robust constrained Markov decision processes allows learning policies that satisfy long-term constraints while providing guarantees under epistemic uncertainty. This paper presents mirror descent policy optimisation for robust constrained Markov decision processes (RCMDPs), making use of policy gradient techniques to optimise both the policy (as a maximiser) and the transition kernel (as an adversarial minimiser) on the Lagrangian representing a constrained MDP. In the oracle-based RCMDP setting, we obtain an $\mathcal{O}\left(\frac{1}{T}\right)$ convergence rate for the squared distance as a Bregman divergence, and an $\mathcal{O}\left(e^{-T}\right)$ convergence rate for entropy-regularised objectives. In the sample-based RCMDP setting, we obtain an $\tilde{\mathcal{O}}\left(\frac{1}{T^{1/3}}\right)$ convergence rate. Experiments confirm the benefits of mirror descent policy optimisation in constrained and unconstrained optimisation, and significant improvements are observed in robustness tests when compared to baseline policy optimisation algorithms.

nan


Article 867

Title@2025-06-29 (7): A Comprehensive Study of Shapley Value in Data Analytics

Title: A Comprehensive Study of Shapley Value in Data Analytics Eine umfassende Untersuchung des Shapley-Wertes in der Datenanalyse 全面研究数据分析中的虚光值 2412.01460v7

Authors (7): Hong Lin, Shixin Wan, Zhongle Xie, Ke Chen, Meihui Zhang, Lidan Shou, Gang Chen

Over the recent years, Shapley value (SV), a solution concept from cooperative game theory, has found numerous applications in data analytics (DA). This paper provides the first comprehensive study of SV used throughout the DA workflow, clarifying the key variables in defining DA-applicable SV and the essential functionalities that SV can provide for data scientists. We condense four primary challenges of using SV in DA, namely computation efficiency, approximation error, privacy preservation, and interpretability, then disentangle the resolution techniques from existing arts in this field, analyze and discuss the techniques w.r.t. each challenge and potential conflicts between challenges. We also implement SVBench, a modular and extensible open-sourced framework for developing SV applications in different DA tasks, and conduct extensive evaluations to validate our analyses and discussions. Based on the qualitative and quantitative results, we identify the limitations of current efforts for applying SV to DA and highlight the directions of future research and engineering.

nan


Article 868

Title@2025-06-29 (7): The Effectiveness of LLMs as Annotators: A Comparative Overview and Empirical Analysis of Direct Representation

Title: The Effectiveness of LLMs as Annotators: A Comparative Overview and Empirical Analysis of Direct Representation Die Wirksamkeit von LLMs als Annotatoren: Eine vergleichende Übersicht und empirische Analyse der direkten Repräsentation LLMs作为说明人的效力:直接代表的比较概览和经验分析 2405.01299v2

Authors (2): Maja Pavlovic, Massimo Poesio

Large Language Models (LLMs) have emerged as powerful support tools across various natural language tasks and a range of application domains. Recent studies focus on exploring their capabilities for data annotation. This paper provides a comparative overview of twelve studies investigating the potential of LLMs in labelling data. While the models demonstrate promising cost and time-saving benefits, there exist considerable limitations, such as representativeness, bias, sensitivity to prompt variations and English language preference. Leveraging insights from these studies, our empirical analysis further examines the alignment between human and GPT-generated opinion distributions across four subjective datasets. In contrast to the studies examining representation, our methodology directly obtains the opinion distribution from GPT. Our analysis thereby supports the minority of studies that are considering diverse perspectives when evaluating data annotation tasks and highlights the need for further research in this direction.

nan


Article 869

Title@2025-06-29 (7): CoT Red-Handed: Stress Testing Chain-of-Thought Monitoring

Title: CoT Red-Handed: Stress Testing Chain-of-Thought Monitoring CoT Red-Handed: Stresstesting Chain-of-Thought-Überwachung COT 红手:压力测试研究链监测 2505.23575v2

Authors (6): Benjamin Arnav, Pablo Bernabeu-Pérez, Nathan Helm-Burger, Tim Kostolansky, Hannes Whittingham, Mary Phuong

As AI models are deployed with increasing autonomy, it is important to ensure they do not take harmful actions unnoticed. As a potential mitigation, we investigate Chain-of-Thought (CoT) monitoring, wherein a weaker trusted monitor model continuously oversees the intermediate reasoning steps of a more powerful but untrusted model. We compare CoT monitoring to action-only monitoring, where only final outputs are reviewed, in a red-teaming setup where the untrusted model is instructed to pursue harmful side tasks while completing a coding problem. We find that CoT monitoring improves detection by up to 27 percentage points in scenarios where action-only monitoring fails to reliably identify sabotage. However, CoT traces can also contain misleading rationalizations that deceive the monitor, reducing performance in more obvious sabotage cases. To address this, we introduce a hybrid protocol that independently scores both reasoning and final outputs and combines them using a weighted average. This hybrid monitor consistently outperforms both CoT and action-only monitors across all tested models and tasks, with detection rates over four times higher than action-only monitoring for subtle deception scenarios.

nan


Article 870

Title@2025-06-29 (7): maneuverRecognition – A Python package for Timeseries Classification in the domain of Vehicle Telematics

Title: maneuverRecognition – A Python package for Timeseries Classification in the domain of Vehicle Telematics ManöverErkennung – Ein Python-Paket für die Zeitreihe Klassifizierung im Bereich der Fahrzeugtelematik 机动车辆遥视技术领域时间序列分类的 Python 软件包 2506.23147v1

Authors (2): Jonathan Schuster, Fabian Transchel

In the domain of vehicle telematics the automated recognition of driving maneuvers is used to classify and evaluate driving behaviour. This not only serves as a component to enhance the personalization of insurance policies, but also to increase road safety, reduce accidents and the associated costs as well as to reduce fuel consumption and support environmentally friendly driving. In this context maneuver recognition technically requires a continuous application of time series classification which poses special challenges to the transfer, preprocessing and storage of telematic sensor data, the training of predictive models, and the prediction itself. Although much research has been done in the field of gathering relevant data or regarding the methods to build predictive models for the task of maneuver recognition, there is a practical need for python packages and functions that allow to quickly transform data into the required structure as well as to build and evaluate such models. The maneuverRecognition package was therefore developed to provide the necessary functions for preprocessing, modelling and evaluation and also includes a ready to use LSTM based network structure that can be modified. The implementation of the package is demonstrated using real driving data of three different persons recorded via smartphone sensors.

nan


Article 871

Title@2025-06-29 (7): Forget-MI: Machine Unlearning for Forgetting Multimodal Information in Healthcare Settings

Title: Forget-MI: Machine Unlearning for Forgetting Multimodal Information in Healthcare Settings Forget-MI: Maschinelles Lernen zum Vergessen multimodaler Informationen in Gesundheitseinstellungen 忘记-MI:在保健机构中放弃多模式信息的机器学习 2506.23145v1

Authors (5): Shahad Hardan, Darya Taratynova, Abdelmajid Essofi, Karthik Nandakumar, Mohammad Yaqub

Privacy preservation in AI is crucial, especially in healthcare, where models rely on sensitive patient data. In the emerging field of machine unlearning, existing methodologies struggle to remove patient data from trained multimodal architectures, which are widely used in healthcare. We propose Forget-MI, a novel machine unlearning method for multimodal medical data, by establishing loss functions and perturbation techniques. Our approach unlearns unimodal and joint representations of the data requested to be forgotten while preserving knowledge from the remaining data and maintaining comparable performance to the original model. We evaluate our results using performance on the forget dataset, performance on the test dataset, and Membership Inference Attack (MIA), which measures the attacker’s ability to distinguish the forget dataset from the training dataset. Our model outperforms the existing approaches that aim to reduce MIA and the performance on the forget dataset while keeping an equivalent performance on the test set. Specifically, our approach reduces MIA by 0.202 and decreases AUC and F1 scores on the forget set by 0.221 and 0.305, respectively. Additionally, our performance on the test set matches that of the retrained model, while allowing forgetting. Code is available at https://github.com/BioMedIA-MBZUAI/Forget-MI.git

nan


Article 872

Title@2025-06-29 (7): Beware of Calibration Data for Pruning Large Language Models

Title: Beware of Calibration Data for Pruning Large Language Models Hüten Sie sich vor Kalibrierdaten für das Pruning von großen Sprachmodellen 注意为粗略大语言模型提供校准数据 2410.17711v2

Authors (8): Yixin Ji, Yang Xiang, Juntao Li, Qingrong Xia, Ping Li, Xinyu Duan, Zhefeng Wang, Min Zhang

As large language models (LLMs) are widely applied across various fields, model compression has become increasingly crucial for reducing costs and improving inference efficiency. Post-training pruning is a promising method that does not require resource-intensive iterative training and only needs a small amount of calibration data to assess the importance of parameters. Recent research has enhanced post-training pruning from different aspects but few of them systematically explore the effects of calibration data, and it is unclear if there exist better calibration data construction strategies. We fill this blank and surprisingly observe that calibration data is also crucial to post-training pruning, especially for high sparsity. Through controlled experiments on important influence factors of calibration data, including the pruning settings, the amount of data, and its similarity with pre-training data, we observe that a small size of data is adequate, and more similar data to its pre-training stage can yield better performance. As pre-training data is usually inaccessible for advanced LLMs, we further provide a self-generating calibration data synthesis strategy to construct feasible calibration data. Experimental results on recent strong open-source LLMs (e.g., DCLM, and LLaMA-3) show that the proposed strategy can enhance the performance of strong pruning methods (e.g., Wanda, DSnoT, OWL) by a large margin (up to $2.68\%$). Code is available at https://github.com/Dereck0602/calibration_data.

nan


Article 873

Title@2025-06-29 (7): CRISP-SAM2: SAM2 with Cross-Modal Interaction and Semantic Prompting for Multi-Organ Segmentation

Title: CRISP-SAM2: SAM2 with Cross-Modal Interaction and Semantic Prompting for Multi-Organ Segmentation CRISP-SAM2: SAM2 mit Cross-Modal Interaction und semantischer Prompting für Multi-Organ Segmentierung CRIISP-SAM2:SAM2 具有跨模式相互作用和跨组织分解的语义提示的SAM2 2506.23121v1

Authors (8): Xinlei Yu, Chanmiao Wang, Hui Jin, Ahmed Elazab, Gangyong Jia, Xiang Wan, Changqing Zou, Ruiquan Ge

Multi-organ medical segmentation is a crucial component of medical image processing, essential for doctors to make accurate diagnoses and develop effective treatment plans. Despite significant progress in this field, current multi-organ segmentation models often suffer from inaccurate details, dependence on geometric prompts and loss of spatial information. Addressing these challenges, we introduce a novel model named CRISP-SAM2 with CRoss-modal Interaction and Semantic Prompting based on SAM2. This model represents a promising approach to multi-organ medical segmentation guided by textual descriptions of organs. Our method begins by converting visual and textual inputs into cross-modal contextualized semantics using a progressive cross-attention interaction mechanism. These semantics are then injected into the image encoder to enhance the detailed understanding of visual information. To eliminate reliance on geometric prompts, we use a semantic prompting strategy, replacing the original prompt encoder to sharpen the perception of challenging targets. In addition, a similarity-sorting self-updating strategy for memory and a mask-refining process is applied to further adapt to medical imaging and enhance localized details. Comparative experiments conducted on seven public datasets indicate that CRISP-SAM2 outperforms existing models. Extensive analysis also demonstrates the effectiveness of our method, thereby confirming its superior performance, especially in addressing the limitations mentioned earlier. Our code is available at: https://github.com/YU-deep/CRISP_SAM2.git.

nan


Article 874

Title@2025-06-29 (7): Enough Coin Flips Can Make LLMs Act Bayesian

Title: Enough Coin Flips Can Make LLMs Act Bayesian Genug Münze Flips kann LLMs Act Bayesian 足够多的硬币翻翻可以制造长效LLM 贝叶斯女士 2503.04722v2

Authors (7): Ritwik Gupta, Rodolfo Corona, Jiaxin Ge, Eric Wang, Dan Klein, Trevor Darrell, David M. Chan

Large language models (LLMs) exhibit the ability to generalize given few-shot examples in their input prompt, an emergent capability known as in-context learning (ICL). We investigate whether LLMs use ICL to perform structured reasoning in ways that are consistent with a Bayesian framework or rely on pattern matching. Using a controlled setting of biased coin flips, we find that: (1) LLMs often possess biased priors, causing initial divergence in zero-shot settings, (2) in-context evidence outweighs explicit bias instructions, (3) LLMs broadly follow Bayesian posterior updates, with deviations primarily due to miscalibrated priors rather than flawed updates, and (4) attention magnitude has negligible effect on Bayesian inference. With sufficient demonstrations of biased coin flips via ICL, LLMs update their priors in a Bayesian manner.

nan


Article 875

Title@2025-06-29 (7): Deep Support Vectors

Title: Deep Support Vectors Unterstützende Vektoren 深支持矢量 2403.17329v3

Authors (4): Junhoo Lee, Hyunho Lee, Kyomin Hwang, Nojun Kwak

Deep learning has achieved tremendous success. However, unlike SVMs, which provide direct decision criteria and can be trained with a small dataset, it still has significant weaknesses due to its requirement for massive datasets during training and the black-box characteristics on decision criteria. This paper addresses these issues by identifying support vectors in deep learning models. To this end, we propose the DeepKKT condition, an adaptation of the traditional Karush-Kuhn-Tucker (KKT) condition for deep learning models, and confirm that generated Deep Support Vectors (DSVs) using this condition exhibit properties similar to traditional support vectors. This allows us to apply our method to few-shot dataset distillation problems and alleviate the black-box characteristics of deep learning models. Additionally, we demonstrate that the DeepKKT condition can transform conventional classification models into generative models with high fidelity, particularly as latent generative models using class labels as latent variables. We validate the effectiveness of DSVs using common datasets (ImageNet, CIFAR10 and CIFAR100) on the general architectures (ResNet and ConvNet), proving their practical applicability.

nan


Article 876

Title@2025-06-29 (7): A Survey of Test-Time Compute: From Intuitive Inference to Deliberate Reasoning

Title: A Survey of Test-Time Compute: From Intuitive Inference to Deliberate Reasoning Eine Übersicht über die Berechnung der Testzeit: Vom intuitiven Rückschluss zur überlegten Vernunft 试验时间计算调查:从直觉推理到故意推理 2501.02497v3

Authors (9): Yixin Ji, Juntao Li, Yang Xiang, Hai Ye, Kaixin Wu, Kai Yao, Jia Xu, Linjian Mo, Min Zhang

The remarkable performance of the o1 model in complex reasoning demonstrates that test-time compute scaling can further unlock the model’s potential, enabling powerful System-2 thinking. However, there is still a lack of comprehensive surveys for test-time compute scaling. We trace the concept of test-time compute back to System-1 models. In System-1 models, test-time compute addresses distribution shifts and improves robustness and generalization through parameter updating, input modification, representation editing, and output calibration. In System-2 models, it enhances the model’s reasoning ability to solve complex problems through repeated sampling, self-correction, and tree search. We organize this survey according to the trend of System-1 to System-2 thinking, highlighting the key role of test-time compute in the transition from System-1 models to weak System-2 models, and then to strong System-2 models. We also point out advanced topics and future directions.

nan


Article 877

Title@2025-06-29 (7): Theoretical Modeling of LLM Self-Improvement Training Dynamics Through Solver-Verifier Gap

Title: Theoretical Modeling of LLM Self-Improvement Training Dynamics Through Solver-Verifier Gap Theoretische Modellierung der LLM-Selbstverbesserungs-Trainingsdynamik durch Solver-Verifier Gap 理学建模LLM 自改进培训动态,通过溶剂-变压装置差距 2507.00075v1

Authors (4): Yifan Sun, Yushan Liang, Zhen Zhang, Jiaye Teng

Self-improvement is among the most prominent techniques within the realm of large language models (LLM), aiming to enhance the LLM performance without relying on external data. Despite its significance, generally how LLM performances evolve during the self-improvement process remains underexplored. In this paper, we theoretically model the training dynamics of self-improvement via the concept of solver-verifier gap. This is inspired by the conjecture that the performance enhancement of self-improvement stems from the gap between LLM’s solver capability and verifier capability. Based on the theoretical framework, we further introduce how to predict the ultimate power of self-improvement using only information from the first few training epochs. We empirically validate the effectiveness of the theoretical model on various LLMs and datasets. Beyond self-improvement, we extend our analysis to investigate how external data influences these dynamics within the framework. Notably, we find that under limited external data regimes, such external data can be utilized at any stage without significantly affecting final performances, which accords with the empirical observations.

nan


Article 878

Title@2025-06-29 (7): Learning Dynamics of LLM Finetuning

Title: Learning Dynamics of LLM Finetuning Dynamisches Lernen der LLM-Feinsteuerung LLM 微调的学习动态 2407.10490v4

Authors (2): Yi Ren, Danica J. Sutherland

Learning dynamics, which describes how the learning of specific training examples influences the model’s predictions on other examples, gives us a powerful tool for understanding the behavior of deep learning systems. We study the learning dynamics of large language models during different types of finetuning, by analyzing the step-wise decomposition of how influence accumulates among different potential responses. Our framework allows a uniform interpretation of many interesting observations about the training of popular algorithms for both instruction tuning and preference tuning. In particular, we propose a hypothetical explanation of why specific types of hallucination are strengthened after finetuning, e.g., the model might use phrases or facts in the response for question B to answer question A, or the model might keep repeating similar simple phrases when generating responses. We also extend our framework and highlight a unique “squeezing effect” to explain a previously observed phenomenon in off-policy direct preference optimization (DPO), where running DPO for too long makes even the desired outputs less likely. This framework also provides insights into where the benefits of on-policy DPO and other variants come from. The analysis not only provides a novel perspective of understanding LLM’s finetuning but also inspires a simple, effective method to improve alignment performance.

nan


Article 879

Title@2025-06-29 (7): Integrating Large Language Models in Financial Investments and Market Analysis: A Survey

Title: Integrating Large Language Models in Financial Investments and Market Analysis: A Survey Integration großer Sprachmodelle in Finanzanlagen und Marktanalyse: Eine Umfrage 将大语言模式纳入金融投资和市场分析:调查 2507.01990v1

Authors (6): Sedigheh Mahdavi, Jiating, Chen, Pradeep Kumar Joshi, Lina Huertas Guativa, Upmanyu Singh

Large Language Models (LLMs) have been employed in financial decision making, enhancing analytical capabilities for investment strategies. Traditional investment strategies often utilize quantitative models, fundamental analysis, and technical indicators. However, LLMs have introduced new capabilities to process and analyze large volumes of structured and unstructured data, extract meaningful insights, and enhance decision-making in real-time. This survey provides a structured overview of recent research on LLMs within the financial domain, categorizing research contributions into four main frameworks: LLM-based Frameworks and Pipelines, Hybrid Integration Methods, Fine-Tuning and Adaptation Approaches, and Agent-Based Architectures. This study provides a structured review of recent LLMs research on applications in stock selection, risk assessment, sentiment analysis, trading, and financial forecasting. By reviewing the existing literature, this study highlights the capabilities, challenges, and potential directions of LLMs in financial markets.

nan


Article 880

Title@2025-06-29 (7): Score-based Generative Diffusion Models to Synthesize Full-dose FDG Brain PET from MRI in Epilepsy Patients

Title: Score-based Generative Diffusion Models to Synthesize Full-dose FDG Brain PET from MRI in Epilepsy Patients Score-basierte Generative Diffusionsmodelle zur Synthesize-Volldosierung FDG Brain PET von MRT bei Epilepsie-Patienten 用于合成全剂量FDG脑脑PET的基于分分数的传播模型,来自麻疯病人的MRI 2506.11297v2

Authors (5): Jiaqi Wu, Jiahong Ouyang, Farshad Moradi, Mohammad Mehdi Khalighi, Greg Zaharchuk

Fluorodeoxyglucose (FDG) PET to evaluate patients with epilepsy is one of the most common applications for simultaneous PET/MRI, given the need to image both brain structure and metabolism, but is suboptimal due to the radiation dose in this young population. Little work has been done synthesizing diagnostic quality PET images from MRI data or MRI data with ultralow-dose PET using advanced generative AI methods, such as diffusion models, with attention to clinical evaluations tailored for the epilepsy population. Here we compared the performance of diffusion- and non-diffusion-based deep learning models for the MRI-to-PET image translation task for epilepsy imaging using simultaneous PET/MRI in 52 subjects (40 train/2 validate/10 hold-out test). We tested three different models: 2 score-based generative diffusion models (SGM-Karras Diffusion [SGM-KD] and SGM-variance preserving [SGM-VP]) and a Transformer-Unet. We report results on standard image processing metrics as well as clinically relevant metrics, including congruency measures (Congruence Index and Congruency Mean Absolute Error) that assess hemispheric metabolic asymmetry, which is a key part of the clinical analysis of these images. The SGM-KD produced the best qualitative and quantitative results when synthesizing PET purely from T1w and T2 FLAIR images with the least mean absolute error in whole-brain specific uptake value ratio (SUVR) and highest intraclass correlation coefficient. When 1% low-dose PET images are included in the inputs, all models improve significantly and are interchangeable for quantitative performance and visual quality. In summary, SGMs hold great potential for pure MRI-to-PET translation, while all 3 model types can synthesize full-dose FDG-PET accurately using MRI and ultralow-dose PET.

nan


Article 881

Title@2025-06-29 (7): Multi-task Offline Reinforcement Learning for Online Advertising in Recommender Systems

Title: Multi-task Offline Reinforcement Learning for Online Advertising in Recommender Systems Multi-Task Offline-Verstärkung Lernen für Online-Werbung in Recommender-Systemen 顾问系统中在线广告网上广告多任务离线强化学习 2506.23090v1

Authors (9): Langming Liu, Wanyu Wang, Chi Zhang, Bo Li, Hongzhi Yin, Xuetao Wei, Wenbo Su, Bo Zheng, Xiangyu Zhao

Online advertising in recommendation platforms has gained significant attention, with a predominant focus on channel recommendation and budget allocation strategies. However, current offline reinforcement learning (RL) methods face substantial challenges when applied to sparse advertising scenarios, primarily due to severe overestimation, distributional shifts, and overlooking budget constraints. To address these issues, we propose MTORL, a novel multi-task offline RL model that targets two key objectives. First, we establish a Markov Decision Process (MDP) framework specific to the nuances of advertising. Then, we develop a causal state encoder to capture dynamic user interests and temporal dependencies, facilitating offline RL through conditional sequence modeling. Causal attention mechanisms are introduced to enhance user sequence representations by identifying correlations among causal states. We employ multi-task learning to decode actions and rewards, simultaneously addressing channel recommendation and budget allocation. Notably, our framework includes an automated system for integrating these tasks into online advertising. Extensive experiments on offline and online environments demonstrate MTORL’s superiority over state-of-the-art methods.

nan


Article 882

Title@2025-06-29 (7): Fractional Policy Gradients: Reinforcement Learning with Long-Term Memory

Title: Fractional Policy Gradients: Reinforcement Learning with Long-Term Memory Gradienten der Fraktionspolitik: Stärkung des Lernens mit Langzeitgedächtnis 小分政策分级:用长期记忆加强学习 2507.00073v1

Authors (2): Urvi Pawar, Kunal Telangi

We propose Fractional Policy Gradients (FPG), a reinforcement learning framework incorporating fractional calculus for long-term temporal modeling in policy optimization. Standard policy gradient approaches face limitations from Markovian assumptions, exhibiting high variance and inefficient sampling. By reformulating gradients using Caputo fractional derivatives, FPG establishes power-law temporal correlations between state transitions. We develop an efficient recursive computation technique for fractional temporal-difference errors with constant time and memory requirements. Theoretical analysis shows FPG achieves asymptotic variance reduction of order O(t^(-alpha)) versus standard policy gradients while preserving convergence. Empirical validation demonstrates 35-68% sample efficiency gains and 24-52% variance reduction versus state-of-the-art baselines. This framework provides a mathematically grounded approach for leveraging long-range dependencies without computational overhead.

nan


Article 883

Title: Multi-Modal Recommendation Unlearning for Legal, Licensing, and Modality Constraints Multi-Modal Empfehlung Unlearning für rechtliche, Lizenzierungs- und Modalitätsbeschränkungen 法律、许可证和方式限制方面的学习 2405.15328v3

Authors (3): Yash Sinha, Murari Mandal, Mohan Kankanhalli

User data spread across multiple modalities has popularized multi-modal recommender systems (MMRS). They recommend diverse content such as products, social media posts, TikTok reels, etc., based on a user-item interaction graph. With rising data privacy demands, recent methods propose unlearning private user data from uni-modal recommender systems (RS). However, methods for unlearning item data related to outdated user preferences, revoked licenses, and legally requested removals are still largely unexplored. Previous RS unlearning methods are unsuitable for MMRS due to the incompatibility of their matrix-based representation with the multi-modal user-item interaction graph. Moreover, their data partitioning step degrades performance on each shard due to poor data heterogeneity and requires costly performance aggregation across shards. This paper introduces MMRecUn, the first approach known to us for unlearning in MMRS and unlearning item data. Given a trained RS model, MMRecUn employs a novel Reverse Bayesian Personalized Ranking (BPR) objective to enable the model to forget marked data. The reverse BPR attenuates the impact of user-item interactions within the forget set, while the forward BPR reinforces the significance of user-item interactions within the retain set. Our experiments demonstrate that MMRecUn outperforms baseline methods across various unlearning requests when evaluated on benchmark MMRS datasets. MMRecUn achieves recall performance improvements of up to 49.85% compared to baseline methods and is up to 1.3x faster than the Gold model, which is trained on retain set from scratch. MMRecUn offers significant advantages, including superiority in removing target interactions, preserving retained interactions, and zero overhead costs compared to previous methods. Code: https://github.com/MachineUnlearn/MMRecUN Extended version: arXiv:2405.15328

nan


Article 884

Title@2025-06-29 (7): TyphoFormer: Language-Augmented Transformer for Accurate Typhoon Track Forecasting

Title: TyphoFormer: Language-Augmented Transformer for Accurate Typhoon Track Forecasting TyphoFormer: Sprachgesteigerter Transformer für präzise Typhoon-Track-Prognose 台风前台风:用于准确预报台风轨道的语文增强变换器 2506.17609v2

Authors (6): Lincan Li, Eren Erman Ozguven, Yue Zhao, Guang Wang, Yiqun Xie, Yushun Dong

Accurate typhoon track forecasting is crucial for early system warning and disaster response. While Transformer-based models have demonstrated strong performance in modeling the temporal dynamics of dense trajectories of humans and vehicles in smart cities, they usually lack access to broader contextual knowledge that enhances the forecasting reliability of sparse meteorological trajectories, such as typhoon tracks. To address this challenge, we propose TyphoFormer, a novel framework that incorporates natural language descriptions as auxiliary prompts to improve typhoon trajectory forecasting. For each time step, we use Large Language Model (LLM) to generate concise textual descriptions based on the numerical attributes recorded in the North Atlantic hurricane database. The language descriptions capture high-level meteorological semantics and are embedded as auxiliary special tokens prepended to the numerical time series input. By integrating both textual and sequential information within a unified Transformer encoder, TyphoFormer enables the model to leverage contextual cues that are otherwise inaccessible through numerical features alone. Extensive experiments are conducted on HURDAT2 benchmark, results show that TyphoFormer consistently outperforms other state-of-the-art baseline methods, particularly under challenging scenarios involving nonlinear path shifts and limited historical observations.

nan


Article 885

Title@2025-06-29 (7): CSBrain: A Cross-scale Spatiotemporal Brain Foundation Model for EEG Decoding

Title: CSBrain: A Cross-scale Spatiotemporal Brain Foundation Model for EEG Decoding CSBrain: Ein Cross-Scale Spatiotemporal Brain Foundation Modell für die EEG-Dekodierung CSBrain: 跨规模的超时脑基础模型 2506.23075v1

Authors (10): Yuchen Zhou, Jiamin Wu, Zichen Ren, Zhouheng Yao, Weiheng Lu, Kunyu Peng, Qihao Zheng, Chunfeng Song, Wanli Ouyang, Chao Gou

Understanding and decoding brain activity from electroencephalography (EEG) signals is a fundamental challenge in neuroscience and AI, with applications in cognition, emotion recognition, diagnosis, and brain-computer interfaces. While recent EEG foundation models advance generalized decoding via unified architectures and large-scale pretraining, they adopt a scale-agnostic dense modeling paradigm inherited from NLP and vision. This design neglects a core property of neural activity: cross-scale spatiotemporal structure. EEG task patterns span a wide range of temporal and spatial scales, from short bursts to slow rhythms, and from localized cortical responses to distributed interactions. Ignoring this diversity leads to suboptimal representations and weak generalization. We propose CSBrain, a Cross-scale Spatiotemporal Brain foundation model for generalized EEG decoding. CSBrain introduces: (i) Cross-scale Spatiotemporal Tokenization (CST), which aggregates multi-scale features from localized temporal windows and anatomical brain regions into compact scale-aware tokens; and (ii) Structured Sparse Attention (SSA), which captures cross-window and cross-region dependencies, enhancing scale diversity while removing spurious correlations. CST and SSA are alternately stacked to progressively integrate multi-scale dependencies. Experiments on 11 EEG tasks across 16 datasets show that CSBrain consistently outperforms task-specific and foundation model baselines. These results establish cross-scale modeling as a key inductive bias and position CSBrain as a robust backbone for future brain-AI research.

nan


Article 886

Title@2025-06-29 (7): Learning Counterfactually Decoupled Attention for Open-World Model Attribution

Title: Learning Counterfactually Decoupled Attention for Open-World Model Attribution Counterfactually Entkoppelte Aufmerksamkeit für Open-World-Modellzuweisung lernen 对开放世界模式的注意 2506.23074v1

Authors (9): Yu Zheng, Boyang Gong, Fanye Kong, Yueqi Duan, Bingyao Yu, Wenzhao Zheng, Lei Chen, Jiwen Lu, Jie Zhou

In this paper, we propose a Counterfactually Decoupled Attention Learning (CDAL) method for open-world model attribution. Existing methods rely on handcrafted design of region partitioning or feature space, which could be confounded by the spurious statistical correlations and struggle with novel attacks in open-world scenarios. To address this, CDAL explicitly models the causal relationships between the attentional visual traces and source model attribution, and counterfactually decouples the discriminative model-specific artifacts from confounding source biases for comparison. In this way, the resulting causal effect provides a quantification on the quality of learned attention maps, thus encouraging the network to capture essential generation patterns that generalize to unseen source models by maximizing the effect. Extensive experiments on existing open-world model attribution benchmarks show that with minimal computational overhead, our method consistently improves state-of-the-art models by large margins, particularly for unseen novel attacks. Source code: https://github.com/yzheng97/CDAL.

nan


Article 887

Title@2025-06-29 (7): DReSS: Data-driven Regularized Structured Streamlining for Large Language Models

Title: DReSS: Data-driven Regularized Structured Streamlining for Large Language Models DResS: Datengesteuerte Regularisierte Strukturierte Straffung für große Sprachmodelle DReSS: 数据驱动的大型语文模式正规化结构精简 2501.17905v3

Authors (8): Mingkuan Feng, Jinyang Wu, Shuai Zhang, Pengpeng Shao, Ruihan Jin, Zhengqi Wen, Jianhua Tao, Feihu Che

Large language models (LLMs) have achieved significant progress across various domains, but their increasing scale results in high computational and memory costs. Recent studies have revealed that LLMs exhibit sparsity, providing the potential to reduce model size through pruning techniques. However, existing pruning methods typically follow a prune-then-finetune paradigm. Since the pruned components still contain valuable information, their direct removal often leads to irreversible performance degradation, imposing a substantial computational burden to recover performance during finetuning. In this paper, we propose a novel paradigm that first applies regularization, then prunes, and finally finetunes. Based on this paradigm, we introduce DReSS, a simple and effective Data-driven Regularized Structured Streamlining method for LLMs. By leveraging a small amount of data to regularize the components to be pruned, DReSS explicitly transfers the important information to the remaining parts of the model in advance. Compared to direct pruning, this can reduce the information loss caused by parameter removal, thereby enhancing its language modeling capabilities. Experimental results demonstrate that DReSS significantly outperforms existing pruning methods even under extreme pruning ratios, significantly reducing latency and increasing throughput.

nan


Article 888

Title@2025-06-29 (7): Multimodal Medical Code Tokenizer

Title: Multimodal Medical Code Tokenizer Multimodaler medizinischer Code Tokenizer 多式联运医疗法典化器 2502.04397v3

Authors (8): Xiaorui Su, Shvat Messica, Yepeng Huang, Ruth Johnson, Lukas Fesser, Shanghua Gao, Faryad Sahneh, Marinka Zitnik

Foundation models trained on patient electronic health records (EHRs) require tokenizing medical data into sequences of discrete vocabulary items. Existing tokenizers treat medical codes from EHRs as isolated textual tokens. However, each medical code is defined by its textual description, its position in ontological hierarchies, and its relationships to other codes, such as disease co-occurrences and drug-treatment associations. Medical vocabularies contain more than 600,000 codes with critical information for clinical reasoning. We introduce MedTok, a multimodal medical code tokenizer that uses the text descriptions and relational context of codes. MedTok processes text using a language model encoder and encodes the relational structure with a graph encoder. It then quantizes both modalities into a unified token space, preserving modality-specific and cross-modality information. We integrate MedTok into five EHR models and evaluate it on operational and clinical tasks across in-patient and out-patient datasets, including outcome prediction, diagnosis classification, drug recommendation, and risk stratification. Swapping standard EHR tokenizers with MedTok improves AUPRC across all EHR models, by 4.10% on MIMIC-III, 4.78% on MIMIC-IV, and 11.32% on EHRShot, with the largest gains in drug recommendation. Beyond EHR modeling, we demonstrate using MedTok tokenizer with medical QA systems. Our results demonstrate the potential of MedTok as a unified tokenizer for medical codes, improving tokenization for medical foundation models.

nan


Article 889

Title@2025-06-29 (7): Curious Causality-Seeking Agents Learn Meta Causal World

Title: Curious Causality-Seeking Agents Learn Meta Causal World Neugierige Kausalität suchende Agenten lernen Meta Kausale Welt 好奇原因 – – 寻寻代理人学习大原因世界 2506.23068v1

Authors (7): Zhiyu Zhao, Haoxuan Li, Haifeng Zhang, Jun Wang, Francesco Faccio, Jürgen Schmidhuber, Mengyue Yang

When building a world model, a common assumption is that the environment has a single, unchanging underlying causal rule, like applying Newton’s laws to every situation. In reality, what appears as a drifting causal mechanism is often the manifestation of a fixed underlying mechanism seen through a narrow observational window. This brings about a problem that, when building a world model, even subtle shifts in policy or environment states can alter the very observed causal mechanisms. In this work, we introduce the \textbf{Meta-Causal Graph} as world models, a minimal unified representation that efficiently encodes the transformation rules governing how causal structures shift across different latent world states. A single Meta-Causal Graph is composed of multiple causal subgraphs, each triggered by meta state, which is in the latent state space. Building on this representation, we introduce a \textbf{Causality-Seeking Agent} whose objectives are to (1) identify the meta states that trigger each subgraph, (2) discover the corresponding causal relationships by agent curiosity-driven intervention policy, and (3) iteratively refine the Meta-Causal Graph through ongoing curiosity-driven exploration and agent experiences. Experiments on both synthetic tasks and a challenging robot arm manipulation task demonstrate that our method robustly captures shifts in causal dynamics and generalizes effectively to previously unseen contexts.

nan


Article 890

Title@2025-06-29 (7): Measuring How LLMs Internalize Human Psychological Concepts: A preliminary analysis

Title: Measuring How LLMs Internalize Human Psychological Concepts: A preliminary analysis Messen, wie LLMs menschliche Psychologische Konzepte internalisieren: Eine vorläufige Analyse 衡量LLM女士如何将人类心理心理概念内化:初步分析 2506.23055v1

Authors (4): Hiro Taiyo Hamada, Ippei Fujisawa, Genji Kawakita, Yuki Yamada

Large Language Models (LLMs) such as ChatGPT have shown remarkable abilities in producing human-like text. However, it is unclear how accurately these models internalize concepts that shape human thought and behavior. Here, we developed a quantitative framework to assess concept alignment between LLMs and human psychological dimensions using 43 standardized psychological questionnaires, selected for their established validity in measuring distinct psychological constructs. Our method evaluates how accurately language models reconstruct and classify questionnaire items through pairwise similarity analysis. We compared resulting cluster structures with the original categorical labels using hierarchical clustering. A GPT-4 model achieved superior classification accuracy (66.2\%), significantly outperforming GPT-3.5 (55.9\%) and BERT (48.1\%), all exceeding random baseline performance (31.9\%). We also demonstrated that the estimated semantic similarity from GPT-4 is associated with Pearson’s correlation coefficients of human responses in multiple psychological questionnaires. This framework provides a novel approach to evaluate the alignment of the human-LLM concept and identify potential representational biases. Our findings demonstrate that modern LLMs can approximate human psychological constructs with measurable accuracy, offering insights for developing more interpretable AI systems.

nan


Article 891

Title@2025-06-29 (7): A Framework of Decision-Relevant Observability: Reinforcement Learning Converges Under Relative Ignorability

Title: A Framework of Decision-Relevant Observability: Reinforcement Learning Converges Under Relative Ignorability Ein Rahmen für entscheidungsrelevante Beobachtungsfähigkeit: Verstärktes Lernen unter relativer Ignorierbarkeit 与决策相关的可观察性框架:相对无知下的加强学习联系 2504.07722v5

Authors (1): MaryLena Bleile

From clinical dosing algorithms to autonomous robots, sequential decision-making systems routinely operate with missing or incomplete data. Classical reinforcement learning theory, which is commonly used to solve sequential decision problems, assumes Markovian observability, which may not hold under partial observability. Causal inference paradigms formalise ignorability of missingness. We show these views can be unified and generalized in order to guarantee Q-learning convergence even when the Markov property fails. To do so, we introduce the concept of \emph{relative ignorability}. Relative ignorability is a graphical-causal criterion which refines the requirements for accurate decision-making based on incomplete data. Theoretical results and simulations both reveal that non-markovian stochastic processes whose missingness is relatively ignorable with respect to causal estimands can still be optimized using standard Reinforcement Learning algorithms. These results expand the theoretical foundations of safe, data-efficient AI to real-world environments where complete information is unattainable.

nan


Article 892

Title@2025-06-29 (7): Double-Diffusion: Diffusion Conditioned Diffusion Probabilistic Model For Air Quality Prediction

Title: Double-Diffusion: Diffusion Conditioned Diffusion Probabilistic Model For Air Quality Prediction Doppel-Diffusion: Diffusion Bedingt Diffusion probabilistische Modell für die Luftqualität Vorhersage 双重扩散: 空气质量预测的有条件扩散概率模型 2506.23053v1

Authors (4): Hanlin Dong, Arian Prabowo, Hao Xue, Flora D. Salim

Air quality prediction is a challenging forecasting task due to its spatio-temporal complexity and the inherent dynamics as well as uncertainty. Most of the current models handle these two challenges by applying Graph Neural Networks or known physics principles, and quantifying stochasticity through probabilistic networks like Diffusion models. Nevertheless, finding the right balancing point between the certainties and uncertainties remains an open question. Therefore, we propose Double-Diffusion, a novel diffusion probabilistic model that harnesses the power of known physics to guide air quality forecasting with stochasticity. To the best of our knowledge, while precedents have been made of using conditional diffusion models to predict air pollution, this is the first attempt to use physics as a conditional generative approach for air quality prediction. Along with a sampling strategy adopted from image restoration and a new denoiser architecture, Double-Diffusion ranks first in most evaluation scenarios across two real-life datasets compared with other probabilistic models, it also cuts inference time by 50% to 30% while enjoying an increase between 3-12% in Continuous Ranked Probabilistic Score (CRPS).

nan


Article 893

Title@2025-06-29 (7): DSAC: Distributional Soft Actor-Critic for Risk-Sensitive Reinforcement Learning

Title: DSAC: Distributional Soft Actor-Critic for Risk-Sensitive Reinforcement Learning DSAC: Distributive Soft Actor-Critic für risikosensibles Verstärkungslernen DSAC: 用于风险敏感强化学习的分布软软活性-批评 2004.14547v3

Authors (6): Xiaoteng Ma, Junyao Chen, Li Xia, Jun Yang, Qianchuan Zhao, Zhengyuan Zhou

We present Distributional Soft Actor-Critic (DSAC), a distributional reinforcement learning (RL) algorithm that combines the strengths of distributional information of accumulated rewards and entropy-driven exploration from Soft Actor-Critic (SAC) algorithm. DSAC models the randomness in both action and rewards, surpassing baseline performances on various continuous control tasks. Unlike standard approaches that solely maximize expected rewards, we propose a unified framework for risk-sensitive learning, one that optimizes the risk-related objective while balancing entropy to encourage exploration. Extensive experiments demonstrate DSAC’s effectiveness in enhancing agent performances for both risk-neutral and risk-sensitive control tasks.

nan


Article 894

Title@2025-06-29 (7): ReMem: Mutual Information-Aware Fine-tuning of Pretrained Vision Transformers for Effective Knowledge Distillation

Title: ReMem: Mutual Information-Aware Fine-tuning of Pretrained Vision Transformers for Effective Knowledge Distillation ReMem: Gegenseitige Information-Bewusst Feinabstimmung von vortrainierten Visionstransformatoren für effektive Wissensdestillation ReMEM:为有效知识蒸馏而对预先培训的愿景变异器进行相互信息-软件微调 2506.23041v1

Authors (9): Chengyu Dong, Huan Gui, Noveen Sachdeva, Long Jin, Ke Yin, Jingbo Shang, Lichan Hong, Ed H. Chi, Zhe Zhao

Knowledge distillation from pretrained visual representation models offers an effective approach to improve small, task-specific production models. However, the effectiveness of such knowledge transfer drops significantly when distilling from strong models that are pretrained in a large scale. In this paper, we address this challenge for pretrained Vision Transformers (ViTs) by exploring methods to fine-tune them for more effective knowledge transfer. Motivated by the connection between mutual information and distillation effectiveness, we propose to employ mutual information-aware optimization during finetuning. For small or highly-imbalanced downstream datasets where such optimization becomes less effective, we introduce a simple yet effective heuristic of reweighting MLP blocks. This approach is inspired by our observation that top MLP blocks are primarily responsible for mutual information loss. Our method enables small student models to benefit from those pretrained models among the strongest.

nan


Article 895

Title@2025-06-29 (7): MetaSynth: Meta-Prompting-Driven Agentic Scaffolds for Diverse Synthetic Data Generation

Title: MetaSynth: Meta-Prompting-Driven Agentic Scaffolds for Diverse Synthetic Data Generation MetaSynth: Meta-prompting-Driven Agentic Scaffolds für vielfältige synthetische Datengenerierung MetaSynth: 用于多种合成数据生成的元- 制造- 挥发剂脚架 2504.12563v2

Authors (5): Haris Riaz, Sourav Bhabesh, Vinayak Arannil, Miguel Ballesteros, Graham Horwood

Recent smaller language models such Phi-3.5 and Phi-4 rely on synthetic data generated using larger Language models. Questions remain about leveraging synthetic data for other use cases, such as adapting LLMs to specific domains. A key limitation of synthetic data is low diversity, which negatively impacts its downstream applicability for improving other models. To address this, we propose MetaSynth, a method for generating synthetic data that enhances diversity through meta-prompting, where a language model orchestrates multiple “expert” LLM agents to collaboratively generate data. Using only 25 million tokens of synthetic data generated with MetaSynth, we successfully adapt a well-trained LLM (Mistral-7B-v0.3) to two specialized domains-Finance and Biomedicine-without compromising the capabilities of the resulting model in general tasks. In addition, we evaluate the diversity of our synthetic data using seven automated metrics, and find that it approaches the diversity of LLM pre-training corpora. Continually pre-training Mistral-7B-v0.3 with MetaSynth notably outperforms the base LLM, showing improvements of up to 4.08% in Finance and 13.75% in Biomedicine. The same model shows degraded performance when trained on data generated using a template prompt, even when the template includes prior generations and varying In-Context exemplars of real data. Our findings suggest that a few million tokens of diverse synthetic data without mixing any real data, is sufficient for effective domain adaptation when using MetaSynth.

nan


Article 896

Title@2025-06-29 (7): The Oracle Complexity of Simplex-based Matrix Games: Linear Separability and Nash Equilibria

Title: The Oracle Complexity of Simplex-based Matrix Games: Linear Separability and Nash Equilibria Die Oracle Komplexität von Simplex-basierten Matrix Games: Lineare Separierbarkeit und Nash Equilibria 简单克斯基母体运动会的甲骨文复杂性:线性分离和纳什平衡 2412.06990v2

Authors (2): Guy Kornowski, Ohad Shamir

We study the problem of solving matrix games of the form $\max_{\mathbf{w}\in\mathcal{W}}\min_{\mathbf{p}\in\Delta}\mathbf{p}^{\top}A\mathbf{w}$, where $A$ is some matrix and $\Delta$ is the probability simplex. This problem encapsulates canonical tasks such as finding a linear separator and computing Nash equilibria in zero-sum games. However, perhaps surprisingly, its inherent complexity (as formalized in the standard framework of oracle complexity [Nemirovski and Yudin, 1983]) is not well-understood. In this work, we first identify different oracle models which are implicitly used by prior algorithms, amounting to multiplying the matrix $A$ by a vector from either one or both sides. We then prove complexity lower bounds for algorithms under both access models, which in particular imply a separation between them. Specifically, we start by showing that algorithms for linear separability based on one-sided multiplications must require $\Omega(\gamma_A^{-2})$ iterations, where $\gamma_A$ is the margin, as matched by the Perceptron algorithm. We then prove that accelerated algorithms for this task, which utilize multiplications from both sides, must require $\tilde{\Omega}(\gamma_{A}^{-2/3})$ iterations, establishing the first oracle complexity barrier for such algorithms. Finally, by adapting our lower bound to $\ell_1$ geometry, we prove that computing an $\epsilon$-approximate Nash equilibrium requires $\tilde{\Omega}(\epsilon^{-2/5})$ iterations, which is an exponential improvement over the previously best-known lower bound due to Hadiji et al. [2024].

nan


Article 897

Title@2025-06-28 (6): Fragile, Robust, and Antifragile: A Perspective from Parameter Responses in Reinforcement Learning Under Stress

Title: Fragile, Robust, and Antifragile: A Perspective from Parameter Responses in Reinforcement Learning Under Stress Fragile, robuste und antifragile: Eine Perspektive aus Parameterreaktionen bei der Stärkung des Lernens unter Stress 脆弱、强力和抗脆弱:强化学习压力下的参数对策的视角 2506.23036v1

Authors (2): Zain ul Abdeen, Ming Jin

This paper explores Reinforcement learning (RL) policy robustness by systematically analyzing network parameters under internal and external stresses. Inspired by synaptic plasticity in neuroscience, synaptic filtering introduces internal stress by selectively perturbing parameters, while adversarial attacks apply external stress through modified agent observations. This dual approach enables the classification of parameters as fragile, robust, or antifragile, based on their influence on policy performance in clean and adversarial settings. Parameter scores are defined to quantify these characteristics, and the framework is validated on PPO-trained agents in Mujoco continuous control environments. The results highlight the presence of antifragile parameters that enhance policy performance under stress, demonstrating the potential of targeted filtering techniques to improve RL policy adaptability. These insights provide a foundation for future advancements in the design of robust and antifragile RL systems.

nan


Article 898

Title@2025-06-28 (6): Feature-Wise Mixing for Mitigating Contextual Bias in Predictive Supervised Learning

Title: Feature-Wise Mixing for Mitigating Contextual Bias in Predictive Supervised Learning Feature-Wise-Mixing für die Vermeidung von Kontext-Bias in Predictive Supervised Learning 预测监督的学习中减轻背景偏见的功能- Wise 混音 2506.23033v1

Authors (1): Yash Vardhan Tomar

Bias in predictive machine learning (ML) models is a fundamental challenge due to the skewed or unfair outcomes produced by biased models. Existing mitigation strategies rely on either post-hoc corrections or rigid constraints. However, emerging research claims that these techniques can limit scalability and reduce generalizability. To address this, this paper introduces a feature-wise mixing framework to mitigate contextual bias. This was done by redistributing feature representations across multiple contextual datasets. To assess feature-wise mixing’s effectiveness, four ML classifiers were trained using cross-validation and evaluated with bias-sensitive loss functions, including disparity metrics and mean squared error (MSE), which served as a standard measure of predictive performance. The proposed method achieved an average bias reduction of 43.35% and a statistically significant decrease in MSE across all classifiers trained on mixed datasets. Additionally, benchmarking against established bias mitigation techniques found that feature-wise mixing consistently outperformed SMOTE oversampling and demonstrated competitive effectiveness without requiring explicit bias attribute identification. Feature-wise mixing efficiently avoids the computational overhead typically associated with fairness-aware learning algorithms. Future work could explore applying feature-wise mixing for real-world fields where accurate predictions are necessary.

nan


Article 899

Title@2025-06-28 (6): ATTENTION2D: Communication Efficient Distributed Self-Attention Mechanism

Title: ATTENTION2D: Communication Efficient Distributed Self-Attention Mechanism ACHTUNG2D: Kommunikation Effizient verteilter Selbstaufmerksamkeitsmechanismus 注意2D: 沟通高效分配自发性传播机制 2503.15758v2

Authors (1): Venmugil Elango

Transformer-based models have emerged as a leading architecture for natural language processing, natural language generation, and image generation tasks. A fundamental element of the transformer architecture is self-attention, which allows the model to capture intricate dependencies within the data. However, the self-attention mechanism also incurs significant computational and memory costs, particularly for long sequences. In this paper, we introduce ATTENTION2D, a novel approach that exploits parallelism along two dimensions - query and key/value - of the self-attention operation. This method enables efficient distribution and parallelization of computations across multiple devices. Our approach facilitates asymptotically faster training and inference phases compared to previous methods, without relying on approximations or incurring additional computational or memory overheads. Furthermore, unlike existing techniques that struggle to scale with an increasing number of processing units, our approach effectively scales with additional processing units. Our experimental results confirm the effectiveness of our method in improving communication efficiency and scalability. Compared to Ring Attention, our approach demonstrated up to a 5x performance boost on a GPT-3-like model using 64 NVIDIA A100 GPUs across 16 nodes, and up to a 9.4x performance boost on 64 NVIDIA H100 GPUs across 64 nodes.

nan


Article 900

Title@2025-06-28 (6): VisionScores – A system-segmented image score dataset for deep learning tasks

Title: VisionScores – A system-segmented image score dataset for deep learning tasks VisionScores – Ein systemsegmentierter Bildscore-Datensatz für Deep-Learning-Aufgaben 愿景-图像分集数据集,用于深层学习任务 2506.23030v1

Authors (2): Alejandro Romero Amezcua, Mariano José Juan Rivera Meraz

VisionScores presents a novel proposal being the first system-segmented image score dataset, aiming to offer structure-rich, high information-density images for machine and deep learning tasks. Delimited to two-handed piano pieces, it was built to consider not only certain graphic similarity but also composition patterns, as this creative process is highly instrument-dependent. It provides two scenarios in relation to composer and composition type. The first, formed by 14k samples, considers works from different authors but the same composition type, specifically, Sonatinas. The latter, consisting of 10.8K samples, presents the opposite case, various composition types from the same author, being the one selected Franz Liszt. All of the 24.8k samples are formatted as grayscale jpg images of $128 \times 512$ pixels. VisionScores supplies the users not only the formatted samples but the systems’ order and pieces’ metadata. Moreover, unsegmented full-page scores and the pre-formatted images are included for further analysis.

nan


Article 901

Title@2025-06-28 (6): Multimodal Contrastive Representation Learning in Augmented Biomedical Knowledge Graphs

Title: Multimodal Contrastive Representation Learning in Augmented Biomedical Knowledge Graphs Multimodales Kontrastives Repräsentationslernen in Augmented Biomedical Knowledge Graphs 生物医学知识强化图中多模式差异代表性学习 2501.01644v2

Authors (4): Tien Dang, Viet Thanh Duy Nguyen, Minh Tuan Le, Truong-Son Hy

Biomedical Knowledge Graphs (BKGs) integrate diverse datasets to elucidate complex relationships within the biomedical field. Effective link prediction on these graphs can uncover valuable connections, such as potential novel drug-disease relations. We introduce a novel multimodal approach that unifies embeddings from specialized Language Models (LMs) with Graph Contrastive Learning (GCL) to enhance intra-entity relationships while employing a Knowledge Graph Embedding (KGE) model to capture inter-entity relationships for effective link prediction. To address limitations in existing BKGs, we present PrimeKG++, an enriched knowledge graph incorporating multimodal data, including biological sequences and textual descriptions for each entity type. By combining semantic and relational information in a unified representation, our approach demonstrates strong generalizability, enabling accurate link predictions even for unseen nodes. Experimental results on PrimeKG++ and the DrugBank drug-target interaction dataset demonstrate the effectiveness and robustness of our method across diverse biomedical datasets. Our source code, pre-trained models, and data are publicly available at https://github.com/HySonLab/BioMedKG

nan


Article 902

Title@2025-06-28 (6): Spectra 1.1: Scaling Laws and Efficient Inference for Ternary Language Models

Title: Spectra 1.1: Scaling Laws and Efficient Inference for Ternary Language Models Spectra 1.1: Skalierungsgesetze und effiziente Schlussfolgerungen für Ternäre Sprachmodelle Spectra 1.1:扩大法律规模和短期语言模式的有效推论 2506.23025v1

Authors (8): Tejas Vaidhya, Ayush Kaushal, Vineet Jain, Francis Couture Harpin, Prashant Shishodia, Majid Behbahani, Yuriy Nevmyvaka, Irina Rish

Large language models (LLMs) are increasingly used across research and industry applications, yet their inference efficiency remains a significant challenge. As the computational power of modern GPU architectures continuously improves, their memory bandwidth and capacity have not scaled proportionally, creating a critical bottleneck during inference. To address this, we investigate ternary language models (TriLMs) that employ quantization-aware training to significantly reduce memory requirements. We first analyze the scalability of TriLMs by conducting a scaling law analysis, revealing that TriLMs benefit more from increasing training data than from scaling model parameters. Based on this observation, we introduce Spectra-1.1, an open suite of TriLMs trained on up to 1.2 trillion tokens, demonstrating sustained performance gains at scale. Furthermore, to improve inference efficiency, we propose novel 2-bit and 1.6-bit packing schemes for ternary weights, which demonstrate accelerated inference across various CPU architectures. Also, building on the 2-bit packing, we develop a GPU kernel called TriRun that accelerates end-to-end model inference by up to 5 times compared to floating-point baselines. To encourage further exploration and development of TriLMs, we will release the Spectra-1.1 suite and TriRun inference kernels. Overall, our work lays the foundation for building and deploying efficient LLMs, providing a valuable resource for the research community.

nan


Article 903

Title@2025-06-28 (6): BWLer: Barycentric Weight Layer Elucidates a Precision-Conditioning Tradeoff for PINNs

Title: BWLer: Barycentric Weight Layer Elucidates a Precision-Conditioning Tradeoff for PINNs BWLer: Barycentric Weight Layer erklärt einen Precision-Conditioning Tradeoff für PINNs BWLer: 巴里中心重量层 2506.23024v1

Authors (6): Jerry Liu, Yasa Baig, Denise Hui Jean Lee, Rajat Vadiraj Dwaraknath, Atri Rudra, Chris Ré

Physics-informed neural networks (PINNs) offer a flexible way to solve partial differential equations (PDEs) with machine learning, yet they still fall well short of the machine-precision accuracy many scientific tasks demand. In this work, we investigate whether the precision ceiling comes from the ill-conditioning of the PDEs or from the typical multi-layer perceptron (MLP) architecture. We introduce the Barycentric Weight Layer (BWLer), which models the PDE solution through barycentric polynomial interpolation. A BWLer can be added on top of an existing MLP (a BWLer-hat) or replace it completely (explicit BWLer), cleanly separating how we represent the solution from how we take derivatives for the PDE loss. Using BWLer, we identify fundamental precision limitations within the MLP: on a simple 1-D interpolation task, even MLPs with O(1e5) parameters stall around 1e-8 RMSE – about eight orders above float64 machine precision – before any PDE terms are added. In PDE learning, adding a BWLer lifts this ceiling and exposes a tradeoff between achievable accuracy and the conditioning of the PDE loss. For linear PDEs we fully characterize this tradeoff with an explicit error decomposition and navigate it during training with spectral derivatives and preconditioning. Across five benchmark PDEs, adding a BWLer on top of an MLP improves RMSE by up to 30x for convection, 10x for reaction, and 1800x for wave equations while remaining compatible with first-order optimizers. Replacing the MLP entirely lets an explicit BWLer reach near-machine-precision on convection, reaction, and wave problems (up to 10 billion times better than prior results) and match the performance of standard PINNs on stiff Burgers’ and irregular-geometry Poisson problems. Together, these findings point to a practical path for combining the flexibility of PINNs with the precision of classical spectral solvers.

nan


Article 904

Title@2025-06-28 (6): Scenario-Based Hierarchical Reinforcement Learning for Automated Driving Decision Making

Title: Scenario-Based Hierarchical Reinforcement Learning for Automated Driving Decision Making Szenariobasiertes Hierarchisches Verstärkungslernen für Automatisiertes Fahren bei Entscheidungsfindungen 用于自动驱动决策的基于情景的等级强化学习 2506.23023v1

Authors (3): M. Youssef Abdelhamid, Lennart Vater, Zlatan Ajanovic

Developing decision-making algorithms for highly automated driving systems remains challenging, since these systems have to operate safely in an open and complex environments. Reinforcement Learning (RL) approaches can learn comprehensive decision policies directly from experience and already show promising results in simple driving tasks. However, current approaches fail to achieve generalizability for more complex driving tasks and lack learning efficiency. Therefore, we present Scenario-based Automated Driving Reinforcement Learning (SAD-RL), the first framework that integrates Reinforcement Learning (RL) of hierarchical policy in a scenario-based environment. A high-level policy selects maneuver templates that are evaluated and executed by a low-level control logic. The scenario-based environment allows to control the training experience for the agent and to explicitly introduce challenging, but rate situations into the training process. Our experiments show that an agent trained using the SAD-RL framework can achieve safe behaviour in easy as well as challenging situations efficiently. Our ablation studies confirmed that both HRL and scenario diversity are essential for achieving these results.

nan


Article 905

Title@2025-06-28 (6): The Limited Impact of Medical Adaptation of Large Language and Vision-Language Models

Title: The Limited Impact of Medical Adaptation of Large Language and Vision-Language Models Die begrenzten Auswirkungen medizinischer Anpassung von großen Sprach- und Visions-Sprachenmodellen 大语言和视觉语言模式医学适应的有限影响 2411.08870v3

Authors (5): Daniel P. Jeong, Pranav Mani, Saurabh Garg, Zachary C. Lipton, Michael Oberst

Several recent works seek to adapt general-purpose large language models (LLMs) and vision-language models (VLMs) for medical applications through continued pretraining on publicly available biomedical corpora. These works typically claim that such domain-adaptive pretraining improves performance on various downstream medical tasks, such as answering medical exam questions. In this paper, we compare ten “medical” LLMs and two VLMs against their corresponding base models, arriving at a different conclusion: all medical VLMs and nearly all medical LLMs fail to consistently improve over their base models in the zero-/few-shot prompting and supervised fine-tuning regimes for medical question answering (QA). For instance, on clinical-note-based QA tasks in the 3-shot setting, medical LLMs outperform their base models in only 26.7% of cases, reach a (statistical) tie in 16.7% of cases, and perform significantly worse in the remaining 56.7% of cases. Our conclusions are based on (i) comparing each medical model directly against its base model; (ii) optimizing the prompts for each model separately in zero-/few-shot prompting; and (iii) accounting for statistical uncertainty in comparisons. Our findings suggest that state-of-the-art general-domain models may already exhibit strong medical knowledge and reasoning capabilities, and offer recommendations to strengthen the conclusions of future studies.

nan


Article 906

Title@2025-06-28 (6): On Universality of Non-Separable Approximate Message Passing Algorithms

Title: On Universality of Non-Separable Approximate Message Passing Algorithms Über Universalität nicht-trennbarer ungefährer Botschaften 关于不可分离的近似信息传递算法的普遍性 2506.23010v1

Authors (3): Max Lovig, Tianhao Wang, Zhou Fan

Mean-field characterizations of first-order iterative algorithms – including Approximate Message Passing (AMP), stochastic and proximal gradient descent, and Langevin diffusions – have enabled a precise understanding of learning dynamics in many statistical applications. For algorithms whose non-linearities have a coordinate-separable form, it is known that such characterizations enjoy a degree of universality with respect to the underlying data distribution. However, mean-field characterizations of non-separable algorithm dynamics have largely remained restricted to i.i.d. Gaussian or rotationally-invariant data. In this work, we initiate a study of universality for non-separable AMP algorithms. We identify a general condition for AMP with polynomial non-linearities, in terms of a Bounded Composition Property (BCP) for their representing tensors, to admit a state evolution that holds universally for matrices with non-Gaussian entries. We then formalize a condition of BCP-approximability for Lipschitz AMP algorithms to enjoy a similar universal guarantee. We demonstrate that many common classes of non-separable non-linearities are BCP-approximable, including local denoisers, spectral denoisers for generic signals, and compositions of separable functions with generic linear maps, implying the universality of state evolution for AMP algorithms employing these non-linearities.

nan


Article 907

Title@2025-06-28 (6): A Fundamental Accuracy–Robustness Trade-off in Regression and Classification

Title: A Fundamental Accuracy–Robustness Trade-off in Regression and Classification Eine grundlegende Genauigkeit–Robustheit Trade-off in Regression und Klassifizierung 回归和分类方面的基本准确性-易腐性 2411.05853v2

Authors (1): Sohail Bahmani

We derive a fundamental trade-off between standard and adversarial risk in a rather general situation that formalizes the following simple intuition: “If no (nearly) optimal predictor is smooth, adversarial robustness comes at the cost of accuracy.” As a concrete example, we evaluate the derived trade-off in regression with polynomial ridge functions under mild regularity conditions. Generalizing our analysis of this example, we formulate a necessary condition under which adversarial robustness can be achieved without significant degradation of the accuracy. This necessary condition is expressed in terms of a quantity that resembles the Poincar'{e} constant of the data distribution.

nan


Article 908

Title@2025-06-28 (6): Are Convex Optimization Curves Convex?

Title: Are Convex Optimization Curves Convex? Sind Convex-Optimierungskurven Convex? Convex 优化曲线曲线 Convex? 2503.10138v3

Authors (3): Guy Barzilai, Ohad Shamir, Moslem Zamani

In this paper, we study when we might expect the optimization curve induced by gradient descent to be \emph{convex} – precluding, for example, an initial plateau followed by a sharp decrease, making it difficult to decide when optimization should stop. Although such undesirable behavior can certainly occur when optimizing general functions, might it also occur in the benign and well-studied case of smooth convex functions? As far as we know, this question has not been tackled in previous work. We show, perhaps surprisingly, that the answer crucially depends on the choice of the step size. In particular, for the range of step sizes which are known to result in monotonic convergence to an optimal value, we characterize a regime where the optimization curve will be provably convex, and a regime where the curve can be non-convex. We also extend our results to gradient flow, and to the closely-related but different question of whether the gradient norm decreases monotonically.

nan


Article 909

Title@2025-06-28 (6): A Data Science Approach to Calcutta High Court Judgments: An Efficient LLM and RAG-powered Framework for Summarization and Similar Cases Retrieval

Title: A Data Science Approach to Calcutta High Court Judgments: An Efficient LLM and RAG-powered Framework for Summarization and Similar Cases Retrieval A Data Science Approach to Kalkutta High Court Urteile: Ein effizienter LLM- und RAG-gestützter Rahmen für Zusammenfassungen und ähnliche Fälle 加尔各答高等法院判决的数据科学方法:高效LLM和RAG驱动的总结和类似案件检索框架 2507.01058v1

Authors (5): Puspendu Banerjee, Aritra Mazumdar, Wazib Ansar, Saptarsi Goswami, Amlan Chakrabarti

The judiciary, as one of democracy’s three pillars, is dealing with a rising amount of legal issues, needing careful use of judicial resources. This research presents a complex framework that leverages Data Science methodologies, notably Large Language Models (LLM) and Retrieval-Augmented Generation (RAG) techniques, to improve the efficiency of analyzing Calcutta High Court verdicts. Our framework focuses on two key aspects: first, the creation of a robust summarization mechanism that distills complex legal texts into concise and coherent summaries; and second, the development of an intelligent system for retrieving similar cases, which will assist legal professionals in research and decision making. By fine-tuning the Pegasus model using case head note summaries, we achieve significant improvements in the summarization of legal cases. Our two-step summarizing technique preserves crucial legal contexts, allowing for the production of a comprehensive vector database for RAG. The RAG-powered framework efficiently retrieves similar cases in response to user queries, offering thorough overviews and summaries. This technique not only improves legal research efficiency, but it also helps legal professionals and students easily acquire and grasp key legal information, benefiting the overall legal scenario.

nan


Article 910

Title@2025-06-28 (6): A Reinforcement Learning Approach for Optimal Control in Microgrids

Title: A Reinforcement Learning Approach for Optimal Control in Microgrids Ein Verstärkungs-Lernansatz zur optimalen Steuerung in Microgrids 微电网优化控制强化学习方法 2506.22995v1

Authors (4): Davide Salaorni, Federico Bianchi, Francesco Trovò, Marcello Restelli

The increasing integration of renewable energy sources (RESs) is transforming traditional power grid networks, which require new approaches for managing decentralized energy production and consumption. Microgrids (MGs) provide a promising solution by enabling localized control over energy generation, storage, and distribution. This paper presents a novel reinforcement learning (RL)-based methodology for optimizing microgrid energy management. Specifically, we propose an RL agent that learns optimal energy trading and storage policies by leveraging historical data on energy production, consumption, and market prices. A digital twin (DT) is used to simulate the energy storage system dynamics, incorporating degradation factors to ensure a realistic emulation of the analysed setting. Our approach is validated through an experimental campaign using real-world data from a power grid located in the Italian territory. The results indicate that the proposed RL-based strategy outperforms rule-based methods and existing RL benchmarks, offering a robust solution for intelligent microgrid management.

nan


Article 911

Title@2025-06-28 (6): Foundation Models for Wearable Movement Data in Mental Health Research

Title: Foundation Models for Wearable Movement Data in Mental Health Research Gründungsmodelle für Wearable Movement Data in der Mental Health Research 心理健康研究中穿戴运动数据模型基金会 2411.15240v4

Authors (5): Franklin Y. Ruan, Aiwei Zhang, Jenny Y. Oh, SouYoung Jin, Nicholas C. Jacobson

Pretrained foundation models and transformer architectures have driven the success of large language models (LLMs) and other modern AI breakthroughs. However, similar advancements in health data modeling remain limited due to the need for innovative adaptations. Wearable movement data offers a valuable avenue for exploration, as it’s a core feature in nearly all commercial smartwatches, well established in clinical and mental health research, and the sequential nature of the data shares similarities to language. We introduce the Pretrained Actigraphy Transformer (PAT), the first open source foundation model designed for time-series wearable movement data. Leveraging transformer-based architectures and novel techniques, such as patch embeddings, and pretraining on data from 29,307 participants in a national U.S. sample, PAT achieves state-of-the-art performance in several mental health prediction tasks. PAT is also lightweight and easily interpretable, making it a robust tool for mental health research. GitHub: https://github.com/njacobsonlab/Pretrained-Actigraphy-Transformer/

nan


Article 912

Title@2025-06-28 (6): Kernel Outlier Detection

Title: Kernel Outlier Detection Kernaussenerkennung 核心外星探测 2506.22994v1

Authors (3): Can Hakan Dağıdır, Mia Hubert, Peter J. Rousseeuw

A new anomaly detection method called kernel outlier detection (KOD) is proposed. It is designed to address challenges of outlier detection in high-dimensional settings. The aim is to overcome limitations of existing methods, such as dependence on distributional assumptions or on hyperparameters that are hard to tune. KOD starts with a kernel transformation, followed by a projection pursuit approach. Its novelties include a new ensemble of directions to search over, and a new way to combine results of different direction types. This provides a flexible and lightweight approach for outlier detection. Our empirical evaluations illustrate the effectiveness of KOD on three small datasets with challenging structures, and on four large benchmark datasets.

nan


Article 913

Title@2025-06-28 (6): Recommender Systems for Good (RS4Good): Survey of Use Cases and a Call to Action for Research that Matters

Title: Recommender Systems for Good (RS4Good): Survey of Use Cases and a Call to Action for Research that Matters Recommender Systems for Good (RS4Good): Umfrage von Use Cases und ein Aufruf zur Aktion für Forschung, die zählt 良好建议系统(RS4Good):调查使用案例和呼吁采取行动开展重要研究 2411.16645v2

Authors (4): Dietmar Jannach, Alan Said, Marko Tkalčič, Markus Zanker

In the area of recommender systems, the vast majority of research efforts is spent on developing increasingly sophisticated recommendation models, also using increasingly more computational resources. Unfortunately, most of these research efforts target a very small set of application domains, mostly e-commerce and media recommendation. Furthermore, many of these models are never evaluated with users, let alone put into practice. The scientific, economic and societal value of much of these efforts by scholars therefore remains largely unclear. To achieve a stronger positive impact resulting from these efforts, we posit that we as a research community should more often address use cases where recommender systems contribute to societal good (RS4Good). In this opinion piece, we first discuss a number of examples where the use of recommender systems for problems of societal concern has been successfully explored in the literature. We then proceed by outlining a paradigmatic shift that is needed to conduct successful RS4Good research, where the key ingredients are interdisciplinary collaborations and longitudinal evaluation approaches with humans in the loop.

nan


Article 914

Title@2025-06-28 (6): Early Stopping Tabular In-Context Learning

Title: Early Stopping Tabular In-Context Learning Frühzeitiges Stoppen des tabellarischen In-Context-Lernens 早期停止制表列表内容学习 2506.21387v2

Authors (3): Jaris Küken, Lennart Purucker, Frank Hutter

Tabular foundation models have shown strong performance across various tabular learning tasks via in-context learning, offering robust generalization without any downstream finetuning. However, their inference-time costs remain high, particularly for larger datasets. To address this, we propose early-stopping the in-context learning process. We achieve this by dynamically evaluating whether to stop in-context learning after each Transformer encoder layer. Once stopped, we decode the embedding using a pre-trained layer-wise decoder. Experiments across 34 small classification tasks size show that early stopping in-context learning accelerates inference by up to x1.3 with negligible degradation in predictive performance. To assess scalability, we further evaluate our method on five larger classification tasks, achieving speedups of up to x2.2. Our results demonstrate the potential of early exiting as an effective and practical strategy for improving the efficiency of tabular in-context learning.

nan


Article 915

Title@2025-06-28 (6): Cybersecurity-Focused Anomaly Detection in Connected Autonomous Vehicles Using Machine Learning

Title: Cybersecurity-Focused Anomaly Detection in Connected Autonomous Vehicles Using Machine Learning Cybersecurity-fokussierte Anomalienerkennung in vernetzten autonomen Fahrzeugen mit maschinellem Lernen 使用机器学习的连连连自主车辆网络安全-以计算机安全为主的异常现象探测 2506.22984v1

Authors (6): Prathyush Kumar Reddy Lebaku, Lu Gao, Yunpeng Zhang, Zhixia Li, Yongxin Liu, Tanvir Arafin

Anomaly detection in connected autonomous vehicles (CAVs) is crucial for maintaining safe and reliable transportation networks, as CAVs can be susceptible to sensor malfunctions, cyber-attacks, and unexpected environmental disruptions. This study explores an anomaly detection approach by simulating vehicle behavior, generating a dataset that represents typical and atypical vehicular interactions. The dataset includes time-series data of position, speed, and acceleration for multiple connected autonomous vehicles. We utilized machine learning models to effectively identify abnormal driving patterns. First, we applied a stacked Long Short-Term Memory (LSTM) model to capture temporal dependencies and sequence-based anomalies. The stacked LSTM model processed the sequential data to learn standard driving behaviors. Additionally, we deployed a Random Forest model to support anomaly detection by offering ensemble-based predictions, which enhanced model interpretability and performance. The Random Forest model achieved an R2 of 0.9830, MAE of 5.746, and a 95th percentile anomaly threshold of 14.18, while the stacked LSTM model attained an R2 of 0.9998, MAE of 82.425, and a 95th percentile anomaly threshold of 265.63. These results demonstrate the models’ effectiveness in accurately predicting vehicle trajectories and detecting anomalies in autonomous driving scenarios.

nan


Article 916

Title@2025-06-28 (6): Fine-Tuning Next-Scale Visual Autoregressive Models with Group Relative Policy Optimization

Title: Fine-Tuning Next-Scale Visual Autoregressive Models with Group Relative Policy Optimization Feintuning Next-Scale Visual Autoregressive Modelle mit gruppenrelativer Politikoptimierung 采用群体相对政策优化优化的 下尺度视觉自动递减模型 2505.23331v2

Authors (2): Matteo Gallici, Haitz Sáez de Ocáriz Borde

Fine-tuning pre-trained generative models with Reinforcement Learning (RL) has emerged as an effective approach for aligning outputs more closely with nuanced human preferences. In this paper, we investigate the application of Group Relative Policy Optimization (GRPO) to fine-tune next-scale visual autoregressive (VAR) models. Our empirical results demonstrate that this approach enables alignment to intricate reward signals derived from aesthetic predictors and CLIP embeddings, significantly enhancing image quality and enabling precise control over the generation style. Interestingly, by leveraging CLIP, our method can help VAR models generalize beyond their initial ImageNet distribution: through RL-driven exploration, these models can generate images aligned with prompts referencing image styles that were absent during pre-training. In summary, we show that RL-based fine-tuning is both efficient and effective for VAR models, benefiting particularly from their fast inference speeds, which are advantageous for online sampling, an aspect that poses significant challenges for diffusion-based alternatives.

nan


Article 917

Title@2025-06-28 (6): Time-MQA: Time Series Multi-Task Question Answering with Context Enhancement

Title: Time-MQA: Time Series Multi-Task Question Answering with Context Enhancement Time-MQA: Zeitreihe Multi-Task-Fragebeantwortung mit Kontextverbesserung 时间-MQA:时间系列多任务问题,加强背景回答 2503.01875v2

Authors (8): Yaxuan Kong, Yiyuan Yang, Yoontae Hwang, Wenjie Du, Stefan Zohren, Zhangyang Wang, Ming Jin, Qingsong Wen

Time series data are foundational in finance, healthcare, and energy domains. However, most existing methods and datasets remain focused on a narrow spectrum of tasks, such as forecasting or anomaly detection. To bridge this gap, we introduce Time Series Multi-Task Question Answering (Time-MQA), a unified framework that enables natural language queries across multiple time series tasks - numerical analytical tasks and open-ended question answering with reasoning. Central to Time-MQA is the TSQA dataset, a large-scale dataset containing $\sim$200k question-answer pairs derived from diverse time series spanning environment, traffic, etc. This comprehensive resource covers various time series lengths and promotes robust model development. We further demonstrate how continually pre-training large language models (Mistral 7B, Llama-3 8B, and Qwen-2.5 7B) on the TSQA dataset enhanced time series reasoning capabilities, moving beyond mere numeric tasks and enabling more advanced and intuitive interactions with temporal data. The complete TSQA dataset, models, user study questionnaires for evaluation, and other related materials have been open-sourced.

nan


Article 918

Title@2025-06-28 (6): On the Generalizability of “Competition of Mechanisms: Tracing How Language Models Handle Facts and Counterfactuals”

Title: On the Generalizability of “Competition of Mechanisms: Tracing How Language Models Handle Facts and Counterfactuals” Zur Verallgemeinerbarkeit von “Wettbewerb von Mechanismen: Aufspüren, wie Sprachmodelle mit Fakten und Gegenfakten umgehen” 关于“机制的竞争:追踪语言模式如何处理事实和反事实”的一般性 2506.22977v1

Authors (5): Asen Dotsinski, Udit Thakur, Marko Ivanov, Mohammad Hafeez Khan, Maria Heuss

We present a reproduction study of “Competition of Mechanisms: Tracing How Language Models Handle Facts and Counterfactuals” (Ortu et al., 2024), which investigates competition of mechanisms in language models between factual recall and counterfactual in-context repetition. Our study successfully reproduces their primary findings regarding the localization of factual and counterfactual information, the dominance of attention blocks in mechanism competition, and the specialization of attention heads in handling competing information. We reproduce their results on both GPT-2 (Radford et al., 2019) and Pythia 6.9B (Biderman et al., 2023). We extend their work in three significant directions. First, we explore the generalizability of these findings to even larger models by replicating the experiments on Llama 3.1 8B (Grattafiori et al., 2024), discovering greatly reduced attention head specialization. Second, we investigate the impact of prompt structure by introducing variations where we avoid repeating the counterfactual statement verbatim or we change the premise word, observing a marked decrease in the logit for the counterfactual token. Finally, we test the validity of the authors’ claims for prompts of specific domains, discovering that certain categories of prompts skew the results by providing the factual prediction token as part of the subject of the sentence. Overall, we find that the attention head ablation proposed in Ortu et al. (2024) is ineffective for domains that are underrepresented in their dataset, and that the effectiveness varies based on model architecture, prompt structure, domain and task.

nan


Article 919

Title@2025-06-28 (6): Interpretable LLM-based Table Question Answering

Title: Interpretable LLM-based Table Question Answering Interpretierbare LLM-basierte Tabellenfragebeantwortung 基于表问题的回答 2412.12386v3

Authors (6): Giang Nguyen, Ivan Brugere, Shubham Sharma, Sanjay Kariyappa, Anh Totti Nguyen, Freddy Lecue

Interpretability in Table Question Answering (Table QA) is critical, especially in high-stakes domains like finance and healthcare. While recent Table QA approaches based on Large Language Models (LLMs) achieve high accuracy, they often produce ambiguous explanations of how answers are derived. We propose Plan-of-SQLs (POS), a new Table QA method that makes the model’s decision-making process interpretable. POS decomposes a question into a sequence of atomic steps, each directly translated into an executable SQL command on the table, thereby ensuring that every intermediate result is transparent. Through extensive experiments, we show that: First, POS generates the highest-quality explanations among compared methods, which markedly improves the users’ ability to simulate and verify the model’s decisions. Second, when evaluated on standard Table QA benchmarks (TabFact, WikiTQ, and FeTaQA), POS achieves QA accuracy that is competitive to existing methods, while also offering greater efficiency-requiring significantly fewer LLM calls and table database queries (up to 25x fewer)-and more robust performance on large-sized tables. Finally, we observe high agreement (up to 90.59% in forward simulation) between LLMs and human users when making decisions based on the same explanations, suggesting that LLMs could serve as an effective proxy for humans in evaluating Table QA explanations.

nan


Article 920

Title@2025-06-28 (6): ActAlign: Zero-Shot Fine-Grained Video Classification via Language-Guided Sequence Alignment

Title: ActAlign: Zero-Shot Fine-Grained Video Classification via Language-Guided Sequence Alignment AktAlign: Nullschnelle feinkörnige Videoklassifikation über sprachgeführte Sequenzausrichtung 通过语言引导序列调整的零热精细视频分类 2506.22967v1

Authors (2): Amir Aghdam, Vincent Tao Hu

We address the task of zero-shot fine-grained video classification, where no video examples or temporal annotations are available for unseen action classes. While contrastive vision-language models such as SigLIP demonstrate strong open-set recognition via mean-pooled image-text similarity, they fail to capture the temporal structure critical for distinguishing fine-grained activities. We introduce ActAlign, a zero-shot framework that formulates video classification as sequence alignment. For each class, a large language model generates an ordered sub-action sequence, which is aligned with video frames using Dynamic Time Warping (DTW) in a shared embedding space. Without any video-text supervision or fine-tuning, ActAlign achieves 30.5% accuracy on the extremely challenging ActionAtlas benchmark, where human accuracy is only 61.6%. ActAlign outperforms billion-parameter video-language models while using approximately 8x less parameters. These results demonstrate that structured language priors, combined with classical alignment techniques, offer a scalable and general approach to unlocking the open-set recognition potential of vision-language models for fine-grained video understanding.

nan


Article 921

Title@2025-06-28 (6): CN-SBM: Categorical Block Modelling For Primary and Residual Copy Number Variation

Title: CN-SBM: Categorical Block Modelling For Primary and Residual Copy Number Variation CN-SBM: Kategorische Blockmodellierung für primäre und residuale Kopiernummernvariation CNSBM: 初级和残余复制品分类块模型 2506.22963v1

Authors (7): Kevin Lam, William Daniels, J Maxwell Douglas, Daniel Lai, Samuel Aparicio, Benjamin Bloem-Reddy, Yongjin Park

Cancer is a genetic disorder whose clonal evolution can be monitored by tracking noisy genome-wide copy number variants. We introduce the Copy Number Stochastic Block Model (CN-SBM), a probabilistic framework that jointly clusters samples and genomic regions based on discrete copy number states using a bipartite categorical block model. Unlike models relying on Gaussian or Poisson assumptions, CN-SBM respects the discrete nature of CNV calls and captures subpopulation-specific patterns through block-wise structure. Using a two-stage approach, CN-SBM decomposes CNV data into primary and residual components, enabling detection of both large-scale chromosomal alterations and finer aberrations. We derive a scalable variational inference algorithm for application to large cohorts and high-resolution data. Benchmarks on simulated and real datasets show improved model fit over existing methods. Applied to TCGA low-grade glioma data, CN-SBM reveals clinically relevant subtypes and structured residual variation, aiding patient stratification in survival analysis. These results establish CN-SBM as an interpretable, scalable framework for CNV analysis with direct relevance for tumor heterogeneity and prognosis.

nan


Article 922

Title@2025-06-28 (6): Infinite Sampling: Efficient and Stable Grouped RL Training for Large Language Models

Title: Infinite Sampling: Efficient and Stable Grouped RL Training for Large Language Models Infinite Sampling: Effiziente und stabile RL-Schulung für große Sprachmodelle 无限抽样:高语言模式有效和稳定的组合RL培训 2506.22950v1

Authors (6): Liangyu Wang, Huanyi Xie, Xinhai Wang, Tianjin Huang, Mengdi Li, Di Wang

Group-based reinforcement learning algorithms such as Group Reward Policy Optimization (GRPO) have proven effective for fine-tuning large language models (LLMs) with human feedback. However, generating and storing multiple responses per prompt incurs substantial memory overhead, especially as the sample group size increases, limiting scalability under constrained hardware. We propose Infinite Sampling, a framework that enables efficient and stable GRPO training by decoupling group size from GPU memory usage. It consists of: (1) micro sampling groups that decompose large groups into memory-feasible rounds; (2) continuous sampling that interleaves generation across groups to improve utilization; and (3) a length-aware scheduler combining token-conditioned sequence length prediction with a two-stage plan: global grouping via FPTAS and runtime refill via SJF. Experiments show that our Micro Sampling Groups reduce peak memory usage by over 50% compared to full-group decoding (e.g., from 21.55 GB to 10.64 GB on Qwen3-1.7B). Building on this, Infinite Sampling improves throughput by over 25% compared to the naive micro sampling group method, reducing decoding steps while maintaining full-length completions and memory usage. Our hybrid scheduling ensures efficient and stable GRPO training with larger groups under realistic GPU memory constraints.

nan


Article 923

Title@2025-06-28 (6): A Study on Semi-Supervised Detection of DDoS Attacks under Class Imbalance

Title: A Study on Semi-Supervised Detection of DDoS Attacks under Class Imbalance Eine Studie über die halbüberwachte Erkennung von DDoS-Angriffen unter Klassenungleichgewicht 关于在 “ 不平衡 “ 类下对DDoS攻击的半有效探测研究 2506.22949v1

Authors (4): Ehsan Hallaji, Vaishnavi Shanmugam, Roozbeh Razavi-Far, Mehrdad Saif

One of the most difficult challenges in cybersecurity is eliminating Distributed Denial of Service (DDoS) attacks. Automating this task using artificial intelligence is a complex process due to the inherent class imbalance and lack of sufficient labeled samples of real-world datasets. This research investigates the use of Semi-Supervised Learning (SSL) techniques to improve DDoS attack detection when data is imbalanced and partially labeled. In this process, 13 state-of-the-art SSL algorithms are evaluated for detecting DDoS attacks in several scenarios. We evaluate their practical efficacy and shortcomings, including the extent to which they work in extreme environments. The results will offer insight into designing intelligent Intrusion Detection Systems (IDSs) that are robust against class imbalance and handle partially labeled data.

nan


Article 924

Title@2025-06-28 (6): Efficient Shallow Ritz Method For 1D Diffusion-Reaction Problems

Title: Efficient Shallow Ritz Method For 1D Diffusion-Reaction Problems Effiziente Ritz-Methode für 1D-Diffusionsreaktionsprobleme 用于1D 扩散反应问题的高效浅流机法 2407.01496v3

Authors (4): Zhiqiang Cai, Anastassia Doktorova, Robert D. Falgout, César Herrera

This paper studies the shallow Ritz method for solving one-dimensional diffusion-reaction problems. The method is capable of improving the order of approximation for non-smooth problems. By following a similar approach to the one presented in [9], we present a damped block Newton (dBN) method to achieve nearly optimal order of approximation. The dBN method optimizes the Ritz functional by alternating between the linear and non-linear parameters of the shallow ReLU neural network (NN). For diffusion-reaction problems, new difficulties arise: (1) for the linear parameters, the mass matrix is dense and even more ill-conditioned than the stiffness matrix, and (2) for the non-linear parameters, the Hessian matrix is dense and may be singular. This paper addresses these challenges, resulting in a dBN method with computational cost of ${\cal O}(n)$. The ideas presented for diffusion-reaction problems can also be applied to least-squares approximation problems. For both applications, starting with the non-linear parameters as a uniform partition, numerical experiments show that the dBN method moves the mesh points to nearly optimal locations.

nan


Article 925

Title@2025-06-28 (6): Orthogonal Gradient Descent Improves Neural Calibration

Title: Orthogonal Gradient Descent Improves Neural Calibration Orthogonal gradient Descent verbessert die Neuralkalibrierung 骨质梯度梯度下层改善神经校准 2506.04487v2

Authors (1): C. Evans Hedges

We provide evidence that orthogonalizing gradients during training improves model calibration without sacrificing accuracy. On CIFAR-10 with 10\% labeled data, $\perp$Grad matches SGD in accuracy but yields consistently improved calibration metrics such as lower test loss, reduced softmax overconfidence, and higher predictive entropy. These benefits persist under input corruption (CIFAR-10C) and extended training, where $\perp$Grad models degrade more gracefully than SGD-trained counterparts. $\perp$Grad is optimizer-agnostic, incurs minimal overhead, and works well with post-hoc calibration techniques like temperature scaling. Theoretically, we prove convergence of a simplified version of $\perp$Grad under mild assumptions and characterize its stationary points in positive homogeneous networks: $\perp$Grad converges to solutions where further loss reduction requires confidence scaling rather than decision boundary improvement. Code for this paper can be found at: https://github.com/evanshedges2/orthograd_improves_calibration.

nan


Article 926

Title@2025-06-28 (6): Transformer Encoder and Multi-features Time2Vec for Financial Prediction

Title: Transformer Encoder and Multi-features Time2Vec for Financial Prediction Transformer Encoder und Multi-Features Time2Vec für finanzielle Vorhersagen 用于金融预测的时间2Vec 2504.13801v2

Authors (4): Nguyen Kim Hai Bui, Nguyen Duy Chien, Péter Kovács, Gergő Bognár

Financial prediction is a complex and challenging task of time series analysis and signal processing, expected to model both short-term fluctuations and long-term temporal dependencies. Transformers have remarkable success mostly in natural language processing using attention mechanism, which also influenced the time series community. The ability to capture both short and long-range dependencies helps to understand the financial market and to recognize price patterns, leading to successful applications of Transformers in stock prediction. Although, the previous research predominantly focuses on individual features and singular predictions, that limits the model’s ability to understand broader market trends. In reality, within sectors such as finance and technology, companies belonging to the same industry often exhibit correlated stock price movements. In this paper, we develop a novel neural network architecture by integrating Time2Vec with the Encoder of the Transformer model. Based on the study of different markets, we propose a novel correlation feature selection method. Through a comprehensive fine-tuning of multiple hyperparameters, we conduct a comparative analysis of our results against benchmark models. We conclude that our method outperforms other state-of-the-art encoding methods such as positional encoding, and we also conclude that selecting correlation features enhance the accuracy of predicting multiple stock prices.

nan


Article 927

Title@2025-06-28 (6): Utilizing a Novel Deep Learning Method for Scene Categorization in Remote Sensing Data

Title: Utilizing a Novel Deep Learning Method for Scene Categorization in Remote Sensing Data Verwendung einer neuartigen Deep-Learning-Methode für die Kategorisierung von Szenen in Daten der Fernerkundung 利用新深层学习方法对遥感数据进行星 域分类 2506.22939v1

Authors (8): Ghufran A. Omran, Wassan Saad Abduljabbar Hayale, Ahmad AbdulQadir AlRababah, Israa Ibraheem Al-Barazanchi, Ravi Sekhar, Pritesh Shah, Sushma Parihar, Harshavardhan Reddy Penubadi

Scene categorization (SC) in remotely acquired images is an important subject with broad consequences in different fields, including catastrophe control, ecological observation, architecture for cities, and more. Nevertheless, its several apps, reaching a high degree of accuracy in SC from distant observation data has demonstrated to be difficult. This is because traditional conventional deep learning models require large databases with high variety and high levels of noise to capture important visual features. To address these problems, this investigation file introduces an innovative technique referred to as the Cuttlefish Optimized Bidirectional Recurrent Neural Network (CO- BRNN) for type of scenes in remote sensing data. The investigation compares the execution of CO-BRNN with current techniques, including Multilayer Perceptron- Convolutional Neural Network (MLP-CNN), Convolutional Neural Network-Long Short Term Memory (CNN-LSTM), and Long Short Term Memory-Conditional Random Field (LSTM-CRF), Graph-Based (GB), Multilabel Image Retrieval Model (MIRM-CF), Convolutional Neural Networks Data Augmentation (CNN-DA). The results demonstrate that CO-BRNN attained the maximum accuracy of 97%, followed by LSTM-CRF with 90%, MLP-CNN with 85%, and CNN-LSTM with 80%. The study highlights the significance of physical confirmation to ensure the efficiency of satellite data.

nan


Article 928

Title@2025-06-28 (6): Efficient Cybersecurity Assessment Using SVM and Fuzzy Evidential Reasoning for Resilient Infrastructure

Title: Efficient Cybersecurity Assessment Using SVM and Fuzzy Evidential Reasoning for Resilient Infrastructure Effiziente Cybersecurity-Bewertung mittels SVM und Fuzzy Evidential Reasoning für resiliente Infrastruktur 利用SVM和模糊的证明理由进行高效的网络安全评估 2506.22938v1

Authors (6): Zaydon L. Ali, Wassan Saad Abduljabbar Hayale, Israa Ibraheem Al_Barazanchi, Ravi Sekhar, Pritesh Shah, Sushma Parihar

With current advancement in hybermedia knowledges, the privacy of digital information has developed a critical problem. To overawed the susceptibilities of present security protocols, scholars tend to focus mainly on efforts on alternation of current protocols. Over past decade, various proposed encoding models have been shown insecurity, leading to main threats against significant data. Utilizing the suitable encryption model is very vital means of guard against various such, but algorithm is selected based on the dependency of data which need to be secured. Moreover, testing potentiality of the security assessment one by one to identify the best choice can take a vital time for processing. For faster and precisive identification of assessment algorithm, we suggest a security phase exposure model for cipher encryption technique by invoking Support Vector Machine (SVM). In this work, we form a dataset using usual security components like contrast, homogeneity. To overcome the uncertainty in analysing the security and lack of ability of processing data to a risk assessment mechanism. To overcome with such complications, this paper proposes an assessment model for security issues using fuzzy evidential reasoning (ER) approaches. Significantly, the model can be utilised to process and assemble risk assessment data on various aspects in systematic ways. To estimate the performance of our framework, we have various analyses like, recall, F1 score and accuracy.

nan


Article 929

Title@2025-06-28 (6): Differentiable Radar Ambiguity Functions: Mathematical Formulation and Computational Implementation

Title: Differentiable Radar Ambiguity Functions: Mathematical Formulation and Computational Implementation Differentielle Radar-Ambiguitätsfunktionen: Mathematische Formulierung und Computational Implementation 可区别的雷达模糊功能:数学拟订和计算实施 2506.22935v1

Authors (1): Marc Bara Iniesta

The ambiguity function is fundamental to radar waveform design, characterizing range and Doppler resolution capabilities. However, its traditional formulation involves non-differentiable operations, preventing integration with gradient-based optimization methods and modern machine learning frameworks. This paper presents the first complete mathematical framework and computational implementation for differentiable radar ambiguity functions. Our approach addresses the fundamental technical challenges that have prevented the radar community from leveraging automatic differentiation: proper handling of complex-valued gradients using Wirtinger calculus, efficient computation through parallelized FFT operations, numerical stability throughout cascaded operations, and composability with arbitrary differentiable operations. We term this approach GRAF (Gradient-based Radar Ambiguity Functions), which reformulates the ambiguity function computation to maintain mathematical equivalence while enabling gradient flow through the entire pipeline. The resulting implementation provides a general-purpose differentiable ambiguity function compatible with modern automatic differentiation frameworks, enabling new research directions including neural network-based waveform generation with ambiguity constraints, end-to-end optimization of radar systems, and integration of classical radar theory with modern deep learning. We provide complete implementation details and demonstrate computational efficiency suitable for practical applications. This work establishes the mathematical and computational foundation for applying modern machine learning techniques to radar waveform design, bridging classical radar signal processing with automatic differentiation frameworks.

nan


Article 930

Title@2025-06-28 (6): FedDTG:Federated Data-Free Knowledge Distillation via Three-Player Generative Adversarial Networks

Title: FedDTG:Federated Data-Free Knowledge Distillation via Three-Player Generative Adversarial Networks FedDTG:Federated Data-Free Knowledge Destillation über drei Spieler Generative Adversarial Networks FDTG:通过三层生成反向网络进行联邦无数据知识蒸馏 2201.03169v5

Authors (3): Lingzhi Gao, Zhenyuan Zhang, Chao Wu

While existing federated learning approaches primarily focus on aggregating local models to construct a global model, in realistic settings, some clients may be reluctant to share their private models due to the inclusion of privacy-sensitive information. Knowledge distillation, which can extract model knowledge without accessing model parameters, is well-suited for this federated scenario. However, most distillation methods in federated learning (federated distillation) require a proxy dataset, which is difficult to obtain in the real world. Therefore, in this paper, we introduce a distributed three-player Generative Adversarial Network (GAN) to implement data-free mutual distillation and propose an effective method called FedDTG. We confirmed that the fake samples generated by GAN can make federated distillation more efficient and robust. Additionally, the distillation process between clients can deliver good individual client performance while simultaneously acquiring global knowledge and protecting data privacy. Our extensive experiments on benchmark vision datasets demonstrate that our method outperforms other federated distillation algorithms in terms of generalization.

nan


Article 931

Title@2025-06-28 (6): Mathematical Computation on High-dimensional Data via Array Programming and Parallel Acceleration

Title: Mathematical Computation on High-dimensional Data via Array Programming and Parallel Acceleration Mathematische Berechnung hochdimensionaler Daten über Array-Programmierung und parallele Beschleunigung 通过阵列编程和平行加速对高维数据进行数学计算 2506.22929v1

Authors (1): Chen Zhang

While deep learning excels in natural image and language processing, its application to high-dimensional data faces computational challenges due to the dimensionality curse. Current large-scale data tools focus on business-oriented descriptive statistics, lacking mathematical statistics support for advanced analysis. We propose a parallel computation architecture based on space completeness, decomposing high-dimensional data into dimension-independent structures for distributed processing. This framework enables seamless integration of data mining and parallel-optimized machine learning methods, supporting scientific computations across diverse data types like medical and natural images within a unified system.

nan


Article 932

Title@2025-06-28 (6): HalluSegBench: Counterfactual Visual Reasoning for Segmentation Hallucination Evaluation

Title: HalluSegBench: Counterfactual Visual Reasoning for Segmentation Hallucination Evaluation HalluSegBench: Counterfactual Visual Reasoning for Segmentation Halluzination Evaluation HalluSegeBench:截肢幻觉评价的反事实视觉理由 2506.21546v2

Authors (6): Xinzhuo Li, Adheesh Juvekar, Xingyou Liu, Muntasir Wahed, Kiet A. Nguyen, Ismini Lourentzou

Recent progress in vision-language segmentation has significantly advanced grounded visual understanding. However, these models often exhibit hallucinations by producing segmentation masks for objects not grounded in the image content or by incorrectly labeling irrelevant regions. Existing evaluation protocols for segmentation hallucination primarily focus on label or textual hallucinations without manipulating the visual context, limiting their capacity to diagnose critical failures. In response, we introduce HalluSegBench, the first benchmark specifically designed to evaluate hallucinations in visual grounding through the lens of counterfactual visual reasoning. Our benchmark consists of a novel dataset of 1340 counterfactual instance pairs spanning 281 unique object classes, and a set of newly introduced metrics that quantify hallucination sensitivity under visually coherent scene edits. Experiments on HalluSegBench with state-of-the-art vision-language segmentation models reveal that vision-driven hallucinations are significantly more prevalent than label-driven ones, with models often persisting in false segmentation, highlighting the need for counterfactual reasoning to diagnose grounding fidelity.

nan


Article 933

Title@2025-06-28 (6): SConU: Selective Conformal Uncertainty in Large Language Models

Title: SConU: Selective Conformal Uncertainty in Large Language Models SConU: Selektive konforme Unsicherheit in großen Sprachmodellen SCONU:大语言模式中选择性的形式不确定性 2504.14154v2

Authors (7): Zhiyuan Wang, Qingni Wang, Yue Zhang, Tianlong Chen, Xiaofeng Zhu, Xiaoshuang Shi, Kaidi Xu

As large language models are increasingly utilized in real-world applications, guarantees of task-specific metrics are essential for their reliable deployment. Previous studies have introduced various criteria of conformal uncertainty grounded in split conformal prediction, which offer user-specified correctness coverage. However, existing frameworks often fail to identify uncertainty data outliers that violate the exchangeability assumption, leading to unbounded miscoverage rates and unactionable prediction sets. In this paper, we propose a novel approach termed Selective Conformal Uncertainty (SConU), which, for the first time, implements significance tests, by developing two conformal p-values that are instrumental in determining whether a given sample deviates from the uncertainty distribution of the calibration set at a specific manageable risk level. Our approach not only facilitates rigorous management of miscoverage rates across both single-domain and interdisciplinary contexts, but also enhances the efficiency of predictions. Furthermore, we comprehensively analyze the components of the conformal procedures, aiming to approximate conditional coverage, particularly in high-stakes question-answering tasks.

nan


Article 934

Title@2025-06-28 (6): Towards Time Series Generation Conditioned on Unstructured Natural Language

Title: Towards Time Series Generation Conditioned on Unstructured Natural Language Zur Zeitreihe Erzeugung auf unstrukturierte natürliche Sprache 以无结构的自然语言为条件的一代人 2506.22927v1

Authors (3): Jaeyun Woo, Jiseok Lee, Brian Kenji Iwana

Generative Artificial Intelligence (AI) has rapidly become a powerful tool, capable of generating various types of data, such as images and text. However, despite the significant advancement of generative AI, time series generative AI remains underdeveloped, even though the application of time series is essential in finance, climate, and numerous fields. In this research, we propose a novel method of generating time series conditioned on unstructured natural language descriptions. We use a diffusion model combined with a language model to generate time series from the text. Through the proposed method, we demonstrate that time series generation based on natural language is possible. The proposed method can provide various applications such as custom forecasting, time series manipulation, data augmentation, and transfer learning. Furthermore, we construct and propose a new public dataset for time series generation, consisting of 63,010 time series-description pairs.

nan


Article 935

Title@2025-06-28 (6): Learning Truthful Mechanisms without Discretization

Title: Learning Truthful Mechanisms without Discretization Wahrhaftige Mechanismen ohne Diskretisierung lernen 学习真相机制,无分化 2506.22911v1

Authors (5): Yunxuan Ma, Siqiang Wang, Zhijian Duan, Yukun Cheng, Xiaotie Deng

This paper introduces TEDI (Truthful, Expressive, and Dimension-Insensitive approach), a discretization-free algorithm to learn truthful and utility-maximizing mechanisms. Existing learning-based approaches often rely on discretization of outcome spaces to ensure truthfulness, which leads to inefficiency with increasing problem size. To address this limitation, we formalize the concept of pricing rules, defined as functions that map outcomes to prices. Based on this concept, we propose a novel menu mechanism, which can be equivalent to a truthful direct mechanism under specific conditions. The core idea of TEDI lies in its parameterization of pricing rules using Partial GroupMax Network, a new network architecture designed to universally approximate partial convex functions. To learn optimal pricing rules, we develop novel training techniques, including covariance trick and continuous sampling, to derive unbiased gradient estimators compatible with first-order optimization. Theoretical analysis establishes that TEDI guarantees truthfulness, full expressiveness, and dimension-insensitivity. Experimental evaluation in the studied auction setting demonstrates that TEDI achieves strong performance, competitive with or exceeding state-of-the-art methods. This work presents the first approaches to learn truthful mechanisms without outcome discretization, thereby enhancing algorithmic efficiency. The proposed concepts, network architecture, and learning techniques might offer potential value and provide new insights for automated mechanism design and differentiable economics.

nan


Article 936

Title@2025-06-28 (6): Missing-Modality-Aware Graph Neural Network for Cancer Classification

Title: Missing-Modality-Aware Graph Neural Network for Cancer Classification Missing-Modality-Aware Graph Neuronales Netzwerk für die Krebsklassifikation 失踪 – – 模式 – – Aware 癌症分类神经网络 2506.22901v1

Authors (2): Sina Tabakhi, Haiping Lu

A key challenge in learning from multimodal biological data is missing modalities, where all data from some modalities are missing for some patients. Current fusion methods address this by excluding patients with missing modalities, imputing missing modalities, or making predictions directly with partial modalities. However, they often struggle with diverse missing-modality patterns and the exponential growth of the number of such patterns as the number of modalities increases. To address these limitations, we propose MAGNET (Missing-modality-Aware Graph neural NETwork) for direct prediction with partial modalities, which introduces a patient-modality multi-head attention mechanism to fuse lower-dimensional modality embeddings based on their importance and missingness. MAGNET’s complexity increases linearly with the number of modalities while adapting to missing-pattern variability. To generate predictions, MAGNET further constructs a patient graph with fused multimodal embeddings as node features and the connectivity determined by the modality missingness, followed by a conventional graph neural network. Experiments on three public multiomics datasets for cancer classification, with real-world instead of artificial missingness, show that MAGNET outperforms the state-of-the-art fusion methods. The data and code are available at https://github.com/SinaTabakhi/MAGNET.

nan


Article 937

Title@2025-06-28 (6): Neural Cellular Automata: From Cells to Pixels

Title: Neural Cellular Automata: From Cells to Pixels Neurale Zelluläre Automaten: Von Zellen zu Pixeln 神经细胞自定义数据: 从单元格到像素 2506.22899v1

Authors (6): Ehsan Pajouheshgar, Yitao Xu, Ali Abbasi, Alexander Mordvintsev, Wenzel Jakob, Sabine Süsstrunk

Neural Cellular Automata (NCAs) are bio-inspired systems in which identical cells self-organize to form complex and coherent patterns by repeatedly applying simple local rules. NCAs display striking emergent behaviors including self-regeneration, generalization and robustness to unseen situations, and spontaneous motion. Despite their success in texture synthesis and morphogenesis, NCAs remain largely confined to low-resolution grids. This limitation stems from (1) training time and memory requirements that grow quadratically with grid size, (2) the strictly local propagation of information which impedes long-range cell communication, and (3) the heavy compute demands of real-time inference at high resolution. In this work, we overcome this limitation by pairing NCA with a tiny, shared implicit decoder, inspired by recent advances in implicit neural representations. Following NCA evolution on a coarse grid, a lightweight decoder renders output images at arbitrary resolution. We also propose novel loss functions for both morphogenesis and texture synthesis tasks, specifically tailored for high-resolution output with minimal memory and computation overhead. Combining our proposed architecture and loss functions brings substantial improvement in quality, efficiency, and performance. NCAs equipped with our implicit decoder can generate full-HD outputs in real time while preserving their self-organizing, emergent properties. Moreover, because each MLP processes cell states independently, inference remains highly parallelizable and efficient. We demonstrate the applicability of our approach across multiple NCA variants (on 2D, 3D grids, and 3D meshes) and multiple tasks, including texture generation and morphogenesis (growing patterns from a seed), showing that with our proposed framework, NCAs seamlessly scale to high-resolution outputs with minimal computational overhead.

nan


Article 938

Title@2025-06-28 (6): Interpretable Time Series Autoregression for Periodicity Quantification

Title: Interpretable Time Series Autoregression for Periodicity Quantification Verdolmetschbare Zeitreihen Autoregression für Periodizitätsquantifizierung 周期量化的自动递减 2506.22895v1

Authors (5): Xinyu Chen, Vassilis Digalakis Jr, Lijun Ding, Dingyi Zhuang, Jinhua Zhao

Time series autoregression is a classical statistical model for capturing auto-correlations and identifying temporal patterns such as periodicity and seasonality. In this work, we propose a novel sparse autoregression framework from an interpretable machine learning perspective and the model interpretability for periodicity quantification is reinforced by $\ell_0$-norm induced sparsity constraints. On the time-varying time series data, we reformulate the sparse autoregression and convert the involved optimization problem into a mixed-integer optimization (MIO). To accelerate it, we develop a subspace pursuit based decision variable pruning (DVP) strategy to reduce the search space. On the multidimensional time series that involves complicated spatial and temporal dimensions, we propose a spatially- and time-varying sparse autoregression model and resolve the corresponding MIO problem by developing a two-stage optimization scheme. In particular, the proposed scheme makes the model scalable to large problems even with millions of decision variables. Empirically, we conduct extensive experiments to evaluate the proposed models on real-world time series data. First, we demonstrate that the MIO solver can be drastically accelerated through the DVP strategy, while maintaining the same solution quality as a full MIO solver. Applying the time-varying sparse autoregression model to ridesharing trip data, we uncover both daily and weekly periodicities and reveal long-term changes in regularity of human mobility. Second, we demonstrate the spatial patterns of yearly seasonality in climate variable time series such as temperature and precipitation across the past four decades, and our model allows to discover dynamic climate patterns and identify climate phenomena such as El Nino in sea surface temperature.

nan


Article 939

Title@2025-06-28 (6): CA-Diff: Collaborative Anatomy Diffusion for Brain Tissue Segmentation

Title: CA-Diff: Collaborative Anatomy Diffusion for Brain Tissue Segmentation CA-Diff: Kollaborative Anatomie-Diffusion für die Segmentierung von Hirngewebe CA-Diff:脑组织分割协作解剖分解 2506.22882v1

Authors (8): Qilong Xing, Zikai Song, Yuteng Ye, Yuke Chen, Youjia Zhang, Na Feng, Junqing Yu, Wei Yang

Segmentation of brain structures from MRI is crucial for evaluating brain morphology, yet existing CNN and transformer-based methods struggle to delineate complex structures accurately. While current diffusion models have shown promise in image segmentation, they are inadequate when applied directly to brain MRI due to neglecting anatomical information. To address this, we propose Collaborative Anatomy Diffusion (CA-Diff), a framework integrating spatial anatomical features to enhance segmentation accuracy of the diffusion model. Specifically, we introduce distance field as an auxiliary anatomical condition to provide global spatial context, alongside a collaborative diffusion process to model its joint distribution with anatomical structures, enabling effective utilization of anatomical features for segmentation. Furthermore, we introduce a consistency loss to refine relationships between the distance field and anatomical structures and design a time adapted channel attention module to enhance the U-Net feature fusion procedure. Extensive experiments show that CA-Diff outperforms state-of-the-art (SOTA) methods.

nan


Article 940

Title@2025-06-28 (6): Loop2Net: Data-Driven Generation and Optimization of Airfoil CFD Meshes from Sparse Boundary Coordinates

Title: Loop2Net: Data-Driven Generation and Optimization of Airfoil CFD Meshes from Sparse Boundary Coordinates Loop2Net: Datengetriebene Erzeugung und Optimierung von CFD-Netzen aus Sparse Boundary Coordinates 循环2Net:数据驱动生成和优化来自偏浅边界坐标点的 CFD 空 fil CFD Meshes 2507.01057v1

Authors (4): Lushun Fan, Yuqin Xia, Jun Li, Karl Jenkins

In this study, an innovative intelligent optimization system for mesh quality is proposed, which is based on a deep convolutional neural network architecture, to achieve mesh generation and optimization. The core of the study is the Loop2Net generator and loss function, it predicts the mesh based on the given wing coordinates. And the model’s performance is continuously optimised by two key loss functions during the training. Then discipline by adding penalties, the goal of mesh generation was finally reached.

nan


Article 941

Title@2025-06-28 (6): Training Flexible Models of Genetic Variant Effects from Functional Annotations using Accelerated Linear Algebra

Title: Training Flexible Models of Genetic Variant Effects from Functional Annotations using Accelerated Linear Algebra Training Flexible Modelle genetischer Variant-Effekte aus funktionellen Anmerkungen mit beschleunigter Linear Algebra 使用加速线性线性代数对功能说明的遗传变异效应灵活模型的培训 2506.19598v2

Authors (3): Alan N. Amin, Andres Potapczynski, Andrew Gordon Wilson

To understand how genetic variants in human genomes manifest in phenotypes – traits like height or diseases like asthma – geneticists have sequenced and measured hundreds of thousands of individuals. Geneticists use this data to build models that predict how a genetic variant impacts phenotype given genomic features of the variant, like DNA accessibility or the presence of nearby DNA-bound proteins. As more data and features become available, one might expect predictive models to improve. Unfortunately, training these models is bottlenecked by the need to solve expensive linear algebra problems because variants in the genome are correlated with nearby variants, requiring inversion of large matrices. Previous methods have therefore been restricted to fitting small models, and fitting simplified summary statistics, rather than the full likelihood of the statistical model. In this paper, we leverage modern fast linear algebra techniques to develop DeepWAS (Deep genome Wide Association Studies), a method to train large and flexible neural network predictive models to optimize likelihood. Notably, we find that larger models only improve performance when using our full likelihood approach; when trained by fitting traditional summary statistics, larger models perform no better than small ones. We find larger models trained on more features make better predictions, potentially improving disease predictions and therapeutic target identification.

nan


Article 942

Title@2025-06-28 (6): P$^2$U: Progressive Precision Update For Efficient Model Distribution

Title: P$^2$U: Progressive Precision Update For Efficient Model Distribution P$^2$U: Progressive Präzisionsaktualisierung für effiziente Modellverteilung P$2美元U:为有效分配模型而逐步精准更新 2506.22871v1

Authors (2): Homayun Afrabandpey, Hamed Rezazadegan Tavakoli

Efficient model distribution is becoming increasingly critical in bandwidth-constrained environments. In this paper, we propose a simple yet effective approach called Progressive Precision Update (P$^2$U) to address this problem. Instead of transmitting the original high-precision model, P$^2$U transmits a lower-bit precision model, coupled with a model update representing the difference between the original high-precision model and the transmitted low precision version. With extensive experiments on various model architectures, ranging from small models ($1 - 6$ million parameters) to a large model (more than $100$ million parameters) and using three different data sets, e.g., chest X-Ray, PASCAL-VOC, and CIFAR-100, we demonstrate that P$^2$U consistently achieves better tradeoff between accuracy, bandwidth usage and latency. Moreover, we show that when bandwidth or startup time is the priority, aggressive quantization (e.g., 4-bit) can be used without severely compromising performance. These results establish P$^2$U as an effective and practical solution for scalable and efficient model distribution in low-resource settings, including federated learning, edge computing, and IoT deployments. Given that P$^2$U complements existing compression techniques and can be implemented alongside any compression method, e.g., sparsification, quantization, pruning, etc., the potential for improvement is even greater.

nan


Article 943

Title@2025-06-28 (6): PRMBench: A Fine-grained and Challenging Benchmark for Process-Level Reward Models

Title: PRMBench: A Fine-grained and Challenging Benchmark for Process-Level Reward Models PRMBench: Ein feinkörniger und anspruchsvoller Benchmark für Prozess-Level-Reward-Modelle PRMBBench:进程一级奖励模式的精细和质疑基准 2501.03124v5

Authors (5): Mingyang Song, Zhaochen Su, Xiaoye Qu, Jiawei Zhou, Yu Cheng

Process-level Reward Models (PRMs) are crucial for complex reasoning and decision-making tasks, where each intermediate step plays an important role in the reasoning process. Since language models are prone to various types of errors during the reasoning process, PRMs are required to possess nuanced capabilities for detecting various implicit error types in real-world scenarios. However, current benchmarks primarily focus on step correctness, failing to evaluate PRMs’ performance systematically. To address this gap, we introduce PRMBench, a process-level benchmark specifically designed to assess the fine-grained error detection capabilities of PRMs. PRMBench comprises 6,216 carefully designed problems and 83,456 step-level labels, evaluating models across multiple dimensions, including simplicity, soundness, and sensitivity. In our experiments on 15 models, spanning both open-source PRMs and closed-source large language models prompted as critic models, we uncover significant weaknesses in current PRMs. These findings underscore the challenges inherent in process-level evaluation and highlight key directions for future research. We hope PRMBench can be a robust bench for advancing research on PRM evaluation and development.

nan


Article 944

Title@2025-06-28 (6): Machine learning in wastewater treatment: insights from modelling a pilot denitrification reactor

Title: Machine learning in wastewater treatment: insights from modelling a pilot denitrification reactor Maschinelles Lernen in der Abwasserbehandlung: Erkenntnisse aus der Modellierung eines Pilot-Denitrifikationsreaktors 废水处理方面的机器学习:从模拟试验脱硝反应堆中获得的见解 2412.14030v2

Authors (3): Eivind Bøhn, Sølve Eidnes, Kjell Rune Jonassen

Wastewater treatment plants are increasingly recognized as promising candidates for machine learning applications, due to their societal importance and high availability of data. However, their varied designs, operational conditions, and influent characteristics hinder straightforward automation. In this study, we use data from a pilot reactor at the Veas treatment facility in Norway to explore how machine learning can be used to optimize biological nitrate ($\mathrm{NO_3^-}$) reduction to molecular nitrogen ($\mathrm{N_2}$) in the biogeochemical process known as \textit{denitrification}. Rather than focusing solely on predictive accuracy, our approach prioritizes understanding the foundational requirements for effective data-driven modelling of wastewater treatment. Specifically, we aim to identify which process parameters are most critical, the necessary data quantity and quality, how to structure data effectively, and what properties are required by the models. We find that nonlinear models perform best on the training and validation data sets, indicating nonlinear relationships to be learned, but linear models transfer better to the unseen test data, which comes later in time. The variable measuring the water temperature has a particularly detrimental effect on the models, owing to a significant change in distributions between training and test data. We therefore conclude that multiple years of data is necessary to learn robust machine learning models. By addressing foundational elements, particularly in the context of the climatic variability faced by northern regions, this work lays the groundwork for a more structured and tailored approach to machine learning for wastewater treatment. We share publicly both the data and code used to produce the results in the paper.

nan


Article 945

Title@2025-06-28 (6): Deep neural networks can provably solve Bellman equations for Markov decision processes without the curse of dimensionality

Title: Deep neural networks can provably solve Bellman equations for Markov decision processes without the curse of dimensionality Tiefe neuronale Netzwerke können nachweislich Bellman-Gleichungen für Markov-Entscheidungsprozesse ohne den Fluch der Dimensionalität lösen 深神经网络可以解决Bellman方程式 用于Markov决策程序 而不受到维度的诅咒 2506.22851v1

Authors (3): Arnulf Jentzen, Konrad Kleinberg, Thomas Kruse

Discrete time stochastic optimal control problems and Markov decision processes (MDPs) are fundamental models for sequential decision-making under uncertainty and as such provide the mathematical framework underlying reinforcement learning theory. A central tool for solving MDPs is the Bellman equation and its solution, the so-called $Q$-function. In this article, we construct deep neural network (DNN) approximations for $Q$-functions associated to MDPs with infinite time horizon and finite control set $A$. More specifically, we show that if the the payoff function and the random transition dynamics of the MDP can be suitably approximated by DNNs with leaky rectified linear unit (ReLU) activation, then the solutions $Q_d\colon \mathbb R^d\to \mathbb R^{ A }$, $d\in \mathbb{N}$, of the associated Bellman equations can also be approximated in the $L^2$-sense by DNNs with leaky ReLU activation whose numbers of parameters grow at most polynomially in both the dimension $d\in \mathbb{N}$ of the state space and the reciprocal $1/\varepsilon$ of the prescribed error $\varepsilon\in (0,1)$. Our proof relies on the recently introduced full-history recursive multilevel fixed-point (MLFP) approximation scheme.

nan


Article 946

Title@2025-06-28 (6): Scalable Structure Learning of Bayesian Networks by Learning Algorithm Ensembles

Title: Scalable Structure Learning of Bayesian Networks by Learning Algorithm Ensembles Skalierbare Struktur Lernen Bayesischer Netzwerke durch das Lernen von Algorithmenensembles Bayesian网络通过学习算法群群学习的可缩放结构学习 2506.22848v1

Authors (7): Shengcai Liu, Hui Ou-yang, Zhiyuan Wang, Cheng Chen, Qijun Cai, Yew-Soon Ong, Ke Tang

Learning the structure of Bayesian networks (BNs) from data is challenging, especially for datasets involving a large number of variables. The recently proposed divide-and-conquer (D\&D) strategies present a promising approach for learning large BNs. However, they still face a main issue of unstable learning accuracy across subproblems. In this work, we introduce the idea of employing structure learning ensemble (SLE), which combines multiple BN structure learning algorithms, to consistently achieve high learning accuracy. We further propose an automatic approach called Auto-SLE for learning near-optimal SLEs, addressing the challenge of manually designing high-quality SLEs. The learned SLE is then integrated into a D\&D method. Extensive experiments firmly show the superiority of our method over D\&D methods with single BN structure learning algorithm in learning large BNs, achieving accuracy improvement usually by 30\%$\sim$225\% on datasets involving 10,000 variables. Furthermore, our method generalizes well to datasets with many more (e.g., 30000) variables and different network characteristics than those present in the training data for learning the SLE. These results indicate the significant potential of employing (automatic learning of) SLEs for scalable BN structure learning.

nan


Article 947

Title@2025-06-28 (6): Quantum Neural Networks for Wind Energy Forecasting: A Comparative Study of Performance and Scalability with Classical Models

Title: Quantum Neural Networks for Wind Energy Forecasting: A Comparative Study of Performance and Scalability with Classical Models Quantum Neural Networks for Wind Energy Forecasting: Eine vergleichende Studie über Leistung und Skalierbarkeit mit klassischen Modellen 风能预测的量子神经网络:用古典模型比较性能和可缩放性研究 2506.22845v1

Authors (3): Batuhan Hangun, Oguz Altun, Onder Eyecioglu

Quantum Neural Networks (QNNs), a prominent approach in Quantum Machine Learning (QML), are emerging as a powerful alternative to classical machine learning methods. Recent studies have focused on the applicability of QNNs to various tasks, such as time-series forecasting, prediction, and classification, across a wide range of applications, including cybersecurity and medical imaging. With the increased use of smart grids driven by the integration of renewable energy systems, machine learning plays an important role in predicting power demand and detecting system disturbances. This study provides an in-depth investigation of QNNs for predicting the power output of a wind turbine. We assess the predictive performance and simulation time of six QNN configurations that are based on the Z Feature Map for data encoding and varying ansatz structures. Through detailed cross-validation experiments and tests on an unseen hold-out dataset, we experimentally demonstrate that QNNs can achieve predictive performance that is competitive with, and in some cases marginally better than, the benchmarked classical approaches. Our results also reveal the effects of dataset size and circuit complexity on predictive performance and simulation time. We believe our findings will offer valuable insights for researchers in the energy domain who wish to incorporate quantum machine learning into their work.

nan


Article 948

Title@2025-06-28 (6): xLSTMAD: A Powerful xLSTM-based Method for Anomaly Detection

Title: xLSTMAD: A Powerful xLSTM-based Method for Anomaly Detection xLSTMAD: Eine leistungsstarke xLSTM-basierte Methode zur Erkennung von Anomalien XLSTMAD: 一种以XLSTM为基础的强效异常探测方法 2506.22837v1

Authors (4): Kamil Faber, Marcin Pietroń, Dominik Żurek, Roberto Corizzo

The recently proposed xLSTM is a powerful model that leverages expressive multiplicative gating and residual connections, providing the temporal capacity needed for long-horizon forecasting and representation learning. This architecture has demonstrated success in time series forecasting, lossless compression, and even large-scale language modeling tasks, where its linear memory footprint and fast inference make it a viable alternative to Transformers. Despite its growing popularity, no prior work has explored xLSTM for anomaly detection. In this work, we fill this gap by proposing xLSTMAD, the first anomaly detection method that integrates a full encoder-decoder xLSTM architecture, purpose-built for multivariate time series data. Our encoder processes input sequences to capture historical context, while the decoder is devised in two separate variants of the method. In the forecasting approach, the decoder iteratively generates forecasted future values xLSTMAD-F, while the reconstruction approach reconstructs the input time series from its encoded counterpart xLSTMAD-R. We investigate the performance of two loss functions: Mean Squared Error (MSE), and Soft Dynamic Time Warping (SoftDTW) to consider local reconstruction fidelity and global sequence alignment, respectively. We evaluate our method on the comprehensive TSB-AD-M benchmark, which spans 17 real-world datasets, using state-of-the-art challenging metrics such as VUS-PR. In our results, xLSTM showcases state-of-the-art accuracy, outperforming 23 popular anomaly detection baselines. Our paper is the first work revealing the powerful modeling capabilities of xLSTM for anomaly detection, paving the way for exciting new developments on this subject. Our code is available at: https://github.com/Nyderx/xlstmad

nan


Article 949

Title@2025-06-28 (6): Adaptive Rank Allocation for Federated Parameter-Efficient Fine-Tuning of Language Models

Title: Adaptive Rank Allocation for Federated Parameter-Efficient Fine-Tuning of Language Models Adaptive Rangverteilung für Federated Parameter-Efficient Fine-Tuning of Language Models 联邦准拉米有效精密语言模式调适级分配 2501.14406v3

Authors (4): Fei Wu, Jia Hu, Geyong Min, Shiqiang Wang

Pre-trained Language Models (PLMs) have demonstrated their superiority and versatility in modern Natural Language Processing (NLP), effectively adapting to various downstream tasks through further fine-tuning. Federated Parameter-Efficient Fine-Tuning (FedPEFT) has emerged as a promising solution to address privacy and efficiency challenges in distributed training for PLMs on resource-constrained local devices. However, our measurements reveal two key limitations of FedPEFT: heterogeneous data across devices exacerbates performance degradation of low-rank adaptation, and a fixed parameter configuration results in communication inefficiency. To overcome these limitations, we propose FedARA, a novel Adaptive Rank Allocation framework for federated parameter-efficient fine-tuning of language models. Specifically, FedARA employs truncated Singular Value Decomposition (SVD) adaptation to enhance similar feature representation across clients, significantly mitigating the adverse effects of data heterogeneity. Subsequently, it utilizes dynamic rank allocation to progressively identify critical ranks, effectively improving communication efficiency. Lastly, it leverages rank-based module pruning to automatically remove inactive modules, steadily reducing local computational cost and memory usage in each federated learning round. Extensive experiments show that FedARA consistently outperforms baselines by an average of 6.95% to 8.49% across various datasets and models under heterogeneous data while significantly improving communication efficiency by 2.40$ \times$. Moreover, experiments on various edge devices demonstrate substantial decreases in total training time and energy consumption by up to 48.90% and 46.95%, respectively.

nan


Article 950

Title@2025-06-28 (6): Privacy-Preserving Video Anomaly Detection: A Survey

Title: Privacy-Preserving Video Anomaly Detection: A Survey Datenschutz-Bewahrung Video Anomalie Erkennung: Eine Umfrage 隐私保护视频异常探测:调查 2411.14565v2

Authors (10): Yang Liu, Siao Liu, Xiaoguang Zhu, Jielin Li, Hao Yang, Liangyu Teng, Juncen Guo, Yan Wang, Dingkang Yang, Jing Liu

Video Anomaly Detection (VAD) aims to automatically analyze spatiotemporal patterns in surveillance videos collected from open spaces to detect anomalous events that may cause harm, such as fighting, stealing, and car accidents. However, vision-based surveillance systems such as closed-circuit television often capture personally identifiable information. The lack of transparency and interpretability in video transmission and usage raises public concerns about privacy and ethics, limiting the real-world application of VAD. Recently, researchers have focused on privacy concerns in VAD by conducting systematic studies from various perspectives including data, features, and systems, making Privacy-Preserving Video Anomaly Detection (P2VAD) a hotspot in the AI community. However, current research in P2VAD is fragmented, and prior reviews have mostly focused on methods using RGB sequences, overlooking privacy leakage and appearance bias considerations. To address this gap, this article is the first to systematically reviews the progress of P2VAD, defining its scope and providing an intuitive taxonomy. We outline the basic assumptions, learning frameworks, and optimization objectives of various approaches, analyzing their strengths, weaknesses, and potential correlations. Additionally, we provide open access to research resources such as benchmark datasets and available code. Finally, we discuss key challenges and future opportunities from the perspectives of AI development and P2VAD deployment, aiming to guide future work in the field.

nan


Article 951

Title@2025-06-28 (6): Prompting without Panic: Attribute-aware, Zero-shot, Test-Time Calibration

Title: Prompting without Panic: Attribute-aware, Zero-shot, Test-Time Calibration Prompting ohne Panik: Attribute-Bewusstsein, Null-Schuss, Test-Zeit-Kalibrierung 无恐慌的提示: 属性意识、 零弹、 测试时间校准 2506.22819v1

Authors (4): Ramya Hebbalaguppe, Tamoghno Kandar, Abhinav Nagpal, Chetan Arora

Vision-language models (VLM) have demonstrated impressive performance in image recognition by leveraging self-supervised training on large datasets. Their performance can be further improved by adapting to the test sample using test-time prompt tuning (TPT). Unfortunately, the singular focus of TPT approaches on improving the accuracy suffers from tunnel vision, and leads to degradation in confidence calibration. This limits the applicability of TPT in critical applications. We make three contributions in this work. (1) We posit that random or naive initialization of prompts leads to overfitting on a particular test sample, and is the main reason for miscalibration of the VLM after TPT. To mitigate the problem, we propose careful initialization of test time prompt using prior knowledge about the target label attributes from a large language model (LLM); (2) To further maintain the quality of prompts during \tpt, we propose a novel regularization loss to reduce intraclass distance, and increase inter-class distance between the learnt Through extensive experiments on different CLIP architectures and 15 datasets, we show that our approach can effectively improve the calibration after TPT. We report an average expected calibration error (ECE) of 4.11 with our method, TCA, compared to 11.7 for vanilla TPT, 6.12 for C-TPT (ICLR’24), 6.78 for DiffTPT (CVPR’23), and 8.43 for PromptAlign (NeurIPS’23). The code is publicly accessible at: https://github.com/rhebbalaguppe/TCA_PromptWithoutPanic.

nan


Article 952

Title@2025-06-28 (6): BayesLoRA: Task-Specific Uncertainty in Low-Rank Adapters

Title: BayesLoRA: Task-Specific Uncertainty in Low-Rank Adapters BayesLoRA: Aufgabenspezifische Unsicherheit in Low-Rank-Adaptern BayesLOLRA:低兰克适应器中任务具体不确定性 2506.22809v1

Authors (1): Cooper Doyle

We propose BayesLoRA, a task-specific uncertainty quantification framework that integrates MC-Dropout into Low-Rank Adapters (LoRA). Unlike general-purpose transformer uncertainty methods, BayesLoRA provides guardrails tailored to downstream workflows, enabling agents to introspect and modulate behavior under uncertainty. We demonstrate mathematically and empirically that LoRA adapters exhibit amplified variance outside fine-tuning distributions, yielding reliable confidence estimates for agentic decision-making.

nan


Article 953

Title@2025-06-28 (6): Concept Pinpoint Eraser for Text-to-image Diffusion Models via Residual Attention Gate

Title: Concept Pinpoint Eraser for Text-to-image Diffusion Models via Residual Attention Gate Konzept Pinpoint Radiergummi für Text-zu-Bild-Diffusionsmodelle über Residual Attention Gate 通过遗留关注门进行文字到图像传播模型的Pinpoint Eraser概念概念 2506.22806v1

Authors (5): Byung Hyun Lee, Sungjin Lim, Seunggyu Lee, Dong Un Kang, Se Young Chun

Remarkable progress in text-to-image diffusion models has brought a major concern about potentially generating images on inappropriate or trademarked concepts. Concept erasing has been investigated with the goals of deleting target concepts in diffusion models while preserving other concepts with minimal distortion. To achieve these goals, recent concept erasing methods usually fine-tune the cross-attention layers of diffusion models. In this work, we first show that merely updating the cross-attention layers in diffusion models, which is mathematically equivalent to adding \emph{linear} modules to weights, may not be able to preserve diverse remaining concepts. Then, we propose a novel framework, dubbed Concept Pinpoint Eraser (CPE), by adding \emph{nonlinear} Residual Attention Gates (ResAGs) that selectively erase (or cut) target concepts while safeguarding remaining concepts from broad distributions by employing an attention anchoring loss to prevent the forgetting. Moreover, we adversarially train CPE with ResAG and learnable text embeddings in an iterative manner to maximize erasing performance and enhance robustness against adversarial attacks. Extensive experiments on the erasure of celebrities, artistic styles, and explicit contents demonstrated that the proposed CPE outperforms prior arts by keeping diverse remaining concepts while deleting the target concepts with robustness against attack prompts. Code is available at https://github.com/Hyun1A/CPE

nan


Article 954

Title@2025-06-28 (6): Intervening in Black Box: Concept Bottleneck Model for Enhancing Human Neural Network Mutual Understanding

Title: Intervening in Black Box: Concept Bottleneck Model for Enhancing Human Neural Network Mutual Understanding Intervening in Black Box: Konzept Engpass-Modell für die Verbesserung der menschlichen neuralen Netzwerk gegenseitiges Verständnis 黑盒干预:增强人类神经网络相互了解的概念瓶颈模式 2506.22803v1

Authors (8): Nuoye Xiong, Anqi Dong, Ning Wang, Cong Hua, Guangming Zhu, Mei Lin, Peiyi Shen, Liang Zhang

Recent advances in deep learning have led to increasingly complex models with deeper layers and more parameters, reducing interpretability and making their decisions harder to understand. While many methods explain black-box reasoning, most lack effective interventions or only operate at sample-level without modifying the model itself. To address this, we propose the Concept Bottleneck Model for Enhancing Human-Neural Network Mutual Understanding (CBM-HNMU). CBM-HNMU leverages the Concept Bottleneck Model (CBM) as an interpretable framework to approximate black-box reasoning and communicate conceptual understanding. Detrimental concepts are automatically identified and refined (removed/replaced) based on global gradient contributions. The modified CBM then distills corrected knowledge back into the black-box model, enhancing both interpretability and accuracy. We evaluate CBM-HNMU on various CNN and transformer-based models across Flower-102, CIFAR-10, CIFAR-100, FGVC-Aircraft, and CUB-200, achieving a maximum accuracy improvement of 2.64% and a maximum increase in average accuracy across 1.03%. Source code is available at: https://github.com/XiGuaBo/CBM-HNMU.

nan


Article 955

Title@2025-06-28 (6): Vision-QRWKV: Exploring Quantum-Enhanced RWKV Models for Image Classification

Title: Vision-QRWKV: Exploring Quantum-Enhanced RWKV Models for Image Classification Vision-QRWKV: Quantum-erweiterte RWKV-Modelle für die Bildklassifikation 展望-QRWKKV:探索以量为基础增强的RWKV图像分类模型 2506.06633v2

Authors (1): Chi-Sheng Chen

Recent advancements in quantum machine learning have shown promise in enhancing classical neural network architectures, particularly in domains involving complex, high-dimensional data. Building upon prior work in temporal sequence modeling, this paper introduces Vision-QRWKV, a hybrid quantum-classical extension of the Receptance Weighted Key Value (RWKV) architecture, applied for the first time to image classification tasks. By integrating a variational quantum circuit (VQC) into the channel mixing component of RWKV, our model aims to improve nonlinear feature transformation and enhance the expressive capacity of visual representations. We evaluate both classical and quantum RWKV models on a diverse collection of 14 medical and standard image classification benchmarks, including MedMNIST datasets, MNIST, and FashionMNIST. Our results demonstrate that the quantum-enhanced model outperforms its classical counterpart on a majority of datasets, particularly those with subtle or noisy class distinctions (e.g., ChestMNIST, RetinaMNIST, BloodMNIST). This study represents the first systematic application of quantum-enhanced RWKV in the visual domain, offering insights into the architectural trade-offs and future potential of quantum models for lightweight and efficient vision tasks.

nan


Article 956

Title@2025-06-28 (6): Riemannian-Geometric Fingerprints of Generative Models

Title: Riemannian-Geometric Fingerprints of Generative Models Riemannisch-Geometrische Fingerabdrücke von Generativen Modellen 创造型模型的里伊曼尼-大地测量指纹 2506.22802v1

Authors (2): Hae Jin Song, Laurent Itti

Recent breakthroughs and rapid integration of generative models (GMs) have sparked interest in the problem of model attribution and their fingerprints. For instance, service providers need reliable methods of authenticating their models to protect their IP, while users and law enforcement seek to verify the source of generated content for accountability and trust. In addition, a growing threat of model collapse is arising, as more model-generated data are being fed back into sources (e.g., YouTube) that are often harvested for training (“regurgitative training”), heightening the need to differentiate synthetic from human data. Yet, a gap still exists in understanding generative models’ fingerprints, we believe, stemming from the lack of a formal framework that can define, represent, and analyze the fingerprints in a principled way. To address this gap, we take a geometric approach and propose a new definition of artifact and fingerprint of GMs using Riemannian geometry, which allows us to leverage the rich theory of differential geometry. Our new definition generalizes previous work (Song et al., 2024) to non-Euclidean manifolds by learning Riemannian metrics from data and replacing the Euclidean distances and nearest-neighbor search with geodesic distances and kNN-based Riemannian center of mass. We apply our theory to a new gradient-based algorithm for computing the fingerprints in practice. Results show that it is more effective in distinguishing a large array of GMs, spanning across 4 different datasets in 2 different resolutions (64 by 64, 256 by 256), 27 model architectures, and 2 modalities (Vision, Vision-Language). Using our proposed definition significantly improves the performance on model attribution, as well as a generalization to unseen datasets, model types, and modalities, suggesting its practical efficacy.

nan


Article 957

Title@2025-06-28 (6): VoteSplat: Hough Voting Gaussian Splatting for 3D Scene Understanding

Title: VoteSplat: Hough Voting Gaussian Splatting for 3D Scene Understanding VoteSplat: Harte Abstimmung Gaussian Splatting für 3D-Szene Verständnis VotoStplat: 高斯拉特三维场景理解的 高斯拉特票 2506.22799v1

Authors (7): Minchao Jiang, Shunyu Jia, Jiaming Gu, Xiaoyuan Lu, Guangming Zhu, Anqi Dong, Liang Zhang

3D Gaussian Splatting (3DGS) has become horsepower in high-quality, real-time rendering for novel view synthesis of 3D scenes. However, existing methods focus primarily on geometric and appearance modeling, lacking deeper scene understanding while also incurring high training costs that complicate the originally streamlined differentiable rendering pipeline. To this end, we propose VoteSplat, a novel 3D scene understanding framework that integrates Hough voting with 3DGS. Specifically, Segment Anything Model (SAM) is utilized for instance segmentation, extracting objects, and generating 2D vote maps. We then embed spatial offset vectors into Gaussian primitives. These offsets construct 3D spatial votes by associating them with 2D image votes, while depth distortion constraints refine localization along the depth axis. For open-vocabulary object localization, VoteSplat maps 2D image semantics to 3D point clouds via voting points, reducing training costs associated with high-dimensional CLIP features while preserving semantic unambiguity. Extensive experiments demonstrate effectiveness of VoteSplat in open-vocabulary 3D instance localization, 3D point cloud understanding, click-based 3D object localization, hierarchical segmentation, and ablation studies. Our code is available at https://sy-ja.github.io/votesplat/

nan


Article 958

Title@2025-06-28 (6): Aircraft Trajectory Dataset Augmentation in Latent Space

Title: Aircraft Trajectory Dataset Augmentation in Latent Space Flugbahndatensatz Augmentation im Latent Space 低空空间轨迹数据集增强 2506.07585v2

Authors (2): Seokbin Yoon, Keumjin Lee

Aircraft trajectory modeling plays a crucial role in Air Traffic Management (ATM) and is important for various downstream tasks, including conflict detection and landing time prediction. Dataset augmentation through the addition of synthetically generated trajectory data is necessary to develop a more robust aircraft trajectory model and ensure that the trajectory dataset is sufficient and balanced. In this work, we propose a novel framework called ATRADA for aircraft trajectory dataset augmentation. In the proposed framework, a Transformer encoder learns the underlying patterns in the original trajectory dataset and converts each data point into a context vector in the learned latent space. The converted dataset in the latent space is projected into reduced dimensions using principal component analysis (PCA), and a Gaussian mixture model (GMM) is applied to fit the probability distribution of the data points in the reduced-dimensional space. Finally, new samples are drawn from the fitted GMM, the dimension of the samples is reverted to the original dimension, and they are decoded with a Multi-Layer Perceptron (MLP). Several experiments demonstrate that the framework effectively generates new, high-quality synthetic aircraft trajectory data, which were compared to the results of several baselines.

nan


Article 959

Title@2025-06-28 (6): Learning Time-Aware Causal Representation for Model Generalization in Evolving Domains

Title: Learning Time-Aware Causal Representation for Model Generalization in Evolving Domains Time-Aware-Lernen Kausaldarstellung für Modellverallgemeinerung in sich entwickelnden Domänen 正在演变的域域中模型普遍化模型的学习时间- 软件因果代表 2506.17718v2

Authors (7): Zhuo He, Shuang Li, Wenze Song, Longhui Yuan, Jian Liang, Han Li, Kun Gai

Endowing deep models with the ability to generalize in dynamic scenarios is of vital significance for real-world deployment, given the continuous and complex changes in data distribution. Recently, evolving domain generalization (EDG) has emerged to address distribution shifts over time, aiming to capture evolving patterns for improved model generalization. However, existing EDG methods may suffer from spurious correlations by modeling only the dependence between data and targets across domains, creating a shortcut between task-irrelevant factors and the target, which hinders generalization. To this end, we design a time-aware structural causal model (SCM) that incorporates dynamic causal factors and the causal mechanism drifts, and propose \textbf{S}tatic-D\textbf{YN}amic \textbf{C}ausal Representation Learning (\textbf{SYNC}), an approach that effectively learns time-aware causal representations. Specifically, it integrates specially designed information-theoretic objectives into a sequential VAE framework which captures evolving patterns, and produces the desired representations by preserving intra-class compactness of causal factors both across and within domains. Moreover, we theoretically show that our method can yield the optimal causal predictor for each time domain. Results on both synthetic and real-world datasets exhibit that SYNC can achieve superior temporal generalization performance.

nan


Article 960

Title@2025-06-28 (6): Multimodal Atmospheric Super-Resolution With Deep Generative Models

Title: Multimodal Atmospheric Super-Resolution With Deep Generative Models Multimodale atmosphärische Super-Resolution mit tiefen Generativen Modellen 多式大气多式超分辨率,具有深创模型 2506.22780v1

Authors (6): Dibyajyoti Chakraborty, Haiwen Guan, Jason Stock, Troy Arcomano, Guido Cervone, Romit Maulik

Score-based diffusion modeling is a generative machine learning algorithm that can be used to sample from complex distributions. They achieve this by learning a score function, i.e., the gradient of the log-probability density of the data, and reversing a noising process using the same. Once trained, score-based diffusion models not only generate new samples but also enable zero-shot conditioning of the generated samples on observed data. This promises a novel paradigm for data and model fusion, wherein the implicitly learned distributions of pretrained score-based diffusion models can be updated given the availability of online data in a Bayesian formulation. In this article, we apply such a concept to the super-resolution of a high-dimensional dynamical system, given the real-time availability of low-resolution and experimentally observed sparse sensor measurements from multimodal data. Additional analysis on how score-based sampling can be used for uncertainty estimates is also provided. Our experiments are performed for a super-resolution task that generates the ERA5 atmospheric dataset given sparse observations from a coarse-grained representation of the same and/or from unstructured experimental observations of the IGRA radiosonde dataset. We demonstrate accurate recovery of the high dimensional state given multiple sources of low-fidelity measurements. We also discover that the generative model can balance the influence of multiple dataset modalities during spatiotemporal reconstructions.

nan


Article 961

Title@2025-06-28 (6): PromptDSI: Prompt-based Rehearsal-free Continual Learning for Document Retrieval

Title: PromptDSI: Prompt-based Rehearsal-free Continual Learning for Document Retrieval PromptDSI: Prompt-basiert Probefreies Kontinuales Lernen für Dokument-Retrieval 快速检索:为检索文件而进行基于即时的无排练的持续学习 2406.12593v4

Authors (8): Tuan-Luc Huynh, Thuy-Trang Vu, Weiqing Wang, Yinwei Wei, Trung Le, Dragan Gasevic, Yuan-Fang Li, Thanh-Toan Do

Differentiable Search Index (DSI) utilizes pre-trained language models to perform indexing and document retrieval via end-to-end learning without relying on external indexes. However, DSI requires full re-training to index new documents, causing significant computational inefficiencies. Continual learning (CL) offers a solution by enabling the model to incrementally update without full re-training. Existing CL solutions in document retrieval rely on memory buffers or generative models for rehearsal, which is infeasible when accessing previous training data is restricted due to privacy concerns. To this end, we introduce PromptDSI, a prompt-based, rehearsal-free continual learning approach for document retrieval. PromptDSI follows the Prompt-based Continual Learning (PCL) framework, using learnable prompts to efficiently index new documents without accessing previous documents or queries. To improve retrieval latency, we remove the initial forward pass of PCL, which otherwise greatly increases training and inference time, with a negligible trade-off in performance. Additionally, we introduce a novel topic-aware prompt pool that employs neural topic embeddings as fixed keys, eliminating the instability of prompt key optimization while maintaining competitive performance with existing PCL prompt pools. In a challenging rehearsal-free continual learning setup, we demonstrate that PromptDSI variants outperform rehearsal-based baselines, match the strong cache-based baseline in mitigating forgetting, and significantly improving retrieval performance on new corpora.

nan


Article 962

Title@2025-06-28 (6): FF-INT8: Efficient Forward-Forward DNN Training on Edge Devices with INT8 Precision

Title: FF-INT8: Efficient Forward-Forward DNN Training on Edge Devices with INT8 Precision FF-INT8: Effizientes Vorwärts-DNN-Training auf Edge-Geräten mit INT8-Präzision FF-INT8:利用INT8精密度进行关于边缘装置的有效前向DNN培训 2506.22771v1

Authors (3): Jingxiao Ma, Priyadarshini Panda, Sherief Reda

Backpropagation has been the cornerstone of neural network training for decades, yet its inefficiencies in time and energy consumption limit its suitability for resource-constrained edge devices. While low-precision neural network quantization has been extensively researched to speed up model inference, its application in training has been less explored. Recently, the Forward-Forward (FF) algorithm has emerged as a promising alternative to backpropagation, replacing the backward pass with an additional forward pass. By avoiding the need to store intermediate activations for backpropagation, FF can reduce memory footprint, making it well-suited for embedded devices. This paper presents an INT8 quantized training approach that leverages FF’s layer-by-layer strategy to stabilize gradient quantization. Furthermore, we propose a novel “look-ahead” scheme to address limitations of FF and improve model accuracy. Experiments conducted on NVIDIA Jetson Orin Nano board demonstrate 4.6% faster training, 8.3% energy savings, and 27.0% reduction in memory usage, while maintaining competitive accuracy compared to the state-of-the-art.

nan


Article 963

Title@2025-06-28 (6): Can We Reliably Predict the Fed’s Next Move? A Multi-Modal Approach to U.S. Monetary Policy Forecasting

Title: Can We Reliably Predict the Fed’s Next Move? A Multi-Modal Approach to U.S. Monetary Policy Forecasting Können wir den nächsten Schritt der Fed zuverlässig vorhersagen? Ein multimodaler Ansatz zur US-amerikanischen Geldpolitikprognose 我们能否可靠地预测美联储的下一步行动?美国货币政策预测的多模式方法。 2506.22763v1

Authors (2): Fiona Xiao Jingyi, Lili Liu

Forecasting central bank policy decisions remains a persistent challenge for investors, financial institutions, and policymakers due to the wide-reaching impact of monetary actions. In particular, anticipating shifts in the U.S. federal funds rate is vital for risk management and trading strategies. Traditional methods relying only on structured macroeconomic indicators often fall short in capturing the forward-looking cues embedded in central bank communications. This study examines whether predictive accuracy can be enhanced by integrating structured data with unstructured textual signals from Federal Reserve communications. We adopt a multi-modal framework, comparing traditional machine learning models, transformer-based language models, and deep learning architectures in both unimodal and hybrid settings. Our results show that hybrid models consistently outperform unimodal baselines. The best performance is achieved by combining TF-IDF features of FOMC texts with economic indicators in an XGBoost classifier, reaching a test AUC of 0.83. FinBERT-based sentiment features marginally improve ranking but perform worse in classification, especially under class imbalance. SHAP analysis reveals that sparse, interpretable features align more closely with policy-relevant signals. These findings underscore the importance of integrating textual and structured signals transparently. For monetary policy forecasting, simpler hybrid models can offer both accuracy and interpretability, delivering actionable insights for researchers and decision-makers.

nan


Article 964

Title@2025-06-28 (6): Detecting Sockpuppetry on Wikipedia Using Meta-Learning

Title: Detecting Sockpuppetry on Wikipedia Using Meta-Learning Sockepuppetry auf Wikipedia erkennen Mit Meta-Learning 在维基百科上用元学习探测袜子布料 2506.10314v2

Authors (2): Luc Raszewski, Christine De Kock

Malicious sockpuppet detection on Wikipedia is critical to preserving access to reliable information on the internet and preventing the spread of disinformation. Prior machine learning approaches rely on stylistic and meta-data features, but do not prioritise adaptability to author-specific behaviours. As a result, they struggle to effectively model the behaviour of specific sockpuppet-groups, especially when text data is limited. To address this, we propose the application of meta-learning, a machine learning technique designed to improve performance in data-scarce settings by training models across multiple tasks. Meta-learning optimises a model for rapid adaptation to the writing style of a new sockpuppet-group. Our results show that meta-learning significantly enhances the precision of predictions compared to pre-trained models, marking an advancement in combating sockpuppetry on open editing platforms. We release a new dataset of sockpuppet investigations to foster future research in both sockpuppetry and meta-learning fields.

nan


Article 965

Title@2025-06-28 (6): Graph Contrastive Learning with Low-Rank Regularization and Low-Rank Attention for Noisy Node Classification

Title: Graph Contrastive Learning with Low-Rank Regularization and Low-Rank Attention for Noisy Node Classification Grafik Kontrastives Lernen mit Low-Rank-Regularisierung und Low-Rank-Achtung für Noisy Node-Klassifikation 用于噪音节点分类的与低兰克常规化和低兰克注意的对比学习图 2402.09600v2

Authors (2): Yancheng Wang, Yingzhen Yang

Graph Neural Networks (GNNs) have achieved remarkable success in learning node representations and have shown strong performance in tasks such as node classification. However, recent findings indicate that the presence of noise in real-world graph data can substantially impair the effectiveness of GNNs. To address this challenge, we introduce a robust and innovative node representation learning method named Graph Contrastive Learning with Low-Rank Regularization, or GCL-LRR, which follows a two-stage transductive learning framework for node classification. In the first stage, the GCL-LRR encoder is optimized through prototypical contrastive learning while incorporating a low-rank regularization objective. In the second stage, the representations generated by GCL-LRR are employed by a linear transductive classifier to predict the labels of unlabeled nodes within the graph. Our GCL-LRR is inspired by the Low Frequency Property (LFP) of the graph data and its labels, and it is also theoretically motivated by our sharp generalization bound for transductive learning. To the best of our knowledge, our theoretical result is among the first to theoretically demonstrate the advantage of low-rank regularization in transductive learning, which is also supported by strong empirical results. To further enhance the performance of GCL-LRR, we present an improved model named GCL-LR-Attention, which incorporates a novel LR-Attention layer into GCL-LRR. GCL-LR-Attention reduces the kernel complexity of GCL-LRR and contributes to a tighter generalization bound, leading to improved performance. Extensive evaluations on standard benchmark datasets evidence the effectiveness and robustness of both GCL-LRR and GCL-LR-Attention in learning meaningful node representations. The code is available at https://github.com/Statistical-Deep-Learning/GCL-LR-Attention.

nan


Article 966

Title@2025-06-28 (6): Sculpting Memory: Multi-Concept Forgetting in Diffusion Models via Dynamic Mask and Concept-Aware Optimization

Title: Sculpting Memory: Multi-Concept Forgetting in Diffusion Models via Dynamic Mask and Concept-Aware Optimization Sculpting Memory: Multi-Concept Vergessen von Diffusionsmodellen über Dynamic Mask und Concept-Aware Optimierung 雕塑记忆:通过动态面具和概念软件优化在传播模型中遗忘多种概念 2504.09039v2

Authors (7): Gen Li, Yang Xiao, Jie Ji, Kaiyuan Deng, Bo Hui, Linke Guo, Xiaolong Ma

Text-to-image (T2I) diffusion models have achieved remarkable success in generating high-quality images from textual prompts. However, their ability to store vast amounts of knowledge raises concerns in scenarios where selective forgetting is necessary, such as removing copyrighted content, reducing biases, or eliminating harmful concepts. While existing unlearning methods can remove certain concepts, they struggle with multi-concept forgetting due to instability, residual knowledge persistence, and generation quality degradation. To address these challenges, we propose \textbf{Dynamic Mask coupled with Concept-Aware Loss}, a novel unlearning framework designed for multi-concept forgetting in diffusion models. Our \textbf{Dynamic Mask} mechanism adaptively updates gradient masks based on current optimization states, allowing selective weight modifications that prevent interference with unrelated knowledge. Additionally, our \textbf{Concept-Aware Loss} explicitly guides the unlearning process by enforcing semantic consistency through superclass alignment, while a regularization loss based on knowledge distillation ensures that previously unlearned concepts remain forgotten during sequential unlearning. We conduct extensive experiments to evaluate our approach. Results demonstrate that our method outperforms existing unlearning techniques in forgetting effectiveness, output fidelity, and semantic coherence, particularly in multi-concept scenarios. Our work provides a principled and flexible framework for stable and high-fidelity unlearning in generative models. The code will be released publicly.

nan


Article 967

Title@2025-06-28 (6): Decoding Federated Learning: The FedNAM+ Conformal Revolution

Title: Decoding Federated Learning: The FedNAM+ Conformal Revolution Decoding Federated Learning: Die FedNAM+ Konforme Revolution 解说联邦学习:美联联储+非正规革命 2506.17872v2

Authors (3): Sree Bhargavi Balija, Amitash Nanda, Debashis Sahoo

Federated learning has significantly advanced distributed training of machine learning models across decentralized data sources. However, existing frameworks often lack comprehensive solutions that combine uncertainty quantification, interpretability, and robustness. To address this, we propose FedNAM+, a federated learning framework that integrates Neural Additive Models (NAMs) with a novel conformal prediction method to enable interpretable and reliable uncertainty estimation. Our method introduces a dynamic level adjustment technique that utilizes gradient-based sensitivity maps to identify key input features influencing predictions. This facilitates both interpretability and pixel-wise uncertainty estimates. Unlike traditional interpretability methods such as LIME and SHAP, which do not provide confidence intervals, FedNAM+ offers visual insights into prediction reliability. We validate our approach through experiments on CT scan, MNIST, and CIFAR datasets, demonstrating high prediction accuracy with minimal loss (e.g., only 0.1% on MNIST), along with transparent uncertainty measures. Visual analysis highlights variable uncertainty intervals, revealing low-confidence regions where model performance can be improved with additional data. Compared to Monte Carlo Dropout, FedNAM+ delivers efficient and global uncertainty estimates with reduced computational overhead, making it particularly suitable for federated learning scenarios. Overall, FedNAM+ provides a robust, interpretable, and computationally efficient framework that enhances trust and transparency in decentralized predictive modeling.

nan


Article 968

Title@2025-06-28 (6): Evaluating Pavement Deterioration Rates Due to Flooding Events Using Explainable AI

Title: Evaluating Pavement Deterioration Rates Due to Flooding Events Using Explainable AI Bewertung von Verschlechterungsraten aufgrund von Überschwemmungen unter Verwendung erklärbarer KI 利用可解释的AI 评估洪水事件造成的平 道 况 况 率 2507.01056v1

Authors (4): Lidan Peng, Lu Gao, Feng Hong, Jingran Sun

Flooding can damage pavement infrastructure significantly, causing both immediate and long-term structural and functional issues. This research investigates how flooding events affect pavement deterioration, specifically focusing on measuring pavement roughness by the International Roughness Index (IRI). To quantify these effects, we utilized 20 years of pavement condition data from TxDOT’s PMIS database, which is integrated with flood event data, including duration and spatial extent. Statistical analyses were performed to compare IRI values before and after flooding and to calculate the deterioration rates influenced by flood exposure. Moreover, we applied Explainable Artificial Intelligence (XAI) techniques, such as SHapley Additive exPlanations (SHAP) and Local Interpretable Model-Agnostic Explanations (LIME), to assess the impact of flooding on pavement performance. The results demonstrate that flood-affected pavements experience a more rapid increase in roughness compared to non-flooded sections. These findings emphasize the need for proactive flood mitigation strategies, including improved drainage systems, flood-resistant materials, and preventative maintenance, to enhance pavement resilience in vulnerable regions.

nan


Article 969

Title@2025-06-28 (6): Efficiently Serving Large Multimodal Models Using EPD Disaggregation

Title: Efficiently Serving Large Multimodal Models Using EPD Disaggregation Effizientes Servieren großer multimodaler Modelle mit EPD-Disaggregation 利用EPD拆分有效服务大型多模式模式 2501.05460v4

Authors (12): Gursimran Singh, Xinglu Wang, Yifan Hu, Timothy Yu, Linzi Xing, Wei Jiang, Zhefeng Wang, Xiaolong Bai, Yi Li, Ying Xiong, Yong Zhang, Zhenan Fan

Large Multimodal Models (LMMs) extend Large Language Models (LLMs) by handling diverse inputs such as images, audio, and video, but at the cost of adding a multimodal encoding stage that increases both computational and memory overhead. This step negatively affects key Service Level Objectives (SLOs), such as time to first token (TTFT) and time per output token (TPOT). We introduce Encode-Prefill-Decode (EPD) Disaggregation, a novel framework that separates the encoding, prefill, and decode stages onto dedicated resources. Unlike current systems, which bundle encoding and prefill together, our approach decouples these steps, unlocking new opportunities and optimizations. These include a mechanism to cache multimedia tokens for efficient transfer, a novel way to parallelize the encoding load within a request, a module for optimal resource allocation for disaggregated serving, and a novel role-switching method to handle changing workload characteristics. Experimental evaluations with popular LMMs show substantial gains in memory efficiency (up to 15x lower peak memory utilization), batch sizes (up to 22x larger), 10x more images per request, and 2.2x larger KV caches. Furthermore, it leads to significant improvements in SLO attainment (up to 90-100% improvement) and TTFT (up to 71% reduction), compared to systems that do not disaggregate. The code is available at https://github.com/vbdi/epdserve.

nan


Article 970

Title@2025-06-28 (6): Robust Tensor Completion via Gradient Tensor Nulclear L1-L2 Norm for Traffic Data Recovery

Title: Robust Tensor Completion via Gradient Tensor Nulclear L1-L2 Norm for Traffic Data Recovery Robuste Tensor-Vervollständigung über Gradient Tensor Nulclear L1-L2 Norm für Verkehrsdatenrettung 通过梯度 Tensor Nulclear L1-L2 用于交通数据恢复的规范 2506.22732v1

Authors (4): Hao Shu, Jicheng Li, Tianyv Lei, Lijun Sun

In real-world scenarios, spatiotemporal traffic data frequently experiences dual degradation from missing values and noise caused by sensor malfunctions and communication failures. Therefore, effective data recovery methods are essential to ensure the reliability of downstream data-driven applications. while classical tensor completion methods have been widely adopted, they are incapable of modeling noise, making them unsuitable for complex scenarios involving simultaneous data missingness and noise interference. Existing Robust Tensor Completion (RTC) approaches offer potential solutions by separately modeling the actual tensor data and noise. However, their effectiveness is often constrained by the over-relaxation of convex rank surrogates and the suboptimal utilization of local consistency, leading to inadequate model accuracy. To address these limitations, we first introduce the tensor L1-L2 norm, a novel non-convex tensor rank surrogate that functions as an effective low-rank representation tool. Leveraging an advanced feature fusion strategy, we further develop the gradient tensor L1-L2 norm by incorporating the tensor L1-L2 norm in the gradient domain. By integrating the gradient tensor nuclear L1-L2 norm into the RTC framework, we propose the Robust Tensor Completion via Gradient Tensor Nuclear L1-L2 Norm (RTC-GTNLN) model, which not only fully exploits both global low-rankness and local consistency without trade-off parameter, but also effectively handles the dual degradation challenges of missing data and noise in traffic data. Extensive experiments conducted on multiple real-world traffic datasets demonstrate that the RTC-GTNLN model consistently outperforms existing state-of-the-art methods in complex recovery scenarios involving simultaneous missing values and noise.

nan


Article 971

Title@2025-06-28 (6): XTransfer: Cross-Modality Model Transfer for Human Sensing with Few Data at the Edge

Title: XTransfer: Cross-Modality Model Transfer for Human Sensing with Few Data at the Edge XTransfer: Cross-Modality-Modelltransfer für die menschliche Wahrnehmung mit wenigen Daten am Rand Xtransfer:人类遥感跨模式模式转让模型,边缘几乎没有数据 2506.22726v1

Authors (9): Yu Zhang, Xi Zhang, Hualin zhou, Xinyuan Chen, Shang Gao, Hong Jia, Jianfei Yang, Yuankai Qi, Tao Gu

Deep learning for human sensing on edge systems offers significant opportunities for smart applications. However, its training and development are hindered by the limited availability of sensor data and resource constraints of edge systems. Current methods that rely on transferring pre-trained models often encounter issues such as modality shift and high resource demands, resulting in substantial accuracy loss, resource overhead, and poor adaptability across different sensing applications. In this paper, we propose XTransfer, a first-of-its-kind method for resource-efficient, modality-agnostic model transfer. XTransfer freely leverages single or multiple pre-trained models and transfers knowledge across different modalities by (i) model repairing that safely repairs modality shift in pre-trained model layers with only few sensor data, and (ii) layer recombining that efficiently searches and recombines layers of interest from source models in a layer-wise manner to create compact models. We benchmark various baselines across diverse human sensing datasets spanning different modalities. Comprehensive results demonstrate that XTransfer achieves state-of-the-art performance on human sensing tasks while significantly reducing the costs of sensor data collection, model training, and edge deployment.

nan


Article 972

Title@2025-06-28 (6): BEST-Route: Adaptive LLM Routing with Test-Time Optimal Compute

Title: BEST-Route: Adaptive LLM Routing with Test-Time Optimal Compute BEST-Route: Adaptives LLM Routing mit Test-Time Optimal Compute 最佳选择:用测试时最佳计算法运行的适应性LMLM 2506.22716v1

Authors (10): Dujian Ding, Ankur Mallick, Shaokun Zhang, Chi Wang, Daniel Madrigal, Mirian Del Carmen Hipolito Garcia, Menglin Xia, Laks V. S. Lakshmanan, Qingyun Wu, Victor Rühle

Large language models (LLMs) are powerful tools but are often expensive to deploy at scale. LLM query routing mitigates this by dynamically assigning queries to models of varying cost and quality to obtain a desired trade-off. Prior query routing approaches generate only one response from the selected model and a single response from a small (inexpensive) model was often not good enough to beat a response from a large (expensive) model due to which they end up overusing the large model and missing out on potential cost savings. However, it is well known that for small models, generating multiple responses and selecting the best can enhance quality while remaining cheaper than a single large-model response. We leverage this idea to propose BEST-Route, a novel routing framework that chooses a model and the number of responses to sample from it based on query difficulty and the quality thresholds. Experiments on real-world datasets demonstrate that our method reduces costs by up to 60% with less than 1% performance drop.

nan


Article 973

Title@2025-06-28 (6): Libra: Synergizing CUDA and Tensor Cores for High-Performance Sparse Matrix Multiplication

Title: Libra: Synergizing CUDA and Tensor Cores for High-Performance Sparse Matrix Multiplication Waage: CUDA- und Tensorkerne für hochleistungsfähige Sparse-Matrix-Multiplikation synergisieren 激光仪:将CUDA和Tensor核心同步用于高性能散射矩阵乘法 2506.22714v1

Authors (7): Jinliang Shi, Shigang Li, Youxuan Xu, Xueying Wang, Rongtian Fu, Zhi Ma, Tong Wu

Sparse matrix multiplication operators (i.e., SpMM and SDDMM) are widely used in deep learning and scientific computing. Modern accelerators are commonly equipped with Tensor cores and CUDA cores to accelerate sparse operators. The former brings superior computing power but only for structured matrix multiplication, while the latter has relatively lower performance but with higher programming flexibility. In this work, we discover that utilizing one resource alone leads to inferior performance for sparse matrix multiplication, due to their respective limitations. To this end, we propose Libra, a systematic approach that enables synergistic computation between CUDA and Tensor cores to achieve the best performance for sparse matrix multiplication. Specifically, we propose a 2D-aware workload distribution strategy to find out the sweet point of task mapping for different sparse operators, leveraging both the high performance of Tensor cores and the low computational redundancy on CUDA cores. In addition, Libra incorporates systematic optimizations for heterogeneous computing, including hybrid load-balancing, finely optimized kernel implementations, and GPU-accelerated preprocessing. Extensive experimental results on H100 and RTX 4090 GPUs show that Libra outperforms the state-of-the-art by on average 3.1x (up to 9.23x) over DTC-SpMM and 2.9x (up to 3.9x) for end-to-end GNN applications. Libra opens up a new perspective for sparse operator acceleration by fully exploiting the heterogeneous computing resources on GPUs.

nan


Article 974

Title@2025-06-28 (6): Generalized Linear Mode Connectivity for Transformers

Title: Generalized Linear Mode Connectivity for Transformers Generalisierte Linearmodus-Konnektivität für Transformatoren 变形器通用线性模式连接 2506.22712v1

Authors (6): Alexander Theus, Alessandro Cabodi, Sotiris Anagnostidis, Antonio Orvieto, Sidak Pal Singh, Valentina Boeva

Understanding the geometry of neural network loss landscapes is a central question in deep learning, with implications for generalization and optimization. A striking phenomenon is linear mode connectivity (LMC), where independently trained models can be connected by low- or zero-loss paths, despite appearing to lie in separate loss basins. However, this is often obscured by symmetries in parameter space – such as neuron permutations – which make functionally equivalent models appear dissimilar. Prior work has predominantly focused on neuron re-ordering through permutations, but such approaches are limited in scope and fail to capture the richer symmetries exhibited by modern architectures such as Transformers. In this work, we introduce a unified framework that captures four symmetry classes: permutations, semi-permutations, orthogonal transformations, and general invertible maps – broadening the set of valid reparameterizations and subsuming many previous approaches as special cases. Crucially, this generalization enables, for the first time, the discovery of low- and zero-barrier linear interpolation paths between independently trained Vision Transformers and GPT-2 models. These results reveal deeper structure in the loss landscape and underscore the importance of symmetry-aware analysis for understanding model space geometry.

nan


Article 975

Title@2025-06-28 (6): Supporting renewable energy planning and operation with data-driven high-resolution ensemble weather forecast

Title: Supporting renewable energy planning and operation with data-driven high-resolution ensemble weather forecast Unterstützung der Planung und des Betriebs erneuerbarer Energien mit datengetriebener Hochauflösungs-Ensemble-Wettervorhersage 支持可再生能源规划和运作,以数据驱动的高分辨率高分辨率气象组合组合天气预报支持可再生能源规划和运作 2505.04396v3

Authors (13): Jingnan Wang, Jie Chao, Shangshang Yang, Kaijun Ren, Kefeng Deng, Xi Chen, Yaxin Liu, Hanqiuzi Wen, Ziniu Xiao, Lifeng Zhang, Xiaodong Wang, Jiping Guan, Baoxiang Pan

The planning and operation of renewable energy, especially wind power, depend crucially on accurate, timely, and high-resolution weather information. Coarse-grid global numerical weather forecasts are typically downscaled to meet these requirements, introducing challenges of scale inconsistency, process representation error, computation cost, and entanglement of distinct uncertainty sources from chaoticity, model bias, and large-scale forcing. We address these challenges by learning the climatological distribution of a target wind farm using its high-resolution numerical weather simulations. An optimal combination of this learned high-resolution climatological prior with coarse-grid large scale forecasts yields highly accurate, fine-grained, full-variable, large ensemble of weather pattern forecasts. Using observed meteorological records and wind turbine power outputs as references, the proposed methodology verifies advantageously compared to existing numerical/statistical forecasting-downscaling pipelines, regarding either deterministic/probabilistic skills or economic gains. Moreover, a 100-member, 10-day forecast with spatial resolution of 1 km and output frequency of 15 min takes < 1 hour on a moderate-end GPU, as contrast to $\mathcal{O}(10^3)$ CPU hours for conventional numerical simulation. By drastically reducing computational costs while maintaining accuracy, our method paves the way for more efficient and reliable renewable energy planning and operation.

nan


Article 976

Title@2025-06-28 (6): Predicting and Explaining Customer Data Sharing in the Open Banking

Title: Predicting and Explaining Customer Data Sharing in the Open Banking Vorhersage und Erklärung der Weitergabe von Kundendaten im Open Banking 在开放银行业务中预测和解释客户数据分享 2507.01987v1

Authors (9): João B. G. de Brito, Rodrigo Heldt, Cleo S. Silveira, Matthias Bogaert, Guilherme B. Bucco, Fernando B. Luce, João L. Becker, Filipe J. Zabala, Michel J. Anzanello

The emergence of Open Banking represents a significant shift in financial data management, influencing financial institutions’ market dynamics and marketing strategies. This increased competition creates opportunities and challenges, as institutions manage data inflow to improve products and services while mitigating data outflow that could aid competitors. This study introduces a framework to predict customers’ propensity to share data via Open Banking and interprets this behavior through Explanatory Model Analysis (EMA). Using data from a large Brazilian financial institution with approximately 3.2 million customers, a hybrid data balancing strategy incorporating ADASYN and NEARMISS techniques was employed to address the infrequency of data sharing and enhance the training of XGBoost models. These models accurately predicted customer data sharing, achieving 91.39% accuracy for inflow and 91.53% for outflow. The EMA phase combined the Shapley Additive Explanations (SHAP) method with the Classification and Regression Tree (CART) technique, revealing the most influential features on customer decisions. Key features included the number of transactions and purchases in mobile channels, interactions within these channels, and credit-related features, particularly credit card usage across the national banking system. These results highlight the critical role of mobile engagement and credit in driving customer data-sharing behaviors, providing financial institutions with strategic insights to enhance competitiveness and innovation in the Open Banking environment.

nan


Article 977

Title@2025-06-28 (6): FairMarket-RL: LLM-Guided Fairness Shaping for Multi-Agent Reinforcement Learning in Peer-to-Peer Markets

Title: FairMarket-RL: LLM-Guided Fairness Shaping for Multi-Agent Reinforcement Learning in Peer-to-Peer Markets FairMarket-RL: LLM-geführte Fairness-Forming für das Mehr-Agenten-Verstärkungs-Lernen in Peer-to-Peer-Märkten Fairket-RL:在同行与同行市场中为多机构强化学习提供LLM-指导公平模式 2506.22708v1

Authors (6): Shrenik Jadhav, Birva Sevak, Srijita Das, Akhtar Hussain, Wencong Su, Van-Hai Bui

Peer-to-peer (P2P) trading is increasingly recognized as a key mechanism for decentralized market regulation, yet existing approaches often lack robust frameworks to ensure fairness. This paper presents FairMarket-RL, a novel hybrid framework that combines Large Language Models (LLMs) with Reinforcement Learning (RL) to enable fairness-aware trading agents. In a simulated P2P microgrid with multiple sellers and buyers, the LLM acts as a real-time fairness critic, evaluating each trading episode using two metrics: Fairness-To-Buyer (FTB) and Fairness-Between-Sellers (FBS). These fairness scores are integrated into agent rewards through scheduled {\lambda}-coefficients, forming an adaptive LLM-guided reward shaping loop that replaces brittle, rule-based fairness constraints. Agents are trained using Independent Proximal Policy Optimization (IPPO) and achieve equitable outcomes, fulfilling over 90% of buyer demand, maintaining fair seller margins, and consistently reaching FTB and FBS scores above 0.80. The training process demonstrates that fairness feedback improves convergence, reduces buyer shortfalls, and narrows profit disparities between sellers. With its language-based critic, the framework scales naturally, and its extension to a large power distribution system with household prosumers illustrates its practical applicability. FairMarket-RL thus offers a scalable, equity-driven solution for autonomous trading in decentralized energy systems.

nan


Article 978

Title@2025-06-28 (6): Lower bounds for trace estimation via Block Krylov and other methods

Title: Lower bounds for trace estimation via Block Krylov and other methods Untere Grenzen für die Spurenabschätzung über Block Krylov und andere Methoden 通过Clock Krylov 和其他方法进行跟踪估计的下限范围 2506.22701v1

Authors (1): Shi Jie Yu

This paper studies theoretical lower bounds for estimating the trace of a matrix function, $\text{tr}(f(A))$, focusing on methods that use Hutchinson’s method along with Block Krylov techniques. These methods work by approximating matrix-vector products like $f(A)V$ using a Block Krylov subspace. This is closely related to approximating functions with polynomials. We derive theoretical upper bounds on how many Krylov steps are needed for functions such as $A^{-1/2}$ and $A^{-1}$ by analyzing the upper bounds from the polynomial approximation of their scalar equivalent. In addition, we also develop lower limits on the number of queries needed for trace estimation, specifically for $\text{tr}(W^{-p})$ where $W$ is a Wishart matrix. Our study clarifies the connection between the number of steps in Block Krylov methods and the degree of the polynomial used for approximation. This links the total cost of trace estimation to basic limits in polynomial approximation and how much information is needed for the computation.

nan


Article 979

Title@2025-06-28 (6): Suboptimality analysis of receding horizon quadratic control with unknown linear systems and its applications in learning-based control

Title: Suboptimality analysis of receding horizon quadratic control with unknown linear systems and its applications in learning-based control Suboptimalitätsanalyse der receding horizont quadratic control mit unbekannten linearen Systemen und deren Anwendungen in der Learning-based control 以未知线性系统及其在以学习为基础的控制中的应用对正在后退的地平地平线二次控制及其应用的不优化程度分析 2301.07876v4

Authors (3): Shengling Shi, Anastasios Tsiamis, Bart De Schutter

This work analyzes how the trade-off between the modeling error, the terminal value function error, and the prediction horizon affects the performance of a nominal receding-horizon linear quadratic (LQ) controller. By developing a novel perturbation result of the Riccati difference equation, a novel performance upper bound is obtained and suggests that for many cases, the prediction horizon can be either one or infinity to improve the control performance, depending on the relative difference between the modeling error and the terminal value function error. The result also shows that when an infinite horizon is desired, a finite prediction horizon that is larger than the controllability index can be sufficient for achieving a near-optimal performance, revealing a close relation between the prediction horizon and controllability. The obtained suboptimality performance upper bound is applied to provide novel sample complexity and regret guarantees for nominal receding-horizon LQ controllers in a learning-based setting. We show that an adaptive prediction horizon that increases as a logarithmic function of time is beneficial for regret minimization.

nan


Article 980

Title@2025-06-28 (6): Residual Matrix Transformers: Scaling the Size of the Residual Stream

Title: Residual Matrix Transformers: Scaling the Size of the Residual Stream Residual Matrix Transformers: Skalierung der Größe des Residual Stream 残余矩阵变异器:扩大残余流的规模 2506.22696v1

Authors (2): Brian Mak, Jeffrey Flanigan

The residual stream acts as a memory bus where transformer layers both store and access features (Elhage et al., 2021). We consider changing the mechanism for retrieving and storing information in the residual stream, and replace the residual stream of the transformer with an outer product memory matrix (Kohonen, 1972, Anderson, 1972). We call this model the Residual Matrix Transformer (RMT). We find that the RMT enjoys a number of attractive properties: 1) the size of the residual stream can be scaled independently of compute and model size, improving performance, 2) the RMT can achieve the same loss as the transformer with 58% fewer FLOPS, 25% fewer parameters, and 41% fewer training tokens tokens, and 3) the RMT outperforms the transformer on downstream evaluations. We theoretically analyze the transformer and the RMT, and show that the RMT allows for more efficient scaling of the residual stream, as well as improved variance propagation properties. Code for this project can be found at https://github.com/bmac3/residual-matrix-transformer.

nan


Article 981

Title@2025-06-28 (6): Mixing Time of the Proximal Sampler in Relative Fisher Information via Strong Data Processing Inequality

Title: Mixing Time of the Proximal Sampler in Relative Fisher Information via Strong Data Processing Inequality Mischzeit des Proximal-Probenehmers in relativen Fisher-Informationen über starke Datenverarbeitungsungleichheit 利用强有力的数据处理不平等,混合比较渔业信息中通过强有力的数据处理不平等的 Proximal样本采集器的时间 2502.05623v2

Authors (1): Andre Wibisono

We study the mixing time guarantee for sampling in relative Fisher information via the Proximal Sampler algorithm, which is an approximate proximal discretization of the Langevin dynamics. We show that when the target probability distribution is strongly log-concave, the relative Fisher information converges exponentially fast along the Proximal Sampler; this matches the exponential convergence rate of the relative Fisher information along the continuous-time Langevin dynamics for strongly log-concave target. When combined with a standard implementation of the Proximal Sampler via rejection sampling, this exponential convergence rate provides a high-accuracy iteration complexity guarantee for the Proximal Sampler in relative Fisher information when the target distribution is strongly log-concave and log-smooth. Our proof proceeds by establishing a strong data processing inequality for relative Fisher information along the Gaussian channel under strong log-concavity, and a data processing inequality along the reverse Gaussian channel for a special distribution. The forward and reverse Gaussian channels compose to form the Proximal Sampler, and these data processing inequalities imply the exponential convergence rate of the relative Fisher information along the Proximal Sampler.

nan


Article 982

Title@2025-06-28 (6): On the Convergence of Min-Max Langevin Dynamics and Algorithm

Title: On the Convergence of Min-Max Langevin Dynamics and Algorithm Über die Konvergenz von Min-Max Langevin Dynamik und Algorithmen 关于Min-Max Langevin动力公司和Algorithm公司的趋同 2412.20471v3

Authors (4): Yang Cai, Siddharth Mitra, Xiuyuan Wang, Andre Wibisono

We study zero-sum games in the space of probability distributions over the Euclidean space $\mathbb{R}^d$ with entropy regularization, in the setting when the interaction function between the players is smooth and strongly convex-strongly concave. We prove an exponential convergence guarantee for the mean-field min-max Langevin dynamics to compute the equilibrium distribution of the zero-sum game. We also study the finite-particle approximation of the mean-field min-max Langevin dynamics, both in continuous and discrete times. We prove biased convergence guarantees for the continuous-time finite-particle min-max Langevin dynamics to the stationary mean-field equilibrium distribution with an explicit bias term which does not scale with the number of particles. We also prove biased convergence guarantees for the discrete-time finite-particle min-max Langevin algorithm to the stationary mean-field equilibrium distribution with an additional bias term which scales with the step size and the number of particles. This provides an explicit iteration complexity for the average particle along the finite-particle algorithm to approximately compute the equilibrium distribution of the zero-sum game.

nan


Article 983

Title@2025-06-28 (6): Scaling Data-Constrained Language Models

Title: Scaling Data-Constrained Language Models Skalierung von datengebundenen Sprachmodellen 受数据约束的语言模式 2305.16264v5

Authors (9): Niklas Muennighoff, Alexander M. Rush, Boaz Barak, Teven Le Scao, Aleksandra Piktus, Nouamane Tazi, Sampo Pyysalo, Thomas Wolf, Colin Raffel

The current trend of scaling language models involves increasing both parameter count and training dataset size. Extrapolating this trend suggests that training dataset size may soon be limited by the amount of text data available on the internet. Motivated by this limit, we investigate scaling language models in data-constrained regimes. Specifically, we run a large set of experiments varying the extent of data repetition and compute budget, ranging up to 900 billion training tokens and 9 billion parameter models. We find that with constrained data for a fixed compute budget, training with up to 4 epochs of repeated data yields negligible changes to loss compared to having unique data. However, with more repetition, the value of adding compute eventually decays to zero. We propose and empirically validate a scaling law for compute optimality that accounts for the decreasing value of repeated tokens and excess parameters. Finally, we experiment with approaches mitigating data scarcity, including augmenting the training dataset with code data or removing commonly used filters. Models and datasets from our 400 training runs are freely available at https://github.com/huggingface/datablations.

nan


Article 984

Title@2025-06-27 (5): Mitigating Semantic Collapse in Generative Personalization with a Surprisingly Simple Test-Time Embedding Adjustment

Title: Mitigating Semantic Collapse in Generative Personalization with a Surprisingly Simple Test-Time Embedding Adjustment Semantischer Zusammenbruch bei der Generativen Personalisierung mit einer überraschend einfachen Test-Time-Einbettungsanpassung abmildern 采用令人惊讶的简单测试时间嵌入调整法, 减轻在产生个性化过程中的语义消沉 2506.22685v1

Authors (8): Anh Bui, Trang Vu, Trung Le, Junae Kim, Tamas Abraham, Rollin Omari, Amar Kaur, Dinh Phung

In this paper, we investigate the semantic collapsing problem in generative personalization, an under-explored topic where the learned visual concept ($V^$) gradually shifts from its original textual meaning and comes to dominate other concepts in multi-concept input prompts. This issue not only reduces the semantic richness of complex input prompts like “a photo of $V^$ wearing glasses and playing guitar” into simpler, less contextually rich forms such as “a photo of $V^$” but also leads to simplified output images that fail to capture the intended concept. We identify the root cause as unconstrained optimisation, which allows the learned embedding $V^$ to drift arbitrarily in the embedding space, both in direction and magnitude. To address this, we propose a simple yet effective training-free method that adjusts the magnitude and direction of pre-trained embedding at inference time, effectively mitigating the semantic collapsing problem. Our method is broadly applicable across different personalization methods and demonstrates significant improvements in text-image alignment in diverse use cases. Our code is anonymously published at https://anonymous.4open.science/r/Embedding-Adjustment.

nan


Article 985

Title@2025-06-27 (5): PriorDiffusion: Leverage Language Prior in Diffusion Models for Monocular Depth Estimation

Title: PriorDiffusion: Leverage Language Prior in Diffusion Models for Monocular Depth Estimation PriorDiffusion: Leverage Language Prior in Diffusionsmodellen für monookulare Tiefenschätzung 先前传播:在单人深度估算扩散模型中先使用语言 2411.16750v3

Authors (8): Ziyao Zeng, Jingcheng Ni, Daniel Wang, Patrick Rim, Younjoon Chung, Fengyu Yang, Byung-Woo Hong, Alex Wong

Traditional monocular depth estimation suffers from inherent ambiguity and visual nuisance. We argue that language prior can enhance monocular depth estimation by leveraging the inductive bias learned during the text-to-image pre-training of diffusion models. The ability of these models to generate images that align with text indicates that they have learned the spatial relationships, size, and shape of specified objects, which can be applied to improve depth estimation. Thus, we propose PriorDiffusion, using a pre-trained text-to-image diffusion model that takes both images and corresponding text descriptions to infer affine-invariant depth through a denoising process. We also show that language prior enhances the model’s perception of specific regions of images that users care about and describe. Simultaneously, language prior acts as a constraint to accelerate the convergence of both training and the inference diffusion trajectory. By training on HyperSim and Virtual KITTI, we achieve faster training convergence, fewer inference diffusion steps, and state-of-the-art zero-shot performance across NYUv2, KITTI, ETH3D, and ScanNet. Code will be released upon acceptance.

nan


Article 986

Title@2025-06-27 (5): CMI-Bench: A Comprehensive Benchmark for Evaluating Music Instruction Following

Title: CMI-Bench: A Comprehensive Benchmark for Evaluating Music Instruction Following CMI-Bench: Ein umfassender Benchmark für die Bewertung der Musikanleitung CMI-Bench:评估音乐教学的全面基准 2506.12285v2

Authors (5): Yinghao Ma, Siyou Li, Juntao Yu, Emmanouil Benetos, Akira Maezawa

Recent advances in audio-text large language models (LLMs) have opened new possibilities for music understanding and generation. However, existing benchmarks are limited in scope, often relying on simplified tasks or multi-choice evaluations that fail to reflect the complexity of real-world music analysis. We reinterpret a broad range of traditional MIR annotations as instruction-following formats and introduce CMI-Bench, a comprehensive music instruction following benchmark designed to evaluate audio-text LLMs on a diverse set of music information retrieval (MIR) tasks. These include genre classification, emotion regression, emotion tagging, instrument classification, pitch estimation, key detection, lyrics transcription, melody extraction, vocal technique recognition, instrument performance technique detection, music tagging, music captioning, and (down)beat tracking: reflecting core challenges in MIR research. Unlike previous benchmarks, CMI-Bench adopts standardized evaluation metrics consistent with previous state-of-the-art MIR models, ensuring direct comparability with supervised approaches. We provide an evaluation toolkit supporting all open-source audio-textual LLMs, including LTU, Qwen-audio, SALMONN, MusiLingo, etc. Experiment results reveal significant performance gaps between LLMs and supervised models, along with their culture, chronological and gender bias, highlighting the potential and limitations of current models in addressing MIR tasks. CMI-Bench establishes a unified foundation for evaluating music instruction following, driving progress in music-aware LLMs.

nan


Article 987

Title@2025-06-27 (5): SLED: A Speculative LLM Decoding Framework for Efficient Edge Serving

Title: SLED: A Speculative LLM Decoding Framework for Efficient Edge Serving SLED: Ein spekulatives LLM-Decoding-Framework für effizientes Edge Serving SLED: 有效边缘服务投机性LLM代谢框架 2506.09397v3

Authors (8): Xiangchen Li, Dimitrios Spatharakis, Saeid Ghafouri, Jiakun Fan, Hans Vandierendonck, Deepu John, Bo Ji, Dimitrios Nikolopoulos

The growing gap between the increasing complexity of large language models (LLMs) and the limited computational budgets of edge devices poses a key challenge for efficient on-device inference, despite gradual improvements in hardware capabilities. Existing strategies, such as aggressive quantization, pruning, or remote inference, trade accuracy for efficiency or lead to substantial cost burdens. This position paper introduces a new framework that leverages speculative decoding, previously viewed primarily as a decoding acceleration technique for autoregressive generation of LLMs, as a promising approach specifically adapted for edge computing by orchestrating computation across heterogeneous devices. We propose \acronym, a framework that allows lightweight edge devices to draft multiple candidate tokens locally using diverse draft models, while a single, shared edge server verifies the tokens utilizing a more precise target model. To further increase the efficiency of verification, the edge server batch the diverse verification requests from devices. This approach supports device heterogeneity and reduces server-side memory footprint by sharing the same upstream target model across multiple devices. Our initial experiments with Jetson Orin Nano, Raspberry Pi 4B/5, and an edge server equipped with 4 Nvidia A100 GPUs indicate substantial benefits: 2.2 more system throughput, 2.8 more system capacity, and better cost efficiency, all without sacrificing model accuracy.

nan


Article 988

Title@2025-06-27 (5): DistShap: Scalable GNN Explanations with Distributed Shapley Values

Title: DistShap: Scalable GNN Explanations with Distributed Shapley Values DistShap: Skalierbare GNN-Erklärungen mit verteilten Shapley-Werten 分布式shap:可缩放的 GNN 解释和分布式形状值 2506.22668v1

Authors (3): Selahattin Akkas, Aditya Devarakonda, Ariful Azad

With the growing adoption of graph neural networks (GNNs), explaining their predictions has become increasingly important. However, attributing predictions to specific edges or features remains computationally expensive. For example, classifying a node with 100 neighbors using a 3-layer GNN may involve identifying important edges from millions of candidates contributing to the prediction. To address this challenge, we propose DistShap, a parallel algorithm that distributes Shapley value-based explanations across multiple GPUs. DistShap operates by sampling subgraphs in a distributed setting, executing GNN inference in parallel across GPUs, and solving a distributed least squares problem to compute edge importance scores. DistShap outperforms most existing GNN explanation methods in accuracy and is the first to scale to GNN models with millions of features by using up to 128 GPUs on the NERSC Perlmutter supercomputer.

nan


Article 989

Title@2025-06-27 (5): VERA: Variational Inference Framework for Jailbreaking Large Language Models

Title: VERA: Variational Inference Framework for Jailbreaking Large Language Models VERA: Variationaler Bezugsrahmen für Jailbreaking große Sprachmodelle VERA:破碎大型语言模型变化推断框架 2506.22666v1

Authors (5): Anamika Lochab, Lu Yan, Patrick Pynadath, Xiangyu Zhang, Ruqi Zhang

The rise of API-only access to state-of-the-art LLMs highlights the need for effective black-box jailbreak methods to identify model vulnerabilities in real-world settings. Without a principled objective for gradient-based optimization, most existing approaches rely on genetic algorithms, which are limited by their initialization and dependence on manually curated prompt pools. Furthermore, these methods require individual optimization for each prompt, failing to provide a comprehensive characterization of model vulnerabilities. To address this gap, we introduce VERA: Variational infErence fRamework for jAilbreaking. VERA casts black-box jailbreak prompting as a variational inference problem, training a small attacker LLM to approximate the target LLM’s posterior over adversarial prompts. Once trained, the attacker can generate diverse, fluent jailbreak prompts for a target query without re-optimization. Experimental results show that VERA achieves strong performance across a range of target LLMs, highlighting the value of probabilistic inference for adversarial prompt generation.

nan


Article 990

Title@2025-06-27 (5): Learning Stochastic Multiscale Models

Title: Learning Stochastic Multiscale Models Stochastische Multiskalenmodelle lernen 学习存储多尺度模型 2506.22655v1

Authors (2): Andrew F. Ilersich, Prasanth B. Nair

The physical sciences are replete with dynamical systems that require the resolution of a wide range of length and time scales. This presents significant computational challenges since direct numerical simulation requires discretization at the finest relevant scales, leading to a high-dimensional state space. In this work, we propose an approach to learn stochastic multiscale models in the form of stochastic differential equations directly from observational data. Our method resolves the state on a coarse mesh while introducing an auxiliary state to capture the effects of unresolved scales. We learn the parameters of the multiscale model using a modern forward-solver-free amortized variational inference method. Our approach draws inspiration from physics-based multiscale modeling approaches, such as large-eddy simulation in fluid dynamics, while learning directly from data. We present numerical studies to demonstrate that our learned multiscale models achieve superior predictive accuracy compared to direct numerical simulation and closure-type models at equivalent resolution.

nan


Article 991

Title@2025-06-27 (5): Probing Quantum Spin Systems with Kolmogorov-Arnold Neural Network Quantum States

Title: Probing Quantum Spin Systems with Kolmogorov-Arnold Neural Network Quantum States Probing Quantum Spin Systems mit Kolmogorov-Arnold Neural Network Quantum States 与Kolmogorov-Arold神经网络 2506.01891v4

Authors (5): Mahmud Ashraf Shamim, Eric A F Reinhardt, Talal Ahmed Chowdhury, Sergei Gleyzer, Paulo T Araujo

Neural Quantum States (NQS) are a class of variational wave functions parametrized by neural networks (NNs) to study quantum many-body systems. In this work, we propose \texttt{SineKAN}, a NQS \textit{ansatz} based on Kolmogorov-Arnold Networks (KANs), to represent quantum mechanical wave functions as nested univariate functions. We show that \texttt{SineKAN} wavefunction with learnable sinusoidal activation functions can capture the ground state energies, fidelities and various correlation functions of the one dimensional Transverse-Field Ising model, Anisotropic Heisenberg model, and Antiferromagnetic $J_{1}-J_{2}$ model with different chain lengths. In our study of the $J_1-J_2$ model with $L=100$ sites, we find that the \texttt{SineKAN} model outperforms several previously explored neural quantum state \textit{ans"atze}, including Restricted Boltzmann Machines (RBMs), Long Short-Term Memory models (LSTMs), and Multi-layer Perceptrons (MLP) \textit{a.k.a.} Feed Forward Neural Networks, when compared to the results obtained from the Density Matrix Renormalization Group (DMRG) algorithm. We find that \texttt{SineKAN} models can be trained to high precisions and accuracies with minimal computational costs.

nan


Article 992

Title@2025-06-27 (5): XxaCT-NN: Structure Agnostic Multimodal Learning for Materials Science

Title: XxaCT-NN: Structure Agnostic Multimodal Learning for Materials Science XxACT-NN: Strukturagnostisches multimodales Lernen für die Materialwissenschaft XxXCT-NN:材料科学机构 – – 材料科学多模式学习 2507.01054v1

Authors (5): Jithendaraa Subramanian, Linda Hung, Daniel Schweigert, Santosh Suram, Weike Ye

Recent advances in materials discovery have been driven by structure-based models, particularly those using crystal graphs. While effective for computational datasets, these models are impractical for real-world applications where atomic structures are often unknown or difficult to obtain. We propose a scalable multimodal framework that learns directly from elemental composition and X-ray diffraction (XRD) – two of the more available modalities in experimental workflows without requiring crystal structure input. Our architecture integrates modality-specific encoders with a cross-attention fusion module and is trained on the 5-million-sample Alexandria dataset. We present masked XRD modeling (MXM), and apply MXM and contrastive alignment as self-supervised pretraining strategies. Pretraining yields faster convergence (up to 4.2x speedup) and improves both accuracy and representation quality. We further demonstrate that multimodal performance scales more favorably with dataset size than unimodal baselines, with gains compounding at larger data regimes. Our results establish a path toward structure-free, experimentally grounded foundation models for materials science.

nan


Article 993

Title@2025-06-27 (5): Vulnerable Road User Detection and Safety Enhancement: A Comprehensive Survey

Title: Vulnerable Road User Detection and Safety Enhancement: A Comprehensive Survey Verwundbare Erkennung und Verbesserung der Sicherheit von Verkehrsteilnehmern: Eine umfassende Umfrage 弱势道路使用者检测和加强安全:全面调查 2405.19202v5

Authors (8): Renato M. Silva, Gregorio F. Azevedo, Matheus V. V. Berto, Jean R. Rocha, Eduardo C. Fidelis, Matheus V. Nogueira, Pedro H. Lisboa, Tiago A. Almeida

Traffic incidents involving vulnerable road users (VRUs) constitute a significant proportion of global road accidents. Advances in traffic communication ecosystems, coupled with sophisticated signal processing and machine learning techniques, have facilitated the utilization of data from diverse sensors. Despite these advancements and the availability of extensive datasets, substantial progress is required to mitigate traffic casualties. This paper provides a comprehensive survey of state-of-the-art technologies and methodologies to enhance the safety of VRUs. The study investigates the communication networks between vehicles and VRUs, emphasizing the integration of advanced sensors and the availability of relevant datasets. It explores preprocessing techniques and data fusion methods to enhance sensor data quality. Furthermore, our study assesses critical simulation environments essential for developing and testing VRU safety systems. Our research also highlights recent advances in VRU detection and classification algorithms, addressing challenges such as variable environmental conditions. Additionally, we cover cutting-edge research in predicting VRU intentions and behaviors, which is mandatory for proactive collision avoidance strategies. Through this survey, we aim to provide a comprehensive understanding of the current landscape of VRU safety technologies, identifying areas of progress and areas needing further research and development.

nan


Article 994

Title@2025-06-27 (5): Cost-effective Reduced-Order Modeling via Bayesian Active Learning

Title: Cost-effective Reduced-Order Modeling via Bayesian Active Learning Kostengünstige Reduzierte Auftragsmodellierung über Bayesian Active Learning 通过巴耶斯积极学习进行降低分级模拟 2506.22645v1

Authors (4): Amir Hossein Rahmati, Nathan M. Urban, Byung-Jun Yoon, Xiaoning Qian

Machine Learning surrogates have been developed to accelerate solving systems dynamics of complex processes in different science and engineering applications. To faithfully capture governing systems dynamics, these methods rely on large training datasets, hence restricting their applicability in real-world problems. In this work, we propose BayPOD-AL, an active learning framework based on an uncertainty-aware Bayesian proper orthogonal decomposition (POD) approach, which aims to effectively learn reduced-order models from high-fidelity full-order models representing complex systems. Experimental results on predicting the temperature evolution over a rod demonstrate BayPOD-AL’s effectiveness in suggesting the informative data and reducing computational cost related to constructing a training dataset compared to other uncertainty-guided active learning strategies. Furthermore, we demonstrate BayPOD-AL’s generalizability and efficiency by evaluating its performance on a dataset of higher temporal resolution than the training dataset.

nan


Article 995

Title@2025-06-27 (5): Diversity by Design: Addressing Mode Collapse Improves scRNA-seq Perturbation Modeling on Well-Calibrated Metrics

Title: Diversity by Design: Addressing Mode Collapse Improves scRNA-seq Perturbation Modeling on Well-Calibrated Metrics Diversity by Design: Addressing Mode Collapse Verbessert scRNA-seq Perturbation Modeling auf gut kalibrierten Metrics 通过设计实现多样性:解决模式折叠问题,改进精确度度度模型的ScRNA-等分扰动模型 2506.22641v1

Authors (6): Gabriel M. Mejia, Henry E. Miller, Francis J. A. Leblanc, Bo Wang, Brendan Swain, Lucas Paulo de Lima Camillo

Recent benchmarks reveal that models for single-cell perturbation response are often outperformed by simply predicting the dataset mean. We trace this anomaly to a metric artifact: control-referenced deltas and unweighted error metrics reward mode collapse whenever the control is biased or the biological signal is sparse. Large-scale \textit{in silico} simulations and analysis of two real-world perturbation datasets confirm that shared reference shifts, not genuine biological change, drives high performance in these evaluations. We introduce differentially expressed gene (DEG)-aware metrics, weighted mean-squared error (WMSE) and weighted delta $R^{2}$ ($R^{2}_{w}(\Delta)$) with respect to all perturbations, that measure error in niche signals with high sensitivity. We further introduce negative and positive performance baselines to calibrate these metrics. With these improvements, the mean baseline sinks to null performance while genuine predictors are correctly rewarded. Finally, we show that using WMSE as a loss function reduces mode collapse and improves model performance.

nan


Article 996

Title@2025-06-27 (5): Augmenting Online RL with Offline Data is All You Need: A Unified Hybrid RL Algorithm Design and Analysis

Title: Augmenting Online RL with Offline Data is All You Need: A Unified Hybrid RL Algorithm Design and Analysis Augmenting Online RL mit Offline-Daten ist alles, was Sie brauchen: Eine einheitliche Hybrid-RL-Algorithmus Design und Analyse 用离线数据扩大在线 RL 的离线数据是所有你需要的: 统一混合 RL 算法设计和分析 2505.13768v3

Authors (5): Ruiquan Huang, Donghao Li, Chengshuai Shi, Cong Shen, Jing Yang

This paper investigates a hybrid learning framework for reinforcement learning (RL) in which the agent can leverage both an offline dataset and online interactions to learn the optimal policy. We present a unified algorithm and analysis and show that augmenting confidence-based online RL algorithms with the offline dataset outperforms any pure online or offline algorithm alone and achieves state-of-the-art results under two learning metrics, i.e., sub-optimality gap and online learning regret. Specifically, we show that our algorithm achieves a sub-optimality gap $\tilde{O}(\sqrt{1/(N_0/\mathtt{C}(\pi^* \rho)+N_1}) )$, where $\mathtt{C}(\pi^* \rho)$ is a new concentrability coefficient, $N_0$ and $N_1$ are the numbers of offline and online samples, respectively. For regret minimization, we show that it achieves a constant $\tilde{O}( \sqrt{N_1/(N_0/\mathtt{C}(\pi^{-} \rho)+N_1)} )$ speed-up compared to pure online learning, where $\mathtt{C}(\pi^- \rho)$ is the concentrability coefficient over all sub-optimal policies. Our results also reveal an interesting separation on the desired coverage properties of the offline dataset for sub-optimality gap minimization and regret minimization. We further validate our theoretical findings in several experiments in special RL models such as linear contextual bandits and Markov decision processes (MDPs).

nan


Article 997

Title@2025-06-27 (5): Layer Importance for Mathematical Reasoning is Forged in Pre-Training and Invariant after Post-Training

Title: Layer Importance for Mathematical Reasoning is Forged in Pre-Training and Invariant after Post-Training Layer Bedeutung für mathematische Vernunft wird im Vor-Training und Invariant nach dem Nach-Training geschmiedet 培训前和培训后变化无常中提出了数学理由的重要性。 2506.22638v1

Authors (5): Aadim Nepal, Safal Shrestha, Anubhav Shrestha, Minwu Kim, Keith Ross

Large language models can exhibit improved mathematical reasoning capabilities following post-training with instruction tuning, reinforcement learning, or knowledge distillation. However, it remains unclear whether these improvements are driven by major changes in transformer layers or from minor adjustments that leave the relative layer importance structures of the base model largely unchanged. We investigate this question through systematic layer-wise ablation experiments, examining base, instruction-tuned, knowledge-distilled, and reinforcement learning variants on mathematical reasoning benchmarks. Our findings show that mathematical reasoning gives rise to a specific layer importance structure, and this structure persists across all post-training paradigms. Removal of such layers causes accuracy drops of up to 80%. In contrast, non-mathematical tasks like factual recall exhibit no critical layers. This distinction suggests that mathematical reasoning requires specialized layers that emerge during pre-training, while other non-reasoning tasks do not. From an information-theoretic perspective, we also observe that these critical layers are the same layers where major representational transformation occurs.

nan


Article 998

Title@2025-06-27 (5): EXPRTS: Exploring and Probing the Robustness of Time Series Forecasting Models

Title: EXPRTS: Exploring and Probing the Robustness of Time Series Forecasting Models EXPRTS: Erforschung und Erprobung der Robustheit von Zeitreihenprognosemodellen EXPRTS:探索和检验时间序列预测模型的威力 2403.03508v3

Authors (7): Håkon Hanisch Kjærnli, Lluis Mas-Ribas, Hans Jakob Håland, Vegard Sjåvik, Aida Ashrafi, Helge Langseth, Odd Erik Gundersen

When deploying time series forecasting models based on machine learning to real world settings, one often encounter situations where the data distribution drifts. Such drifts expose the forecasting models to out-of-distribution (OOD) data, and machine learning models lack robustness in these settings. Robustness can be improved by using deep generative models or genetic algorithms to augment time series datasets, but these approaches lack interpretability and are computationally expensive. In this work, we develop an interpretable and simple framework for generating time series. Our method combines time-series decompositions with analytic functions, and is able to generate time series with characteristics matching both in- and out-of-distribution data. This approach allows users to generate new time series in an interpretable fashion, which can be used to augment the dataset and improve forecasting robustness. We demonstrate our framework through EXPRTS, a visual analytics tool designed for univariate time series forecasting models and datasets. Different visualizations of the data distribution, forecasting errors and single time series instances enable users to explore time series datasets, apply transformations, and evaluate forecasting model robustness across diverse scenarios. We show how our framework can generate meaningful OOD time series that improve model robustness, and we validate EXPRTS effectiveness and usability through three use-cases and a user study.

nan


Article 999

Title@2025-06-27 (5): A hierarchical Vovk-Azoury-Warmuth forecaster with discounting for online regression in RKHS

Title: A hierarchical Vovk-Azoury-Warmuth forecaster with discounting for online regression in RKHS Ein hierarchischer Vovk-Azoury-Warmuth-Prognoser mit Rabatt für Online-Regression in RKHS Vovk-Azoury-Warmuth等级预报器,对RKHS的在线回归进行折扣 2506.22631v1

Authors (1): Dmitry B. Rokhlin

We study the problem of online regression with the unconstrained quadratic loss against a time-varying sequence of functions from a Reproducing Kernel Hilbert Space (RKHS). Recently, Jacobsen and Cutkosky (2024) introduced a discounted Vovk-Azoury-Warmuth (DVAW) forecaster that achieves optimal dynamic regret in the finite-dimensional case. In this work, we lift their approach to the non-parametric domain by synthesizing the DVAW framework with a random feature approximation. We propose a fully adaptive, hierarchical algorithm, which we call H-VAW-D (Hierarchical Vovk-Azoury-Warmuth with Discounting), that learns both the discount factor and the number of random features. We prove that this algorithm, which has a per-iteration computational complexity of $O(T\ln T)$, achieves an expected dynamic regret of $O(T^{2/3}P_T^{1/3} + \sqrt{T}\ln T)$, where $P_T$ is the functional path length of a comparator sequence.

nan


Article 1000

Title@2025-06-27 (5): Hierarchical Modeling and Architecture Optimization: Review and Unified Framework

Title: Hierarchical Modeling and Architecture Optimization: Review and Unified Framework Hierarchische Modellierung und Architekturoptimierung: Überprüfung und Unified Framework 等级建模和架构优化:审查和统一框架 2506.22621v1

Authors (5): Paul Saves, Edward Hallé-Hannan, Jasper Bussemaker, Youssef Diouane, Nathalie Bartoli

Simulation-based problems involving mixed-variable inputs frequently feature domains that are hierarchical, conditional, heterogeneous, or tree-structured. These characteristics pose challenges for data representation, modeling, and optimization. This paper reviews extensive literature on these structured input spaces and proposes a unified framework that generalizes existing approaches. In this framework, input variables may be continuous, integer, or categorical. A variable is described as meta if its value governs the presence of other decreed variables, enabling the modeling of conditional and hierarchical structures. We further introduce the concept of partially-decreed variables, whose activation depends on contextual conditions. To capture these inter-variable hierarchical relationships, we introduce design space graphs, combining principles from feature modeling and graph theory. This allows the definition of general hierarchical domains suitable for describing complex system architectures. The framework supports the use of surrogate models over such domains and integrates hierarchical kernels and distances for efficient modeling and optimization. The proposed methods are implemented in the open-source Surrogate Modeling Toolbox (SMT 2.0), and their capabilities are demonstrated through applications in Bayesian optimization for complex system design, including a case study in green aircraft architecture.

nan


Article 1001

Title@2025-06-27 (5): Deep Hedging to Manage Tail Risk

Title: Deep Hedging to Manage Tail Risk Deep Hedging zur Verwaltung von Tail Risk 管理尾端风险 2506.22611v1

Authors (1): Yuming Ma

Extending Buehler et al.’s 2019 Deep Hedging paradigm, we innovatively employ deep neural networks to parameterize convex-risk minimization (CVaR/ES) for the portfolio tail-risk hedging problem. Through comprehensive numerical experiments on crisis-era bootstrap market simulators – customizable with transaction costs, risk budgets, liquidity constraints, and market impact – our end-to-end framework not only achieves significant one-day 99% CVaR reduction but also yields practical insights into friction-aware strategy adaptation, demonstrating robustness and operational viability in realistic markets.

nan


Article 1002

Title@2025-06-27 (5): Learning Individual Reproductive Behavior from Aggregate Fertility Rates via Neural Posterior Estimation

Title: Learning Individual Reproductive Behavior from Aggregate Fertility Rates via Neural Posterior Estimation Individuelles reproduktives Verhalten von Aggregat Fertilitätsraten über neurale hintere Schätzung lernen 学习个人生殖行为 学习个人生殖行为 2506.22607v1

Authors (4): Daniel Ciganda, Ignacio Campón, Iñaki Permanyer, Jakob H Macke

While age-specific fertility rates (ASFRs) provide the most extensive record of reproductive change, their aggregate nature masks the underlying behavioral mechanisms that ultimately drive fertility trends. To recover these mechanisms, we develop a likelihood-free Bayesian framework that couples an individual-level model of the reproductive process with Sequential Neural Posterior Estimation (SNPE). This allows us to infer eight behavioral and biological parameters from just two aggregate series: ASFRs and the age-profile of planned versus unplanned births. Applied to U.S. National Survey of Family Growth cohorts and to Demographic and Health Survey cohorts from Colombia, the Dominican Republic, and Peru, the method reproduces observed fertility schedules and, critically, predicts out-of-sample micro-level distributions of age at first sex, inter-birth intervals, and family-size ideals, none of which inform the estimation step. Because the fitted model yields complete synthetic life histories, it enables behaviorally explicit population forecasts and supports the construction of demographic digital twins.

nan


Article 1003

Title@2025-06-27 (5): A User-Centric, Privacy-Preserving, and Verifiable Ecosystem for Personal Data Management and Utilization

Title: A User-Centric, Privacy-Preserving, and Verifiable Ecosystem for Personal Data Management and Utilization Ein benutzerzentriertes, datenschutzschonendes und überprüfbares Ökosystem für das Management und die Nutzung personenbezogener Daten 用于个人数据管理和利用的用户中心、隐私保护和可核实生态系统 2506.22606v1

Authors (5): Osama Zafar, Mina Namazi, Yuqiao Xu, Youngjin Yoo, Erman Ayday

In the current paradigm of digital personalized services, the centralized management of personal data raises significant privacy concerns, security vulnerabilities, and diminished individual autonomy over sensitive information. Despite their efficiency, traditional centralized architectures frequently fail to satisfy rigorous privacy requirements and expose users to data breaches and unauthorized access risks. This pressing challenge calls for a fundamental paradigm shift in methodologies for collecting, storing, and utilizing personal data across diverse sectors, including education, healthcare, and finance. This paper introduces a novel decentralized, privacy-preserving architecture that handles heterogeneous personal information, ranging from educational credentials to health records and financial data. Unlike traditional models, our system grants users complete data ownership and control, allowing them to selectively share information without compromising privacy. The architecture’s foundation comprises advanced privacy-enhancing technologies, including secure enclaves and federated learning, enabling secure computation, verification, and data sharing. The system supports diverse functionalities, including local computation, model training, and privacy-preserving data sharing, while ensuring data credibility and robust user privacy.

nan


Article 1004

Title@2025-06-27 (5): Are Fast Methods Stable in Adversarially Robust Transfer Learning?

Title: Are Fast Methods Stable in Adversarially Robust Transfer Learning? Sind schnelle Methoden stabil im adversarially robust Transfer Learning? 快速方法是否稳定在逆向快速传输学习中? 2506.22602v1

Authors (2): Joshua C. Zhao, Saurabh Bagchi

Transfer learning is often used to decrease the computational cost of model training, as fine-tuning a model allows a downstream task to leverage the features learned from the pre-training dataset and quickly adapt them to a new task. This is particularly useful for achieving adversarial robustness, as adversarially training models from scratch is very computationally expensive. However, high robustness in transfer learning still requires adversarial training during the fine-tuning phase, which requires up to an order of magnitude more time than standard fine-tuning. In this work, we revisit the use of the fast gradient sign method (FGSM) in robust transfer learning to improve the computational cost of adversarial fine-tuning. We surprisingly find that FGSM is much more stable in adversarial fine-tuning than when training from scratch. In particular, FGSM fine-tuning does not suffer from any issues with catastrophic overfitting at standard perturbation budgets of $\varepsilon=4$ or $\varepsilon=8$. This stability is further enhanced with parameter-efficient fine-tuning methods, where FGSM remains stable even up to $\varepsilon=32$ for linear probing. We demonstrate how this stability translates into performance across multiple datasets. Compared to fine-tuning with the more commonly used method of projected gradient descent (PGD), on average, FGSM only loses 0.39% and 1.39% test robustness for $\varepsilon=4$ and $\varepsilon=8$ while using $4\times$ less training time. Surprisingly, FGSM may not only be a significantly more efficient alternative to PGD in adversarially robust transfer learning but also a well-performing one.

nan


Article 1005

Title@2025-06-27 (5): AdaReasoner: Adaptive Reasoning Enables More Flexible Thinking in Large Language Models

Title: AdaReasoner: Adaptive Reasoning Enables More Flexible Thinking in Large Language Models AdaReasoner: Adaptive Reasoning ermöglicht flexibleres Denken in großen Sprachmodellen AdaReasoner: 适应性合理性使大语言模式的思维更加灵活 2505.17312v3

Authors (7): Xiangqi Wang, Yue Huang, Yanbo Wang, Xiaonan Luo, Kehan Guo, Yujun Zhou, Xiangliang Zhang

LLMs often need effective configurations, like temperature and reasoning steps, to handle tasks requiring sophisticated reasoning and problem-solving, ranging from joke generation to mathematical reasoning. Existing prompting approaches usually adopt general-purpose, fixed configurations that work ‘well enough’ across tasks but seldom achieve task-specific optimality. To address this gap, we introduce AdaReasoner, an LLM-agnostic plugin designed for any LLM to automate adaptive reasoning configurations for tasks requiring different types of thinking. AdaReasoner is trained using a reinforcement learning (RL) framework, combining a factorized action space with a targeted exploration strategy, along with a pretrained reward model to optimize the policy model for reasoning configurations with only a few-shot guide. AdaReasoner is backed by theoretical guarantees and experiments of fast convergence and a sublinear policy gap. Across six different LLMs and a variety of reasoning tasks, it consistently outperforms standard baselines, preserves out-of-distribution robustness, and yield gains on knowledge-intensive tasks through tailored prompts.

nan


Article 1006

Title@2025-06-27 (5): What Makes the Preferred Thinking Direction for LLMs in Multiple-choice Questions?

Title: What Makes the Preferred Thinking Direction for LLMs in Multiple-choice Questions? Was macht die bevorzugte Denkrichtung für LLMs in Multiple-Choice-Fragen? ” 多种选择问题 “ 中LLMs的首选思维方向是什么? 2502.18435v3

Authors (8): Yizhe Zhang, Richard Bai, Zijin Gu, Ruixiang Zhang, Jiatao Gu, Emmanuel Abbe, Samy Bengio, Navdeep Jaitly

Language models usually use left-to-right (L2R) autoregressive factorization. However, L2R factorization may not always be the best inductive bias. Therefore, we investigate whether alternative factorizations of the text distribution could be beneficial in some tasks. We investigate right-to-left (R2L) training as a compelling alternative, focusing on multiple-choice questions (MCQs) as a test bed for knowledge extraction and reasoning. Through extensive experiments across various model sizes (2B-8B parameters) and training datasets, we find that R2L models can significantly outperform L2R models on several MCQ benchmarks, including logical reasoning, commonsense understanding, and truthfulness assessment tasks. Our analysis reveals that this performance difference may be fundamentally linked to multiple factors including calibration, computability, and directional conditional entropy. We analyze the impact of these factors through controlled simulation studies using arithmetic tasks, where the impacting factors can be better disentangled. Our work demonstrates that exploring alternative factorizations of the text distribution can lead to improvements in LLM capabilities and provides theoretical insights into optimal factorization towards approximating human language distribution, and when each reasoning order might be more advantageous. Our code and checkpoints are released at https://github.com/apple/ml-reversal-blessing.

nan


Article 1007

Title: The Hidden Link Between RLHF and Contrastive Learning Die versteckte Verbindung zwischen RLHF und Kontrastivem Lernen RLHF 和抵触学习之间的隐藏链接 2506.22578v1

Authors (6): Xufei Lv, Haoyuan Sun, Xuefeng Bai, Min Zhang, Houde Liu, Kehai Chen

Alignment of large language models (LLMs) with human values has recently garnered significant attention, with prominent examples including the canonical yet costly Reinforcement Learning from Human Feedback (RLHF) and the simple Direct Preference Optimization (DPO). In this work, we demonstrate that both RLHF and DPO can be interpreted from the perspective of mutual information (MI) maximization, uncovering a profound connection to contrastive learning. Within this framework, both RLHF and DPO can be viewed as methods that perform contrastive learning based on the positive and negative samples derived from the base model, leveraging the Donsker-Varadhan (DV) lower bound on MI (equivalently, the MINE estimator). This paradigm further explains why RLHF may not intrinsically incentivize reasoning capacities in LLMs beyond what is already present in the base model. Building on this perspective, we replace the DV/MINE bound with the Jensen-Shannon MI estimator and propose Mutual Information Optimization (MIO). Comprehensive theoretical analysis and extensive empirical evaluations demonstrate that MIO mitigates the late-stage decline in chosen-likelihood observed in DPO, achieving competitive or superior performance across various challenging reasoning and mathematical benchmarks. We will release the model and code upon acceptance.

nan


Article 1008

Title@2025-06-27 (5): Enhancing Diffusion Posterior Sampling for Inverse Problems by Integrating Crafted Measurements

Title: Enhancing Diffusion Posterior Sampling for Inverse Problems by Integrating Crafted Measurements Verstärkte Diffusions-Parfümung für inverse Probleme durch Integration von handwerklichen Messungen 通过综合人工计量方法,加强对反向问题的传播池别样样 2411.09850v2

Authors (7): Shijie Zhou, Huaisheng Zhu, Rohan Sharma, Jiayi Chen, Ruiyi Zhang, Kaiyi Ji, Changyou Chen

Diffusion models have emerged as a powerful foundation model for visual generations. With an appropriate sampling process, it can effectively serve as a generative prior for solving general inverse problems. Current posterior sampling-based methods take the measurement (i.e., degraded image sample) into the posterior sampling to infer the distribution of the target data (i.e., clean image sample). However, in this manner, we show that high-frequency information can be prematurely introduced during the early stages, which could induce larger posterior estimate errors during restoration sampling. To address this observation, we first reveal that forming the log-posterior gradient with the noisy measurement ( i.e., noisy measurement from a diffusion forward process) instead of the clean one can benefit the early posterior sampling. Consequently, we propose a novel diffusion posterior sampling method DPS-CM, which incorporates a Crafted Measurement (i.e., noisy measurement crafted by a reverse denoising process, rather than constructed from the diffusion forward process) to form the posterior estimate. This integration aims to mitigate the misalignment with the diffusion prior caused by cumulative posterior estimate errors. Experimental results demonstrate that our approach significantly improves the overall capacity to solve general and noisy inverse problems, such as Gaussian deblurring, super-resolution, inpainting, nonlinear deblurring, and tasks with Poisson noise, relative to existing approaches. Code is available at: https://github.com/sjz5202/DPS-CM.

nan


Article 1009

Title@2025-06-27 (5): Dealing with the Evil Twins: Improving Random Augmentation by Addressing Catastrophic Forgetting of Diverse Augmentations

Title: Dealing with the Evil Twins: Improving Random Augmentation by Addressing Catastrophic Forgetting of Diverse Augmentations Umgang mit den bösen Zwillingen: Zufällige Augmentation verbessern, indem Katastrophenvergessen von verschiedenen Augmentationen adressiert wird 对付邪恶双胞胎:通过解决灾难性地忘记多种增加现象,改善随机增加现象 2506.08240v2

Authors (2): Dongkyu Cho, Rumi Chunara

Data augmentation is a promising tool for enhancing out-of-distribution generalization, where the key is to produce diverse, challenging variations of the source domain via costly targeted augmentations that maximize its generalization effect. Conversely, random augmentation is inexpensive but is deemed suboptimal due to its limited effect. In this paper, we revisit random augmentation and explore methods to address its shortcomings. We show that the stochastic nature of random augmentation can produce a set of colliding augmentations that distorts the learned features, similar to catastrophic forgetting. We propose a simple solution that improves the generalization effect of random augmentation by addressing forgetting, which displays strong generalization performance across various single source domain generalization (sDG) benchmarks.

nan


Article 1010

Title@2025-06-27 (5): Generalized Venn and Venn-Abers Calibration with Applications in Conformal Prediction

Title: Generalized Venn and Venn-Abers Calibration with Applications in Conformal Prediction Generalisierte Venn- und Venn-Abers-Kalibrierung mit Anwendungen in konformer Vorhersage 通用文文和文安-用非正式预测对应用进行校准 2502.05676v2

Authors (2): Lars van der Laan, Ahmed Alaa

Ensuring model calibration is critical for reliable prediction, yet popular distribution-free methods such as histogram binning and isotonic regression offer only asymptotic guarantees. We introduce a unified framework for Venn and Venn-Abers calibration that extends Vovk’s approach beyond binary classification to a broad class of prediction problems defined by generic loss functions. Our method transforms any perfectly in-sample calibrated predictor into a set-valued predictor that, in finite samples, outputs at least one marginally calibrated point prediction. These set predictions shrink asymptotically and converge to a conditionally calibrated prediction, capturing epistemic uncertainty. We further propose Venn multicalibration, a new approach for achieving finite-sample calibration across subpopulations. For quantile loss, our framework recovers group-conditional and multicalibrated conformal prediction as special cases and yields novel prediction intervals with quantile-conditional coverage.

nan


Article 1011

Title@2025-06-27 (5): Exploration Behavior of Untrained Policies

Title: Exploration Behavior of Untrained Policies Explorationsverhalten ungeübter Politiken 未经过培训的政策的探索行为 2506.22566v1

Authors (1): Jacob Adamczyk

Exploration remains a fundamental challenge in reinforcement learning (RL), particularly in environments with sparse or adversarial reward structures. In this work, we study how the architecture of deep neural policies implicitly shapes exploration before training. We theoretically and empirically demonstrate strategies for generating ballistic or diffusive trajectories from untrained policies in a toy model. Using the theory of infinite-width networks and a continuous-time limit, we show that untrained policies return correlated actions and result in non-trivial state-visitation distributions. We discuss the distributions of the corresponding trajectories for a standard architecture, revealing insights into inductive biases for tackling exploration. Our results establish a theoretical and experimental framework for using policy initialization as a design tool to understand exploration behavior in early training.

nan


Article 1012

Title@2025-06-27 (5): Adjoint Schrödinger Bridge Sampler

Title: Adjoint Schrödinger Bridge Sampler Adjoint Schrödinger Bridge Sampler Adjoint Schrödinger桥采样器 2506.22565v1

Authors (5): Guan-Horng Liu, Jaemoo Choi, Yongxin Chen, Benjamin Kurt Miller, Ricky T. Q. Chen

Computational methods for learning to sample from the Boltzmann distribution – where the target distribution is known only up to an unnormalized energy function – have advanced significantly recently. Due to the lack of explicit target samples, however, prior diffusion-based methods, known as diffusion samplers, often require importance-weighted estimation or complicated learning processes. Both trade off scalability with extensive evaluations of the energy and model, thereby limiting their practical usage. In this work, we propose Adjoint Schr"odinger Bridge Sampler (ASBS), a new diffusion sampler that employs simple and scalable matching-based objectives yet without the need to estimate target samples during training. ASBS is grounded on a mathematical model – the Schr"odinger Bridge – which enhances sampling efficiency via kinetic-optimal transportation. Through a new lens of stochastic optimal control theory, we demonstrate how SB-based diffusion samplers can be learned at scale via Adjoint Matching and prove convergence to the global solution. Notably, ASBS generalizes the recent Adjoint Sampling (Havens et al., 2025) to arbitrary source distributions by relaxing the so-called memoryless condition that largely restricts the design space. Through extensive experiments, we demonstrate the effectiveness of ASBS on sampling from classical energy functions, amortized conformer generation, and molecular Boltzmann distributions.

nan


Article 1013

Title@2025-06-27 (5): Faster Convergence of Riemannian Stochastic Gradient Descent with Increasing Batch Size

Title: Faster Convergence of Riemannian Stochastic Gradient Descent with Increasing Batch Size Schnellere Konvergenz der Riemannschen Stochastischen Gradienten mit zunehmender Batchgröße Riemannian Stottachatic 渐变源层与增加批量大小的更快趋同 2501.18164v2

Authors (2): Kanata Oowada, Hideaki Iiduka

We have theoretically analyzed the use of Riemannian stochastic gradient descent (RSGD) and found that using an increasing batch size leads to faster RSGD convergence rate than using a constant batch size not only with a constant learning rate but also with a decaying learning rate, such as cosine annealing decay and polynomial decay. The convergence rate of RSGD improves from $O(\sqrt{T^{-1}+\text{const.}})$ with a constant batch size to $O(T^{-\frac{1}{2}})$ with an increasing batch size, where $T$ denotes the number of iterations. Using principal component analysis and low-rank matrix completion tasks, we investigated, both theoretically and numerically, how increasing batch size affects computational time as measured by stochastic first-order oracle (SFO) complexity. Increasing batch size reduces the SFO complexity of RSGD. Furthermore, our numerical results demonstrated that increasing batch size offers the advantages of both small and large constant batch sizes.

nan


Article 1014

Title@2025-06-27 (5): MetaCipher: A General and Extensible Reinforcement Learning Framework for Obfuscation-Based Jailbreak Attacks on Black-Box LLMs

Title: MetaCipher: A General and Extensible Reinforcement Learning Framework for Obfuscation-Based Jailbreak Attacks on Black-Box LLMs MetaCipher: Ein allgemeines und erweitertes Lernrahmenwerk für Verwüstungsbasierte Jailbreak-Angriffe auf Black-Box-LLMs MetaCipher:关于对黑毒LMS进行以腐烂为基础的破碎监狱袭击的一般和可扩展强化学习框架 2506.22557v1

Authors (5): Boyuan Chen, Minghao Shao, Abdul Basit, Siddharth Garg, Muhammad Shafique

The growing capabilities of large language models (LLMs) have exposed them to increasingly sophisticated jailbreak attacks. Among these, obfuscation-based attacks – which encrypt malicious content to evade detection – remain highly effective. By leveraging the reasoning ability of advanced LLMs to interpret encrypted prompts, such attacks circumvent conventional defenses that rely on keyword detection or context filtering. These methods are very difficult to defend against, as existing safety mechanisms are not designed to interpret or decode ciphered content. In this work, we propose \textbf{MetaCipher}, a novel obfuscation-based jailbreak framework, along with a reinforcement learning-based dynamic cipher selection mechanism that adaptively chooses optimal encryption strategies from a cipher pool. This approach enhances jailbreak effectiveness and generalizability across diverse task types, victim LLMs, and safety guardrails. Our framework is modular and extensible by design, supporting arbitrary cipher families and accommodating evolving adversarial strategies. We complement our method with a large-scale empirical analysis of cipher performance across multiple victim LLMs. Within as few as 10 queries, MetaCipher achieves over 92\% attack success rate (ASR) on most recent standard malicious prompt benchmarks against state-of-the-art non-reasoning LLMs, and over 74\% ASR against reasoning-capable LLMs, outperforming all existing obfuscation-based jailbreak methods. These results highlight the long-term robustness and adaptability of our approach, making it more resilient than prior methods in the face of advancing safety measures.

nan


Article 1015

Title@2025-06-27 (5): Spectral Bias in Variational Quantum Machine Learning

Title: Spectral Bias in Variational Quantum Machine Learning Spektrale Bias im variantenreichen Quantum Maschinelles Lernen 变化量量机器学习中的光谱比值 2506.22555v1

Authors (2): Callum Duffy, Marcin Jastrzebski

In this work, we investigate the phenomenon of spectral bias in quantum machine learning, where, in classical settings, models tend to fit low-frequency components of a target function earlier during training than high-frequency ones, demonstrating a frequency-dependent rate of convergence. We study this effect specifically in parameterised quantum circuits (PQCs). Leveraging the established formulation of PQCs as Fourier series, we prove that spectral bias in this setting arises from the ``redundancy’’ of the Fourier coefficients, which denotes the number of terms in the analytical form of the model contributing to the same frequency component. The choice of data encoding scheme dictates the degree of redundancy for a Fourier coefficient. We find that the magnitude of the Fourier coefficients’ gradients during training strongly correlates with the coefficients’ redundancy. We then further demonstrate this empirically with three different encoding schemes. Additionally, we demonstrate that PQCs with greater redundancy exhibit increased robustness to random perturbations in their parameters at the corresponding frequencies. We investigate how design choices affect the ability of PQCs to learn Fourier sums, focusing on parameter initialization scale and entanglement structure, finding large initializations and low-entanglement schemes tend to slow convergence.

nan


Article 1016

Title@2025-06-27 (5): Neural models of multiscale systems: conceptual limitations, stochastic parametrizations, and a climate application

Title: Neural models of multiscale systems: conceptual limitations, stochastic parametrizations, and a climate application Neurale Modelle von multiskaligen Systemen: konzeptionelle Einschränkungen, stochastische Parametrisierungen und eine Klimaanwendung 多尺度系统的神经模型:概念局限性、随机超称和气候应用 2506.22552v1

Authors (1): Fabrizio Falasca

This work explores key conceptual limitations in data-driven modeling of multiscale dynamical systems, focusing on neural emulators and stochastic climate modeling. A skillful climate model should capture both stationary statistics and responses to external perturbations. While current autoregressive neural models often reproduce the former, they typically struggle with the latter. We begin by analyzing a low-dimensional dynamical system to expose, by analogy, fundamental limitations that persist in high-dimensional settings. Specifically, we construct neural stochastic models under two scenarios: one where the full state vector is observed, and another with only partial observations (i.e. a subset of variables). In the first case, the models accurately capture both equilibrium statistics and forced responses in ensemble mean and variance. In the more realistic case of partial observations, two key challenges emerge: (i) identifying the \textit{proper} variables to model, and (ii) parameterizing the influence of unobserved degrees of freedom. These issues are not specific to neural networks but reflect fundamental limitations of data-driven modeling and the need to target the slow dynamics of the system. We argue that physically grounded strategies – such as coarse-graining and stochastic parameterizations – are critical, both conceptually and practically, for the skillful emulation of complex systems like the coupled climate system. Building on these insights, we turn to a more realistic application: a stochastic reduced neural model of the sea surface temperature field and the net radiative flux at the top of the atmosphere, assessing its stationary statistics, response to temperature forcing, and interpretability.

nan


Article 1017

Title@2025-06-27 (5): Beyond ReLU: How Activations Affect Neural Kernels and Random Wide Networks

Title: Beyond ReLU: How Activations Affect Neural Kernels and Random Wide Networks Beyond ReLU: Wie Aktivierungen Neuronale Kernel und Random Wide Networks beeinflussen ReLU之后:激活如何影响神经内核和随机广域网 2506.22429v1

Authors (2): David Holzmüller, Max Schölpple

While the theory of deep learning has made some progress in recent years, much of it is limited to the ReLU activation function. In particular, while the neural tangent kernel (NTK) and neural network Gaussian process kernel (NNGP) have given theoreticians tractable limiting cases of fully connected neural networks, their properties for most activation functions except for powers of the ReLU function are poorly understood. Our main contribution is to provide a more general characterization of the RKHS of these kernels for typical activation functions whose only non-smoothness is at zero, such as SELU, ELU, or LeakyReLU. Our analysis also covers a broad set of special cases such as missing biases, two-layer networks, or polynomial activations. Our results show that a broad class of not infinitely smooth activations generate equivalent RKHSs at different network depths, while polynomial activations generate non-equivalent RKHSs. Finally, we derive results for the smoothness of NNGP sample paths, characterizing the smoothness of infinitely wide neural networks at initialization.

nan


Article 1018

Title@2025-06-27 (5): CLoVE: Personalized Federated Learning through Clustering of Loss Vector Embeddings

Title: CLoVE: Personalized Federated Learning through Clustering of Loss Vector Embeddings CLoVE: Personalisiertes Federated Learning durch Clustering von Verlust-Vektor-Embeddings CLOVE:通过将损失矢量嵌入体集中起来进行个性化联邦学习 2506.22427v1

Authors (5): Randeep Bhatia, Nikos Papadis, Murali Kodialam, TV Lakshman, Sayak Chakrabarty

We propose CLoVE (Clustering of Loss Vector Embeddings), a novel algorithm for Clustered Federated Learning (CFL). In CFL, clients are naturally grouped into clusters based on their data distribution. However, identifying these clusters is challenging, as client assignments are unknown. CLoVE utilizes client embeddings derived from model losses on client data, and leverages the insight that clients in the same cluster share similar loss values, while those in different clusters exhibit distinct loss patterns. Based on these embeddings, CLoVE is able to iteratively identify and separate clients from different clusters and optimize cluster-specific models through federated aggregation. Key advantages of CLoVE over existing CFL algorithms are (1) its simplicity, (2) its applicability to both supervised and unsupervised settings, and (3) the fact that it eliminates the need for near-optimal model initialization, which makes it more robust and better suited for real-world applications. We establish theoretical convergence bounds, showing that CLoVE can recover clusters accurately with high probability in a single round and converges exponentially fast to optimal models in a linear setting. Our comprehensive experiments comparing with a variety of both CFL and generic Personalized Federated Learning (PFL) algorithms on different types of datasets and an extensive array of non-IID settings demonstrate that CLoVE achieves highly accurate cluster recovery in just a few rounds of training, along with state-of-the-art model accuracy, across a variety of both supervised and unsupervised PFL tasks.

nan


Article 1019

Title@2025-06-27 (5): ARMOR: Robust Reinforcement Learning-based Control for UAVs under Physical Attacks

Title: ARMOR: Robust Reinforcement Learning-based Control for UAVs under Physical Attacks ARMOR: Robuste lernbasierte Steuerung zur Verstärkung von UAVs unter physischen Angriffen 对遭受人身攻击的无人驾驶航空器进行强有力的加强学习控制 2506.22423v1

Authors (4): Pritam Dash, Ethan Chan, Nathan P. Lawrence, Karthik Pattabiraman

Unmanned Aerial Vehicles (UAVs) depend on onboard sensors for perception, navigation, and control. However, these sensors are susceptible to physical attacks, such as GPS spoofing, that can corrupt state estimates and lead to unsafe behavior. While reinforcement learning (RL) offers adaptive control capabilities, existing safe RL methods are ineffective against such attacks. We present ARMOR (Adaptive Robust Manipulation-Optimized State Representations), an attack-resilient, model-free RL controller that enables robust UAV operation under adversarial sensor manipulation. Instead of relying on raw sensor observations, ARMOR learns a robust latent representation of the UAV’s physical state via a two-stage training framework. In the first stage, a teacher encoder, trained with privileged attack information, generates attack-aware latent states for RL policy training. In the second stage, a student encoder is trained via supervised learning to approximate the teacher’s latent states using only historical sensor data, enabling real-world deployment without privileged information. Our experiments show that ARMOR outperforms conventional methods, ensuring UAV safety. Additionally, ARMOR improves generalization to unseen attacks and reduces training cost by eliminating the need for iterative adversarial training.

nan


Article 1020

Title@2025-06-27 (5): L2MAC: Large Language Model Automatic Computer for Extensive Code Generation

Title: L2MAC: Large Language Model Automatic Computer for Extensive Code Generation L2MAC: Automatischer Computer mit großem Sprachmodell für umfangreiche Code-Generierung L2MAC:用于广泛代码生成的大型语言模拟自动计算机 2310.02003v6

Authors (3): Samuel Holt, Max Ruiz Luyten, Mihaela van der Schaar

Transformer-based large language models (LLMs) are constrained by the fixed context window of the underlying transformer architecture, hindering their ability to produce long and coherent outputs. Memory-augmented LLMs are a promising solution, but current approaches cannot handle long output generation tasks since they (1) only focus on reading memory and reduce its evolution to the concatenation of new memories or (2) use very specialized memories that cannot adapt to other domains. This paper presents L2MAC, the first practical LLM-based general-purpose stored-program automatic computer (von Neumann architecture) framework, an LLM-based multi-agent system, for long and consistent output generation. Its memory has two components: the instruction registry, which is populated with a prompt program to solve the user-given task, and a file store, which will contain the final and intermediate outputs. Each instruction in turn is executed by a separate LLM agent, whose context is managed by a control unit capable of precise memory reading and writing to ensure effective interaction with the file store. These components enable L2MAC to generate extensive outputs, bypassing the constraints of the finite context window while producing outputs that fulfill a complex user-specified task. We empirically demonstrate that L2MAC achieves state-of-the-art performance in generating large codebases for system design tasks, significantly outperforming other coding methods in implementing the detailed user-specified task; we show that L2MAC works for general-purpose extensive text-based tasks, such as writing an entire book; and we provide valuable insights into L2MAC’s performance improvement over existing methods.

nan


Article 1021

Title@2025-06-27 (5): Maximizing Confidence Alone Improves Reasoning

Title: Maximizing Confidence Alone Improves Reasoning Maximierung des Vertrauens allein verbessert die Vernunft 使信心最大化单独提高合理性 2505.22660v4

Authors (6): Mihir Prabhudesai, Lili Chen, Alex Ippoliti, Katerina Fragkiadaki, Hao Liu, Deepak Pathak

Reinforcement learning (RL) has enabled machine learning models to achieve significant advances in many fields. Most recently, RL has empowered frontier language models to solve challenging math, science, and coding problems. However, central to any RL algorithm is the reward function, and reward engineering is a notoriously difficult problem in any domain. In this paper, we propose RENT: Reinforcement Learning via Entropy Minimization – a fully unsupervised RL method that requires no external reward or ground-truth answers, and instead uses the model’s entropy of its underlying distribution as an intrinsic reward. We find that by reinforcing the chains of thought that yield high model confidence on its generated answers, the model improves its reasoning ability. In our experiments, we showcase these improvements on an extensive suite of commonly-used reasoning benchmarks, including GSM8K, MATH500, AMC, AIME, and GPQA, and models of varying sizes from the Qwen, Mistral, and Llama families. The generality of our unsupervised learning method lends itself to applicability in a wide range of domains where external supervision is unavailable.

nan


Article 1022

Title@2025-06-27 (5): Decoupled SGDA for Games with Intermittent Strategy Communication

Title: Decoupled SGDA for Games with Intermittent Strategy Communication Entkoppelte SGDA für Spiele mit intermittierender Strategiekommunikation SGDA与间歇性战略交流 2501.14652v2

Authors (5): Ali Zindari, Parham Yazdkhasti, Anton Rodomanov, Tatjana Chavdarova, Sebastian U. Stich

We focus on reducing communication overhead in multiplayer games, where frequently exchanging strategies between players is not feasible and players have noisy or outdated strategies of the other players. We introduce Decoupled SGDA, a novel adaptation of Stochastic Gradient Descent Ascent (SGDA). In this approach, players independently update their strategies based on outdated opponent strategies, with periodic synchronization to align strategies. For Strongly-Convex-Strongly-Concave (SCSC) games, we demonstrate that Decoupled SGDA achieves near-optimal communication complexity comparable to the best-known GDA rates. For weakly coupled games where the interaction between players is lower relative to the non-interactive part of the game, Decoupled SGDA significantly reduces communication costs compared to standard SGDA. Our findings extend to multi-player games. To provide insights into the effect of communication frequency and convergence, we extensively study the convergence of Decoupled SGDA for quadratic minimax problems. Lastly, in settings where the noise over the players is imbalanced, Decoupled SGDA significantly outperforms federated minimax methods.

nan


Article 1023

Title@2025-06-27 (5): Exploration from a Primal-Dual Lens: Value-Incentivized Actor-Critic Methods for Sample-Efficient Online RL

Title: Exploration from a Primal-Dual Lens: Value-Incentivized Actor-Critic Methods for Sample-Efficient Online RL Exploration aus einer primär-Dual-Lense: wertanreizte aktor-kritische Methoden für probeneffiziente Online-RL 探索原始-二极镜头:用于抽样有效在线转线的有价值激励的活性-Crict 方法 2506.22401v1

Authors (4): Tong Yang, Bo Dai, Lin Xiao, Yuejie Chi

Online reinforcement learning (RL) with complex function approximations such as transformers and deep neural networks plays a significant role in the modern practice of artificial intelligence. Despite its popularity and importance, balancing the fundamental trade-off between exploration and exploitation remains a long-standing challenge; in particular, we are still in lack of efficient and practical schemes that are backed by theoretical performance guarantees. Motivated by recent developments in exploration via optimistic regularization, this paper provides an interpretation of the principle of optimism through the lens of primal-dual optimization. From this fresh perspective, we set forth a new value-incentivized actor-critic (VAC) method, which optimizes a single easy-to-optimize objective integrating exploration and exploitation – it promotes state-action and policy estimates that are both consistent with collected data transitions and result in higher value functions. Theoretically, the proposed VAC method has near-optimal regret guarantees under linear Markov decision processes (MDPs) in both finite-horizon and infinite-horizon settings, which can be extended to the general function approximation setting under appropriate assumptions.

nan


Article 1024

Title@2025-06-27 (5): Strategic A/B testing via Maximum Probability-driven Two-armed Bandit

Title: Strategic A/B testing via Maximum Probability-driven Two-armed Bandit Strategische A/B-Prüfung über maximale Wahrscheinlichkeit zweiarmiger Bandit 战略A/B测试,通过最大概率驱动的双武装两武装土匪进行战略A/B测试 2506.22536v1

Authors (5): Yu Zhang, Shanshan Zhao, Bokui Wan, Jinjuan Wang, Xiaodong Yan

Detecting a minor average treatment effect is a major challenge in large-scale applications, where even minimal improvements can have a significant economic impact. Traditional methods, reliant on normal distribution-based or expanded statistics, often fail to identify such minor effects because of their inability to handle small discrepancies with sufficient sensitivity. This work leverages a counterfactual outcome framework and proposes a maximum probability-driven two-armed bandit (TAB) process by weighting the mean volatility statistic, which controls Type I error. The implementation of permutation methods further enhances the robustness and efficacy. The established strategic central limit theorem (SCLT) demonstrates that our approach yields a more concentrated distribution under the null hypothesis and a less concentrated one under the alternative hypothesis, greatly improving statistical power. The experimental results indicate a significant improvement in the A/B testing, highlighting the potential to reduce experimental costs while maintaining high statistical power.

nan


Article 1025

Title@2025-06-27 (5): Multi-View Contrastive Learning for Robust Domain Adaptation in Medical Time Series Analysis

Title: Multi-View Contrastive Learning for Robust Domain Adaptation in Medical Time Series Analysis Multi-View Kontrastives Lernen für robuste Domain-Anpassung in der medizinischen Zeitreihenanalyse 医学时间序列分析中强力域适应多查看差异性学习 2506.22393v1

Authors (2): YongKyung Oh, Alex Bui

Adapting machine learning models to medical time series across different domains remains a challenge due to complex temporal dependencies and dynamic distribution shifts. Current approaches often focus on isolated feature representations, limiting their ability to fully capture the intricate temporal dynamics necessary for robust domain adaptation. In this work, we propose a novel framework leveraging multi-view contrastive learning to integrate temporal patterns, derivative-based dynamics, and frequency-domain features. Our method employs independent encoders and a hierarchical fusion mechanism to learn feature-invariant representations that are transferable across domains while preserving temporal coherence. Extensive experiments on diverse medical datasets, including electroencephalogram (EEG), electrocardiogram (ECG), and electromyography (EMG) demonstrate that our approach significantly outperforms state-of-the-art methods in transfer learning tasks. By advancing the robustness and generalizability of machine learning models, our framework offers a practical pathway for deploying reliable AI systems in diverse healthcare settings.

nan


Article 1026

Title@2025-06-27 (5): How to Train Long-Context Language Models (Effectively)

Title: How to Train Long-Context Language Models (Effectively) Wie man Langkontext-Sprachenmodelle ausbildet (effektiv) 如何培训长文本语言模型(有效) 2410.02660v3

Authors (4): Tianyu Gao, Alexander Wettig, Howard Yen, Danqi Chen

We study continued training and supervised fine-tuning (SFT) of a language model (LM) to make effective use of long-context information. We first establish a reliable evaluation protocol to guide model development – instead of perplexity or simple needle-in-a-haystack (NIAH) tests, we use a broad set of long-context downstream tasks, and we evaluate models after SFT as this better reveals long-context abilities. Supported by our robust evaluations, we run thorough experiments to decide the data mix for continued pre-training, the instruction tuning dataset, and many other design choices such as position extrapolation. We find that (1) code repositories and books are excellent sources of long data, but it is crucial to combine them with high-quality short-context data; (2) training with a sequence length beyond the evaluation length boosts long-context performance; (3) for SFT, using only short instruction datasets yields strong performance on long-context tasks. Our final model, ProLong-8B, which is initialized from Llama-3 and trained on 40B tokens, demonstrates state-of-the-art long-context performance among similarly sized models at a length of 128K. ProLong outperforms Llama-3.1-8B-Instruct on the majority of long-context tasks despite using only 5% as many tokens during long-context training. Additionally, ProLong can effectively process up to 512K tokens, one of the longest context windows of publicly available LMs.

nan


Article 1027

Title@2025-06-27 (5): Towards Distributed Neural Architectures

Title: Towards Distributed Neural Architectures Auf dem Weg zu verteilten Neuralarchitekturen 走向分布式神经结构 2506.22389v1

Authors (3): Aditya Cowsik, Tianyu He, Andrey Gromov

We introduce and train distributed neural architectures (DNA) in vision and language domains. DNAs are initialized with a proto-architecture that consists of (transformer, MLP, attention, etc.) modules and routers. Any token (or patch) can traverse any series of modules in any order. DNAs are a natural generalization of the sparse methods such as Mixture-of-Experts, Mixture-of-Depths, parameter sharing, etc. Computation and communication patterns of DNA modules are learnt end-to-end during training and depend on the content and context of each token (or patch). These patterns can be shaped by further requirements added to the optimization objective such as compute/memory efficiency or load balancing. We empirically show that (i) trained DNAs are competitive with the dense baselines in both domains and (ii) compute efficiency/parameter sharing can be learnt from data. Next, we analyze the emergent connectivity and computation patterns in the trained DNAs. We find that the paths that tokens take through the models are themselves distributed according to a power-law. We show that some paths (or, equivalently, groups of modules) show emergent specialization. Finally, we demonstrate that models learn to allocate compute and active parameters in an interpretable way.

nan


Article 1028

Title@2025-06-27 (5): Probabilistic Optimality for Inference-time Scaling

Title: Probabilistic Optimality for Inference-time Scaling Probabilistische Optimalität für Inferenz-Zeitskalierung 推推时间缩放的概率概率优化度 2506.22376v1

Authors (5): Youkang Wang, Jian Wang, Rubing Chen, Xiao-Yong Wei, Qing Li

Inference-time scaling has emerged as a powerful technique for enhancing the reasoning performance of Large Language Models (LLMs). However, existing approaches often rely on heuristic strategies for parallel sampling, lacking a principled foundation. To address this gap, we propose a probabilistic framework that formalizes the optimality of inference-time scaling under the assumption that parallel samples are independently and identically distributed (i.i.d.), and where the Best-of-N selection strategy follows a probability distribution that can be estimated. Within this framework, we derive a theoretical lower bound on the required number of samples to achieve a target performance level, providing the first principled guidance for compute-efficient scaling. Leveraging this insight, we develop \textsc{OptScale}, a practical algorithm that dynamically determines the optimal number of sampled responses. \textsc{OptScale} employs a language model-based predictor to estimate probabilistic prior parameters, enabling the decision of the minimal number of samples needed that satisfy predefined performance thresholds and confidence levels. Extensive experiments on mathematical reasoning benchmarks (including MATH-500, GSM8K, AIME, and AMC) demonstrate that \textsc{OptScale} significantly reduces sampling overhead while remaining better or on par with state-of-the-art reasoning performance. Our work offers both a theoretical foundation and a practical solution for principled inference-time scaling, addressing a critical gap in the efficient deployment of LLMs for complex reasoning.

nan


Article 1029

Title@2025-06-27 (5): Sheaf-Based Decentralized Multimodal Learning for Next-Generation Wireless Communication Systems

Title: Sheaf-Based Decentralized Multimodal Learning for Next-Generation Wireless Communication Systems Schafsbasiertes dezentralisiertes multimodales Lernen für drahtlose Kommunikationssysteme der nächsten Generation 用于下一代无线通信系统的基于 Sheaf的分散式多模式学习 2506.22374v1

Authors (4): Abdulmomen Ghalkha, Zhuojun Tian, Chaouki Ben Issaid, Mehdi Bennis

In large-scale communication systems, increasingly complex scenarios require more intelligent collaboration among edge devices collecting various multimodal sensory data to achieve a more comprehensive understanding of the environment and improve decision-making accuracy. However, conventional federated learning (FL) algorithms typically consider unimodal datasets, require identical model architectures, and fail to leverage the rich information embedded in multimodal data, limiting their applicability to real-world scenarios with diverse modalities and varying client capabilities. To address this issue, we propose Sheaf-DMFL, a novel decentralized multimodal learning framework leveraging sheaf theory to enhance collaboration among devices with diverse modalities. Specifically, each client has a set of local feature encoders for its different modalities, whose outputs are concatenated before passing through a task-specific layer. While encoders for the same modality are trained collaboratively across clients, we capture the intrinsic correlations among clients’ task-specific layers using a sheaf-based structure. To further enhance learning capability, we propose an enhanced algorithm named Sheaf-DMFL-Att, which tailors the attention mechanism within each client to capture correlations among different modalities. A rigorous convergence analysis of Sheaf-DMFL-Att is provided, establishing its theoretical guarantees. Extensive simulations are conducted on real-world link blockage prediction and mmWave beamforming scenarios, demonstrate the superiority of the proposed algorithms in such heterogeneous wireless communication systems.

nan


Article 1030

Title@2025-06-27 (5): Robust Detection of Watermarks for Large Language Models Under Human Edits

Title: Robust Detection of Watermarks for Large Language Models Under Human Edits Robuste Erkennung von Wasserzeichen für große Sprachmodelle unter menschlichen Bearbeitungen 人类版下大型语言模型水印的强力探测 2411.13868v2

Authors (5): Xiang Li, Feng Ruan, Huiyuan Wang, Qi Long, Weijie J. Su

Watermarking has offered an effective approach to distinguishing text generated by large language models (LLMs) from human-written text. However, the pervasive presence of human edits on LLM-generated text dilutes watermark signals, thereby significantly degrading detection performance of existing methods. In this paper, by modeling human edits through mixture model detection, we introduce a new method in the form of a truncated goodness-of-fit test for detecting watermarked text under human edits, which we refer to as Tr-GoF. We prove that the Tr-GoF test achieves optimality in robust detection of the Gumbel-max watermark in a certain asymptotic regime of substantial text modifications and vanishing watermark signals. Importantly, Tr-GoF achieves this optimality \textit{adaptively} as it does not require precise knowledge of human edit levels or probabilistic specifications of the LLMs, in contrast to the optimal but impractical (Neyman–Pearson) likelihood ratio test. Moreover, we establish that the Tr-GoF test attains the highest detection efficiency rate in a certain regime of moderate text modifications. In stark contrast, we show that sum-based detection rules, as employed by existing methods, fail to achieve optimal robustness in both regimes because the additive nature of their statistics is less resilient to edit-induced noise. Finally, we demonstrate the competitive and sometimes superior empirical performance of the Tr-GoF test on both synthetic data and open-source LLMs in the OPT and LLaMA families.

nan


Article 1031

Title@2025-06-27 (5): Reinforcement Learning with Physics-Informed Symbolic Program Priors for Zero-Shot Wireless Indoor Navigation

Title: Reinforcement Learning with Physics-Informed Symbolic Program Priors for Zero-Shot Wireless Indoor Navigation Verstärktes Lernen mit physikinformierten Symbolischen Programmvorlagen für die kabellose Navigation im Innenbereich 利用物理-内射无线室内航行前置无人制热无线导航的物理-内建符号方案加强学习 2506.22365v1

Authors (4): Tao Li, Haozhe Lei, Mingsheng Yin, Yaqi Hu

When using reinforcement learning (RL) to tackle physical control tasks, inductive biases that encode physics priors can help improve sample efficiency during training and enhance generalization in testing. However, the current practice of incorporating these helpful physics-informed inductive biases inevitably runs into significant manual labor and domain expertise, making them prohibitive for general users. This work explores a symbolic approach to distill physics-informed inductive biases into RL agents, where the physics priors are expressed in a domain-specific language (DSL) that is human-readable and naturally explainable. Yet, the DSL priors do not translate directly into an implementable policy due to partial and noisy observations and additional physical constraints in navigation tasks. To address this gap, we develop a physics-informed program-guided RL (PiPRL) framework with applications to indoor navigation. PiPRL adopts a hierarchical and modularized neuro-symbolic integration, where a meta symbolic program receives semantically meaningful features from a neural perception module, which form the bases for symbolic programming that encodes physics priors and guides the RL process of a low-level neural controller. Extensive experiments demonstrate that PiPRL consistently outperforms purely symbolic or neural policies and reduces training time by over 26% with the help of the program-based inductive biases.

nan


Article 1032

Title@2025-06-27 (5): DiffSoundStream: Efficient Speech Tokenization via Diffusion Decoding

Title: DiffSoundStream: Efficient Speech Tokenization via Diffusion Decoding DiffSoundStream: Effiziente Sprach-Tokenisierung durch Diffusions-Dekodierung DiffSoundStream:通过传播解码的有效演讲 2506.22362v1

Authors (7): Yang Yang, Yunpeng Li, George Sung, Shao-Fu Shih, Craig Dooley, Alessio Centazzo, Ramanan Rajeswaran

Token-based language modeling is a prominent approach for speech generation, where tokens are obtained by quantizing features from self-supervised learning (SSL) models and extracting codes from neural speech codecs, generally referred to as semantic tokens and acoustic tokens. These tokens are often modeled autoregressively, with the inference speed being constrained by the token rate. In this work, we propose DiffSoundStream, a solution that improves the efficiency of speech tokenization in non-streaming scenarios through two techniques: (1) conditioning the neural codec on semantic tokens to minimize redundancy between semantic and acoustic tokens, and (2) leveraging latent diffusion models to synthesize high-quality waveforms from semantic and coarse-level acoustic tokens. Experiments show that at 50 tokens per second, DiffSoundStream achieves speech quality on par with a standard SoundStream model operating at twice the token rate. Additionally, we achieve step-size distillation using just four diffusion sampling steps with only a minor quality loss.

nan


Article 1033

Title@2025-06-27 (5): From Ground to Air: Noise Robustness in Vision Transformers and CNNs for Event-Based Vehicle Classification with Potential UAV Applications

Title: From Ground to Air: Noise Robustness in Vision Transformers and CNNs for Event-Based Vehicle Classification with Potential UAV Applications Vom Boden zur Luft: Schallfestigkeit in Vision Transformern und CNNs für die ereignisbasierte Fahrzeugklassifizierung mit potentiellen UAV-Anwendungen 从地面到空气:视觉变形器和有线电视新闻网中的噪音强力,用于可能应用无人驾驶飞行器的事件车辆分类 2506.22360v1

Authors (3): Nouf Almesafri, Hector Figueiredo, Miguel Arana-Catania

This study investigates the performance of the two most relevant computer vision deep learning architectures, Convolutional Neural Network and Vision Transformer, for event-based cameras. These cameras capture scene changes, unlike traditional frame-based cameras with capture static images, and are particularly suited for dynamic environments such as UAVs and autonomous vehicles. The deep learning models studied in this work are ResNet34 and ViT B16, fine-tuned on the GEN1 event-based dataset. The research evaluates and compares these models under both standard conditions and in the presence of simulated noise. Initial evaluations on the clean GEN1 dataset reveal that ResNet34 and ViT B16 achieve accuracies of 88% and 86%, respectively, with ResNet34 showing a slight advantage in classification accuracy. However, the ViT B16 model demonstrates notable robustness, particularly given its pre-training on a smaller dataset. Although this study focuses on ground-based vehicle classification, the methodologies and findings hold significant promise for adaptation to UAV contexts, including aerial object classification and event-based vision systems for aviation-related tasks.

nan


Article 1034

Title@2025-06-27 (5): Learning Non-Local Molecular Interactions via Equivariant Local Representations and Charge Equilibration

Title: Learning Non-Local Molecular Interactions via Equivariant Local Representations and Charge Equilibration Lernen nicht-lokaler molekularer Interaktionen über gleichberechtigte lokale Repräsentationen und Ladungsgleichstellung 通过平等地方代表性和收费平衡,学习非本地分子相互作用 2501.19179v2

Authors (3): Paul Fuchs, Michał Sanocki, Julija Zavadlav

Graph Neural Network (GNN) potentials relying on chemical locality offer near-quantum mechanical accuracy at significantly reduced computational costs. Message-passing GNNs model interactions beyond their immediate neighborhood by propagating local information between neighboring particles while remaining effectively local. However, locality precludes modeling long-range effects critical to many real-world systems, such as charge transfer, electrostatic interactions, and dispersion effects. In this work, we propose the Charge Equilibration Layer for Long-range Interactions (CELLI) to address the challenge of efficiently modeling non-local interactions. This novel architecture generalizes the classical charge equilibration (Qeq) method to a model-agnostic building block for modern equivariant GNN potentials. Therefore, CELLI extends the capability of GNNs to model long-range interactions while providing high interpretability through explicitly modeled charges. On benchmark systems, CELLI achieves state-of-the-art results for strictly local models. CELLI generalizes to diverse datasets and large structures while providing high computational efficiency and robust predictions.

nan


Article 1035

Title@2025-06-27 (5): Learning Networks from Wide-Sense Stationary Stochastic Processes

Title: Learning Networks from Wide-Sense Stationary Stochastic Processes Netzwerke aus breit angelegten stationären stochastischen Prozessen lernen 广意识固定式存储处理的学习网络 2412.03768v2

Authors (5): Anirudh Rayas, Jiajun Cheng, Rajasekhar Anguluri, Deepjyoti Deka, Gautam Dasarathy

Complex networked systems driven by latent inputs are common in fields like neuroscience, finance, and engineering. A key inference problem here is to learn edge connectivity from node outputs (potentials). We focus on systems governed by steady-state linear conservation laws: $X_t = {L^{\ast}}Y_{t}$, where $X_t, Y_t \in \mathbb{R}^p$ denote inputs and potentials, respectively, and the sparsity pattern of the $p \times p$ Laplacian $L^{\ast}$ encodes the edge structure. Assuming $X_t$ to be a wide-sense stationary stochastic process with a known spectral density matrix, we learn the support of $L^{\ast}$ from temporally correlated samples of $Y_t$ via an $\ell_1$-regularized Whittle’s maximum likelihood estimator (MLE). The regularization is particularly useful for learning large-scale networks in the high-dimensional setting where the network size $p$ significantly exceeds the number of samples $n$. We show that the MLE problem is strictly convex, admitting a unique solution. Under a novel mutual incoherence condition and certain sufficient conditions on $(n, p, d)$, we show that the ML estimate recovers the sparsity pattern of $L^\ast$ with high probability, where $d$ is the maximum degree of the graph underlying $L^{\ast}$. We provide recovery guarantees for $L^\ast$ in element-wise maximum, Frobenius, and operator norms. Finally, we complement our theoretical results with several simulation studies on synthetic and benchmark datasets, including engineered systems (power and water networks), and real-world datasets from neural systems (such as the human brain).

nan


Article 1036

Title@2025-06-27 (5): Scalable Hypergraph Structure Learning with Diverse Smoothness Priors

Title: Scalable Hypergraph Structure Learning with Diverse Smoothness Priors Skalierbare Hypergraph Struktur Lernen mit unterschiedlicher Glätte Priors 具有多样性平滑性学习的可缩放超时图结构 2504.03583v2

Authors (4): Benjamin T. Brown, Haoxiang Zhang, Daniel L. Lau, Gonzalo R. Arce

In graph signal processing, learning the weighted connections between nodes from a set of sample signals is a fundamental task when the underlying relationships are not known a priori. This task is typically addressed by finding a graph Laplacian on which the observed signals are smooth. With the extension of graphs to hypergraphs - where edges can connect more than two nodes - graph learning methods have similarly been generalized to hypergraphs. However, the absence of a unified framework for calculating total variation has led to divergent definitions of smoothness and, consequently, differing approaches to hyperedge recovery. We confront this challenge through generalization of several previously proposed hypergraph total variations, subsequently allowing ease of substitution into a vector based optimization. To this end, we propose a novel hypergraph learning method that recovers a hypergraph topology from time-series signals based on a smoothness prior. Our approach, designated as Hypergraph Structure Learning with Smoothness (HSLS), addresses key limitations in prior works, such as hyperedge selection and convergence issues, by formulating the problem as a convex optimization solved via a forward-backward-forward algorithm, ensuring guaranteed convergence. Additionally, we introduce a process that simultaneously limits the span of the hyperedge search and maintains a valid hyperedge selection set. In doing so, our method becomes scalable in increasingly complex network structures. The experimental results demonstrate improved performance, in terms of accuracy, over other state-of-the-art hypergraph inference methods; furthermore, we empirically show our method to be robust to total variation terms, biased towards global smoothness, and scalable to larger hypergraphs.

nan


Article 1037

Title@2025-06-27 (5): Long-Sequence Memory with Temporal Kernels and Dense Hopfield Functionals

Title: Long-Sequence Memory with Temporal Kernels and Dense Hopfield Functionals Langzeitgedächtnis mit Temporalen Kerneln und Dense Hopfield Functionals 具有时心和高豪华地功能的长期序列内存 2507.01052v1

Authors (1): Ahmed Farooq

In this study we introduce a novel energy functional for long-sequence memory, building upon the framework of dense Hopfield networks which achieves exponential storage capacity through higher-order interactions. Building upon earlier work on long-sequence Hopfield memory models, we propose a temporal kernal $K(m, k)$ to incorporate temporal dependencies, enabling efficient sequential retrieval of patterns over extended sequences. We demonstrate the successful application of this technique for the storage and sequential retrieval of movies frames which are well suited for this because of the high dimensional vectors that make up each frame creating enough variation between even sequential frames in the high dimensional space. The technique has applications in modern transformer architectures, including efficient long-sequence modeling, memory augmentation, improved attention with temporal bias, and enhanced handling of long-term dependencies in time-series data. Our model offers a promising approach to address the limitations of transformers in long-context tasks, with potential implications for natural language processing, forecasting, and beyond.

nan


Article 1038

Title@2025-06-27 (5): Optimal Estimation of Watermark Proportions in Hybrid AI-Human Texts

Title: Optimal Estimation of Watermark Proportions in Hybrid AI-Human Texts Optimale Schätzung von Wasserzeichenanteilen in Hybrid-KI-Humantexten 对混合的AI-人类文案文中水标记比例的最佳估计 2506.22343v1

Authors (6): Xiang Li, Garrett Wen, Weiqing He, Jiayuan Wu, Qi Long, Weijie J. Su

Text watermarks in large language models (LLMs) are an increasingly important tool for detecting synthetic text and distinguishing human-written content from LLM-generated text. While most existing studies focus on determining whether entire texts are watermarked, many real-world scenarios involve mixed-source texts, which blend human-written and watermarked content. In this paper, we address the problem of optimally estimating the watermark proportion in mixed-source texts. We cast this problem as estimating the proportion parameter in a mixture model based on \emph{pivotal statistics}. First, we show that this parameter is not even identifiable in certain watermarking schemes, let alone consistently estimable. In stark contrast, for watermarking methods that employ continuous pivotal statistics for detection, we demonstrate that the proportion parameter is identifiable under mild conditions. We propose efficient estimators for this class of methods, which include several popular unbiased watermarks as examples, and derive minimax lower bounds for any measurable estimator based on pivotal statistics, showing that our estimators achieve these lower bounds. Through evaluations on both synthetic data and mixed-source text generated by open-source models, we demonstrate that our proposed estimators consistently achieve high estimation accuracy.

nan


Article 1039

Title@2025-06-27 (5): A Framework for Multi-source Privacy Preserving Epidemic Analysis

Title: A Framework for Multi-source Privacy Preserving Epidemic Analysis Ein Rahmen für den Datenschutz mit mehreren Quellen, der epidemische Analysen sicherstellt 多来源隐私保护流行病分析框架 2506.22342v1

Authors (8): Zihan Guan, Zhiyuan Zhao, Fengwei Tian, Dung Nguyen, Payel Bhattacharjee, Ravi Tandon, B. Aditya Prakash, Anil Vullikanti

It is now well understood that diverse datasets provide a lot of value in key epidemiology and public health analyses, such as forecasting and nowcasting, development of epidemic models, evaluation and design of interventions and resource allocation. Some of these datasets are often sensitive, and need adequate privacy protections. There are many models of privacy, but Differential Privacy (DP) has become a de facto standard because of its strong guarantees, without making models about adversaries. In this paper, we develop a framework the integrates deep learning and epidemic models to simultaneously perform epidemic forecasting and learning a mechanistic model of epidemic spread, while incorporating multiple datasets for these analyses, including some with DP guarantees. We demonstrate our framework using a realistic but synthetic financial dataset with DP; such a dataset has not been used in such epidemic analyses. We show that this dataset provides significant value in forecasting and learning an epidemic model, even when used with DP guarantees.

nan


Article 1040

Title@2025-06-27 (5): QuKAN: A Quantum Circuit Born Machine approach to Quantum Kolmogorov Arnold Networks

Title: QuKAN: A Quantum Circuit Born Machine approach to Quantum Kolmogorov Arnold Networks QuKAN: Ein Quantum Circuit Born Machine Ansatz zu Quantum Kolmogorov Arnold Networks QuKAN: 量子巡回巡回赛对Quantum Kolmogorov Arnold 网络的 量子诞生机器方法 2506.22340v1

Authors (7): Yannick Werner, Akash Malemath, Mengxi Liu, Vitor Fortes Rey, Nikolaos Palaiodimopoulos, Paul Lukowicz, Maximilian Kiefer-Emmanouilidis

Kolmogorov Arnold Networks (KANs), built upon the Kolmogorov Arnold representation theorem (KAR), have demonstrated promising capabilities in expressing complex functions with fewer neurons. This is achieved by implementing learnable parameters on the edges instead of on the nodes, unlike traditional networks such as Multi-Layer Perceptrons (MLPs). However, KANs potential in quantum machine learning has not yet been well explored. In this work, we present an implementation of these KAN architectures in both hybrid and fully quantum forms using a Quantum Circuit Born Machine (QCBM). We adapt the KAN transfer using pre-trained residual functions, thereby exploiting the representational power of parametrized quantum circuits. In the hybrid model we combine classical KAN components with quantum subroutines, while the fully quantum version the entire architecture of the residual function is translated to a quantum model. We demonstrate the feasibility, interpretability and performance of the proposed Quantum KAN (QuKAN) architecture.

nan


Article 1041

Title@2025-06-27 (5): Multi-Turn Code Generation Through Single-Step Rewards

Title: Multi-Turn Code Generation Through Single-Step Rewards Multi-Turn-Code-Generierung durch Single-Step-Rewards 通过单级奖励生成多发代码 2502.20380v2

Authors (6): Arnav Kumar Jain, Gonzalo Gonzalez-Pumariega, Wayne Chen, Alexander M Rush, Wenting Zhao, Sanjiban Choudhury

We address the problem of code generation from multi-turn execution feedback. Existing methods either generate code without feedback or use complex, hierarchical reinforcement learning to optimize multi-turn rewards. We propose a simple yet scalable approach, $\mu$Code, that solves multi-turn code generation using only single-step rewards. Our key insight is that code generation is a one-step recoverable MDP, where the correct code can be recovered from any intermediate code state in a single turn. $\mu$Code iteratively trains both a generator to provide code solutions conditioned on multi-turn execution feedback and a verifier to score the newly generated code. Experimental evaluations show that our approach achieves significant improvements over the state-of-the-art baselines. We provide analysis of the design choices of the reward models and policy, and show the efficacy of $\mu$Code at utilizing the execution feedback. Our code is available at https://github.com/portal-cornell/muCode.

nan


Article 1042

Title@2025-06-27 (5): Robust quantum reservoir computers for forecasting chaotic dynamics: generalized synchronization and stability

Title: Robust quantum reservoir computers for forecasting chaotic dynamics: generalized synchronization and stability Robuste Quantenreservoir-Computer für die Vorhersage chaotischer Dynamiken: generalisierte Synchronisation und Stabilität 用于预测混乱动态的强大量子储量计算机:普遍同步和稳定 2506.22335v1

Authors (3): Osama Ahmed, Felix Tennie, Luca Magri

We show that recurrent quantum reservoir computers (QRCs) and their recurrence-free architectures (RF-QRCs) are robust tools for learning and forecasting chaotic dynamics from time-series data. First, we formulate and interpret quantum reservoir computers as coupled dynamical systems, where the reservoir acts as a response system driven by training data; in other words, quantum reservoir computers are generalized-synchronization (GS) systems. Second, we show that quantum reservoir computers can learn chaotic dynamics and their invariant properties, such as Lyapunov spectra, attractor dimensions, and geometric properties such as the covariant Lyapunov vectors. This analysis is enabled by deriving the Jacobian of the quantum reservoir update. Third, by leveraging tools from generalized synchronization, we provide a method for designing robust quantum reservoir computers. We propose the criterion $GS=ESP$: GS implies the echo state property (ESP), and vice versa. We analytically show that RF-QRCs, by design, fulfill $GS=ESP$. Finally, we analyze the effect of simulated noise. We find that dissipation from noise enhances the robustness of quantum reservoir computers. Numerical verifications on systems of different dimensions support our conclusions. This work opens opportunities for designing robust quantum machines for chaotic time series forecasting on near-term quantum hardware.

nan


Article 1043

Title: Less Greedy Equivalence Search Weniger Gier-Äquivalenz-Suche 较少贪婪的等等值搜索 2506.22331v1

Authors (2): Adiba Ejaz, Elias Bareinboim

Greedy Equivalence Search (GES) is a classic score-based algorithm for causal discovery from observational data. In the sample limit, it recovers the Markov equivalence class of graphs that describe the data. Still, it faces two challenges in practice: computational cost and finite-sample accuracy. In this paper, we develop Less Greedy Equivalence Search (LGES), a variant of GES that retains its theoretical guarantees while partially addressing these limitations. LGES modifies the greedy step: rather than always applying the highest-scoring insertion, it avoids edge insertions between variables for which the score implies some conditional independence. This more targeted search yields up to a (10)-fold speed-up and a substantial reduction in structural error relative to GES. Moreover, LGES can guide the search using prior assumptions, while correcting these assumptions when contradicted by the data. Finally, LGES can exploit interventional data to refine the learned observational equivalence class. We prove that LGES recovers the true equivalence class in the sample limit from observational and interventional data, even with misspecified prior assumptions. Experiments demonstrate that LGES outperforms GES and other baselines in speed, accuracy, and robustness to misspecified assumptions. Our code is available at https://github.com/CausalAILab/lges.

nan


Article 1044

Title@2025-06-27 (5): Conceptual Topic Aggregation

Title: Conceptual Topic Aggregation Begriffliche Aggregation 专题汇总概念 2506.22309v1

Authors (4): Klara M. Gutekunst, Dominik Dürrschnabel, Johannes Hirth, Gerd Stumme

The vast growth of data has rendered traditional manual inspection infeasible, necessitating the adoption of computational methods for efficient data exploration. Topic modeling has emerged as a powerful tool for analyzing large-scale textual datasets, enabling the extraction of latent semantic structures. However, existing methods for topic modeling often struggle to provide interpretable representations that facilitate deeper insights into data structure and content. In this paper, we propose FAT-CAT, an approach based on Formal Concept Analysis (FCA) to enhance meaningful topic aggregation and visualization of discovered topics. Our approach can handle diverse topics and file types – grouped by directories – to construct a concept lattice that offers a structured, hierarchical representation of their topic distribution. In a case study on the ETYNTKE dataset, we evaluate the effectiveness of our approach against other representation methods to demonstrate that FCA-based aggregation provides more meaningful and interpretable insights into dataset composition than existing topic modeling techniques.

nan


Article 1045

Title@2025-06-27 (5): Unfolding Generative Flows with Koopman Operators: Fast and Interpretable Sampling

Title: Unfolding Generative Flows with Koopman Operators: Fast and Interpretable Sampling Entfalten Generative Strömungen mit Koopman-Betreibern: Schnelle und interpretierbare Probenahme 与 Koopman 运算符的未负生成流程: 快速和可解释抽样 2506.22304v1

Authors (3): Erkan Turan, Aristotelis Siozopoulos, Maks Ovsjanikov

Conditional Flow Matching (CFM) offers a simulation-free framework for training continuous-time generative models, bridging diffusion and flow-based approaches. However, sampling from CFM still relies on numerically solving non-linear ODEs which can be computationally expensive and difficult to interpret. Recent alternatives address sampling speed via trajectory straightening, mini-batch coupling or distillation. However, these methods typically do not shed light on the underlying \textit{structure} of the generative process. In this work, we propose to accelerate CFM and introduce an interpretable representation of its dynamics by integrating Koopman operator theory, which models non-linear flows as linear evolution in a learned space of observables. We introduce a decoder-free Koopman-CFM architecture that learns an embedding where the generative dynamics become linear, enabling closed-form, one-step sampling via matrix exponentiation. This results in significant speedups over traditional CFM as demonstrated on controlled 2D datasets and real-world benchmarks, MNIST, Fashion-MNIST (F-MNIST), and the Toronto Face Dataset (TFD). Unlike previous methods, our approach leads to a well-structured Koopman generator, whose spectral properties, eigenvalues, and eigenfunctions offer principled tools for analyzing generative behavior such as temporal scaling, mode stability, and decomposition in Koopman latent space. By combining sampling efficiency with analytical structure, Koopman-enhanced flow matching offers a potential step toward fast and interpretable generative modeling.

nan


Article 1046

Title@2025-06-27 (5): Weakly-Supervised Domain Adaptation with Proportion-Constrained Pseudo-Labeling

Title: Weakly-Supervised Domain Adaptation with Proportion-Constrained Pseudo-Labeling Schwach-überwachte Domain-Anpassung mit proportional-beschränktem Pseudo-Labeling 与受比例比例限制的 “ 优多 – – 带 “ 环 “ 的 2506.22301v1

Authors (5): Takumi Okuo, Shinnosuke Matsuo, Shota Harada, Kiyohito Tanaka, Ryoma Bise

Domain shift is a significant challenge in machine learning, particularly in medical applications where data distributions differ across institutions due to variations in data collection practices, equipment, and procedures. This can degrade performance when models trained on source domain data are applied to the target domain. Domain adaptation methods have been widely studied to address this issue, but most struggle when class proportions between the source and target domains differ. In this paper, we propose a weakly-supervised domain adaptation method that leverages class proportion information from the target domain, which is often accessible in medical datasets through prior knowledge or statistical reports. Our method assigns pseudo-labels to the unlabeled target data based on class proportion (called proportion-constrained pseudo-labeling), improving performance without the need for additional annotations. Experiments on two endoscopic datasets demonstrate that our method outperforms semi-supervised domain adaptation techniques, even when 5% of the target domain is labeled. Additionally, the experimental results with noisy proportion labels highlight the robustness of our method, further demonstrating its effectiveness in real-world application scenarios.

nan


Article 1047

Title@2025-06-27 (5): CoATA: Effective Co-Augmentation of Topology and Attribute for Graph Neural Networks

Title: CoATA: Effective Co-Augmentation of Topology and Attribute for Graph Neural Networks CoATA: Effektive Co-Augmentation von Topologie und Attribut für Graph Neuronale Netzwerke CoATA: 数字神经网络的地形和属性的有效联合增强 2506.22299v1

Authors (7): Tao Liu, Longlong Lin, Yunfeng Yu, Xi Ou, Youan Zhang, Zhiqiu Ye, Tao Jia

Graph Neural Networks (GNNs) have garnered substantial attention due to their remarkable capability in learning graph representations. However, real-world graphs often exhibit substantial noise and incompleteness, which severely degrades the performance of GNNs. Existing methods typically address this issue through single-dimensional augmentation, focusing either on refining topology structures or perturbing node attributes, thereby overlooking the deeper interplays between the two. To bridge this gap, this paper presents CoATA, a dual-channel GNN framework specifically designed for the Co-Augmentation of Topology and Attribute. Specifically, CoATA first propagates structural signals to enrich and denoise node attributes. Then, it projects the enhanced attribute space into a node-attribute bipartite graph for further refinement or reconstruction of the underlying structure. Subsequently, CoATA introduces contrastive learning, leveraging prototype alignment and consistency constraints, to facilitate mutual corrections between the augmented and original graphs. Finally, extensive experiments on seven benchmark datasets demonstrate that the proposed CoATA outperforms eleven state-of-the-art baseline methods, showcasing its effectiveness in capturing the synergistic relationship between topology and attributes.

nan


Article 1048

Title@2025-06-27 (5): Score-Based Model for Low-Rank Tensor Recovery

Title: Score-Based Model for Low-Rank Tensor Recovery Score-basiertes Modell für Low-Rank Tensor Erholung 低Rank Tensor回收计分模型 2506.22295v1

Authors (6): Zhengyun Cheng, Changhao Wang, Guanwen Zhang, Yi Xu, Wei Zhou, Xiangyang Ji

Low-rank tensor decompositions (TDs) provide an effective framework for multiway data analysis. Traditional TD methods rely on predefined structural assumptions, such as CP or Tucker decompositions. From a probabilistic perspective, these can be viewed as using Dirac delta distributions to model the relationships between shared factors and the low-rank tensor. However, such prior knowledge is rarely available in practical scenarios, particularly regarding the optimal rank structure and contraction rules. The optimization procedures based on fixed contraction rules are complex, and approximations made during these processes often lead to accuracy loss. To address this issue, we propose a score-based model that eliminates the need for predefined structural or distributional assumptions, enabling the learning of compatibility between tensors and shared factors. Specifically, a neural network is designed to learn the energy function, which is optimized via score matching to capture the gradient of the joint log-probability of tensor entries and shared factors. Our method allows for modeling structures and distributions beyond the Dirac delta assumption. Moreover, integrating the block coordinate descent (BCD) algorithm with the proposed smooth regularization enables the model to perform both tensor completion and denoising. Experimental results demonstrate significant performance improvements across various tensor types, including sparse and continuous-time tensors, as well as visual data.

nan


Article 1049

Title@2025-06-27 (5): High Resolution Isotropic 3D Cine imaging with Automated Segmentation using Concatenated 2D Real-time Imaging and Deep Learning

Title: High Resolution Isotropic 3D Cine imaging with Automated Segmentation using Concatenated 2D Real-time Imaging and Deep Learning Hochauflösende isotrope 3D-Cine-Bildgebung mit automatischer Segmentierung mit konkatenierter 2D-Echtzeit-Bildgebung und Deep Learning 高分辨率 Isotoric 3D Cine 成像, 使用 Conculated 2D 实时成像和深层学习进行自动分割 2506.22532v1

Authors (8): Mark Wrobel, Michele Pascale, Tina Yao, Ruaraidh Campbell, Elena Milano, Michael Quail, Jennifer Steeden, Vivek Muthurangu

Background: Conventional cardiovascular magnetic resonance (CMR) in paediatric and congenital heart disease uses 2D, breath-hold, balanced steady state free precession (bSSFP) cine imaging for assessment of function and cardiac-gated, respiratory-navigated, static 3D bSSFP whole-heart imaging for anatomical assessment. Our aim is to concatenate a stack 2D free-breathing real-time cines and use Deep Learning (DL) to create an isotropic a fully segmented 3D cine dataset from these images. Methods: Four DL models were trained on open-source data that performed: a) Interslice contrast correction; b) Interslice respiratory motion correction; c) Super-resolution (slice direction); and d) Segmentation of right and left atria and ventricles (RA, LA, RV, and LV), thoracic aorta (Ao) and pulmonary arteries (PA). In 10 patients undergoing routine cardiovascular examination, our method was validated on prospectively acquired sagittal stacks of real-time cine images. Quantitative metrics (ventricular volumes and vessel diameters) and image quality of the 3D cines were compared to conventional breath hold cine and whole heart imaging. Results: All real-time data were successfully transformed into 3D cines with a total post-processing time of <1 min in all cases. There were no significant biases in any LV or RV metrics with reasonable limits of agreement and correlation. There is also reasonable agreement for all vessel diameters, although there was a small but significant overestimation of RPA diameter. Conclusion: We have demonstrated the potential of creating a 3D-cine data from concatenated 2D real-time cine images using a series of DL models. Our method has short acquisition and reconstruction times with fully segmented data being available within 2 minutes. The good agreement with conventional imaging suggests that our method could help to significantly speed up CMR in clinical practice.

nan


Article 1050

Title@2025-06-27 (5): Gradual Domain Adaptation for Graph Learning

Title: Gradual Domain Adaptation for Graph Learning Graduale Domain-Anpassung für Graphen-Lernen 图表学习的渐进域域适应 2501.17443v2

Authors (6): Pui Ieng Lei, Ximing Chen, Yijun Sheng, Yanyan Liu, Jingzhi Guo, Zhiguo Gong

Existing literature lacks a graph domain adaptation technique for handling large distribution shifts, primarily due to the difficulty in simulating an evolving path from source to target graph. To make a breakthrough, we present a graph gradual domain adaptation (GGDA) framework with the construction of a compact domain sequence that minimizes information loss in adaptations. Our approach starts with an efficient generation of knowledge-preserving intermediate graphs over the Fused Gromov-Wasserstein (FGW) metric. With the bridging data pool, GGDA domains are then constructed via a novel vertex-based domain progression, which comprises “close” vertex selections and adaptive domain advancement to enhance inter-domain information transferability. Theoretically, our framework concretizes the intractable inter-domain distance $W_p(\mu_t,\mu_{t+1})$ via implementable upper and lower bounds, enabling flexible adjustments of this metric for optimizing domain formation. Extensive experiments under various transfer scenarios validate the superior performance of our GGDA framework.

nan


Article 1051

Title@2025-06-27 (5): Breaking Rank Bottlenecks in Knowledge Graph Completion

Title: Breaking Rank Bottlenecks in Knowledge Graph Completion Breaking Rang Engpässe in der Wissensgraphenvervollständigung 完成知识图中的瓶颈 2506.22271v1

Authors (3): Samy Badreddine, Emile van Krieken, Luciano Serafini

Many Knowledge Graph Completion (KGC) models, despite using powerful encoders, rely on a simple vector-matrix multiplication to score queries against candidate object entities. When the number of entities is larger than the model’s embedding dimension, which in practical scenarios is often by several orders of magnitude, we have a linear output layer with a rank bottleneck. Such bottlenecked layers limit model expressivity. We investigate both theoretically and empirically how rank bottlenecks affect KGC models. We find that, by limiting the set of feasible predictions, rank bottlenecks hurt ranking accuracy and the distribution fidelity of scores. Inspired by the language modelling literature, we propose KGE-MoS, a mixture-based output layer to break rank bottlenecks in many KGC models. Our experiments on four datasets show that KGE-MoS improves performance and probabilistic fit of KGC models for a low parameter cost.

nan


Article 1052

Title@2025-06-27 (5): How do Probabilistic Graphical Models and Graph Neural Networks Look at Network Data?

Title: How do Probabilistic Graphical Models and Graph Neural Networks Look at Network Data? Wie betrachten probabilistische Graphische Modelle und Graphische Neuronale Netzwerke Netzwerkdaten? 图像模型和图形神经网络如何看待网络数据? 2506.11869v2

Authors (2): Michela Lapenna, Caterina De Bacco

Graphs are a powerful data structure for representing relational data and are widely used to describe complex real-world systems. Probabilistic Graphical Models (PGMs) and Graph Neural Networks (GNNs) can both leverage graph-structured data, but their inherent functioning is different. The question is how do they compare in capturing the information contained in networked datasets? We address this objective by solving a link prediction task and we conduct three main experiments, on both synthetic and real networks: one focuses on how PGMs and GNNs handle input features, while the other two investigate their robustness to noisy features and increasing heterophily of the graph. PGMs do not necessarily require features on nodes, while GNNs cannot exploit the network edges alone, and the choice of input features matters. We find that GNNs are outperformed by PGMs when input features are low-dimensional or noisy, mimicking many real scenarios where node attributes might be scalar or noisy. Then, we find that PGMs are more robust than GNNs when the heterophily of the graph is increased. Finally, to assess performance beyond prediction tasks, we also compare the two frameworks in terms of their computational complexity and interpretability.

nan


Article 1053

Title@2025-06-27 (5): KITAB-Bench: A Comprehensive Multi-Domain Benchmark for Arabic OCR and Document Understanding

Title: KITAB-Bench: A Comprehensive Multi-Domain Benchmark for Arabic OCR and Document Understanding KITAB-Bench: Ein umfassender Multi-Domain-Benchmark für arabisches OCR und Dokumentenverständnis KITAB-Bench:阿拉伯文OCR和文件理解的综合多领域综合基准 2502.14949v2

Authors (10): Ahmed Heakl, Abdullah Sohail, Mukul Ranjan, Rania Hossam, Ghazi Shazan Ahmad, Mohamed El-Geish, Omar Maher, Zhiqiang Shen, Fahad Khan, Salman Khan

With the growing adoption of Retrieval-Augmented Generation (RAG) in document processing, robust text recognition has become increasingly critical for knowledge extraction. While OCR (Optical Character Recognition) for English and other languages benefits from large datasets and well-established benchmarks, Arabic OCR faces unique challenges due to its cursive script, right-to-left text flow, and complex typographic and calligraphic features. We present KITAB-Bench, a comprehensive Arabic OCR benchmark that fills the gaps in current evaluation systems. Our benchmark comprises 8,809 samples across 9 major domains and 36 sub-domains, encompassing diverse document types including handwritten text, structured tables, and specialized coverage of 21 chart types for business intelligence. Our findings show that modern vision-language models (such as GPT-4o, Gemini, and Qwen) outperform traditional OCR approaches (like EasyOCR, PaddleOCR, and Surya) by an average of 60% in Character Error Rate (CER). Furthermore, we highlight significant limitations of current Arabic OCR models, particularly in PDF-to-Markdown conversion, where the best model Gemini-2.0-Flash achieves only 65% accuracy. This underscores the challenges in accurately recognizing Arabic text, including issues with complex fonts, numeral recognition errors, word elongation, and table structure detection. This work establishes a rigorous evaluation framework that can drive improvements in Arabic document analysis methods and bridge the performance gap with English OCR technologies.

nan


Article 1054

Title@2025-06-27 (5): Projected Compression: Trainable Projection for Efficient Transformer Compression

Title: Projected Compression: Trainable Projection for Efficient Transformer Compression Projektierte Kompression: Trainierbare Projektion für effiziente Transformer-Kompression 预计压缩:高效变压器压缩培训预测 2506.22255v1

Authors (9): Maciej Stefaniak, Michał Krutul, Jan Małaśnicki, Maciej Pióro, Jakub Krajewski, Sebastian Jaszczur, Marek Cygan, Kamil Adamczewski, Jan Ludziejewski

Large language models have steadily increased in size to achieve improved performance; however, this growth has also led to greater inference time and computational demands. Consequently, there is rising interest in model size reduction methods. To address this issue, we propose Projected Compression, a novel model compression technique, that reduces model weights by utilizing projection modules. Specifically, we first train additional trainable projections weights and preserve access to all the original model parameters. Subsequently, these projections are merged into a lower-dimensional product matrix, resulting in a reduced-size standard Transformer-based model. Unlike alternative approaches that require additional computational overhead, our method matches the base model’s per-token computation step in FLOPs. Experimental results show that Projected Compression outperforms the comparable hard pruning and retraining approach on higher quality models. Moreover, the performance margin scales well with the number of tokens.

nan


Article 1055

Title@2025-06-27 (5): Risk-Averse Best Arm Set Identification with Fixed Budget and Fixed Confidence

Title: Risk-Averse Best Arm Set Identification with Fixed Budget and Fixed Confidence Risiko-Averse Best Arm Set Identifikation mit festem Budget und festem Vertrauen 规避风险的最佳机制确定固定预算和固定预算和固定信任 2506.22253v1

Authors (4): Shunta Nonaga, Koji Tabata, Yuta Mizuno, Tamiki Komatsuzaki

Decision making under uncertain environments in the maximization of expected reward while minimizing its risk is one of the ubiquitous problems in many subjects. Here, we introduce a novel problem setting in stochastic bandit optimization that jointly addresses two critical aspects of decision-making: maximizing expected reward and minimizing associated uncertainty, quantified via the mean-variance(MV) criterion. Unlike traditional bandit formulations that focus solely on expected returns, our objective is to efficiently and accurately identify the Pareto-optimal set of arms that strikes the best trade-off between expected performance and risk. We propose a unified meta-algorithmic framework capable of operating under both fixed-confidence and fixed-budget regimes, achieved through adaptive design of confidence intervals tailored to each scenario using the same sample exploration strategy. We provide theoretical guarantees on the correctness of the returned solutions in both settings. To complement this theoretical analysis, we conduct extensive empirical evaluations across synthetic benchmarks, demonstrating that our approach outperforms existing methods in terms of both accuracy and sample efficiency, highlighting its broad applicability to risk-aware decision-making tasks in uncertain environments.

nan


Article 1056

Title@2025-06-27 (5): Spring-block theory of feature learning in deep neural networks

Title: Spring-block theory of feature learning in deep neural networks Spring-Block-Theorie des Feature-Lernens in tiefen neuronalen Netzwerken 深神经网络中特征学习的春季阶段理论 2407.19353v4

Authors (3): Cheng Shi, Liming Pan, Ivan Dokmanić

Feature-learning deep nets progressively collapse data to a regular low-dimensional geometry. How this emerges from the collective action of nonlinearity, noise, learning rate, and other factors, has eluded first-principles theories built from microscopic neuronal dynamics. We exhibit a noise-nonlinearity phase diagram that identifies regimes where shallow or deep layers learn more effectively and propose a macroscopic mechanical theory that reproduces the diagram and links feature learning across layers to generalization.

nan


Article 1057

Title@2025-06-27 (5): Fairness-Optimized Synthetic EHR Generation for Arbitrary Downstream Predictive Tasks

Title: Fairness-Optimized Synthetic EHR Generation for Arbitrary Downstream Predictive Tasks Fairness-Optimierte Synthetische EHR-Generation für willkürliche Downstream-Vorhersageaufgaben 用于任意下游下游预测任务的公平优化合成合成合成人电子HR生成 2406.02510v3

Authors (3): Mirza Farhan Bin Tarek, Raphael Poulain, Rahmatollah Beheshti

Among various aspects of ensuring the responsible design of AI tools for healthcare applications, addressing fairness concerns has been a key focus area. Specifically, given the wide spread of electronic health record (EHR) data and their huge potential to inform a wide range of clinical decision support tasks, improving fairness in this category of health AI tools is of key importance. While such a broad problem (mitigating fairness in EHR-based AI models) has been tackled using various methods, task- and model-agnostic methods are noticeably rare. In this study, we aimed to target this gap by presenting a new pipeline that generates synthetic EHR data, which is not only consistent with (faithful to) the real EHR data but also can reduce the fairness concerns (defined by the end-user) in the downstream tasks, when combined with the real data. We demonstrate the effectiveness of our proposed pipeline across various downstream tasks and two different EHR datasets. Our proposed pipeline can add a widely applicable and complementary tool to the existing toolbox of methods to address fairness in health AI applications, such as those modifying the design of a downstream model. The codebase for our project is available at https://github.com/healthylaife/FairSynth

nan


Article 1058

Title@2025-06-27 (5): Performance of Rank-One Tensor Approximation on Incomplete Data

Title: Performance of Rank-One Tensor Approximation on Incomplete Data Leistung der Rang eins Tensor-Annäherung auf unvollständigen Daten 在不完全数据上接近 “ 一等-一等 “ 的性能 2504.07818v3

Authors (1): Hugo Lebeau

We are interested in the estimation of a rank-one tensor signal when only a portion $\varepsilon$ of its noisy observation is available. We show that the study of this problem can be reduced to that of a random matrix model whose spectral analysis gives access to the reconstruction performance. These results shed light on and specify the loss of performance induced by an artificial reduction of the memory cost of a tensor via the deletion of a random part of its entries.

nan


Article 1059

Title@2025-06-27 (5): Boosting Classification with Quantum-Inspired Augmentations

Title: Boosting Classification with Quantum-Inspired Augmentations Steigerung der Klassifizierung durch Quantum-inspirierte Augmentationen 以量受量影响增量促进分类 2506.22241v1

Authors (6): Matthias Tschöpe, Vitor Fortes Rey, Sogo Pierre Sanon, Paul Lukowicz, Nikolaos Palaiodimopoulos, Maximilian Kiefer-Emmanouilidis

Understanding the impact of small quantum gate perturbations, which are common in quantum digital devices but absent in classical computers, is crucial for identifying potential advantages in quantum machine learning. While these perturbations are typically seen as detrimental to quantum computation, they can actually enhance performance by serving as a natural source of data augmentation. Additionally, they can often be efficiently simulated on classical hardware, enabling quantum-inspired approaches to improve classical machine learning methods. In this paper, we investigate random Bloch sphere rotations, which are fundamental SU(2) transformations, as a simple yet effective quantum-inspired data augmentation technique. Unlike conventional augmentations such as flipping, rotating, or cropping, quantum transformations lack intuitive spatial interpretations, making their application to tasks like image classification less straightforward. While common quantum augmentation methods rely on applying quantum models or trainable quanvolutional layers to classical datasets, we focus on the direct application of small-angle Bloch rotations and their effect on classical data. Using the large-scale ImageNet dataset, we demonstrate that our quantum-inspired augmentation method improves image classification performance, increasing Top-1 accuracy by 3%, Top-5 accuracy by 2.5%, and the F$_1$ score from 8% to 12% compared to standard classical augmentation methods. Finally, we examine the use of stronger unitary augmentations. Although these transformations preserve information in principle, they result in visually unrecognizable images with potential applications for privacy computations. However, we show that our augmentation approach and simple SU(2) transformations do not enhance differential privacy and discuss the implications of this limitation.

nan


Article 1060

Title@2025-06-27 (5): A Plea for History and Philosophy of Statistics and Machine Learning

Title: A Plea for History and Philosophy of Statistics and Machine Learning Ein Plädoyer für Geschichte und Philosophie der Statistik und des maschinellen Lernens 统计和机器学习历史和哲学 2506.22236v1

Authors (1): Hanti Lin

The integration of the history and philosophy of statistics was initiated at least by Hacking (1965) and advanced by Mayo (1996), but it has not received sustained follow-up. Yet such integration is more urgent than ever, as the recent success of artificial intelligence has been driven largely by machine learning – a field historically developed alongside statistics. Today, the boundary between statistics and machine learning is increasingly blurred. What we now need is integration, twice over: of history and philosophy, and of the field they engage – statistics and machine learning. I present a case study of a philosophical idea in machine learning (and in formal epistemology) whose root can be traced back to an often under-appreciated insight in Neyman and Pearson’s 1936 work (a follow-up to their 1933 classic). This leads to the articulation of a foundational assumption – largely implicit in, but shared by, the practices of frequentist statistics and machine learning – which I call achievabilism. Another integration also emerges at the level of methodology, combining two ends of the philosophy of science spectrum: history and philosophy of science on the one hand, and formal epistemology on the other hand.

nan


Article 1061

Title@2025-06-27 (5): No More Sliding Window: Efficient 3D Medical Image Segmentation with Differentiable Top-k Patch Sampling

Title: No More Sliding Window: Efficient 3D Medical Image Segmentation with Differentiable Top-k Patch Sampling Kein Schiebefenster mehr: Effiziente 3D-Messbildsegmentierung mit differenzierbarer Top-K-Patch-Sampling 没有更多滑动窗口: 高效的 3D 医疗图像分割, 配有可区别的顶k 补丁抽样 2501.10814v3

Authors (4): Young Seok Jeon, Hongfei Yang, Huazhu Fu, Mengling Feng

3D models surpass 2D models in CT/MRI segmentation by effectively capturing inter-slice relationships. However, the added depth dimension substantially increases memory consumption. While patch-based training alleviates memory constraints, it significantly slows down the inference speed due to the sliding window (SW) approach. We propose No-More-Sliding-Window (NMSW), a novel end-to-end trainable framework that enhances the efficiency of generic 3D segmentation backbone during an inference step by eliminating the need for SW. NMSW employs a differentiable Top-k module to selectively sample only the most relevant patches, thereby minimizing redundant computations. When patch-level predictions are insufficient, the framework intelligently leverages coarse global predictions to refine results. Evaluated across 3 tasks using 3 segmentation backbones, NMSW achieves competitive accuracy compared to SW inference while significantly reducing computational complexity by 91% (88.0 to 8.00 TMACs). Moreover, it delivers a 9.1x faster inference on the H100 GPU (99.0 to 8.3 sec) and a 11.1x faster inference on the Xeon Gold CPU (2110 to 189 sec). NMSW is model-agnostic, further boosting efficiency when integrated with any existing efficient segmentation backbones. The code is avaialble: https://github.com/Youngseok0001/open_nmsw.

nan


Article 1062

Title@2025-06-27 (5): Uncovering smooth structures in single-cell data with PCS-guided neighbor embeddings

Title: Uncovering smooth structures in single-cell data with PCS-guided neighbor embeddings Entdecken glatter Strukturen in Einzelzellendaten mit PCS-geführten Nachbareinbettungen 以 PCS 制导的邻居嵌入器在单细胞数据中发现平滑结构 2506.22228v1

Authors (4): Rong Ma, Xi Li, Jingyuan Hu, Bin Yu

Single-cell sequencing is revolutionizing biology by enabling detailed investigations of cell-state transitions. Many biological processes unfold along continuous trajectories, yet it remains challenging to extract smooth, low-dimensional representations from inherently noisy, high-dimensional single-cell data. Neighbor embedding (NE) algorithms, such as t-SNE and UMAP, are widely used to embed high-dimensional single-cell data into low dimensions. But they often introduce undesirable distortions, resulting in misleading interpretations. Existing evaluation methods for NE algorithms primarily focus on separating discrete cell types rather than capturing continuous cell-state transitions, while dynamic modeling approaches rely on strong assumptions about cellular processes and specialized data. To address these challenges, we build on the Predictability-Computability-Stability (PCS) framework for reliable and reproducible data-driven discoveries. First, we systematically evaluate popular NE algorithms through empirical analysis, simulation, and theory, and reveal their key shortcomings, such as artifacts and instability. We then introduce NESS, a principled and interpretable machine learning approach to improve NE representations by leveraging algorithmic stability and to enable robust inference of smooth biological structures. NESS offers useful concepts, quantitative stability metrics, and efficient computational workflows to uncover developmental trajectories and cell-state transitions in single-cell data. Finally, we apply NESS to six single-cell datasets, spanning pluripotent stem cell differentiation, organoid development, and multiple tissue-specific lineage trajectories. Across these diverse contexts, NESS consistently yields useful biological insights, such as identification of transitional and stable cell states and quantification of transcriptional dynamics during development.

nan


Article 1063

Title@2025-06-27 (5): Communication-Efficient Heterogeneous Federated Learning with Generalized Heavy-Ball Momentum

Title: Communication-Efficient Heterogeneous Federated Learning with Generalized Heavy-Ball Momentum Kommunikation-Effizient Heterogenes Federated Learning mit Generalized Heavy-Ball Momentum 具有普遍化重气动势的通信效率异质联邦学习 2311.18578v3

Authors (4): Riccardo Zaccone, Sai Praneeth Karimireddy, Carlo Masone, Marco Ciccone

Federated Learning (FL) has emerged as the state-of-the-art approach for learning from decentralized data in privacy-constrained scenarios.However, system and statistical challenges hinder its real-world applicability, requiring efficient learning from edge devices and robustness to data heterogeneity. Despite significant research efforts, existing approaches often degrade severely due to the joint effect of heterogeneity and partial client participation. In particular, while momentum appears as a promising approach for overcoming statistical heterogeneity, in current approaches its update is biased towards the most recently sampled clients. As we show in this work, this is the reason why it fails to outperform FedAvg, preventing its effective use in real-world large-scale scenarios. In this work, we propose a novel Generalized Heavy-Ball Momentum (GHBM) and theoretically prove it enables convergence under unbounded data heterogeneity in cyclic partial participation, thereby advancing the understanding of momentum’s effectiveness in FL. We then introduce adaptive and communication-efficient variants of GHBM that match the communication complexity of FedAvg in settings where clients can be stateful. Extensive experiments on vision and language tasks confirm our theoretical findings, demonstrating that GHBM substantially improves state-of-the-art performance under random uniform client sampling, particularly in large-scale settings with high data heterogeneity and low client participation. Code is available at https://rickzack.github.io/GHBM.

nan


Article 1064

Title@2025-06-27 (5): No Metric to Rule Them All: Toward Principled Evaluations of Graph-Learning Datasets

Title: No Metric to Rule Them All: Toward Principled Evaluations of Graph-Learning Datasets No Metric to Rule them All: Toward Principled Evaluations of Graph-Learning Datasets 《规则通则》的衡量标准:对图表学习数据集进行有原则的评估 2502.02379v2

Authors (4): Corinna Coupette, Jeremy Wayland, Emily Simons, Bastian Rieck

Benchmark datasets have proved pivotal to the success of graph learning, and good benchmark datasets are crucial to guide the development of the field. Recent research has highlighted problems with graph-learning datasets and benchmarking practices – revealing, for example, that methods which ignore the graph structure can outperform graph-based approaches. Such findings raise two questions: (1) What makes a good graph-learning dataset, and (2) how can we evaluate dataset quality in graph learning? Our work addresses these questions. As the classic evaluation setup uses datasets to evaluate models, it does not apply to dataset evaluation. Hence, we start from first principles. Observing that graph-learning datasets uniquely combine two modes – graph structure and node features –, we introduce Rings, a flexible and extensible mode-perturbation framework to assess the quality of graph-learning datasets based on dataset ablations – i.e., quantifying differences between the original dataset and its perturbed representations. Within this framework, we propose two measures – performance separability and mode complementarity – as evaluation tools, each assessing the capacity of a graph dataset to benchmark the power and efficacy of graph-learning methods from a distinct angle. We demonstrate the utility of our framework for dataset evaluation via extensive experiments on graph-level tasks and derive actionable recommendations for improving the evaluation of graph-learning methods. Our work opens new research directions in data-centric graph learning, and it constitutes a step toward the systematic evaluation of evaluations.

nan


Article 1065

Title@2025-06-27 (5): Soft Condorcet Optimization for Ranking of General Agents

Title: Soft Condorcet Optimization for Ranking of General Agents Soft Condorcet Optimierung für das Ranking von General Agents 对一般代理人员排名的优化 2411.00119v4

Authors (10): Marc Lanctot, Kate Larson, Michael Kaisers, Quentin Berthet, Ian Gemp, Manfred Diaz, Roberto-Rafael Maura-Rivero, Yoram Bachrach, Anna Koop, Doina Precup

Driving progress of AI models and agents requires comparing their performance on standardized benchmarks; for general agents, individual performances must be aggregated across a potentially wide variety of different tasks. In this paper, we describe a novel ranking scheme inspired by social choice frameworks, called Soft Condorcet Optimization (SCO), to compute the optimal ranking of agents: the one that makes the fewest mistakes in predicting the agent comparisons in the evaluation data. This optimal ranking is the maximum likelihood estimate when evaluation data (which we view as votes) are interpreted as noisy samples from a ground truth ranking, a solution to Condorcet’s original voting system criteria. SCO ratings are maximal for Condorcet winners when they exist, which we show is not necessarily true for the classical rating system Elo. We propose three optimization algorithms to compute SCO ratings and evaluate their empirical performance. When serving as an approximation to the Kemeny-Young voting method, SCO rankings are on average 0 to 0.043 away from the optimal ranking in normalized Kendall-tau distance across 865 preference profiles from the PrefLib open ranking archive. In a simulated noisy tournament setting, SCO achieves accurate approximations to the ground truth ranking and the best among several baselines when 59\% or more of the preference data is missing. Finally, SCO ranking provides the best approximation to the optimal ranking, measured on held-out test sets, in a problem containing 52,958 human players across 31,049 games of the classic seven-player game of Diplomacy.

nan


Article 1066

Title@2025-06-27 (5): Hybrid Generative Modeling for Incomplete Physics: Deep Grey-Box Meets Optimal Transport

Title: Hybrid Generative Modeling for Incomplete Physics: Deep Grey-Box Meets Optimal Transport Hybride Generative Modellierung für unvollständige Physik: Deep Grey-Box trifft auf optimalen Transport 不完全物理的混合生成模型:深灰质-Box 满足最佳迁移 2506.22204v1

Authors (3): Gurjeet Sangra Singh, Maciej Falkiewicz, Alexandros Kalousis

Physics phenomena are often described by ordinary and/or partial differential equations (ODEs/PDEs), and solved analytically or numerically. Unfortunately, many real-world systems are described only approximately with missing or unknown terms in the equations. This makes the distribution of the physics model differ from the true data-generating process (DGP). Using limited and unpaired data between DGP observations and the imperfect model simulations, we investigate this particular setting by completing the known-physics model, combining theory-driven models and data-driven to describe the shifted distribution involved in the DGP. We present a novel hybrid generative model approach combining deep grey-box modelling with Optimal Transport (OT) methods to enhance incomplete physics models. Our method implements OT maps in data space while maintaining minimal source distribution distortion, demonstrating superior performance in resolving the unpaired problem and ensuring correct usage of physics parameters. Unlike black-box alternatives, our approach leverages physics-based inductive biases to accurately learn system dynamics while preserving interpretability through its domain knowledge foundation. Experimental results validate our method’s effectiveness in both generation tasks and model transparency, offering detailed insights into learned physics dynamics.

nan


Article 1067

Title@2025-06-27 (5): Task-Agnostic Contrastive Pretraining for Relational Deep Learning

Title: Task-Agnostic Contrastive Pretraining for Relational Deep Learning Task-Agnostic Kontrastive Vorschulung für relationales Deep Learning 关系深层学习任务不可允许的对抗性预备培训 2506.22530v1

Authors (2): Jakub Peleška, Gustav Šír

Relational Deep Learning (RDL) is an emerging paradigm that leverages Graph Neural Network principles to learn directly from relational databases by representing them as heterogeneous graphs. However, existing RDL models typically rely on task-specific supervised learning, requiring training separate models for each predictive task, which may hamper scalability and reuse. In this work, we propose a novel task-agnostic contrastive pretraining approach for RDL that enables database-wide representation learning. For that aim, we introduce three levels of contrastive objectives$-$row-level, link-level, and context-level$-$designed to capture the structural and semantic heterogeneity inherent to relational data. We implement the respective pretraining approach through a modular RDL architecture and an efficient sampling strategy tailored to the heterogeneous database setting. Our preliminary results on standard RDL benchmarks demonstrate that fine-tuning the pretrained models measurably outperforms training from scratch, validating the promise of the proposed methodology in learning transferable representations for relational data.

nan


Article 1068

Title@2025-06-27 (5): REDELEX: A Framework for Relational Deep Learning Exploration

Title: REDELEX: A Framework for Relational Deep Learning Exploration REDELEX: Ein Rahmen für relationale Deep Learning Exploration REDELEX: 关系深深学习探索框架 2506.22199v1

Authors (2): Jakub Peleška, Gustav Šír

Relational databases (RDBs) are widely regarded as the gold standard for storing structured information. Consequently, predictive tasks leveraging this data format hold significant application promise. Recently, Relational Deep Learning (RDL) has emerged as a novel paradigm wherein RDBs are conceptualized as graph structures, enabling the application of various graph neural architectures to effectively address these tasks. However, given its novelty, there is a lack of analysis into the relationships between the performance of various RDL models and the characteristics of the underlying RDBs. In this study, we present REDELEX$-$a comprehensive exploration framework for evaluating RDL models of varying complexity on the most diverse collection of over 70 RDBs, which we make available to the community. Benchmarked alongside key representatives of classic methods, we confirm the generally superior performance of RDL while providing insights into the main factors shaping performance, including model complexity, database sizes and their structural properties.

nan


Article 1069

Title@2025-06-27 (5): AB-UPT: Scaling Neural CFD Surrogates for High-Fidelity Automotive Aerodynamics Simulations via Anchored-Branched Universal Physics Transformers

Title: AB-UPT: Scaling Neural CFD Surrogates for High-Fidelity Automotive Aerodynamics Simulations via Anchored-Branched Universal Physics Transformers AB-UPT: Skalierende Neuronale CFD-Überlagerungen für hochfeste automobile Aerodynamik-Simulationen über verankerte Universal-Physics-Transformer AB-UPT:通过 “ 支离破碎的宇宙物理变异器 “ 模拟高纤维汽车空气动力学模拟器的缩放神经 CFD 加速器 2502.09692v3

Authors (7): Benedikt Alkin, Maurits Bleeker, Richard Kurle, Tobias Kronlachner, Reinhard Sonnleitner, Matthias Dorfer, Johannes Brandstetter

Recent advances in neural surrogate modeling offer the potential for transformative innovations in applications such as automotive aerodynamics. Yet, industrial-scale problems often involve volumetric meshes with cell counts reaching 100 million, presenting major scalability challenges. Complex geometries further complicate modeling through intricate surface-volume interactions, while quantities such as vorticity are highly nonlinear and must satisfy strict divergence-free constraints. To address these requirements, we introduce Anchored-Branched Universal Physics Transformers (AB-UPT) as a novel modeling scheme for building neural surrogates for computational fluid dynamics (CFD) simulations. AB-UPT is designed to: (i) decouple geometry encoding and prediction tasks via multi-branch operators; (ii) enable scalability to high-resolution outputs via neural simulation in a low-dimensional latent space, coupled with anchored neural field decoders to predict high-fidelity outputs; (iii) enforce physics consistency by a novel divergence-free formulation. We show that AB-UPT yields state-of-the-art predictive accuracy of surface and volume fields on automotive CFD simulations ranging from 33 thousand up to 150 million mesh cells. Furthermore, our anchored neural field architecture enables the enforcement of hard physical constraints on the physics predictions without degradation in performance, exemplified by modeling divergence-free vorticity fields. Notably, the proposed models can be trained on a single GPU in less than a day and predict industry-standard surface and volume fields within seconds. Additionally, we show that the flexible design of our method enables neural simulation from a computer-aided design geometry alone, omitting the need for costly CFD meshing procedures.

nan


Article 1070

Title@2025-06-27 (5): dreaMLearning: Data Compression Assisted Machine Learning

Title: dreaMLearning: Data Compression Assisted Machine Learning dreaMLearning: Datenkompression unterstütztes maschinelles Lernen dreamlearning:数据压缩辅助机器学习 2506.22190v1

Authors (4): Xiaobo Zhao, Aaron Hurst, Panagiotis Karras, Daniel E. Lucani

Despite rapid advancements, machine learning, particularly deep learning, is hindered by the need for large amounts of labeled data to learn meaningful patterns without overfitting and immense demands for computation and storage, which motivate research into architectures that can achieve good performance with fewer resources. This paper introduces dreaMLearning, a novel framework that enables learning from compressed data without decompression, built upon Entropy-based Generalized Deduplication (EntroGeDe), an entropy-driven lossless compression method that consolidates information into a compact set of representative samples. DreaMLearning accommodates a wide range of data types, tasks, and model architectures. Extensive experiments on regression and classification tasks with tabular and image data demonstrate that dreaMLearning accelerates training by up to 8.8x, reduces memory usage by 10x, and cuts storage by 42%, with a minimal impact on model performance. These advancements enhance diverse ML applications, including distributed and federated learning, and tinyML on resource-constrained edge devices, unlocking new possibilities for efficient and scalable learning.

nan


Article 1071

Title@2025-06-27 (5): Exploring Modularity of Agentic Systems for Drug Discovery

Title: Exploring Modularity of Agentic Systems for Drug Discovery Erforschung der Modularität von Wirkstoffsystemen für die Drogenentdeckung 探索药物发现剂系统模式 2506.22189v1

Authors (4): Laura van Weesep, Samuel Genheden, Ola Engkvist, Jens Sjölund

Large-language models (LLMs) and agentic systems present exciting opportunities to accelerate drug discovery and design. In this study, we critically examine the modularity of LLM-based agentic systems for drug discovery, i.e., whether parts of the agentic system such as the LLM are interchangeable, a topic that has received limited attention in drug discovery applications. We compare the performance of different large language models (LLMs) and the effectiveness of tool-calling agents versus code-generating agents in this domain. Our case study, comparing performance in orchestrating tools for chemistry and drug discovery using an LLM-as-a-judge score, shows that Claude-3.5-Sonnet, Claude-3.7-Sonnet and GPT-4o outperform alternative language models such as Llama-3.1-8B, Llama-3.1-70B, GPT-3.5-Turbo, and Nova-Micro. Although we confirm that code-generating agents outperform the tool-calling ones on average, we show that this is highly question and model dependent. Furthermore, the impact of replacing system prompts is dependent on the specific question asked and the model used, underscoring that – even in this particular domain – one cannot just replace language models without considering prompt re-engineering. Our study highlights the necessity of further research into the modularity of agentic systems to enable the development of stable and scalable solutions for real-world problems.

nan


Article 1072

Title@2025-06-27 (5): LLM as GNN: Graph Vocabulary Learning for Text-Attributed Graph Foundation Models

Title: LLM as GNN: Graph Vocabulary Learning for Text-Attributed Graph Foundation Models LLM als GNN: Graph Vocabulary Learning für text-Attributed Graph Foundation Models 作为GNN的LLMLM:文字图表基础模型图表词汇学习 2503.03313v2

Authors (9): Xi Zhu, Haochen Xue, Ziwei Zhao, Wujiang Xu, Jingyuan Huang, Minghao Guo, Qifan Wang, Kaixiong Zhou, Yongfeng Zhang

Text-Attributed Graphs (TAGs), where each node is associated with text descriptions, are ubiquitous in real-world scenarios. They typically exhibit distinctive structure and domain-specific knowledge, motivating the development of a Graph Foundation Model (GFM) that generalizes across diverse graphs and tasks. Despite large efforts to integrate Large Language Models (LLMs) and Graph Neural Networks (GNNs) for TAGs, existing approaches suffer from decoupled architectures with two-stage alignment, limiting their synergistic potential. Even worse, existing methods assign out-of-vocabulary (OOV) tokens to graph nodes, leading to graph-specific semantics, token explosion, and incompatibility with task-oriented prompt templates, which hinders cross-graph and cross-task transferability. To address these challenges, we propose PromptGFM, a versatile GFM for TAGs grounded in graph vocabulary learning. PromptGFM comprises two key components: (1) Graph Understanding Module, which explicitly prompts LLMs to replicate the finest GNN workflow within the text space, facilitating seamless GNN-LLM integration and elegant graph-text alignment; (2) Graph Inference Module, which establishes a language-based graph vocabulary ensuring expressiveness, transferability, and scalability, enabling readable instructions for LLM fine-tuning. Extensive experiments demonstrate our superiority and transferability across diverse graphs and tasks. The code is available at this: https://github.com/agiresearch/PromptGFM.

nan


Article 1073

Title@2025-06-27 (5): Thompson Sampling-Based Learning and Control for Unknown Dynamic Systems

Title: Thompson Sampling-Based Learning and Control for Unknown Dynamic Systems Thompson Sampling-based Learning and Control für unbekannte dynamische Systeme Thompson 未知动态系统抽样学习和控制 2506.22186v1

Authors (4): Kaikai Zheng, Dawei Shi, Yang Shi, Long Wang

Thompson sampling (TS) is an effective method to explore parametric uncertainties and can therefore be used for active learning-based controller design. However, TS relies on finite parametric representations, which limits its applicability to more general spaces, which are more commonly encountered in control system design. To address this issue, this work pro poses a parameterization method for control law learning using reproducing kernel Hilbert spaces and designs a data-driven active learning control approach. Specifically, the proposed method treats the control law as an element in a function space, allowing the design of control laws without imposing restrictions on the system structure or the form of the controller. A TS framework is proposed in this work to explore potential optimal control laws, and the convergence guarantees are further provided for the learning process. Theoretical analysis shows that the proposed method learns the relationship between control laws and closed-loop performance metrics at an exponential rate, and the upper bound of control regret is also derived. Numerical experiments on controlling unknown nonlinear systems validate the effectiveness of the proposed method.

nan


Article 1074

Title@2025-06-27 (5): ROME: Robust Multi-Modal Density Estimator

Title: ROME: Robust Multi-Modal Density Estimator ROME: Robuster Multi-Modal Density Estimator ROME: 强力多式密度模拟器 2401.10566v3

Authors (5): Anna Mészáros, Julian F. Schumann, Javier Alonso-Mora, Arkady Zgonnikov, Jens Kober

The estimation of probability density functions is a fundamental problem in science and engineering. However, common methods such as kernel density estimation (KDE) have been demonstrated to lack robustness, while more complex methods have not been evaluated in multi-modal estimation problems. In this paper, we present ROME (RObust Multi-modal Estimator), a non-parametric approach for density estimation which addresses the challenge of estimating multi-modal, non-normal, and highly correlated distributions. ROME utilizes clustering to segment a multi-modal set of samples into multiple uni-modal ones and then combines simple KDE estimates obtained for individual clusters in a single multi-modal estimate. We compared our approach to state-of-the-art methods for density estimation as well as ablations of ROME, showing that it not only outperforms established methods but is also more robust to a variety of distributions. Our results demonstrate that ROME can overcome the issues of over-fitting and over-smoothing exhibited by other estimators.

nan


Article 1075

Title@2025-06-27 (5): Graph-Reward-SQL: Execution-Free Reinforcement Learning for Text-to-SQL via Graph Matching and Stepwise Reward

Title: Graph-Reward-SQL: Execution-Free Reinforcement Learning for Text-to-SQL via Graph Matching and Stepwise Reward Graph-Reward-SQL: Executionsfreies Verstärkungslernen für Text-zu-SQL über Graph Matching und Stepwise Reward 图表-评分-SQL:通过图表匹配和分步评分,为文本到SQL进行无执行加强学习 2505.12380v2

Authors (13): Han Weng, Puzhen Wu, Cui Longjie, Yi Zhan, Boyi Liu, Yuanfeng Song, Dun Zeng, Yingxiang Yang, Qianru Zhang, Dong Huang, Xiaoming Yin, Yang Sun, Xing Chen

Reinforcement learning (RL) has been widely adopted to enhance the performance of large language models (LLMs) on Text-to-SQL tasks. However, existing methods often rely on execution-based or LLM-based Bradley-Terry reward models. The former suffers from high execution latency caused by repeated database calls, whereas the latter imposes substantial GPU memory overhead, both of which significantly hinder the efficiency and scalability of RL pipelines. To this end, we propose a novel Text-to-SQL RL fine-tuning framework named Graph-Reward-SQL, which employs the GMNScore outcome reward model. We leverage SQL graph representations to provide accurate reward signals while significantly reducing inference time and GPU memory usage. Building on this foundation, we further introduce StepRTM, a stepwise reward model that provides intermediate supervision over Common Table Expression (CTE) subqueries. This encourages both functional correctness and structural clarity of SQL. Extensive comparative and ablation experiments on standard benchmarks, including Spider and BIRD, demonstrate that our method consistently outperforms existing reward models.

nan


Article 1076

Title@2025-06-27 (5): ASVSim (AirSim for Surface Vehicles): A High-Fidelity Simulation Framework for Autonomous Surface Vehicle Research

Title: ASVSim (AirSim for Surface Vehicles): A High-Fidelity Simulation Framework for Autonomous Surface Vehicle Research ASVSim (AirSim for Surface Vehicles): Ein Simulationsrahmen mit hoher Genauigkeit für autonome Oberflächenfahrzeugforschung ASVSim(地表车辆的航空Sim):自动地表车辆研究的高纤维模拟框架 2506.22174v1

Authors (7): Bavo Lesy, Siemen Herremans, Robin Kerstens, Jan Steckel, Walter Daems, Siegfried Mercelis, Ali Anwar

The transport industry has recently shown significant interest in unmanned surface vehicles (USVs), specifically for port and inland waterway transport. These systems can improve operational efficiency and safety, which is especially relevant in the European Union, where initiatives such as the Green Deal are driving a shift towards increased use of inland waterways. At the same time, a shortage of qualified personnel is accelerating the adoption of autonomous solutions. However, there is a notable lack of open-source, high-fidelity simulation frameworks and datasets for developing and evaluating such solutions. To address these challenges, we introduce AirSim For Surface Vehicles (ASVSim), an open-source simulation framework specifically designed for autonomous shipping research in inland and port environments. The framework combines simulated vessel dynamics with marine sensor simulation capabilities, including radar and camera systems and supports the generation of synthetic datasets for training computer vision models and reinforcement learning agents. Built upon Cosys-AirSim, ASVSim provides a comprehensive platform for developing autonomous navigation algorithms and generating synthetic datasets. The simulator supports research of both traditional control methods and deep learning-based approaches. Through limited experiments, we demonstrate the potential of the simulator in these research areas. ASVSim is provided as an open-source project under the MIT license, making autonomous navigation research accessible to a larger part of the ocean engineering community.

nan


Article 1077

Title@2025-06-27 (5): Near Field Localization via AI-Aided Subspace Methods

Title: Near Field Localization via AI-Aided Subspace Methods Nahfeldlokalisierung über KI-gestützte Subraum-Methoden 通过AI-援助的次级空间方法使近场定位化 2504.00599v2

Authors (3): Arad Gast, Luc Le Magoarou, Nir Shlezinger

The increasing demands for high-throughput and energy-efficient wireless communications are driving the adoption of extremely large antennas operating at high-frequency bands. In these regimes, multiple users will reside in the radiative near-field, and accurate localization becomes essential. Unlike conventional far-field systems that rely solely on DOA estimation, near-field localization exploits spherical wavefront propagation to recover both DOA and range information. While subspace-based methods, such as MUSIC and its extensions, offer high resolution and interpretability for near-field localization, their performance is significantly impacted by model assumptions, including non-coherent sources, well-calibrated arrays, and a sufficient number of snapshots. To address these limitations, this work proposes AI-aided subspace methods for near-field localization that enhance robustness to real-world challenges. Specifically, we introduce NF-SubspaceNet, a deep learning-augmented 2D MUSIC algorithm that learns a surrogate covariance matrix to improve localization under challenging conditions, and DCD-MUSIC, a cascaded AI-aided approach that decouples angle and range estimation to reduce computational complexity. We further develop a novel model-order-aware training method to accurately estimate the number of sources, that is combined with casting of near field subspace methods as AI models for learning. Extensive simulations demonstrate that the proposed methods outperform classical and existing deep-learning-based localization techniques, providing robust near-field localization even under coherent sources, miscalibrations, and few snapshots.

nan


Article 1078

Title@2025-06-27 (5): Design Patterns for Securing LLM Agents against Prompt Injections

Title: Design Patterns for Securing LLM Agents against Prompt Injections Designmuster zur Sicherung von LLM-Agenten gegen sofortige Injektionen 防止及时注射的LLM代理物的安全设计模式 2506.08837v3

Authors (14): Luca Beurer-Kellner, Beat Buesser, Ana-Maria Creţu, Edoardo Debenedetti, Daniel Dobos, Daniel Fabian, Marc Fischer, David Froelicher, Kathrin Grosse, Daniel Naeff, Ezinwanne Ozoani, Andrew Paverd, Florian Tramèr, Václav Volhejn

As AI agents powered by Large Language Models (LLMs) become increasingly versatile and capable of addressing a broad spectrum of tasks, ensuring their security has become a critical challenge. Among the most pressing threats are prompt injection attacks, which exploit the agent’s resilience on natural language inputs – an especially dangerous threat when agents are granted tool access or handle sensitive information. In this work, we propose a set of principled design patterns for building AI agents with provable resistance to prompt injection. We systematically analyze these patterns, discuss their trade-offs in terms of utility and security, and illustrate their real-world applicability through a series of case studies.

nan


Article 1079

Title@2025-06-27 (5): Earthquake Damage Grades Prediction using An Ensemble Approach Integrating Advanced Machine and Deep Learning Models

Title: Earthquake Damage Grades Prediction using An Ensemble Approach Integrating Advanced Machine and Deep Learning Models Erdbebenschäden Grade Vorhersage mit einem Ensemble Ansatz Integration von fortgeschrittenen Maschinen und Deep Learning Modelle 采用组合办法预测地震损害等级,结合先进机器和深学习模式 2506.22129v1

Authors (2): Anurag Panda, Gaurav Kumar Yadav

In the aftermath of major earthquakes, evaluating structural and infrastructural damage is vital for coordinating post-disaster response efforts. This includes assessing damage’s extent and spatial distribution to prioritize rescue operations and resource allocation. Accurately estimating damage grades to buildings post-earthquake is paramount for effective response and recovery, given the significant impact on lives and properties, underscoring the urgency of streamlining relief fund allocation processes. Previous studies have shown the effectiveness of multi-class classification, especially XGBoost, along with other machine learning models and ensembling methods, incorporating regularization to address class imbalance. One consequence of class imbalance is that it may give rise to skewed models that undervalue minority classes and give preference to the majority class. This research deals with the problem of class imbalance with the help of the synthetic minority oversampling technique (SMOTE). We delve into multiple multi-class classification machine learning, deep learning models, and ensembling methods to forecast structural damage grades. The study elucidates performance determinants through comprehensive feature manipulation experiments and diverse training approaches. It identifies key factors contributing to seismic vulnerability while evaluating model performance using techniques like the confusion matrix further to enhance understanding of the effectiveness of earthquake damage prediction.

nan


Article 1080

Title@2025-06-27 (5): Generative AI for O-RAN Slicing: A Semi-Supervised Approach with VAE and Contrastive Learning

Title: Generative AI for O-RAN Slicing: A Semi-Supervised Approach with VAE and Contrastive Learning Generative KI für O-RAN Slicing: Ein semi-überwachter Ansatz mit VAE und Kontrastivem Lernen O-RAN切片:与VAE和反竞争学习的半监督方法 2401.08861v3

Authors (4): Salar Nouri, Mojdeh Karbalaee Motalleb, Vahid Shah-Mansouri, Seyed Pooya Shariatpanahi

This paper introduces a novel generative AI (GAI)-driven, unified semi-supervised learning architecture for optimizing resource allocation and network slicing in O-RAN. Termed Generative Semi-Supervised VAE-Contrastive Learning, our approach maximizes the weighted user equipment (UE) throughput and allocates physical resource blocks (PRBs) to enhance the quality of service for eMBB and URLLC services. The GAI framework utilizes a dedicated xApp for intelligent power control and PRB allocation. This integrated GAI model synergistically combines the generative power of a VAE with contrastive learning to achieve robustness in an end-to-end trainable system. It is a semi-supervised training approach that concurrently optimizes supervised regression of resource allocation decisions (i.e., power, UE association, PRB) and unsupervised contrastive objectives. This intrinsic fusion improves the precision of resource management and model generalization in dynamic mobile networks. We evaluated our GAI methodology against exhaustive search and deep Q-Network algorithms using key performance metrics. Results show our integrated GAI approach offers superior efficiency and effectiveness in various scenarios, presenting a compelling GAI-based solution for critical network slicing and resource management challenges in next-generation O-RAN systems.

nan


Article 1081

Title@2025-06-27 (5): Identifying a Circuit for Verb Conjugation in GPT-2

Title: Identifying a Circuit for Verb Conjugation in GPT-2 Identifizierung eines Kreises für Verbkonjugation in GPT-2 在 GPT-2 中确定 Verb 混和的电路 2506.22105v1

Authors (1): David Demitri Africa

I implement a procedure to isolate and interpret the sub-network (or “circuit”) responsible for subject-verb agreement in GPT-2 Small. In this study, the model is given prompts where the subject is either singular (e.g. “Alice”) or plural (e.g. “Alice and Bob”), and the task is to correctly predict the appropriate verb form (“walks” for singular subjects, “walk” for plural subjects). Using a series of techniques-including performance verification automatic circuit discovery via direct path patching, and direct logit attribution- I isolate a candidate circuit that contributes significantly to the model’s correct verb conjugation. The results suggest that only a small fraction of the network’s component-token pairs is needed to achieve near-model performance on the base task but substantially more for more complex settings.

nan


Article 1082

Title@2025-06-27 (5): Tied Prototype Model for Few-Shot Medical Image Segmentation

Title: Tied Prototype Model for Few-Shot Medical Image Segmentation Gebundenes Prototypenmodell für wenige heiße medizinische Bildsegmentierung 几乎没有热的医学图像分割的捆绑式原型模型 2506.22101v1

Authors (3): Hyeongji Kim, Stine Hansen, Michael Kampffmeyer

Common prototype-based medical image few-shot segmentation (FSS) methods model foreground and background classes using class-specific prototypes. However, given the high variability of the background, a more promising direction is to focus solely on foreground modeling, treating the background as an anomaly – an approach introduced by ADNet. Yet, ADNet faces three key limitations: dependence on a single prototype per class, a focus on binary classification, and fixed thresholds that fail to adapt to patient and organ variability. To address these shortcomings, we propose the Tied Prototype Model (TPM), a principled reformulation of ADNet with tied prototype locations for foreground and background distributions. Building on its probabilistic foundation, TPM naturally extends to multiple prototypes and multi-class segmentation while effectively separating non-typical background features. Notably, both extensions lead to improved segmentation accuracy. Finally, we leverage naturally occurring class priors to define an ideal target for adaptive thresholds, boosting segmentation performance. Taken together, TPM provides a fresh perspective on prototype-based FSS for medical image segmentation. The code can be found at https://github.com/hjk92g/TPM-FSS.

nan


Article 1083

Title@2025-06-27 (5): Transfer Learning for Assessing Heavy Metal Pollution in Seaports Sediments

Title: Transfer Learning for Assessing Heavy Metal Pollution in Seaports Sediments Transfer-Lernen für die Bewertung der Schwermetallverschmutzung in Seehäfen Sedimente 评估海港沉积物中重金属污染的转让学习 2506.22096v1

Authors (4): Tin Lai, Farnaz Farid, Yueyang Kuan, Xintian Zhang

Detecting heavy metal pollution in soils and seaports is vital for regional environmental monitoring. The Pollution Load Index (PLI), an international standard, is commonly used to assess heavy metal containment. However, the conventional PLI assessment involves laborious procedures and data analysis of sediment samples. To address this challenge, we propose a deep-learning-based model that simplifies the heavy metal assessment process. Our model tackles the issue of data scarcity in the water-sediment domain, which is traditionally plagued by challenges in data collection and varying standards across nations. By leveraging transfer learning, we develop an accurate quantitative assessment method for predicting PLI. Our approach allows the transfer of learned features across domains with different sets of features. We evaluate our model using data from six major ports in New South Wales, Australia: Port Yamba, Port Newcastle, Port Jackson, Port Botany, Port Kembla, and Port Eden. The results demonstrate significantly lower Mean Absolute Error (MAE) and Mean Absolute Percentage Error (MAPE) of approximately 0.5 and 0.03, respectively, compared to other models. Our model performance is up to 2 orders of magnitude than other baseline models. Our proposed model offers an innovative, accessible, and cost-effective approach to predicting water quality, benefiting marine life conservation, aquaculture, and industrial pollution monitoring.

nan


Article 1084

Title@2025-06-27 (5): Learning to Solve Multi-Objective Routing Problems on Multigraphs

Title: Learning to Solve Multi-Objective Routing Problems on Multigraphs Mehrzielige Routing-Probleme auf Multigraphen lösen lernen 学习解答多目标多路由问题 2506.22095v1

Authors (5): Filip Rydin, Attila Lischka, Jiaming Wu, Morteza Haghir Chehreghani, Balázs Kulcsár

Learning-based methods for routing have gained significant attention in recent years, both in single-objective and multi-objective contexts. However, the multigraph setting, where multiple paths with distinct attributes can exist between destinations, has largely been overlooked, despite its high practical relevancy. In this paper, we introduce two neural approaches to address multi-objective routing on multigraphs. Our first approach works directly on the multigraph, by autoregressively selecting edges until a tour is completed. On the other hand, our second model first prunes the multigraph into a simple graph and then builds routes. We validate both models experimentally and find that they demonstrate strong performance across a variety of problems, including the Traveling Salesman Problem (TSP) and Capacitated Vehicle Routing Problem (CVRP).

nan


Article 1085

Title@2025-06-27 (5): SONG: Self-Organizing Neural Graphs

Title: SONG: Self-Organizing Neural Graphs SONG: Selbstorganisierende Neuralgraphen SONG: 自组织神经图 2107.13214v2

Authors (5): Łukasz Struski, Tomasz Danel, Marek Śmieja, Jacek Tabor, Bartosz Zieliński

Recent years have seen a surge in research on deep interpretable neural networks with decision trees as one of the most commonly incorporated tools. There are at least three advantages of using decision trees over logistic regression classification models: they are easy to interpret since they are based on binary decisions, they can make decisions faster, and they provide a hierarchy of classes. However, one of the well-known drawbacks of decision trees, as compared to decision graphs, is that decision trees cannot reuse the decision nodes. Nevertheless, decision graphs were not commonly used in deep learning due to the lack of efficient gradient-based training techniques. In this paper, we fill this gap and provide a general paradigm based on Markov processes, which allows for efficient training of the special type of decision graphs, which we call Self-Organizing Neural Graphs (SONG). We provide an extensive theoretical study of SONG, complemented by experiments conducted on Letter, Connect4, MNIST, CIFAR, and TinyImageNet datasets, showing that our method performs on par or better than existing decision models.

nan


Article 1086

Title@2025-06-27 (5): Transformers are Graph Neural Networks

Title: Transformers are Graph Neural Networks Transformatoren sind Graphen-Neural-Netzwerke 变形器是图形神经网络 2506.22084v1

Authors (1): Chaitanya K. Joshi

We establish connections between the Transformer architecture, originally introduced for natural language processing, and Graph Neural Networks (GNNs) for representation learning on graphs. We show how Transformers can be viewed as message passing GNNs operating on fully connected graphs of tokens, where the self-attention mechanism capture the relative importance of all tokens w.r.t. each-other, and positional encodings provide hints about sequential ordering or structure. Thus, Transformers are expressive set processing networks that learn relationships among input elements without being constrained by apriori graphs. Despite this mathematical connection to GNNs, Transformers are implemented via dense matrix operations that are significantly more efficient on modern hardware than sparse message passing. This leads to the perspective that Transformers are GNNs currently winning the hardware lottery.

nan


Article 1087

Title@2025-06-27 (5): crypto price prediction using lstm+xgboost

Title: crypto price prediction using lstm+xgboost Krypto-Preisvorhersage mit lstm+xgboost 使用Istm+xgboust的加密价格预测 2506.22055v1

Authors (1): Mehul Gautam

The volatility and complex dynamics of cryptocurrency markets present unique challenges for accurate price forecasting. This research proposes a hybrid deep learning and machine learning model that integrates Long Short-Term Memory (LSTM) networks and Extreme Gradient Boosting (XGBoost) for cryptocurrency price prediction. The LSTM component captures temporal dependencies in historical price data, while XGBoost enhances prediction by modeling nonlinear relationships with auxiliary features such as sentiment scores and macroeconomic indicators. The model is evaluated on historical datasets of Bitcoin, Ethereum, Dogecoin, and Litecoin, incorporating both global and localized exchange data. Comparative analysis using Mean Absolute Percentage Error (MAPE) and Min-Max Normalized Root Mean Square Error (MinMax RMSE) demonstrates that the LSTM+XGBoost hybrid consistently outperforms standalone models and traditional forecasting methods. This study underscores the potential of hybrid architectures in financial forecasting and provides insights into model adaptability across different cryptocurrencies and market contexts.

nan


Article 1088

Title@2025-06-27 (5): Forecasting the future development in quality and value of professional football players

Title: Forecasting the future development in quality and value of professional football players Prognose der zukünftigen Entwicklung in Qualität und Wert von Profi-Fußballspielern 预测专业足球运动员的质量和价值的未来发展 2502.07528v2

Authors (3): Koen W. van Arem, Floris Goes-Smit, Jakob Söhl

Transfers in professional football (soccer) are risky investments because of the large transfer fees and high risks involved. Although data-driven models can be used to improve transfer decisions, existing models focus on describing players’ historical progress, leaving their future performance unknown. Moreover, recent developments have called for the use of explainable models combined with uncertainty quantification of predictions. This paper assesses explainable machine learning models based on predictive accuracy and uncertainty quantification methods for the prediction of the future development in quality and transfer value of professional football players. The predictive accuracy is studied by training the models to predict the quality and value of players one year ahead. This is carried out by training them on two data sets containing data-driven indicators describing the player quality and player value in historical settings. In general, the random forest model is found to be the most suitable model because it provides accurate predictions as well as an uncertainty quantification method that naturally arises from the bagging procedure of the random forest model. Additionally, this research shows that the development of player performance contains nonlinear patterns and interactions between variables, and that time series information can provide useful information for the modeling of player performance metrics. The resulting models can help football clubs make more informed, data-driven transfer decisions by forecasting player quality and transfer value.

nan


Article 1089

Title@2025-06-27 (5): UniCA: Adapting Time Series Foundation Model to General Covariate-Aware Forecasting

Title: UniCA: Adapting Time Series Foundation Model to General Covariate-Aware Forecasting UniCA: Anpassung des Zeitreihen-Basismodells an die allgemeine Kovariate-Vorhersage UICCA:调整时间序列基础模型以适应通用共变软件预测 2506.22039v1

Authors (11): Lu Han, Yu Liu, Qiwen Deng, Jian Jiang, Yinbo Sun, Zhe Yu, Binfeng Wang, Xingyu Lu, Lintao Ma, Han-Jia Ye, De-Chuan Zhan

Time Series Foundation Models (TSFMs) have achieved remarkable success through large-scale pretraining. However, their design primarily targets real-valued series, limiting their ability to handle general forecasting tasks involving diverse and often heterogeneous covariates–such as categorical variables and multimodal data (e.g., images, text)–which are typically task-specific and difficult to leverage during pretraining. To address this gap, we propose Unified Covariate Adaptation (UniCA), a framework to bridge TSFMs with general covariate-aware forecasting. UniCA first performs covariate homogenization to transform heterogeneous covariates into high-level homogeneous series representations and then fuses them via a unified attention-based fusion mechanism. UniCA is compatible and universal for adaptation with both homogeneous and heterogeneous covariates, incorporating extra covariate information while preserving the generalization ability of TSFMs.Extensive experiments on multiple unimodal and multimodal covariate-aware forecasting benchmarks demonstrate the superiority of UniCA, highlighting the promise of covariate-aware TSFM adaptation in real-world forecasting scenarios. Codes are released on https://github.com/hanlu-nju/UniCA.

nan


Article 1090

Title@2025-06-27 (5): Hyper-modal Imputation Diffusion Embedding with Dual-Distillation for Federated Multimodal Knowledge Graph Completion

Title: Hyper-modal Imputation Diffusion Embedding with Dual-Distillation for Federated Multimodal Knowledge Graph Completion Hypermodale Imputation Diffusion Einbettung mit Dual-Destillation für Federated Multimodal Knowledge Graph Completion 为完成联邦多模式知识图而与双蒸馏相结合的超时热量光化扩散 2506.22036v1

Authors (8): Ying Zhang, Yu Zhao, Xuhui Sui, Baohang Zhou, Xiangrui Cai, Li Shen, Xiaojie Yuan, Dacheng Tao

With the increasing multimodal knowledge privatization requirements, multimodal knowledge graphs in different institutes are usually decentralized, lacking of effective collaboration system with both stronger reasoning ability and transmission safety guarantees. In this paper, we propose the Federated Multimodal Knowledge Graph Completion (FedMKGC) task, aiming at training over federated MKGs for better predicting the missing links in clients without sharing sensitive knowledge. We propose a framework named MMFeD3-HidE for addressing multimodal uncertain unavailability and multimodal client heterogeneity challenges of FedMKGC. (1) Inside the clients, our proposed Hyper-modal Imputation Diffusion Embedding model (HidE) recovers the complete multimodal distributions from incomplete entity embeddings constrained by available modalities. (2) Among clients, our proposed Multimodal FeDerated Dual Distillation (MMFeD3) transfers knowledge mutually between clients and the server with logit and feature distillation to improve both global convergence and semantic consistency. We propose a FedMKGC benchmark for a comprehensive evaluation, consisting of a general FedMKGC backbone named MMFedE, datasets with heterogeneous multimodal information, and three groups of constructed baselines. Experiments conducted on our benchmark validate the effectiveness, semantic consistency, and convergence robustness of MMFeD3-HidE.

nan


Article 1091

Title@2025-06-27 (5): CAPM: Fast and Robust Verification on Maxpool-based CNN via Dual Network

Title: CAPM: Fast and Robust Verification on Maxpool-based CNN via Dual Network CAPM: Schnelle und robuste Überprüfung auf Maxpool-basiertem CNN über Dual Network CAPM:通过双重网络对Maxpool的有线电视新闻网进行快速和有力的核查 2407.09550v3

Authors (5): Jia-Hau Bai, Chi-Ting Liu, Yu Wang, Fu-Chieh Chang, Pei-Yuan Wu

This study uses CAPM (Convex Adversarial Polytope for Maxpool-based CNN) to improve the verified bound for general purpose maxpool-based convolutional neural networks (CNNs) under bounded norm adversarial perturbations. The maxpool function is decomposed as a series of ReLU functions to extend the convex relaxation technique to maxpool functions, by which the verified bound can be efficiently computed through a dual network. The experimental results demonstrate that this technique allows the state-of-the-art verification precision for maxpool-based CNNs and involves a much lower computational cost than current verification methods, such as DeepZ, DeepPoly and PRIMA. This method is also applicable to large-scale CNNs, which previous studies show to be often computationally prohibitively expensive. Under certain circumstances, CAPM is 40-times, 20-times or twice as fast and give a significantly higher verification bound (CAPM 98% vs. PRIMA 76%/DeepPoly 73%/DeepZ 8%) as compared to PRIMA/DeepPoly/DeepZ. Furthermore, we additionally present the time complexity of our algorithm as $O(W^2NK)$, where $W$ is the maximum width of the neural network, $N$ is the number of neurons, and $K$ is the size of the maxpool layer’s kernel.

nan


Article 1092

Title@2025-06-27 (5): Learning Data-Driven Uncertainty Set Partitions for Robust and Adaptive Energy Forecasting with Missing Data

Title: Learning Data-Driven Uncertainty Set Partitions for Robust and Adaptive Energy Forecasting with Missing Data Learning Data-Driven Uncertainty Set Partitionen für robuste und adaptive Energieprognosen mit fehlenden Daten 缺少数据的强力和适应性能源预测的学习数据不确定性定出分区 2503.20410v2

Authors (2): Akylas Stratigakos, Panagiotis Andrianesis

Short-term forecasting models typically assume the availability of input data (features) when they are deployed and in use. However, equipment failures, disruptions, cyberattacks, may lead to missing features when such models are used operationally, which could negatively affect forecast accuracy, and result in suboptimal operational decisions. In this paper, we use adaptive robust optimization and adversarial machine learning to develop forecasting models that seamlessly handle missing data operationally. We propose linear- and neural network-based forecasting models with parameters that adapt to available features, combining linear adaptation with a novel algorithm for learning data-driven uncertainty set partitions. The proposed adaptive models do not rely on identifying historical missing data patterns and are suitable for real-time operations under stringent time constraints. Extensive numerical experiments on short-term wind power forecasting considering horizons from 15 minutes to 4 hours ahead illustrate that our proposed adaptive models are on par with imputation when data are missing for very short periods (e.g., when only the latest measurement is missing) whereas they significantly outperform imputation when data are missing for longer periods. We further provide insights by showcasing how linear adaptation and data-driven partitions (even with a few subsets) approach the performance of the optimal, yet impractical, method of retraining for every possible realization of missing data.

nan


Article 1093

Title@2025-06-27 (5): C-Learner: Constrained Learning for Causal Inference

Title: C-Learner: Constrained Learning for Causal Inference C-Learner: Eingeschränktes Lernen für kausale Schlussfolgerung C-Learner: 控制学习以诱因推断 2405.09493v5

Authors (4): Tiffany Tianhui Cai, Yuri Fonseca, Kaiwen Hou, Hongseok Namkoong

Popular debiased estimation methods for causal inference – such as augmented inverse propensity weighting and targeted maximum likelihood estimation – enjoy desirable asymptotic properties like statistical efficiency and double robustness but they can produce unstable estimates when there is limited overlap between treatment and control, requiring additional assumptions or ad hoc adjustments in practice (e.g., truncating propensity scores). In contrast, simple plug-in estimators are stable but lack desirable asymptotic properties. We propose a novel debiasing approach that achieves the best of both worlds, producing stable plug-in estimates with desirable asymptotic properties. Our constrained learning framework solves for the best plug-in estimator under the constraint that the first-order error with respect to the plugged-in quantity is zero, and can leverage flexible model classes including neural networks and tree ensembles. In several experimental settings, including ones in which we handle text-based covariates by fine-tuning language models, our constrained learning-based estimator outperforms basic versions of one-step estimation and targeting in challenging settings with limited overlap between treatment and control, and performs similarly otherwise.

nan


Article 1094

Title@2025-06-27 (5): Distilling the Unknown to Unveil Certainty

Title: Distilling the Unknown to Unveil Certainty Destillieren des Unbekannten zur Enthüllung der Gewissheit 蒸馏未知至 Unveil 确定性 2311.07975v3

Authors (5): Zhilin Zhao, Longbing Cao, Yixuan Zhang, Kun-Yu Lin, Wei-Shi Zheng

Out-of-distribution (OOD) detection is critical for identifying test samples that deviate from in-distribution (ID) data, ensuring network robustness and reliability. This paper presents a flexible framework for OOD knowledge distillation that extracts OOD-sensitive information from a network to develop a binary classifier capable of distinguishing between ID and OOD samples in both scenarios, with and without access to training ID data. To accomplish this, we introduce Confidence Amendment (CA), an innovative methodology that transforms an OOD sample into an ID one while progressively amending prediction confidence derived from the network to enhance OOD sensitivity. This approach enables the simultaneous synthesis of both ID and OOD samples, each accompanied by an adjusted prediction confidence, thereby facilitating the training of a binary classifier sensitive to OOD. Theoretical analysis provides bounds on the generalization error of the binary classifier, demonstrating the pivotal role of confidence amendment in enhancing OOD sensitivity. Extensive experiments spanning various datasets and network architectures confirm the efficacy of the proposed method in detecting OOD samples.

nan


Article 1095

Title@2025-06-27 (5): TROFI: Trajectory-Ranked Offline Inverse Reinforcement Learning

Title: TROFI: Trajectory-Ranked Offline Inverse Reinforcement Learning TROFI: Flugbahngestütztes Offline-Inversum-Verstärkung-Lernen TROFI: 轨迹 - 脱线反向强化学习 2506.22008v1

Authors (5): Alessandro Sestini, Joakim Bergdahl, Konrad Tollmar, Andrew D. Bagdanov, Linus Gisslén

In offline reinforcement learning, agents are trained using only a fixed set of stored transitions derived from a source policy. However, this requires that the dataset be labeled by a reward function. In applied settings such as video game development, the availability of the reward function is not always guaranteed. This paper proposes Trajectory-Ranked OFfline Inverse reinforcement learning (TROFI), a novel approach to effectively learn a policy offline without a pre-defined reward function. TROFI first learns a reward function from human preferences, which it then uses to label the original dataset making it usable for training the policy. In contrast to other approaches, our method does not require optimal trajectories. Through experiments on the D4RL benchmark we demonstrate that TROFI consistently outperforms baselines and performs comparably to using the ground truth reward to learn policies. Additionally, we validate the efficacy of our method in a 3D game environment. Our studies of the reward model highlight the importance of the reward function in this setting: we show that to ensure the alignment of a value function to the actual future discounted reward, it is fundamental to have a well-engineered and easy-to-learn reward function.

nan


Article 1096

Title@2025-06-27 (5): GKNet: Graph Kalman Filtering and Model Inference via Model-based Deep Learning

Title: GKNet: Graph Kalman Filtering and Model Inference via Model-based Deep Learning GKNet: Graph Kalman Filterung und Modellableitung über modellbasiertes Deep Learning GKNet:通过基于模型的深层学习进行图表卡路曼过滤和模型推断 2506.22004v1

Authors (3): Mohammad Sabbaqi, Riccardo Taormina, Elvin Isufi

Inference tasks with time series over graphs are of importance in applications such as urban water networks, economics, and networked neuroscience. Addressing these tasks typically relies on identifying a computationally affordable model that jointly captures the graph-temporal patterns of the data. In this work, we propose a graph-aware state space model for graph time series, where both the latent state and the observation equation are parametric graph-induced models with a limited number of parameters that need to be learned. More specifically, we consider the state equation to follow a stochastic partial differential equation driven by noise over the graphs edges accounting not only for potential edge uncertainties but also for increasing the degrees of freedom in the latter in a tractable manner. The graph structure conditioning of the noise dispersion allows the state variable to deviate from the stochastic process in certain neighborhoods. The observation model is a sampled and graph-filtered version of the state capturing multi-hop neighboring influence. The goal is to learn the parameters in both state and observation models from the partially observed data for downstream tasks such as prediction and imputation. The model is inferred first through a maximum likelihood approach that provides theoretical tractability but is limited in expressivity and scalability. To improve on the latter, we use the state-space formulation to build a principled deep learning architecture that jointly learns the parameters and tracks the state in an end-to-end manner in the spirit of Kalman neural networks.

nan


Article 1097

Title@2025-06-27 (5): Generative adversarial neural networks for simulating neutrino interactions

Title: Generative adversarial neural networks for simulating neutrino interactions Generative adversariale neuronale Netzwerke zur Simulation von Neutrino-Interaktionen 产生模拟中微微子相互作用的对抗性神经网络 2502.20244v2

Authors (7): Jose L. Bonilla, Krzysztof M. Graczyk, Artur M. Ankowski, Rwik Dharmapal Banerjee, Beata E. Kowal, Hemant Prasad, Jan T. Sobczyk

We propose a new approach to simulate neutrino scattering events as an alternative to the standard Monte Carlo generator approach. Generative adversarial neural network (GAN) models are developed to simulate charged current neutrino-carbon collisions in the few-GeV energy range. We consider a simplified framework to generate muon kinematic variables, specifically its energy and scattering angle. GAN models are trained on simulation data from \nuwro{} Monte Carlo event generator. Two GAN models have been obtained: one simulating quasielastic neutrino-nucleus scatterings and another simulating all interactions at given neutrino energy. The models work for neutrino energy ranging from 300 MeV to 10 GeV. The performance of both models has been assessed using two statistical metrics. It is shown that both GAN models successfully reproduce the distribution of muon kinematics.

nan


Article 1098

Title@2025-06-27 (5): Time series classification with random convolution kernels: pooling operators and input representations matter

Title: Time series classification with random convolution kernels: pooling operators and input representations matter Zeitreihenklassifizierung mit zufälligen Konvolutionskernen: Bündelung von Operatoren und Eingabedarstellungen 随机卷变内核的时间序列分类:集合操作员和输入代表事项 2409.01115v4

Authors (5): Mouhamadou Mansour Lo, Gildas Morvan, Mathieu Rossi, Fabrice Morganti, David Mercier

This article presents a new approach based on MiniRocket, called SelF-Rocket, for fast time series classification (TSC). Unlike existing approaches based on random convolution kernels, it dynamically selects the best couple of input representations and pooling operator during the training process. SelF-Rocket achieves state-of-the-art accuracy on the University of California Riverside (UCR) TSC benchmark datasets.

nan


Article 1099

Title@2025-06-27 (5): Federated Data-Efficient Instruction Tuning for Large Language Models

Title: Federated Data-Efficient Instruction Tuning for Large Language Models Federated Data-Efficient Instruction Tuning für große Sprachmodelle 大语言模式联邦数据效率指示图示 2410.10926v2

Authors (4): Zhen Qin, Zhaomin Wu, Bingsheng He, Shuiguang Deng

Instruction tuning is a crucial step in improving the responsiveness of pretrained large language models (LLMs) to human instructions. Federated learning (FL) helps to exploit the use of vast private instruction data from clients, becoming popular for LLM tuning by improving data diversity. Existing federated tuning simply consumes all local data, causing excessive computational overhead and overfitting to local data, while centralized data-efficient solutions are not suitable for FL due to privacy concerns. This work presents FedHDS, a federated data-efficient instruction tuning approach, which tunes LLMs with a representative subset of edge-side data. It reduces the data redundancy at both intra- and inter-client levels without sharing raw data. Experiments with various LLMs, datasets and partitions show that FedHDS improves Rouge-L on unseen tasks by an average of 10.72% over the SOTA full-data federated instruction tuning methods, while using less than 1.5% of the data samples, improving training efficiency by up to tens of times.

nan


Article 1100

Title@2025-06-27 (5): Analyzing and Fine-Tuning Whisper Models for Multilingual Pilot Speech Transcription in the Cockpit

Title: Analyzing and Fine-Tuning Whisper Models for Multilingual Pilot Speech Transcription in the Cockpit Analysieren und Feintuning-Flüsternmodelle für mehrsprachige Pilot-Sprachtranskription im Cockpit 分析并精精精细调校车舱多语种试验性语音翻译多语种试听模式 2506.21990v1

Authors (3): Kartheek Kumar Reddy Nareddy, Sarah Ternus, Julia Niebling

The developments in transformer encoder-decoder architectures have led to significant breakthroughs in machine translation, Automatic Speech Recognition (ASR), and instruction-based chat machines, among other applications. The pre-trained models were trained on vast amounts of generic data over a few epochs (fewer than five in most cases), resulting in their strong generalization capabilities. Nevertheless, the performance of these models does suffer when applied to niche domains like transcribing pilot speech in the cockpit, which involves a lot of specific vocabulary and multilingual conversations. This paper investigates and improves the transcription accuracy of cockpit conversations with Whisper models. We have collected around 85 minutes of cockpit simulator recordings and 130 minutes of interview recordings with pilots and manually labeled them. The speakers are middle aged men speaking both German and English. To improve the accuracy of transcriptions, we propose multiple normalization schemes to refine the transcripts and improve Word Error Rate (WER). We then employ fine-tuning to enhance ASR performance, utilizing performance-efficient fine-tuning with Low-Rank Adaptation (LoRA). Hereby, WER decreased from 68.49 \% (pretrained whisper Large model without normalization baseline) to 26.26\% (finetuned whisper Large model with the proposed normalization scheme).

nan


Article 1101

Title@2025-06-27 (5): The Mamba in the Llama: Distilling and Accelerating Hybrid Models

Title: The Mamba in the Llama: Distilling and Accelerating Hybrid Models Der Mamba im Llama: Destillieren und Beschleunigen von Hybridmodellen Llama的Mamba:蒸馏和加速混合模型 2408.15237v4

Authors (5): Junxiong Wang, Daniele Paliotta, Avner May, Alexander M. Rush, Tri Dao

Linear RNN architectures, like Mamba, can be competitive with Transformer models in language modeling while having advantageous deployment characteristics. Given the focus on training large-scale Transformer models, we consider the challenge of converting these pretrained models for deployment. We demonstrate that it is feasible to distill large Transformers into linear RNNs by reusing the linear projection weights from attention layers with academic GPU resources. The resulting hybrid model, which incorporates a quarter of the attention layers, achieves performance comparable to the original Transformer in chat benchmarks and outperforms open-source hybrid Mamba models trained from scratch with trillions of tokens in both chat benchmarks and general benchmarks. Moreover, we introduce a hardware-aware speculative decoding algorithm that accelerates the inference speed of Mamba and hybrid models. Overall we show how, with limited computation resources, we can remove many of the original attention layers and generate from the resulting model more efficiently. Our top-performing model, distilled from Llama3-8B-Instruct, achieves a 29.61 length-controlled win rate on AlpacaEval 2 against GPT-4 and 7.35 on MT-Bench, surpassing the best 8B scale instruction-tuned linear RNN model. We also find that the distilled model has natural length extrapolation, showing almost perfect accuracy in the needle-in-a-haystack test at 20x the distillation length. Code and pre-trained checkpoints are open-sourced at https://github.com/jxiw/MambaInLlama and https://github.com/itsdaniele/speculative_mamba.

nan


Article 1102

Title@2025-06-27 (5): BeamLLM: Vision-Empowered mmWave Beam Prediction with Large Language Models

Title: BeamLLM: Vision-Empowered mmWave Beam Prediction with Large Language Models BeamLLM: Vision-Empowered mmWave Beam Prediction mit großen Sprachmodellen BeamLLM: 具有大语言模型的视觉-电子动力毫米 2503.10432v2

Authors (5): Can Zheng, Jiguang He, Guofa Cai, Zitong Yu, Chung G. Kang

In this paper, we propose BeamLLM, a vision-aided millimeter-wave (mmWave) beam prediction framework leveraging large language models (LLMs) to address the challenges of high training overhead and latency in mmWave communication systems. By combining computer vision (CV) with LLMs’ cross-modal reasoning capabilities, the framework extracts user equipment (UE) positional features from RGB images and aligns visual-temporal features with LLMs’ semantic space through reprogramming techniques. Evaluated on a realistic vehicle-to-infrastructure (V2I) scenario, the proposed method achieves 61.01% top-1 accuracy and 97.39% top-3 accuracy in standard prediction tasks, significantly outperforming traditional deep learning models. In few-shot prediction scenarios, the performance degradation is limited to 12.56% (top-1) and 5.55% (top-3) from time sample 1 to 10, demonstrating superior prediction capability.

nan


Article 1103

Title@2025-06-27 (5): Distributional Reduction: Unifying Dimensionality Reduction and Clustering with Gromov-Wasserstein

Title: Distributional Reduction: Unifying Dimensionality Reduction and Clustering with Gromov-Wasserstein Verteilungsreduktion: Vereinheitlichende Dimensionalitätsreduktion und Clustering mit Gromov-Wasserstein 减少分配:与格罗莫夫-瓦塞尔斯坦合并统一减少多维度 2402.02239v3

Authors (6): Hugues Van Assel, Cédric Vincent-Cuaz, Nicolas Courty, Rémi Flamary, Pascal Frossard, Titouan Vayer

Unsupervised learning aims to capture the underlying structure of potentially large and high-dimensional datasets. Traditionally, this involves using dimensionality reduction (DR) methods to project data onto lower-dimensional spaces or organizing points into meaningful clusters (clustering). In this work, we revisit these approaches under the lens of optimal transport and exhibit relationships with the Gromov-Wasserstein problem. This unveils a new general framework, called distributional reduction, that recovers DR and clustering as special cases and allows addressing them jointly within a single optimization problem. We empirically demonstrate its relevance to the identification of low-dimensional prototypes representing data at different scales, across multiple image and genomic datasets.

nan


Article 1104

Title@2025-06-27 (5): Green LIME: Improving AI Explainability through Design of Experiments

Title: Green LIME: Improving AI Explainability through Design of Experiments Green LIME: Verbesserung der KI-Erklärbarkeit durch Design von Experimenten 绿色LIME:通过设计实验改进AI可解释性 2502.12753v2

Authors (3): Alexandra Stadler, Werner G. Müller, Radoslav Harman

In artificial intelligence (AI), the complexity of many models and processes surpasses human understanding, making it challenging to determine why a specific prediction is made. This lack of transparency is particularly problematic in critical fields like healthcare, where trust in a model’s predictions is paramount. As a result, the explainability of machine learning (ML) and other complex models has become a key area of focus. Efforts to improve model explainability often involve experimenting with AI systems and approximating their behavior through interpretable surrogate mechanisms. However, these procedures can be resource-intensive. Optimal design of experiments, which seeks to maximize the information obtained from a limited number of observations, offers promising methods for improving the efficiency of these explainability techniques. To demonstrate this potential, we explore Local Interpretable Model-agnostic Explanations (LIME), a widely used method introduced by Ribeiro et al. (2016). LIME provides explanations by generating new data points near the instance of interest and passing them through the model. While effective, this process can be computationally expensive, especially when predictions are costly or require many samples. LIME is highly versatile and can be applied to a wide range of models and datasets. In this work, we focus on models involving tabular data, regression tasks, and linear models as interpretable local approximations. By utilizing optimal design of experiments’ techniques, we reduce the number of function evaluations of the complex model, thereby reducing the computational effort of LIME by a significant amount. We consider this modified version of LIME to be energy-efficient or “green”.

nan


Article 1105

Title@2025-06-27 (5): A Survey on Federated Fine-tuning of Large Language Models

Title: A Survey on Federated Fine-tuning of Large Language Models Eine Umfrage über Federated Fine-Tuning von großen Sprachmodellen 大语言模式联邦微调调查 2503.12016v2

Authors (10): Yebo Wu, Chunlin Tian, Jingguang Li, He Sun, Kahou Tam, Zhanting Zhou, Haicheng Liao, Zhijiang Guo, Li Li, Chengzhong Xu

Large Language Models (LLMs) have demonstrated impressive success across various tasks. Integrating LLMs with Federated Learning (FL), a paradigm known as FedLLM, offers a promising avenue for collaborative model adaptation while preserving data privacy. This survey provides a systematic and comprehensive review of FedLLM. We begin by tracing the historical development of both LLMs and FL, summarizing relevant prior research to set the context. Subsequently, we delve into an in-depth analysis of the fundamental challenges inherent in deploying FedLLM. Addressing these challenges often requires efficient adaptation strategies; therefore, we conduct an extensive examination of existing Parameter-Efficient Fine-tuning (PEFT) methods and explore their applicability within the FL framework. To rigorously evaluate the performance of FedLLM, we undertake a thorough review of existing fine-tuning datasets and evaluation benchmarks. Furthermore, we discuss FedLLM’s diverse real-world applications across multiple domains. Finally, we identify critical open challenges and outline promising research directions to foster future advancements in FedLLM. This survey aims to serve as a foundational resource for researchers and practitioners, offering valuable insights into the rapidly evolving landscape of federated fine-tuning for LLMs. It also establishes a roadmap for future innovations in privacy-preserving AI. We actively maintain a GitHub repo \href{https://github.com/Clin0212/Awesome-Federated-LLM-Learning}{https://github.com/Clin0212/Awesome-Federated-LLM-Learning} to track cutting-edge advancements in this field.

nan


Article 1106

Title@2025-06-27 (5): Spectraformer: A Unified Random Feature Framework for Transformer

Title: Spectraformer: A Unified Random Feature Framework for Transformer Spektraformer: Ein einheitliches Random Feature Framework für Transformer 插件: 变形器统一随机地物框架 2405.15310v4

Authors (4): Duke Nguyen, Du Yin, Aditya Joshi, Flora Salim

Linearization of attention using various kernel approximation and kernel learning techniques has shown promise. Past methods used a subset of combinations of component functions and weight matrices within the random feature paradigm. We identify the need for a systematic comparison of different combinations of weight matrices and component functions for attention learning in Transformer. Hence, we introduce Spectraformer, a unified framework for approximating and learning the kernel function in the attention mechanism of the Transformer. Our empirical results demonstrate, for the first time, that a random feature-based approach can achieve performance comparable to top-performing sparse and low-rank methods on the challenging Long Range Arena benchmark. Thus, we establish a new state-of-the-art for random feature-based efficient Transformers. The framework also produces many variants that offer different advantages in accuracy, training time, and memory consumption. Our code is available at: https://github.com/cruiseresearchgroup/spectraformer .

nan


Article 1107

Title@2025-06-27 (5): SceneDiffuser++: City-Scale Traffic Simulation via a Generative World Model

Title: SceneDiffuser++: City-Scale Traffic Simulation via a Generative World Model SceneDiffuser++: City-Scale Verkehrssimulation über ein Generatives Weltmodell 景点Diffuser++:通过创世模式的城市规模交通量模拟 2506.21976v1

Authors (9): Shuhan Tan, John Lambert, Hong Jeon, Sakshum Kulshrestha, Yijing Bai, Jing Luo, Dragomir Anguelov, Mingxing Tan, Chiyu Max Jiang

The goal of traffic simulation is to augment a potentially limited amount of manually-driven miles that is available for testing and validation, with a much larger amount of simulated synthetic miles. The culmination of this vision would be a generative simulated city, where given a map of the city and an autonomous vehicle (AV) software stack, the simulator can seamlessly simulate the trip from point A to point B by populating the city around the AV and controlling all aspects of the scene, from animating the dynamic agents (e.g., vehicles, pedestrians) to controlling the traffic light states. We refer to this vision as CitySim, which requires an agglomeration of simulation technologies: scene generation to populate the initial scene, agent behavior modeling to animate the scene, occlusion reasoning, dynamic scene generation to seamlessly spawn and remove agents, and environment simulation for factors such as traffic lights. While some key technologies have been separately studied in various works, others such as dynamic scene generation and environment simulation have received less attention in the research community. We propose SceneDiffuser++, the first end-to-end generative world model trained on a single loss function capable of point A-to-B simulation on a city scale integrating all the requirements above. We demonstrate the city-scale traffic simulation capability of SceneDiffuser++ and study its superior realism under long simulation conditions. We evaluate the simulation quality on an augmented version of the Waymo Open Motion Dataset (WOMD) with larger map regions to support trip-level simulation.

nan


Article 1108

Title@2025-06-27 (5): Advancing Jailbreak Strategies: A Hybrid Approach to Exploiting LLM Vulnerabilities and Bypassing Modern Defenses

Title: Advancing Jailbreak Strategies: A Hybrid Approach to Exploiting LLM Vulnerabilities and Bypassing Modern Defenses Verbesserung der Strategien des Jailbreaks: Ein hybrider Ansatz, um LLM-Verletzungen auszunutzen und moderne Verteidigungen zu umgehen 推进破牢战略:利用LLM脆弱性和绕过现代防御的混合办法 2506.21972v1

Authors (6): Mohamed Ahmed, Mohamed Abdelmouty, Mingyu Kim, Gunvanth Kandula, Alex Park, James C. Davis

The advancement of Pre-Trained Language Models (PTLMs) and Large Language Models (LLMs) has led to their widespread adoption across diverse applications. Despite their success, these models remain vulnerable to attacks that exploit their inherent weaknesses to bypass safety measures. Two primary inference-phase threats are token-level and prompt-level jailbreaks. Token-level attacks embed adversarial sequences that transfer well to black-box models like GPT but leave detectable patterns and rely on gradient-based token optimization, whereas prompt-level attacks use semantically structured inputs to elicit harmful responses yet depend on iterative feedback that can be unreliable. To address the complementary limitations of these methods, we propose two hybrid approaches that integrate token- and prompt-level techniques to enhance jailbreak effectiveness across diverse PTLMs. GCG + PAIR and the newly explored GCG + WordGame hybrids were evaluated across multiple Vicuna and Llama models. GCG + PAIR consistently raised attack-success rates over its constituent techniques on undefended models; for instance, on Llama-3, its Attack Success Rate (ASR) reached 91.6%, a substantial increase from PAIR’s 58.4% baseline. Meanwhile, GCG + WordGame matched the raw performance of WordGame maintaining a high ASR of over 80% even under stricter evaluators like Mistral-Sorry-Bench. Crucially, both hybrids retained transferability and reliably pierced advanced defenses such as Gradient Cuff and JBShield, which fully blocked single-mode attacks. These findings expose previously unreported vulnerabilities in current safety stacks, highlight trade-offs between raw success and defensive robustness, and underscore the need for holistic safeguards against adaptive adversaries.

nan


Article 1109

Title@2025-06-27 (5): Mitigating Metropolitan Carbon Emissions with Dynamic Eco-driving at Scale

Title: Mitigating Metropolitan Carbon Emissions with Dynamic Eco-driving at Scale Mit dem dynamischen Öko-Fahren im Maßstab die Emissionen von Metropolitankohlenstoff mindern 减缓城市碳排放,在规模上进行动态生态驾驶 2408.05609v2

Authors (9): Vindula Jayawardana, Baptiste Freydt, Ao Qu, Cameron Hickert, Edgar Sanchez, Catherine Tang, Mark Taylor, Blaine Leonard, Cathy Wu

The sheer scale and diversity of transportation make it a formidable sector to decarbonize. Here, we consider an emerging opportunity to reduce carbon emissions: the growing adoption of semi-autonomous vehicles, which can be programmed to mitigate stop-and-go traffic through intelligent speed commands and, thus, reduce emissions. But would such dynamic eco-driving move the needle on climate change? A comprehensive impact analysis has been out of reach due to the vast array of traffic scenarios and the complexity of vehicle emissions. We address this challenge with large-scale scenario modeling efforts and by using multi-task deep reinforcement learning with a carefully designed network decomposition strategy. We perform an in-depth prospective impact assessment of dynamic eco-driving at 6,011 signalized intersections across three major US metropolitan cities, simulating a million traffic scenarios. Overall, we find that vehicle trajectories optimized for emissions can cut city-wide intersection carbon emissions by 11-22%, without harming throughput or safety, and with reasonable assumptions, equivalent to the national emissions of Israel and Nigeria, respectively. We find that 10% eco-driving adoption yields 25%-50% of the total reduction, and nearly 70% of the benefits come from 20% of intersections, suggesting near-term implementation pathways. However, the composition of this high-impact subset of intersections varies considerably across different adoption levels, with minimal overlap, calling for careful strategic planning for eco-driving deployments. Moreover, the impact of eco-driving, when considered jointly with projections of vehicle electrification and hybrid vehicle adoption remains significant. More broadly, this work paves the way for large-scale analysis of traffic externalities, such as time, safety, and air quality, and the potential impact of solution strategies.

nan


Article 1110

Title@2025-06-27 (5): More Vulnerable than You Think: On the Stability of Tool-Integrated LLM Agents

Title: More Vulnerable than You Think: On the Stability of Tool-Integrated LLM Agents Schwacher als Sie denken: Zur Stabilität von werkzeugintegrierten LLM-Agenten 比你想象的更加脆弱:关于工具集成LLM剂稳定问题 2506.21967v1

Authors (7): Weimin Xiong, Ke Wang, Yifan Song, Hanchao Liu, Sai Zhou, Wei Peng, Sujian Li

Current evaluations of tool-integrated LLM agents typically focus on end-to-end tool-usage evaluation while neglecting their stability. This limits their real-world applicability, as various internal or external factors can cause agents to crash or behave abnormally. Our research addresses this by investigating whether agents are vulnerable to errors throughout the entire tool invocation process, including reading tool documentation, selecting tools and generating parameters, and processing the tool’s response. Through extensive experiments, we observe that agents are highly susceptible to errors at each stage and agents based on open-source models are more vulnerable than those based on proprietary models. We also find that increasing the model size does not significantly improve tool invocation reasoning and may make agents more vulnerable to attacks resembling normal user instructions. This highlights the importance of evaluating agent stability and offers valuable insights for future LLM development and evaluation.

nan


Article 1111

Title@2025-06-27 (5): On the Lipschitz Continuity of Set Aggregation Functions and Neural Networks for Sets

Title: On the Lipschitz Continuity of Set Aggregation Functions and Neural Networks for Sets Über die Lipschitz Kontinuität von Set-Aggregationsfunktionen und neuralen Netzwerken für Sets 设置集合功能和神经网络的利普西茨连续性 2505.24403v2

Authors (2): Giannis Nikolentzos, Konstantinos Skianis

The Lipschitz constant of a neural network is connected to several important properties of the network such as its robustness and generalization. It is thus useful in many settings to estimate the Lipschitz constant of a model. Prior work has focused mainly on estimating the Lipschitz constant of multi-layer perceptrons and convolutional neural networks. Here we focus on data modeled as sets or multisets of vectors and on neural networks that can handle such data. These models typically apply some permutation invariant aggregation function, such as the sum, mean or max operator, to the input multisets to produce a single vector for each input sample. In this paper, we investigate whether these aggregation functions are Lipschitz continuous with respect to three distance functions for unordered multisets, and we compute their Lipschitz constants. In the general case, we find that each aggregation function is Lipschitz continuous with respect to only one of the three distance functions. Then, we build on these results to derive upper bounds on the Lipschitz constant of neural networks that can process multisets of vectors, while we also study their stability to perturbations and generalization under distribution shifts. To empirically verify our theoretical analysis, we conduct a series of experiments on datasets from different domains.

nan


Article 1112

Title@2025-06-27 (5): Optimal Return-to-Go Guided Decision Transformer for Auto-Bidding in Advertisement

Title: Optimal Return-to-Go Guided Decision Transformer for Auto-Bidding in Advertisement Optimaler Return-to-Go-Geführter Entscheidungstransformator für Auto-Bidding in der Werbung 广告中自动提交申请的优化回向向向决策变换器 2506.21956v1

Authors (8): Hao Jiang, Yongxiang Tang, Yanxiang Zeng, Pengjia Yuan, Yanhua Cheng, Teng Sha, Xialong Liu, Peng Jiang

In the realm of online advertising, advertisers partake in ad auctions to obtain advertising slots, frequently taking advantage of auto-bidding tools provided by demand-side platforms. To improve the automation of these bidding systems, we adopt generative models, namely the Decision Transformer (DT), to tackle the difficulties inherent in automated bidding. Applying the Decision Transformer to the auto-bidding task enables a unified approach to sequential modeling, which efficiently overcomes short-sightedness by capturing long-term dependencies between past bidding actions and user behavior. Nevertheless, conventional DT has certain drawbacks: (1) DT necessitates a preset return-to-go (RTG) value before generating actions, which is not inherently produced; (2) The policy learned by DT is restricted by its training data, which is consists of mixed-quality trajectories. To address these challenges, we introduce the R* Decision Transformer (R* DT), developed in a three-step process: (1) R DT: Similar to traditional DT, R DT stores actions based on state and RTG value, as well as memorizing the RTG for a given state using the training set; (2) R^ DT: We forecast the highest value (within the training set) of RTG for a given state, deriving a suboptimal policy based on the current state and the forecasted supreme RTG value; (3) R* DT: Based on R^ DT, we generate trajectories and select those with high rewards (using a simulator) to augment our training dataset. This data enhancement has been shown to improve the RTG of trajectories in the training data and gradually leads the suboptimal policy towards optimality. Comprehensive tests on a publicly available bidding dataset validate the R* DT’s efficacy and highlight its superiority when dealing with mixed-quality trajectories.

nan


Article 1113

Title@2025-06-27 (5): deCIFer: Crystal Structure Prediction from Powder Diffraction Data using Autoregressive Language Models

Title: deCIFer: Crystal Structure Prediction from Powder Diffraction Data using Autoregressive Language Models deCIFer: Crystal Structure Vorhersage von Powder Diffraction Daten mit autoregressiven Sprachmodellen deCIFer:使用自动递减语言模型从粉末碎片数据中预测水晶结构 2502.02189v3

Authors (6): Frederik Lizak Johansen, Ulrik Friis-Jensen, Erik Bjørnager Dam, Kirsten Marie Ørnsbjerg Jensen, Rocío Mercado, Raghavendra Selvan

Novel materials drive progress across applications from energy storage to electronics. Automated characterization of material structures with machine learning methods offers a promising strategy for accelerating this key step in material design. In this work, we introduce an autoregressive language model that performs crystal structure prediction (CSP) from powder diffraction data. The presented model, deCIFer, generates crystal structures in the widely used Crystallographic Information File (CIF) format and can be conditioned on powder X-ray diffraction (PXRD) data. Unlike earlier works that primarily rely on high-level descriptors like composition, deCIFer is also able to use diffraction data to perform CSP. We train deCIFer on nearly 2.3M crystal structures and validate on diverse sets of PXRD patterns for characterizing challenging inorganic crystal systems. Qualitative checks and quantitative assessments using the residual weighted profile show that deCIFer produces structures that more accurately match the target diffraction data. Notably, deCIFer can achieve a 94% match rate on test data. deCIFer bridges experimental diffraction data with computational CSP, lending itself as a powerful tool for crystal structure characterization.

nan


Article 1114

Title@2025-06-27 (5): Smooth-Distill: A Self-distillation Framework for Multitask Learning with Wearable Sensor Data

Title: Smooth-Distill: A Self-distillation Framework for Multitask Learning with Wearable Sensor Data Smooth-Destill: Ein Selbstdestillations-Framework für Multitask-Lernen mit tragbaren Sensordaten 平流蒸馏:用可穿感应器数据进行多任务学习的自我蒸馏框架 2507.00061v1

Authors (6): Hoang-Dieu Vu, Duc-Nghia Tran, Quang-Tu Pham, Hieu H. Pham, Nicolas Vuillerme, Duc-Tan Tran

This paper introduces Smooth-Distill, a novel self-distillation framework designed to simultaneously perform human activity recognition (HAR) and sensor placement detection using wearable sensor data. The proposed approach utilizes a unified CNN-based architecture, MTL-net, which processes accelerometer data and branches into two outputs for each respective task. Unlike conventional distillation methods that require separate teacher and student models, the proposed framework utilizes a smoothed, historical version of the model itself as the teacher, significantly reducing training computational overhead while maintaining performance benefits. To support this research, we developed a comprehensive accelerometer-based dataset capturing 12 distinct sleep postures across three different wearing positions, complementing two existing public datasets (MHealth and WISDM). Experimental results show that Smooth-Distill consistently outperforms alternative approaches across different evaluation scenarios, achieving notable improvements in both human activity recognition and device placement detection tasks. This method demonstrates enhanced stability in convergence patterns during training and exhibits reduced overfitting compared to traditional multitask learning baselines. This framework contributes to the practical implementation of knowledge distillation in human activity recognition systems, offering an effective solution for multitask learning with accelerometer data that balances accuracy and training efficiency. More broadly, it reduces the computational cost of model training, which is critical for scenarios requiring frequent model updates or training on resource-constrained platforms. The code and model are available at https://github.com/Kuan2vn/smooth_distill.

nan


Article 1115

Title@2025-06-27 (5): Physics-informed network paradigm with data generation and background noise removal for diverse distributed acoustic sensing applications

Title: Physics-informed network paradigm with data generation and background noise removal for diverse distributed acoustic sensing applications Physik-informiertes Netzwerk-Paradigma mit Datengenerierung und Hintergrundrauschenentfernung für vielfältige verteilte akustische Sensoranwendungen 物理知情网络范式,为分散的多种声学遥感应用收集数据和背景噪音清除 2506.21952v1

Authors (6): Yangyang Wan, Haotian Wang, Xuhui Yu, Jiageng Chen, Xinyu Fan, Zuyuan He

Distributed acoustic sensing (DAS) has attracted considerable attention across various fields and artificial intelligence (AI) technology plays an important role in DAS applications to realize event recognition and denoising. Existing AI models require real-world data (RWD), whether labeled or not, for training, which is contradictory to the fact of limited available event data in real-world scenarios. Here, a physics-informed DAS neural network paradigm is proposed, which does not need real-world events data for training. By physically modeling target events and the constraints of real world and DAS system, physical functions are derived to train a generative network for generation of DAS events data. DAS debackground net is trained by using the generated DAS events data to eliminate background noise in DAS data. The effectiveness of the proposed paradigm is verified in event identification application based on a public dataset of DAS spatiotemporal data and in belt conveyor fault monitoring application based on DAS time-frequency data, and achieved comparable or better performance than data-driven networks trained with RWD. Owing to the introduction of physical information and capability of background noise removal, the paradigm demonstrates generalization in same application on different sites. A fault diagnosis accuracy of 91.8% is achieved in belt conveyor field with networks which transferred from simulation test site without any fault events data of test site and field for training. The proposed paradigm is a prospective solution to address significant obstacles of data acquisition and intense noise in practical DAS applications and explore more potential fields for DAS.

nan


Article 1116

Title@2025-06-27 (5): Hitchhiking Rides Dataset: Two decades of crowd-sourced records on stochastic traveling

Title: Hitchhiking Rides Dataset: Two decades of crowd-sourced records on stochastic traveling Hitchhiking Rides Datensatz: Zwei Jahrzehnte von crowd-sourced Rekorde auf stochastische Reisen Hitchhiking rides数据集:20年关于随机旅行的多方来源记录 2506.21946v1

Authors (1): Till Wenke

Hitchhiking, a spontaneous and decentralized mode of travel, has long eluded systematic study due to its informal nature. This paper presents and analyzes the largest known structured dataset of hitchhiking rides, comprising over 63,000 entries collected over nearly two decades through platforms associated with hitchwiki.org and lately on hitchmap.com. By leveraging crowd-sourced contributions, the dataset captures key spatiotemporal and strategic aspects of hitchhiking. This work documents the dataset’s origins, evolution, and community-driven maintenance, highlighting its Europe-centric distribution, seasonal patterns, and reliance on a small number of highly active contributors. Through exploratory analyses, I examine waiting times, user behavior, and comment metadata, shedding light on the lived realities of hitchhikers. While the dataset has inherent biases and limitations - such as demographic skew and unverifiable entries it offers a rare and valuable window into an alternative form of mobility. I conclude by outlining future directions for enriching the dataset and advancing research on hitchhiking as both a transportation practice and cultural phenomenon.

nan


Article 1117

Title@2025-06-27 (5): Embedding-based Approaches to Hyperpartisan News Detection

Title: Embedding-based Approaches to Hyperpartisan News Detection Einbetten-basierte Ansätze zu Hyperparteien-Nachrichten-Erkennung 以嵌入式办法探测超党派新闻 2501.01370v2

Authors (2): Karthik Mohan, Pengyu Chen

In this paper, we describe our systems in which the objective is to determine whether a given news article could be considered as hyperpartisan. Hyperpartisan news is news that takes an extremely polarized political standpoint with an intention of creating political divide among the public. We attempted several approaches, including n-grams, sentiment analysis, as well as sentence and document representation using pre-tained ELMo. Our best system using pre-trained ELMo with Bidirectional LSTM achieved an accuracy of 83% through 10-fold cross-validation without much hyperparameter tuning.

nan


Article 1118

Title@2025-06-27 (5): Step-by-Step Video-to-Audio Synthesis via Negative Audio Guidance

Title: Step-by-Step Video-to-Audio Synthesis via Negative Audio Guidance Schritt für Schritt Video-zu-Audio-Synthese über Negative Audio-Anleitung 通过消极音频指导,逐步进行视频到视听合成 2506.20995v2

Authors (4): Akio Hayakawa, Masato Ishii, Takashi Shibuya, Yuki Mitsufuji

We propose a novel step-by-step video-to-audio generation method that sequentially produces individual audio tracks, each corresponding to a specific sound event in the video. Our approach mirrors traditional Foley workflows, aiming to capture all sound events induced by a given video comprehensively. Each generation step is formulated as a guided video-to-audio synthesis task, conditioned on a target text prompt and previously generated audio tracks. This design is inspired by the idea of concept negation from prior compositional generation frameworks. To enable this guided generation, we introduce a training framework that leverages pre-trained video-to-audio models and eliminates the need for specialized paired datasets, allowing training on more accessible data. Experimental results demonstrate that our method generates multiple semantically distinct audio tracks for a single input video, leading to higher-quality composite audio synthesis than existing baselines.

nan


Article 1119

Title@2025-06-27 (5): GuiderNet: A Meta-Learning Framework for Optimizing Quantum Circuit Geometry and Mitigating Barren Plateaus

Title: GuiderNet: A Meta-Learning Framework for Optimizing Quantum Circuit Geometry and Mitigating Barren Plateaus GuiderNet: Ein Meta-Learning-Framework zur Optimierung der Quantum Circuit Geometrie und zur Abmilderung von Barren Plateaus 指南Net:优化量子电路几何和减缓巴伦高原的元学习框架 2506.21940v1

Authors (2): Marwan Ait Haddou, Mohamed Bennai

Variational Quantum Algorithms (VQAs) offer potential for near-term quantum advantage but face challenges from barren plateaus, where gradients vanish, and poorly conditioned optimization landscapes. We introduce GuiderNet, a meta-learning framework that conditions Parameterized Quantum Circuits (PQCs) using data-dependent parameter shifts aimed at minimizing the log condition number of the Fubini-Study metric tensor. Implemented as a classical neural network, GuiderNet is meta-trained to guide PQC parameters into geometrically favorable regions and is embedded within hybrid quantum-classical pipelines to steer both initialization and adaptive modulation during training. Applied to the Kaggle Diabetes classification task, GuiderNet reduces cumulative training loss by over 5x, improves test accuracy from 75.3% to 98.6%, and increases the minority-class F1 score from 0.67 to 0.95. It also suppresses gradient explosion and stabilizes parameter updates, enabling smoother and more robust optimization. These results demonstrate that geometric meta-conditioning can mitigate barren plateaus and ill-conditioning, providing a scalable approach to enhance trainability and generalization in quantum machine learning.

nan


Article 1120

Title@2025-06-27 (5): HQCM-EBTC: A Hybrid Quantum-Classical Model for Explainable Brain Tumor Classification

Title: HQCM-EBTC: A Hybrid Quantum-Classical Model for Explainable Brain Tumor Classification HQCM-EBTC: Hybrides Quantum-Klassisches Modell für erklärbare Hirntumor-Klassifikation HQCM-EBTC:可解释脑肿瘤分类混合量度-分类模型 2506.21937v1

Authors (2): Marwan Ait Haddou, Mohamed Bennai

We propose HQCM-EBTC, a hybrid quantum-classical model for automated brain tumor classification using MRI images. Trained on a dataset of 7,576 scans covering normal, meningioma, glioma, and pituitary classes, HQCM-EBTC integrates a 5-qubit, depth-2 quantum layer with 5 parallel circuits, optimized via AdamW and a composite loss blending cross-entropy and attention consistency. HQCM-EBTC achieves 96.48% accuracy, substantially outperforming the classical baseline (86.72%). It delivers higher precision and F1-scores, especially for glioma detection. t-SNE projections reveal enhanced feature separability in quantum space, and confusion matrices show lower misclassification. Attention map analysis (Jaccard Index) confirms more accurate and focused tumor localization at high-confidence thresholds. These results highlight the promise of quantum-enhanced models in medical imaging, advancing both diagnostic accuracy and interpretability for clinical brain tumor assessment.

nan


Article 1121

Title@2025-06-27 (5): Joint Task Offloading and Resource Allocation in Low-Altitude MEC via Graph Attention Diffusion

Title: Joint Task Offloading and Resource Allocation in Low-Altitude MEC via Graph Attention Diffusion Gemeinsame Aufgabe Offloading und Ressourcenzuteilung in niedrig-Altitude MEC über Graphen-Achtung Diffusion 通过 “ 图表关注 “ 传播在低海拔MEC中联合任务卸载和资源分配 2506.21933v1

Authors (7): Yifan Xue, Ruihuai Liang, Bo Yang, Xuelin Cao, Zhiwen Yu, Mérouane Debbah, Chau Yuen

With the rapid development of the low-altitude economy, air-ground integrated multi-access edge computing (MEC) systems are facing increasing demands for real-time and intelligent task scheduling. In such systems, task offloading and resource allocation encounter multiple challenges, including node heterogeneity, unstable communication links, and dynamic task variations. To address these issues, this paper constructs a three-layer heterogeneous MEC system architecture for low-altitude economic networks, encompassing aerial and ground users as well as edge servers. The system is systematically modeled from the perspectives of communication channels, computational costs, and constraint conditions, and the joint optimization problem of offloading decisions and resource allocation is uniformly abstracted into a graph-structured modeling task. On this basis, we propose a graph attention diffusion-based solution generator (GADSG). This method integrates the contextual awareness of graph attention networks with the solution distribution learning capability of diffusion models, enabling joint modeling and optimization of discrete offloading variables and continuous resource allocation variables within a high-dimensional latent space. We construct multiple simulation datasets with varying scales and topologies. Extensive experiments demonstrate that the proposed GADSG model significantly outperforms existing baseline methods in terms of optimization performance, robustness, and generalization across task structures, showing strong potential for efficient task scheduling in dynamic and complex low-altitude economic network environments.

nan


Article 1122

Title@2025-06-27 (5): Causal Inference Isn’t Special: Why It’s Just Another Prediction Problem

Title: Causal Inference Isn’t Special: Why It’s Just Another Prediction Problem Kausale Schlussfolgerung ist nicht besonders: Warum es nur ein weiteres Vorhersageproblem ist 原因推论不是特别的:为什么这只是另一个预测问题? 2504.04320v2

Authors (1): Carlos Fernández-Loría

Causal inference is often portrayed as fundamentally distinct from predictive modeling, with its own terminology, goals, and intellectual challenges. But at its core, causal inference is simply a structured instance of prediction under distribution shift. In both cases, we begin with labeled data from a source domain and seek to generalize to a target domain where outcomes are not observed. The key difference is that in causal inference, the labels – potential outcomes – are selectively observed based on treatment assignment, introducing bias that must be addressed through assumptions. This perspective reframes causal estimation as a familiar generalization problem and highlights how techniques from predictive modeling, such as reweighting and domain adaptation, apply directly to causal tasks. It also clarifies that causal assumptions are not uniquely strong – they are simply more explicit. By viewing causal inference through the lens of prediction, we demystify its logic, connect it to familiar tools, and make it more accessible to practitioners and educators alike.

nan


Article 1123

Title@2025-06-27 (5): Mic-hackathon 2024: Hackathon on Machine Learning for Electron and Scanning Probe Microscopy

Title: Mic-hackathon 2024: Hackathon on Machine Learning for Electron and Scanning Probe Microscopy Mic-hackathon 2024: Hackathon zum maschinellen Lernen für Elektronen- und Rastersondenmikroskopie Mic-hackathon 2024:关于电子和扫描探测器机器学习的Hackathon 2506.08423v2

Authors (73): Utkarsh Pratiush, Austin Houston, Kamyar Barakati, Aditya Raghavan, Dasol Yoon, Harikrishnan KP, Zhaslan Baraissov, Desheng Ma, Samuel S. Welborn, Mikolaj Jakowski, Shawn-Patrick Barhorst, Alexander J. Pattison, Panayotis Manganaris, Sita Sirisha Madugula, Sai Venkata Gayathri Ayyagari, Vishal Kennedy, Ralph Bulanadi, Michelle Wang, Kieran J. Pang, Ian Addison-Smith, Willy Menacho, Horacio V. Guzman, Alexander Kiefer, Nicholas Furth, Nikola L. Kolev, Mikhail Petrov, Viktoriia Liu, Sergey Ilyev, Srikar Rairao, Tommaso Rodani, Ivan Pinto-Huguet, Xuli Chen, Josep Cruañes, Marta Torrens, Jovan Pomar, Fanzhi Su, Pawan Vedanti, Zhiheng Lyu, Xingzhi Wang, Lehan Yao, Amir Taqieddin, Forrest Laskowski, Xiangyu Yin, Yu-Tsun Shao, Benjamin Fein-Ashley, Yi Jiang, Vineet Kumar, Himanshu Mishra, Yogesh Paul, Adib Bazgir, Rama chandra Praneeth Madugula, Yuwen Zhang, Pravan Omprakash, Jian Huang, Eric Montufar-Morales, Vivek Chawla, Harshit Sethi, Jie Huang, Lauri Kurki, Grace Guinan, Addison Salvador, Arman Ter-Petrosyan, Madeline Van Winkle, Steven R. Spurgeon, Ganesh Narasimha, Zijie Wu, Richard Liu, Yongtao Liu, Boris Slautin, Andrew R Lupini, Rama Vasudevan, Gerd Duscher, Sergei V. Kalinin

Microscopy is a primary source of information on materials structure and functionality at nanometer and atomic scales. The data generated is often well-structured, enriched with metadata and sample histories, though not always consistent in detail or format. The adoption of Data Management Plans (DMPs) by major funding agencies promotes preservation and access. However, deriving insights remains difficult due to the lack of standardized code ecosystems, benchmarks, and integration strategies. As a result, data usage is inefficient and analysis time is extensive. In addition to post-acquisition analysis, new APIs from major microscope manufacturers enable real-time, ML-based analytics for automated decision-making and ML-agent-controlled microscope operation. Yet, a gap remains between the ML and microscopy communities, limiting the impact of these methods on physics, materials discovery, and optimization. Hackathons help bridge this divide by fostering collaboration between ML researchers and microscopy experts. They encourage the development of novel solutions that apply ML to microscopy, while preparing a future workforce for instrumentation, materials science, and applied ML. This hackathon produced benchmark datasets and digital twins of microscopes to support community growth and standardized workflows. All related code is available at GitHub: https://github.com/KalininGroup/Mic-hackathon-2024-codes-publication/tree/1.0.0.1

nan


Article 1124

Title@2025-06-27 (5): Foundation Model Insights and a Multi-Model Approach for Superior Fine-Grained One-shot Subset Selection

Title: Foundation Model Insights and a Multi-Model Approach for Superior Fine-Grained One-shot Subset Selection Grundlagenmodell-Insights und ein Multi-Model-Ansatz für eine erstklassige Auswahl an feinkörnigen Ein-Schuss-Subsets 基础型透视模型和高精粒单弹分子集选多模式办法 2506.14473v2

Authors (5): Zhijing Wan, Zhixiang Wang, Zheng Wang, Xin Xu, Shin’ichi Satoh

One-shot subset selection serves as an effective tool to reduce deep learning training costs by identifying an informative data subset based on the information extracted by an information extractor (IE). Traditional IEs, typically pre-trained on the target dataset, are inherently dataset-dependent. Foundation models (FMs) offer a promising alternative, potentially mitigating this limitation. This work investigates two key questions: (1) Can FM-based subset selection outperform traditional IE-based methods across diverse datasets? (2) Do all FMs perform equally well as IEs for subset selection? Extensive experiments uncovered surprising insights: FMs consistently outperform traditional IEs on fine-grained datasets, whereas their advantage diminishes on coarse-grained datasets with noisy labels. Motivated by these finding, we propose RAM-APL (RAnking Mean-Accuracy of Pseudo-class Labels), a method tailored for fine-grained image datasets. RAM-APL leverages multiple FMs to enhance subset selection by exploiting their complementary strengths. Our approach achieves state-of-the-art performance on fine-grained datasets, including Oxford-IIIT Pet, Food-101, and Caltech-UCSD Birds-200-2011.

nan


Article 1125

Title@2025-06-27 (5): TOAST: Task-Oriented Adaptive Semantic Transmission over Dynamic Wireless Environments

Title: TOAST: Task-Oriented Adaptive Semantic Transmission over Dynamic Wireless Environments TOAST: Taskorientierte adaptive Semantische Übertragung über dynamische drahtlose Umgebungen TOAST: 动态无线环境中以任务为导向的适应性语义传输 2506.21900v1

Authors (3): Sheng Yun, Jianhua Pei, Ping Wang

The evolution toward 6G networks demands a fundamental shift from bit-centric transmission to semantic-aware communication that emphasizes task-relevant information. This work introduces TOAST (Task-Oriented Adaptive Semantic Transmission), a unified framework designed to address the core challenge of multi-task optimization in dynamic wireless environments through three complementary components. First, we formulate adaptive task balancing as a Markov decision process, employing deep reinforcement learning to dynamically adjust the trade-off between image reconstruction fidelity and semantic classification accuracy based on real-time channel conditions. Second, we integrate module-specific Low-Rank Adaptation (LoRA) mechanisms throughout our Swin Transformer-based joint source-channel coding architecture, enabling parameter-efficient fine-tuning that dramatically reduces adaptation overhead while maintaining full performance across diverse channel impairments including Additive White Gaussian Noise (AWGN), fading, phase noise, and impulse interference. Third, we incorporate an Elucidating diffusion model that operates in the latent space to restore features corrupted by channel noises, providing substantial quality improvements compared to baseline approaches. Extensive experiments across multiple datasets demonstrate that TOAST achieves superior performance compared to baseline approaches, with significant improvements in both classification accuracy and reconstruction quality at low Signal-to-Noise Ratio (SNR) conditions while maintaining robust performance across all tested scenarios.

nan


Article 1126

Title@2025-06-27 (5): Advancements and Challenges in Continual Reinforcement Learning: A Comprehensive Review

Title: Advancements and Challenges in Continual Reinforcement Learning: A Comprehensive Review Fortschritte und Herausforderungen im kontinuierlichen Ausbau des Lernens: Ein umfassender Überblick 不断加强学习的进展和挑战:全面审查 2506.21899v1

Authors (3): Amara Zuffer, Michael Burke, Mehrtash Harandi

The diversity of tasks and dynamic nature of reinforcement learning (RL) require RL agents to be able to learn sequentially and continuously, a learning paradigm known as continuous reinforcement learning. This survey reviews how continual learning transforms RL agents into dynamic continual learners. This enables RL agents to acquire and retain useful and reusable knowledge seamlessly. The paper delves into fundamental aspects of continual reinforcement learning, exploring key concepts, significant challenges, and novel methodologies. Special emphasis is placed on recent advancements in continual reinforcement learning within robotics, along with a succinct overview of evaluation environments utilized in prominent research, facilitating accessibility for newcomers to the field. The review concludes with a discussion on limitations and promising future directions, providing valuable insights for researchers and practitioners alike.

nan


Article 1127

Title@2025-06-27 (5): Enhancing Cloud Security through Topic Modelling

Title: Enhancing Cloud Security through Topic Modelling Verbesserung der Cloud-Sicherheit durch Themenmodellierung 通过专题建模模式加强云层安全 2505.01463v2

Authors (3): Sabbir M. Saleh, Nazim Madhavji, John Steinbacher

Protecting cloud applications is critical in an era where security threats are increasingly sophisticated and persistent. Continuous Integration and Continuous Deployment (CI/CD) pipelines are particularly vulnerable, making innovative security approaches essential. This research explores the application of Natural Language Processing (NLP) techniques, specifically Topic Modelling, to analyse security-related text data and anticipate potential threats. We focus on Latent Dirichlet Allocation (LDA) and Probabilistic Latent Semantic Analysis (PLSA) to extract meaningful patterns from data sources, including logs, reports, and deployment traces. Using the Gensim framework in Python, these methods categorise log entries into security-relevant topics (e.g., phishing, encryption failures). The identified topics are leveraged to highlight patterns indicative of security issues across CI/CD’s continuous stages (build, test, deploy). This approach introduces a semantic layer that supports early vulnerability recognition and contextual understanding of runtime behaviours.

nan


Article 1128

Title@2025-06-27 (5): Stability of Primal-Dual Gradient Flow Dynamics for Multi-Block Convex Optimization Problems

Title: Stability of Primal-Dual Gradient Flow Dynamics for Multi-Block Convex Optimization Problems Stabilität von primär-Dual Gradient Flow Dynamics für Multi-Block Convex Optimierungsprobleme 多锁组合最佳化问题微型-多级梯度流动动态的稳定 2408.15969v2

Authors (3): Ibrahim K. Ozaslan, Panagiotis Patrinos, Mihailo R. Jovanović

We examine stability properties of primal-dual gradient flow dynamics for composite convex optimization problems with multiple, possibly nonsmooth, terms in the objective function under the generalized consensus constraint. The proposed dynamics are based on the proximal augmented Lagrangian and they provide a viable alternative to ADMM which faces significant challenges from both analysis and implementation viewpoints in large-scale multi-block scenarios. In contrast to customized algorithms with individualized convergence guarantees, we develop a systematic approach for solving a broad class of challenging composite optimization problems. We leverage various structural properties to establish global (exponential) convergence guarantees for the proposed dynamics. Our assumptions are much weaker than those required to prove (exponential) stability of primal-dual dynamics as well as (linear) convergence of discrete-time methods such as standard two-block and multi-block ADMM and EXTRA algorithms. Finally, we show necessity of some of our structural assumptions for exponential stability and provide computational experiments to demonstrate the convenience of the proposed approach for parallel and distributed computing applications.

nan


Article 1129

Title@2025-06-27 (5): Thompson Sampling in Function Spaces via Neural Operators

Title: Thompson Sampling in Function Spaces via Neural Operators Thompson Probenahme in Funktionsräumen über neurale Operatoren Thompson 通过神经操作员在功能空间取样 2506.21894v1

Authors (4): Rafael Oliveira, Xuesong Wang, Kian Ming A. Chai, Edwin V. Bonilla

We propose an extension of Thompson sampling to optimization problems over function spaces where the objective is a known functional of an unknown operator’s output. We assume that functional evaluations are inexpensive, while queries to the operator (such as running a high-fidelity simulator) are costly. Our algorithm employs a sample-then-optimize approach using neural operator surrogates. This strategy avoids explicit uncertainty quantification by treating trained neural operators as approximate samples from a Gaussian process. We provide novel theoretical convergence guarantees, based on Gaussian processes in the infinite-dimensional setting, under minimal assumptions. We benchmark our method against existing baselines on functional optimization tasks involving partial differential equations and other nonlinear operator-driven phenomena, demonstrating improved sample efficiency and competitive performance.

nan


Article 1130

Title@2025-06-27 (5): Interactive Multi-Objective Probabilistic Preference Learning with Soft and Hard Bounds

Title: Interactive Multi-Objective Probabilistic Preference Learning with Soft and Hard Bounds Interaktives multi-objektives probabilistisches Preference-Lernen mit weichen und harten Grenzen 以软和硬两环进行交互式多目标概率预测性偏好学习 2506.21887v1

Authors (5): Edward Chen, Sang T. Truong, Natalie Dullerud, Sanmi Koyejo, Carlos Guestrin

High-stakes decision-making involves navigating multiple competing objectives with expensive evaluations. For instance, in brachytherapy, clinicians must balance maximizing tumor coverage (e.g., an aspirational target or soft bound of >95% coverage) against strict organ dose limits (e.g., a non-negotiable hard bound of <601 cGy to the bladder), with each plan evaluation being resource-intensive. Selecting Pareto-optimal solutions that match implicit preferences is challenging, as exhaustive Pareto frontier exploration is computationally and cognitively prohibitive, necessitating interactive frameworks to guide users. While decision-makers (DMs) often possess domain knowledge to narrow the search via such soft-hard bounds, current methods often lack systematic approaches to iteratively refine these multi-faceted preference structures. Critically, DMs must trust their final decision, confident they haven’t missed superior alternatives; this trust is paramount in high-consequence scenarios. We present Active-MoSH, an interactive local-global framework designed for this process. Its local component integrates soft-hard bounds with probabilistic preference learning, maintaining distributions over DM preferences and bounds for adaptive Pareto subset refinement. This is guided by an active sampling strategy optimizing exploration-exploitation while minimizing cognitive burden. To build DM trust, Active-MoSH’s global component, T-MoSH, leverages multi-objective sensitivity analysis to identify potentially overlooked, high-value points beyond immediate feedback. We demonstrate Active-MoSH’s performance benefits through diverse synthetic and real-world applications. A user study on AI-generated image selection further validates our hypotheses regarding the framework’s ability to improve convergence, enhance DM trust, and provide expressive preference articulation, enabling more effective DMs.

nan


Article 1131

Title@2025-06-27 (5): UnMix-NeRF: Spectral Unmixing Meets Neural Radiance Fields

Title: UnMix-NeRF: Spectral Unmixing Meets Neural Radiance Fields UnMix-NeRF: Spektrales Unmixen trifft auf neurale Strahlungsfelder Un Mix- NERF: 光谱融合结合满足神经辐射场 2506.21884v1

Authors (5): Fabian Perez, Sara Rojas, Carlos Hinojosa, Hoover Rueda-Chacón, Bernard Ghanem

Neural Radiance Field (NeRF)-based segmentation methods focus on object semantics and rely solely on RGB data, lacking intrinsic material properties. This limitation restricts accurate material perception, which is crucial for robotics, augmented reality, simulation, and other applications. We introduce UnMix-NeRF, a framework that integrates spectral unmixing into NeRF, enabling joint hyperspectral novel view synthesis and unsupervised material segmentation. Our method models spectral reflectance via diffuse and specular components, where a learned dictionary of global endmembers represents pure material signatures, and per-point abundances capture their distribution. For material segmentation, we use spectral signature predictions along learned endmembers, allowing unsupervised material clustering. Additionally, UnMix-NeRF enables scene editing by modifying learned endmember dictionaries for flexible material-based appearance manipulation. Extensive experiments validate our approach, demonstrating superior spectral reconstruction and material segmentation to existing methods. Project page: https://www.factral.co/UnMix-NeRF.

nan


Article 1132

Title@2025-06-27 (5): Advances in Temporal Point Processes: Bayesian, Neural, and LLM Approaches

Title: Advances in Temporal Point Processes: Bayesian, Neural, and LLM Approaches Fortschritte in Temporal Point Prozessen: Bayesian, Neural, und LLM Ansätze Bayesian、Neural和LLM方法等时点工艺的进展 2501.14291v2

Authors (6): Feng Zhou, Quyu Kong, Jie Qiao, Cheng Wan, Yixuan Zhang, Ruichu Cai

Temporal point processes (TPPs) are stochastic process models used to characterize event sequences occurring in continuous time. Traditional statistical TPPs have a long-standing history, with numerous models proposed and successfully applied across diverse domains. In recent years, advances in deep learning have spurred the development of neural TPPs, enabling greater flexibility and expressiveness in capturing complex temporal dynamics. The emergence of large language models (LLMs) has further sparked excitement, offering new possibilities for modeling and analyzing event sequences by leveraging their rich contextual understanding. This survey presents a comprehensive review of recent research on TPPs from three perspectives: Bayesian, deep learning, and LLM approaches. We begin with a review of the fundamental concepts of TPPs, followed by an in-depth discussion of model design and parameter estimation techniques in these three frameworks. We also revisit classic application areas of TPPs to highlight their practical relevance. Finally, we outline challenges and promising directions for future research.

nan


Article 1133

Title@2025-06-27 (5): A Survey of Continual Reinforcement Learning

Title: A Survey of Continual Reinforcement Learning Ein Überblick über kontinuierliches Weiterbildungslernen 持续加强学习调查 2506.21872v1

Authors (7): Chaofan Pan, Xin Yang, Yanhua Li, Wei Wei, Tianrui Li, Bo An, Jiye Liang

Reinforcement Learning (RL) is an important machine learning paradigm for solving sequential decision-making problems. Recent years have witnessed remarkable progress in this field due to the rapid development of deep neural networks. However, the success of RL currently relies on extensive training data and computational resources. In addition, RL’s limited ability to generalize across tasks restricts its applicability in dynamic and real-world environments. With the arisen of Continual Learning (CL), Continual Reinforcement Learning (CRL) has emerged as a promising research direction to address these limitations by enabling agents to learn continuously, adapt to new tasks, and retain previously acquired knowledge. In this survey, we provide a comprehensive examination of CRL, focusing on its core concepts, challenges, and methodologies. Firstly, we conduct a detailed review of existing works, organizing and analyzing their metrics, tasks, benchmarks, and scenario settings. Secondly, we propose a new taxonomy of CRL methods, categorizing them into four types from the perspective of knowledge storage and/or transfer. Finally, our analysis highlights the unique challenges of CRL and provides practical insights into future directions.

nan


Article 1134

Title@2025-06-27 (5): SPADE: Spatial Transcriptomics and Pathology Alignment Using a Mixture of Data Experts for an Expressive Latent Space

Title: SPADE: Spatial Transcriptomics and Pathology Alignment Using a Mixture of Data Experts for an Expressive Latent Space SPADE: Raum-Transkriptomik und Pathologie-Ausrichtung mit einer Mischung von Datenexperten für einen Expressiven Latent Space SPADE: 利用数据专家混合模型进行空间转基因学和病理学调整 2506.21857v1

Authors (7): Ekaterina Redekop, Mara Pleasure, Zichen Wang, Kimberly Flores, Anthony Sisk, William Speier, Corey W. Arnold

The rapid growth of digital pathology and advances in self-supervised deep learning have enabled the development of foundational models for various pathology tasks across diverse diseases. While multimodal approaches integrating diverse data sources have emerged, a critical gap remains in the comprehensive integration of whole-slide images (WSIs) with spatial transcriptomics (ST), which is crucial for capturing critical molecular heterogeneity beyond standard hematoxylin & eosin (H&E) staining. We introduce SPADE, a foundation model that integrates histopathology with ST data to guide image representation learning within a unified framework, in effect creating an ST-informed latent space. SPADE leverages a mixture-of-data experts technique, where experts, created via two-stage feature-space clustering, use contrastive learning to learn representations of co-registered WSI patches and gene expression profiles. Pre-trained on the comprehensive HEST-1k dataset, SPADE is evaluated on 14 downstream tasks, demonstrating significantly superior few-shot performance compared to baseline models, highlighting the benefits of integrating morphological and molecular information into one latent space.

nan


Article 1135

Title@2025-06-27 (5): Leveraging Online Olympiad-Level Math Problems for LLMs Training and Contamination-Resistant Evaluation

Title: Leveraging Online Olympiad-Level Math Problems for LLMs Training and Contamination-Resistant Evaluation Leveraging Online-Olympiade-Level-Mathematik Probleme für LLMs Training und Kontaminierung-Resistent Evaluation 利用在线奥林匹克层面的数学问题促进LLM女士的培训和污染 – – 评估 2501.14275v2

Authors (6): Sadegh Mahdavi, Muchen Li, Kaiwen Liu, Christos Thrampoulidis, Leonid Sigal, Renjie Liao

Advances in Large Language Models (LLMs) have sparked interest in their ability to solve Olympiad-level math problems. However, the training and evaluation of these models are constrained by the limited size and quality of available datasets, as creating large-scale data for such advanced problems requires extensive effort from human experts. In addition, current benchmarks are prone to contamination, leading to unreliable evaluations. In this paper, we present an automated pipeline that leverages the rich resources of the Art of Problem Solving (AoPS) forum, which predominantly features Olympiad-level problems and community-driven solutions. Using open-source LLMs, we develop a method to extract question-answer pairs from the forum, resulting in AoPS-Instruct, a dataset of more than 600,000 high-quality QA pairs. Our experiments demonstrate that fine-tuning LLMs on AoPS-Instruct improves their reasoning abilities across various benchmarks. Moreover, we build an automatic pipeline that introduces LiveAoPSBench, an evolving evaluation set with timestamps, derived from the latest forum data, providing a contamination-resistant benchmark for assessing LLM performance. Notably, we observe a significant decline in LLM performance over time, suggesting their success on older examples may stem from pre-training exposure rather than true reasoning ability. Our work presents a scalable approach to creating and maintaining large-scale, high-quality datasets for advanced math reasoning, offering valuable insights into the capabilities and limitations of LLMs in this domain. Our benchmark and code is available at https://github.com/DSL-Lab/aops

nan


Article 1136

Title@2025-06-27 (5): Unveiling the Power of Noise Priors: Enhancing Diffusion Models for Mobile Traffic Prediction

Title: Unveiling the Power of Noise Priors: Enhancing Diffusion Models for Mobile Traffic Prediction Die Kraft der Lärmvorhersage enthüllen: Diffusionsmodelle für mobile Verkehrsvorhersage verbessern 打破噪音前置物的威力:加强移动交通预测的传播模式 2501.13794v3

Authors (4): Zhi Sheng, Daisy Yuan, Jingtao Ding, Yong Li

Accurate prediction of mobile traffic, i.e., network traffic from cellular base stations, is crucial for optimizing network performance and supporting urban development. However, the non-stationary nature of mobile traffic, driven by human activity and environmental changes, leads to both regular patterns and abrupt variations. Diffusion models excel in capturing such complex temporal dynamics due to their ability to capture the inherent uncertainties. Most existing approaches prioritize designing novel denoising networks but often neglect the critical role of noise itself, potentially leading to sub-optimal performance. In this paper, we introduce a novel perspective by emphasizing the role of noise in the denoising process. Our analysis reveals that noise fundamentally shapes mobile traffic predictions, exhibiting distinct and consistent patterns. We propose NPDiff, a framework that decomposes noise into prior and residual components, with the prior} derived from data dynamics, enhancing the model’s ability to capture both regular and abrupt variations. NPDiff can seamlessly integrate with various diffusion-based prediction models, delivering predictions that are effective, efficient, and robust. Extensive experiments demonstrate that it achieves superior performance with an improvement over 30\%, offering a new perspective on leveraging diffusion models in this domain. We provide code and data at https://github.com/tsinghua-fib-lab/NPDiff.

nan


Article 1137

Title@2025-06-27 (5): The Consistency Hypothesis in Uncertainty Quantification for Large Language Models

Title: The Consistency Hypothesis in Uncertainty Quantification for Large Language Models Die Kohärenzhypothese in der Unsicherheitsquantifizierung für große Sprachmodelle 《大语言模型不确定性量化不确定性的一致假设》 2506.21849v1

Authors (8): Quan Xiao, Debarun Bhattacharjya, Balaji Ganesan, Radu Marinescu, Katsiaryna Mirylenka, Nhan H Pham, Michael Glass, Junkyu Lee

Estimating the confidence of large language model (LLM) outputs is essential for real-world applications requiring high user trust. Black-box uncertainty quantification (UQ) methods, relying solely on model API access, have gained popularity due to their practical benefits. In this paper, we examine the implicit assumption behind several UQ methods, which use generation consistency as a proxy for confidence, an idea we formalize as the consistency hypothesis. We introduce three mathematical statements with corresponding statistical tests to capture variations of this hypothesis and metrics to evaluate LLM output conformity across tasks. Our empirical investigation, spanning 8 benchmark datasets and 3 tasks (question answering, text summarization, and text-to-SQL), highlights the prevalence of the hypothesis under different settings. Among the statements, we highlight the `Sim-Any’ hypothesis as the most actionable, and demonstrate how it can be leveraged by proposing data-free black-box UQ methods that aggregate similarities between generations for confidence estimation. These approaches can outperform the closest baselines, showcasing the practical value of the empirically observed consistency hypothesis.

nan


Article 1138

Title@2025-06-27 (5): QT-DoG: Quantization-aware Training for Domain Generalization

Title: QT-DoG: Quantization-aware Training for Domain Generalization QT-DOG: Quantization-aware Training für Domain Generalization QT-DoG: 区域通用化的量化了解培训 2410.06020v2

Authors (3): Saqib Javed, Hieu Le, Mathieu Salzmann

A key challenge in Domain Generalization (DG) is preventing overfitting to source domains, which can be mitigated by finding flatter minima in the loss landscape. In this work, we propose Quantization-aware Training for Domain Generalization (QT-DoG) and demonstrate that weight quantization effectively leads to flatter minima in the loss landscape, thereby enhancing domain generalization. Unlike traditional quantization methods focused on model compression, QT-DoG exploits quantization as an implicit regularizer by inducing noise in model weights, guiding the optimization process toward flatter minima that are less sensitive to perturbations and overfitting. We provide both an analytical perspective and empirical evidence demonstrating that quantization inherently encourages flatter minima, leading to better generalization across domains. Moreover, with the benefit of reducing the model size through quantization, we demonstrate that an ensemble of multiple quantized models further yields superior accuracy than the state-of-the-art DG approaches with no computational or memory overheads. Code is released at: https://saqibjaved1.github.io/QT_DoG/.

nan


Article 1139

Title@2025-06-27 (5): Koopman operator-based discussion on partial observation in stochastic systems

Title: Koopman operator-based discussion on partial observation in stochastic systems Koopman operator-based discussion on partial observation in stochastic systems Koopman操作员关于抽查系统中部分观察的讨论 2506.21844v1

Authors (1): Jun Ohkubo

It is sometimes difficult to achieve a complete observation for a full set of observables, and partial observations are necessary. For deterministic systems, the Mori-Zwanzig formalism provides a theoretical framework for handling partial observations. Recently, data-driven algorithms based on the Koopman operator theory have made significant progress, and there is a discussion to connect the Mori-Zwanzig formalism with the Koopman operator theory. In this work, we discuss the effects of partial observation in stochastic systems using the Koopman operator theory. The discussion clarifies the importance of distinguishing the state space and the function space in stochastic systems. Even in stochastic systems, the delay embedding technique is beneficial for partial observation, and several numerical experiments showed a power-law behavior of the accuracy for the amplitude of the additive noise. We also discuss the relation between the exponent of the power-law behavior and the effects of partial observation.

nan


Article 1140

Title@2025-06-27 (5): Adversarial Threats in Quantum Machine Learning: A Survey of Attacks and Defenses

Title: Adversarial Threats in Quantum Machine Learning: A Survey of Attacks and Defenses Widersprüchliche Bedrohungen im Quantum Machine Learning: Eine Übersicht über Angriffe und Verteidigungen 量子机器学习中的反对威胁:攻击和防御调查 2506.21842v1

Authors (3): Archisman Ghosh, Satwik Kundu, Swaroop Ghosh

Quantum Machine Learning (QML) integrates quantum computing with classical machine learning, primarily to solve classification, regression and generative tasks. However, its rapid development raises critical security challenges in the Noisy Intermediate-Scale Quantum (NISQ) era. This chapter examines adversarial threats unique to QML systems, focusing on vulnerabilities in cloud-based deployments, hybrid architectures, and quantum generative models. Key attack vectors include model stealing via transpilation or output extraction, data poisoning through quantum-specific perturbations, reverse engineering of proprietary variational quantum circuits, and backdoor attacks. Adversaries exploit noise-prone quantum hardware and insufficiently secured QML-as-a-Service (QMLaaS) workflows to compromise model integrity, ownership, and functionality. Defense mechanisms leverage quantum properties to counter these threats. Noise signatures from training hardware act as non-invasive watermarks, while hardware-aware obfuscation techniques and ensemble strategies disrupt cloning attempts. Emerging solutions also adapt classical adversarial training and differential privacy to quantum settings, addressing vulnerabilities in quantum neural networks and generative architectures. However, securing QML requires addressing open challenges such as balancing noise levels for reliability and security, mitigating cross-platform attacks, and developing quantum-classical trust frameworks. This chapter summarizes recent advances in attacks and defenses, offering a roadmap for researchers and practitioners to build robust, trustworthy QML systems resilient to evolving adversarial landscapes.

nan


Article 1141

Title@2025-06-27 (5): The Cost of Avoiding Backpropagation

Title: The Cost of Avoiding Backpropagation Die Kosten der Vermeidung von Rückvermehrung 避免反言的成本 2506.21833v1

Authors (4): Kunjal Panchal, Sunav Choudhary, Yuriy Brun, Hui Guan

Forward-mode automatic differentiation (FmAD) and zero-order (ZO) optimization have been proposed as memory-efficient alternatives to backpropagation (BP) for gradient computation, especially in low-resource settings. However, their practical benefits remain unclear due to two key gaps: a lack of comparison against memory-efficient BP variants, such as activation checkpointing, and a lack of a unified theoretical analysis. This work presents a comprehensive theoretical and empirical comparison of BP, FmAD, and ZO methods. Our theoretical analysis shows that while FmAD, and ZO can reduce memory usage, they incur significant costs in accuracy, convergence speed, and computation compared to BP with checkpointing. These drawbacks worsen with larger models or constrained perturbation budgets. Empirical experiments on large language and vision-language models show that BP with checkpointing outperforms FmAD and ZO variants, including those enhanced with variance reduction, achieving up to 31.1% higher accuracy, 34.8% faster convergence, and 3.8x fewer computations at comparable memory usage. Our results highlight fundamental limitations of FmAD and ZO, and reaffirm BP with checkpointing as the most effective strategy for model training under memory-constrained settings. Our code is available at https://github.com/Astuary/The_Cost_of_Avoiding_Backpropagation.

nan


Article 1142

Title@2025-06-27 (5): Computational Efficient and Minimax Optimal Nonignorable Matrix Completion

Title: Computational Efficient and Minimax Optimal Nonignorable Matrix Completion Computational Efficient und Minimax Optimale nicht entzündbare Matrix-Vervollständigung 计算效率高和最小最大最佳不皮不皮的矩阵补全 2504.04016v2

Authors (4): Yuanhong A, Guoyu Zhang, Yongcheng Zeng, Bo Zhang

While the matrix completion problem has attracted considerable attention over the decades, few works address the nonignorable missing issue and all have their limitations. In this article, we propose a nuclear norm regularized row- and column-wise matrix U-statistic loss function for the generalized nonignorable missing mechanism, a flexible and generally applicable missing mechanism which contains both ignorable and nonignorable missing mechanism assumptions. The proposed method achieves computational efficiency comparable to the existing missing-at-random approaches, while providing the near minimax optimal statistical convergence rate guarantees for the more general nonignorable missing case. We propose an accelerated proximal gradient algorithm to solve the associated optimization problem, and characterize the interaction between algorithmic and statistical convergence. Simulations and real data analyzes further support the practical utility of the proposed method.

nan


Article 1143

Title@2025-06-27 (5): RLSF: Fine-tuning LLMs via Symbolic Feedback

Title: RLSF: Fine-tuning LLMs via Symbolic Feedback RLSF: Feinjustierende LLMs über symbolisches Feedback RLSF:通过符号反馈对LLMs进行微调 2405.16661v3

Authors (5): Piyush Jha, Prithwish Jana, Pranavkrishna Suresh, Arnav Arora, Vijay Ganesh

Large Language Models (LLMs) have transformed AI but often struggle with tasks that require domain-specific reasoning and logical alignment. Traditional fine-tuning methods do not leverage the vast amount of symbolic domain-knowledge available to us via symbolic reasoning tools (e.g., provers), and are further limited by sparse rewards and unreliable reward models. We introduce Reinforcement Learning via Symbolic Feedback (RLSF), a novel fine-tuning paradigm where symbolic reasoning tools (e.g., solvers, provers, and algebra systems) provide fine-grained feedback to LLMs. RLSF uses poly-sized certificates (e.g., proofs) generated by symbolic tools to identify and correct errors in model outputs, offering token-level guidance without requiring differentiable reasoning systems. This paradigm bridges the gap between symbolic reasoning and LLM fine-tuning, enabling precise alignment with domain-specific constraints while addressing key limitations of traditional reward signals. Via extensive evaluations, we show that our RLSF-based fine-tuning of LLMs outperforms traditional approaches on five different applications (that have some associated logical or domain constraints), namely, program synthesis from natural language pseudo-code to programming language, three chemistry tasks, and solving the Game of 24. A key takeaway is that fine-tuning via RLSF enables relatively smaller LLMs to significantly outperform closed-source models that are orders of magnitude larger.

nan


Article 1144

Title@2025-06-27 (5): Fetal Sleep: A Cross-Species Review of Physiology, Measurement, and Classification

Title: Fetal Sleep: A Cross-Species Review of Physiology, Measurement, and Classification Fetaler Schlaf: Eine Cross-Spezies-Überprüfung der Physiologie, Messung und Klassifizierung 胎儿睡眠:对生理、测量和分类的跨物种审查 2506.21828v1

Authors (6): Weitao Tang, Johann Vargas-Calixto, Nasim Katebi, Robert Galinsky, Gari D. Clifford, Faezeh Marzbanrad

Fetal sleep is a relatively underexplored yet vital aspect of prenatal neurodevelopment. Understanding fetal sleep patterns could provide insights into early brain maturation and help clinicians detect signs of neurological compromise that arise due to fetal hypoxia or fetal growth restriction. This review synthesizes over eight decades of research on the physiological characteristics, ontogeny, and regulation of fetal sleep. We compare sleep-state patterns in humans and large animal models, highlighting species-specific differences and the presence of sleep-state analogs. We review both invasive techniques in animals and non-invasive modalities in humans. Computational methods for sleep-state classification are also examined, including rule-based approaches (with and without clustering-based preprocessing) and state-of-the-art deep learning techniques. Finally, we discuss how intrauterine conditions such as hypoxia and fetal growth restriction can disrupt fetal sleep. This review provides a comprehensive foundation for the development of objective, multimodal, and non-invasive fetal sleep monitoring technologies to support early diagnosis and intervention in prenatal care.

nan


Article 1145

Title@2025-06-27 (5): Few-Shot Segmentation of Historical Maps via Linear Probing of Vision Foundation Models

Title: Few-Shot Segmentation of Historical Maps via Linear Probing of Vision Foundation Models Wenig scharfe Segmentierung historischer Karten über Linear Probing of Vision Foundation Models 通过视觉基金会模型的线性考察对历史地图进行几小片段分割 2506.21826v1

Authors (3): Rafael Sterzinger, Marco Peer, Robert Sablatnig

As rich sources of history, maps provide crucial insights into historical changes, yet their diverse visual representations and limited annotated data pose significant challenges for automated processing. We propose a simple yet effective approach for few-shot segmentation of historical maps, leveraging the rich semantic embeddings of large vision foundation models combined with parameter-efficient fine-tuning. Our method outperforms the state-of-the-art on the Siegfried benchmark dataset in vineyard and railway segmentation, achieving +5% and +13% relative improvements in mIoU in 10-shot scenarios and around +20% in the more challenging 5-shot setting. Additionally, it demonstrates strong performance on the ICDAR 2021 competition dataset, attaining a mean PQ of 67.3% for building block segmentation, despite not being optimized for this shape-sensitive metric, underscoring its generalizability. Notably, our approach maintains high performance even in extremely low-data regimes (10- & 5-shot), while requiring only 689k trainable parameters - just 0.21% of the total model size. Our approach enables precise segmentation of diverse historical maps while drastically reducing the need for manual annotations, advancing automated processing and analysis in the field. Our implementation is publicly available at: https://github.com/RafaelSterzinger/few-shot-map-segmentation.

nan


Article 1146

Title@2025-06-26 (4): Mathematical Modeling of Protein Structures: A Cohomology-Based Approach to the Flagellar Motor

Title: Mathematical Modeling of Protein Structures: A Cohomology-Based Approach to the Flagellar Motor Mathematische Modellierung von Proteinstrukturen: Ein auf Kohomologie basierender Ansatz zum Flagellar Motor 蛋白质结构的数学建模:对旗舰汽车采取基于共同地质学的方法 2504.16941v2

Authors (3): Zakaria Lamine, Abdelatif Hafid, Mohamed Rahouti

This study presents a novel mathematical model derived from cohomology, leveraging the KEEL-proven theorem that establishes cohomology as tautological, generated by boundary classes of curves with fixed dual graphs. Simplicial complexes are constructed using skew-commutative graded algebra, and the structure theorem is applied to connect distinct homologies, enabling precise interpretations of the resulting geometric forms. The proposed model is utilized for protein structure analysis and prediction, with a specific application to the Flagellar Motor structure. This approach offers new insights into the geometric and algebraic foundations of biological macromolecular modeling, highlighting its potential for advancement in structural biology.

nan


Article 1147

Title@2025-06-26 (4): CAT-SG: A Large Dynamic Scene Graph Dataset for Fine-Grained Understanding of Cataract Surgery

Title: CAT-SG: A Large Dynamic Scene Graph Dataset for Fine-Grained Understanding of Cataract Surgery CAT-SG: Ein großer dynamischer Szenendiagramm-Datensatz für ein feinkörniges Verständnis der Kataraktchirurgie CAT-SG:用于精确了解白内障外科手术的大型动态场景图数据集 2506.21813v1

Authors (4): Felix Holm, Gözde Ünver, Ghazal Ghazaei, Nassir Navab

Understanding the intricate workflows of cataract surgery requires modeling complex interactions between surgical tools, anatomical structures, and procedural techniques. Existing datasets primarily address isolated aspects of surgical analysis, such as tool detection or phase segmentation, but lack comprehensive representations that capture the semantic relationships between entities over time. This paper introduces the Cataract Surgery Scene Graph (CAT-SG) dataset, the first to provide structured annotations of tool-tissue interactions, procedural variations, and temporal dependencies. By incorporating detailed semantic relations, CAT-SG offers a holistic view of surgical workflows, enabling more accurate recognition of surgical phases and techniques. Additionally, we present a novel scene graph generation model, CatSGG, which outperforms current methods in generating structured surgical representations. The CAT-SG dataset is designed to enhance AI-driven surgical training, real-time decision support, and workflow analysis, paving the way for more intelligent, context-aware systems in clinical practice.

nan


Article 1148

Title@2025-06-26 (4): Classification with Reject Option: Distribution-free Error Guarantees via Conformal Prediction

Title: Classification with Reject Option: Distribution-free Error Guarantees via Conformal Prediction Klassifizierung mit Ablehnungsoption: Verteilungsfreie Fehlergarantien durch konforme Vorhersage 采用拒绝分类办法:通过非正规预测保证无分发错误 2506.21802v1

Authors (6): Johan Hallberg Szabadváry, Tuwe Löfström, Ulf Johansson, Cecilia Sönströd, Ernst Ahlberg, Lars Carlsson

Machine learning (ML) models always make a prediction, even when they are likely to be wrong. This causes problems in practical applications, as we do not know if we should trust a prediction. ML with reject option addresses this issue by abstaining from making a prediction if it is likely to be incorrect. In this work, we formalise the approach to ML with reject option in binary classification, deriving theoretical guarantees on the resulting error rate. This is achieved through conformal prediction (CP), which produce prediction sets with distribution-free validity guarantees. In binary classification, CP can output prediction sets containing exactly one, two or no labels. By accepting only the singleton predictions, we turn CP into a binary classifier with reject option. Here, CP is formally put in the framework of predicting with reject option. We state and prove the resulting error rate, and give finite sample estimates. Numerical examples provide illustrations of derived error rate through several different conformal prediction settings, ranging from full conformal prediction to offline batch inductive conformal prediction. The former has a direct link to sharp validity guarantees, whereas the latter is more fuzzy in terms of validity guarantees but can be used in practice. Error-reject curves illustrate the trade-off between error rate and reject rate, and can serve to aid a user to set an acceptable error rate or reject rate in practice.

nan


Article 1149

Title@2025-06-26 (4): Offensive Language Detection on Social Media Using XLNet

Title: Offensive Language Detection on Social Media Using XLNet Offensive Spracherkennung auf Social Media mit XLNet 使用XLNet在社交媒体上发现攻击性语言 2506.21795v1

Authors (3): Reem Alothman, Hafida Benhidour, Said Kerrache

The widespread use of text-based communication on social media-through chats, comments, and microblogs-has improved user interaction but has also led to an increase in offensive content, including hate speech, racism, and other forms of abuse. Due to the enormous volume of user-generated content, manual moderation is impractical, which creates a need for automated systems that can detect offensive language. Deep learning models, particularly those using transfer learning, have demonstrated significant success in understanding natural language through large-scale pretraining. In this study, we propose an automatic offensive language detection model based on XLNet, a generalized autoregressive pretraining method, and compare its performance with BERT (Bidirectional Encoder Representations from Transformers), which is a widely used baseline in natural language processing (NLP). Both models are evaluated using the Offensive Language Identification Dataset (OLID), a benchmark Twitter dataset that includes hierarchical annotations. Our experimental results show that XLNet outperforms BERT in detecting offensive content and in categorizing the types of offenses, while BERT performs slightly better in identifying the targets of the offenses. Additionally, we find that oversampling and undersampling strategies are effective in addressing class imbalance and improving classification performance. These findings highlight the potential of transfer learning and XLNet-based architectures to create robust systems for detecting offensive language on social media platforms.

nan


Article 1150

Title@2025-06-26 (4): Multi-task parallelism for robust pre-training of graph foundation models on multi-source, multi-fidelity atomistic modeling data

Title: Multi-task parallelism for robust pre-training of graph foundation models on multi-source, multi-fidelity atomistic modeling data Multi-Task-Parallelismus für robustes Pre-Training von Graphen-Grundmodellen auf Multi-Source-, Multi-Fidelity atomistic Modellierungsdaten 为多源、多信仰、多宗教、原子模型化数据图像基础模型模型的可靠培训前培训提供多任务平行的强有力任务 2506.21788v1

Authors (9): Massimiliano Lupo Pasini, Jong Youl Choi, Pei Zhang, Kshitij Mehta, Rylie Weaver, Ashwin M. Aji, Karl W. Schulz, Jorda Polo, Prasanna Balaprakash

Graph foundation models using graph neural networks promise sustainable, efficient atomistic modeling. To tackle challenges of processing multi-source, multi-fidelity data during pre-training, recent studies employ multi-task learning, in which shared message passing layers initially process input atomistic structures regardless of source, then route them to multiple decoding heads that predict data-specific outputs. This approach stabilizes pre-training and enhances a model’s transferability to unexplored chemical regions. Preliminary results on approximately four million structures are encouraging, yet questions remain about generalizability to larger, more diverse datasets and scalability on supercomputers. We propose a multi-task parallelism method that distributes each head across computing resources with GPU acceleration. Implemented in the open-source HydraGNN architecture, our method was trained on over 24 million structures from five datasets and tested on the Perlmutter, Aurora, and Frontier supercomputers, demonstrating efficient scaling on all three highly heterogeneous super-computing architectures.

nan


Article 1151

Title@2025-06-26 (4): Estimating Correctness Without Oracles in LLM-Based Code Generation

Title: Estimating Correctness Without Oracles in LLM-Based Code Generation Schätzung der Korrektheit ohne Oracles in der LLM-basierten Code-Generierung 在基于LLM的代码生成中估算无甲骨文的正确性 2507.00057v1

Authors (4): Thomas Valentin, Ardi Madadi, Gaetano Sapia, Marcel Böhme

Generating code from natural language specifications is one of the most successful applications of Large Language Models (LLMs). Yet, they hallucinate: LLMs produce outputs that may be grammatically correct but are factually incorrect. Without an existing, correct implementation (i.e., an oracle), can we quantify how likely the generated program is correct? In this paper, we propose a measure of incorrectness, called incoherence, that can be estimated efficiently in the absence of an oracle and provides a lower bound on the error, i.e., the probability that the LLM-generated program for that specification is incorrect. Our experiments demonstrate an extraordinary effectiveness. For the average code generation task, our incoherence-based methodology can automatically identify about two-thirds of incorrect programs without reports of false positives. In fact, an oracle-based evaluation of LLMs can be reliably replaced by an incoherence-based evaluation. In particular, we find a very strong agreement between the ranking of LLMs by the number of programs deemed correct via an oracle (pass@1) and the ranking of LLMs by the number of programs deemed correct via our incoherence.

nan


Article 1152

Title@2025-06-26 (4): Sparse-Reg: Improving Sample Complexity in Offline Reinforcement Learning using Sparsity

Title: Sparse-Reg: Improving Sample Complexity in Offline Reinforcement Learning using Sparsity Sparse-Reg: Verbesserung der Probenkomplexität im Offline-Verstärkungs-Lernen mit Sparsity 利用公平性改进离线强化学习的抽样复杂性 2506.17155v2

Authors (3): Samin Yeasar Arnob, Scott Fujimoto, Doina Precup

In this paper, we investigate the use of small datasets in the context of offline reinforcement learning (RL). While many common offline RL benchmarks employ datasets with over a million data points, many offline RL applications rely on considerably smaller datasets. We show that offline RL algorithms can overfit on small datasets, resulting in poor performance. To address this challenge, we introduce “Sparse-Reg”: a regularization technique based on sparsity to mitigate overfitting in offline reinforcement learning, enabling effective learning in limited data settings and outperforming state-of-the-art baselines in continuous control.

nan


Article 1153

Title@2025-06-26 (4): Graph ODEs and Beyond: A Comprehensive Survey on Integrating Differential Equations with Graph Neural Networks

Title: Graph ODEs and Beyond: A Comprehensive Survey on Integrating Differential Equations with Graph Neural Networks Graph ODEs and Beyond: Eine umfassende Umfrage zur Integration von Differentialgleichungen mit Graph Neural Networks 图ODEs 及其他:关于将差异等同与图形神经网络相结合的综合调查 2503.23167v3

Authors (6): Zewen Liu, Xiaoda Wang, Bohan Wang, Zijie Huang, Carl Yang, Wei Jin

Graph Neural Networks (GNNs) and differential equations (DEs) are two rapidly advancing areas of research that have shown remarkable synergy in recent years. GNNs have emerged as powerful tools for learning on graph-structured data, while differential equations provide a principled framework for modeling continuous dynamics across time and space. The intersection of these fields has led to innovative approaches that leverage the strengths of both, enabling applications in physics-informed learning, spatiotemporal modeling, and scientific computing. This survey aims to provide a comprehensive overview of the burgeoning research at the intersection of GNNs and DEs. We will categorize existing methods, discuss their underlying principles, and highlight their applications across domains such as molecular modeling, traffic prediction, and epidemic spreading. Furthermore, we identify open challenges and outline future research directions to advance this interdisciplinary field. A comprehensive paper list is provided at https://github.com/Emory-Melody/Awesome-Graph-NDEs. This survey serves as a resource for researchers and practitioners seeking to understand and contribute to the fusion of GNNs and DEs

nan


Article 1154

Title@2025-06-26 (4): M3PO: Massively Multi-Task Model-Based Policy Optimization

Title: M3PO: Massively Multi-Task Model-Based Policy Optimization M3PO: Massive Multi-Task-Modellbasierte Politikoptimierung M3PO: 大规模多任务多任务模式政策优化 2506.21782v1

Authors (3): Aditya Narendra, Dmitry Makarov, Aleksandr Panov

We introduce Massively Multi-Task Model-Based Policy Optimization (M3PO), a scalable model-based reinforcement learning (MBRL) framework designed to address sample inefficiency in single-task settings and poor generalization in multi-task domains. Existing model-based approaches like DreamerV3 rely on pixel-level generative models that neglect control-centric representations, while model-free methods such as PPO suffer from high sample complexity and weak exploration. M3PO integrates an implicit world model, trained to predict task outcomes without observation reconstruction, with a hybrid exploration strategy that combines model-based planning and model-free uncertainty-driven bonuses. This eliminates the bias-variance trade-off in prior methods by using discrepancies between model-based and model-free value estimates to guide exploration, while maintaining stable policy updates through a trust-region optimizer. M3PO provides an efficient and robust alternative to existing model-based policy optimization approaches and achieves state-of-the-art performance across multiple benchmarks.

nan


Article 1155

Title@2025-06-26 (4): Multi-thresholding Good Arm Identification with Bandit Feedback

Title: Multi-thresholding Good Arm Identification with Bandit Feedback Multi-Thresholding Good Arm Identification mit Bandit Feedback 多保持好武器及强盗反馈 2503.10386v3

Authors (4): Xuanke Jiang, Sherief Hashima, Kohei Hatano, Eiji Takimoto

We consider a good arm identification problem in a stochastic bandit setting with multi-objectives, where each arm $i \in [K]$ is associated with a distribution $D_i$ defined over $R^M$. For each round $t$, the player pulls an arm $i_t$ and receives an $M$-dimensional reward vector sampled according to $D_{i_t}$. The goal is to find, with high probability, an $\epsilon$-good arm whose expected reward vector is larger than $\bm{\xi} - \epsilon \mathbf{1}$, where $\bm{\xi}$ is a predefined threshold vector, and the vector comparison is component-wise. We propose the Multi-Thresholding UCB~(MultiTUCB) algorithm with a sample complexity bound. Our bound matches the existing one in the special case where $M=1$ and $\epsilon=0$. The proposed algorithm demonstrates superior performance compared to baseline approaches across synthetic and real datasets.

nan


Article 1156

Title@2025-06-26 (4): Generative Data Mining with Longtail-Guided Diffusion

Title: Generative Data Mining with Longtail-Guided Diffusion Generatives Data Mining mit Langschwanz-geführter Diffusion 利用长尾指导传播生成数据采矿 2502.01980v2

Authors (9): David S. Hayden, Mao Ye, Timur Garipov, Gregory P. Meyer, Carl Vondrick, Zhao Chen, Yuning Chai, Eric Wolff, Siddhartha S. Srinivasa

It is difficult to anticipate the myriad challenges that a predictive model will encounter once deployed. Common practice entails a reactive, cyclical approach: model deployment, data mining, and retraining. We instead develop a proactive longtail discovery process by imagining additional data during training. In particular, we develop general model-based longtail signals, including a differentiable, single forward pass formulation of epistemic uncertainty that does not impact model parameters or predictive performance but can flag rare or hard inputs. We leverage these signals as guidance to generate additional training data from a latent diffusion model in a process we call Longtail Guidance (LTG). Crucially, we can perform LTG without retraining the diffusion model or the predictive model, and we do not need to expose the predictive model to intermediate diffusion states. Data generated by LTG exhibit semantically meaningful variation, yield significant generalization improvements on numerous image classification benchmarks, and can be analyzed by a VLM to proactively discover, textually explain, and address conceptual gaps in a deployed predictive model.

nan


Article 1157

Title@2025-06-26 (4): Gradient-Based Neuroplastic Adaptation for Concurrent Optimization of Neuro-Fuzzy Networks

Title: Gradient-Based Neuroplastic Adaptation for Concurrent Optimization of Neuro-Fuzzy Networks Gradient-Based Neuroplastic Adaptation für die gleichzeitige Optimierung von Neuro-Fuzzy-Netzwerken 为同时优化Neuro-Fuzzy网络而渐进式神经塑料适应 2506.21771v1

Authors (2): John Wesley Hostetter, Min Chi

Neuro-fuzzy networks (NFNs) are transparent, symbolic, and universal function approximations that perform as well as conventional neural architectures, but their knowledge is expressed as linguistic IF-THEN rules. Despite these advantages, their systematic design process remains a challenge. Existing work will often sequentially build NFNs by inefficiently isolating parametric and structural identification, leading to a premature commitment to brittle and subpar architecture. We propose a novel application-independent approach called gradient-based neuroplastic adaptation for the concurrent optimization of NFNs’ parameters and structure. By recognizing that NFNs’ parameters and structure should be optimized simultaneously as they are deeply conjoined, settings previously unapproachable for NFNs are now accessible, such as the online reinforcement learning of NFNs for vision-based tasks. The effectiveness of concurrently optimizing NFNs is empirically shown as it is trained by online reinforcement learning to proficiently play challenging scenarios from a vision-based video game called DOOM.

nan


Article 1158

Title@2025-06-26 (4): Early Glaucoma Detection using Deep Learning with Multiple Datasets of Fundus Images

Title: Early Glaucoma Detection using Deep Learning with Multiple Datasets of Fundus Images Frühe Glaukomerkennung mit Deep Learning mit mehreren Datensätzen von Fundus-Bildern 利用包含Fundus图像多数据集的深学习探测早期青光眼 2506.21770v1

Authors (2): Rishiraj Paul Chowdhury, Nirmit Shekar Karkera

Glaucoma is a leading cause of irreversible blindness, but early detection can significantly improve treatment outcomes. Traditional diagnostic methods are often invasive and require specialized equipment. In this work, we present a deep learning pipeline using the EfficientNet-B0 architecture for glaucoma detection from retinal fundus images. Unlike prior studies that rely on single datasets, we sequentially train and fine-tune our model across ACRIMA, ORIGA, and RIM-ONE datasets to enhance generalization. Our experiments show that minimal preprocessing yields higher AUC-ROC compared to more complex enhancements, and our model demonstrates strong discriminative performance on unseen datasets. The proposed pipeline offers a reproducible and scalable approach to early glaucoma detection, supporting its potential clinical utility.

nan


Article 1159

Title@2025-06-26 (4): Testing Causal Models with Hidden Variables in Polynomial Delay via Conditional Independencies

Title: Testing Causal Models with Hidden Variables in Polynomial Delay via Conditional Independencies Testen von Kausalmodellen mit versteckten Variablen in Polynom-Delay über bedingte Abhängigkeiten 测试因有条件独立导致多配偶拖延中隐藏变量的因果模型 2409.14593v2

Authors (4): Hyunchai Jeong, Adiba Ejaz, Jin Tian, Elias Bareinboim

Testing a hypothesized causal model against observational data is a key prerequisite for many causal inference tasks. A natural approach is to test whether the conditional independence relations (CIs) assumed in the model hold in the data. While a model can assume exponentially many CIs (with respect to the number of variables), testing all of them is both impractical and unnecessary. Causal graphs, which encode these CIs in polynomial space, give rise to local Markov properties that enable model testing with a significantly smaller subset of CIs. Model testing based on local properties requires an algorithm to list the relevant CIs. However, existing algorithms for realistic settings with hidden variables and non-parametric distributions can take exponential time to produce even a single CI constraint. In this paper, we introduce the c-component local Markov property (C-LMP) for causal graphs with hidden variables. Since C-LMP can still invoke an exponential number of CIs, we develop a polynomial delay algorithm to list these CIs in poly-time intervals. To our knowledge, this is the first algorithm that enables poly-delay testing of CIs in causal graphs with hidden variables against arbitrary data distributions. Experiments on real-world and synthetic data demonstrate the practicality of our algorithm.

nan


Article 1160

Title@2025-06-26 (4): Nested Stochastic Algorithm for Generalized Sinkhorn distance-Regularized Distributionally Robust Optimization

Title: Nested Stochastic Algorithm for Generalized Sinkhorn distance-Regularized Distributionally Robust Optimization Verschachtelter stochastischer Algorithmus für generalisierte Sinkhorndistanz-regularisierte distributiv robuste Optimierung 用于通用 Sinkhorn 远距离分解分布的内嵌斯托克算法优化 2503.22923v2

Authors (3): Yufeng Yang, Yi Zhou, Zhaosong Lu

Distributionally robust optimization (DRO) is a powerful technique to train robust models against data distribution shift. This paper aims to solve regularized nonconvex DRO problems, where the uncertainty set is modeled by a so-called generalized Sinkhorn distance and the loss function is nonconvex and possibly unbounded. Such a distance allows to model uncertainty of distributions with different probability supports and divergence functions. For this class of regularized DRO problems, we derive a novel dual formulation taking the form of nested stochastic optimization, where the dual variable depends on the data sample. To solve the dual problem, we provide theoretical evidence to design a nested stochastic gradient descent (SGD) algorithm, which leverages stochastic approximation to estimate the nested stochastic gradients. We study the convergence rate of nested SGD and establish polynomial iteration and sample complexities that are independent of the data size and parameter dimension, indicating its potential for solving large-scale DRO problems. We conduct numerical experiments to demonstrate the efficiency and robustness of the proposed algorithm.

nan


Article 1161

Title@2025-06-26 (4): VersaPRM: Multi-Domain Process Reward Model via Synthetic Reasoning Data

Title: VersaPRM: Multi-Domain Process Reward Model via Synthetic Reasoning Data VersaPRM: Multi-Domain-Prozess-Reward-Modell über synthetisch begründete Daten VersaPRM:通过合成理由数据进行多部进程回报模型 2502.06737v2

Authors (16): Thomas Zeng, Shuibai Zhang, Shutong Wu, Christian Classen, Daewon Chae, Ethan Ewer, Minjae Lee, Heeju Kim, Wonjun Kang, Jackson Kunde, Ying Fan, Jungtaek Kim, Hyung Il Koo, Kannan Ramchandran, Dimitris Papailiopoulos, Kangwook Lee

Process Reward Models (PRMs) have proven effective at enhancing mathematical reasoning for Large Language Models (LLMs) by leveraging increased inference-time computation. However, they are predominantly trained on mathematical data and their generalizability to non-mathematical domains has not been rigorously studied. In response, this work first shows that current PRMs have poor performance in other domains. To address this limitation, we introduce VersaPRM, a multi-domain PRM trained on synthetic reasoning data generated using our novel data generation and annotation method. VersaPRM achieves consistent performance gains across diverse domains. For instance, in the MMLU-Pro category of Law, VersaPRM via weighted majority voting, achieves a 7.9% performance gain over the majority voting baseline – surpassing Qwen2.5-Math-PRM’s gain of 1.3%. We further contribute to the community by open-sourcing all data, code and models for VersaPRM.

nan


Article 1162

Title@2025-06-26 (4): TADA: Improved Diffusion Sampling with Training-free Augmented Dynamics

Title: TADA: Improved Diffusion Sampling with Training-free Augmented Dynamics TADA: Verbessertes Diffusions-Sampling mit trainingsfreier Augmented Dynamics TAD: 改进无培训增强动力的传播抽样 2506.21757v1

Authors (6): Tianrong Chen, Huangjie Zheng, David Berthelot, Jiatao Gu, Josh Susskind, Shuangfei Zhai

Diffusion models have demonstrated exceptional capabilities in generating high-fidelity images but typically suffer from inefficient sampling. Many solver designs and noise scheduling strategies have been proposed to dramatically improve sampling speeds. In this paper, we introduce a new sampling method that is up to $186\%$ faster than the current state of the art solver for comparative FID on ImageNet512. This new sampling method is training-free and uses an ordinary differential equation (ODE) solver. The key to our method resides in using higher-dimensional initial noise, allowing to produce more detailed samples with less function evaluations from existing pretrained diffusion models. In addition, by design our solver allows to control the level of detail through a simple hyper-parameter at no extra computational cost. We present how our approach leverages momentum dynamics by establishing a fundamental equivalence between momentum diffusion models and conventional diffusion models with respect to their training paradigms. Moreover, we observe the use of higher-dimensional noise naturally exhibits characteristics similar to stochastic differential equations (SDEs). Finally, we demonstrate strong performances on a set of representative pretrained diffusion models, including EDM, EDM2, and Stable-Diffusion 3, which cover models in both pixel and latent spaces, as well as class and text conditional settings. The code is available at https://github.com/apple/ml-tada.

nan


Article 1163

Title@2025-06-26 (4): Beyond Conformal Predictors: Adaptive Conformal Inference with Confidence Predictors

Title: Beyond Conformal Predictors: Adaptive Conformal Inference with Confidence Predictors Beyond Conformal Predictors: Adaptive Conformal Inferenz mit Confidence Predictors 超越非正式预测者:与信心预测者发生适应性非正式推断 2409.15548v4

Authors (2): Johan Hallberg Szabadváry, Tuwe Löfström

Adaptive Conformal Inference (ACI) provides finite-sample coverage guarantees, enhancing the prediction reliability under non-exchangeability. This study demonstrates that these desirable properties of ACI do not require the use of Conformal Predictors (CP). We show that the guarantees hold for the broader class of confidence predictors, defined by the requirement of producing nested prediction sets, a property we argue is essential for meaningful confidence statements. We empirically investigate the performance of Non-Conformal Confidence Predictors (NCCP) against CP when used with ACI on non-exchangeable data. In online settings, the NCCP offers significant computational advantages while maintaining a comparable predictive efficiency. In batch settings, inductive NCCP (INCCP) can outperform inductive CP (ICP) by utilising the full training dataset without requiring a separate calibration set, leading to improved efficiency, particularly when the data are limited. Although these initial results highlight NCCP as a theoretically sound and practically effective alternative to CP for uncertainty quantification with ACI in non-exchangeable scenarios, further empirical studies are warranted across diverse datasets and predictors.

nan


Article 1164

Title@2025-06-26 (4): Inverse Design of Diffractive Metasurfaces Using Diffusion Models

Title: Inverse Design of Diffractive Metasurfaces Using Diffusion Models Inverses Design von diffraktiven Metaoberflächen unter Verwendung von Diffusionsmodellen 使用扩散模型反向设计 diffractive 元表层 2506.21748v1

Authors (5): Liav Hen, Erez Yosef, Dan Raviv, Raja Giryes, Jacob Scheuer

Metasurfaces are ultra-thin optical elements composed of engineered sub-wavelength structures that enable precise control of light. Their inverse design - determining a geometry that yields a desired optical response - is challenging due to the complex, nonlinear relationship between structure and optical properties. This often requires expert tuning, is prone to local minima, and involves significant computational overhead. In this work, we address these challenges by integrating the generative capabilities of diffusion models into computational design workflows. Using an RCWA simulator, we generate training data consisting of metasurface geometries and their corresponding far-field scattering patterns. We then train a conditional diffusion model to predict meta-atom geometry and height from a target spatial power distribution at a specified wavelength, sampled from a continuous supported band. Once trained, the model can generate metasurfaces with low error, either directly using RCWA-guided posterior sampling or by serving as an initializer for traditional optimization methods. We demonstrate our approach on the design of a spatially uniform intensity splitter and a polarization beam splitter, both produced with low error in under 30 minutes. To support further research in data-driven metasurface design, we publicly release our code and datasets.

nan


Article 1165

Title@2025-06-26 (4): Analysis of static and dynamic batching algorithms for graph neural networks

Title: Analysis of static and dynamic batching algorithms for graph neural networks Analyse von statischen und dynamischen Batching-Algorithmen für Graphen-Neuralnetzwerke 图表神经网络静态和动态批量算法分析 2502.00944v2

Authors (5): Daniel T. Speckhard, Tim Bechtel, Sebastian Kehl, Jonathan Godwin, Claudia Draxl

Graph neural networks (GNN) have shown promising results for several domains such as materials science, chemistry, and the social sciences. GNN models often contain millions of parameters, and like other neural network (NN) models, are often fed only a fraction of the graphs that make up the training dataset in batches to update model parameters. The effect of batching algorithms on training time and model performance has been thoroughly explored for NNs but not yet for GNNs. We analyze two different batching algorithms for graph based models, namely static and dynamic batching for two datasets, the QM9 dataset of small molecules and the AFLOW materials database. Our experiments show that changing the batching algorithm can provide up to a 2.7x speedup, but the fastest algorithm depends on the data, model, batch size, hardware, and number of training steps run. Experiments show that for a select number of combinations of batch size, dataset, and model, significant differences in model learning metrics are observed between static and dynamic batching algorithms.

nan


Article 1166

Title@2025-06-26 (4): Asymmetric Graph Error Control with Low Complexity in Causal Bandits

Title: Asymmetric Graph Error Control with Low Complexity in Causal Bandits Asymmetrische Graphenfehlerkontrolle mit geringer Komplexität bei Kausalbändern Causal 土匪中低复杂度的对称图形错误控制 2408.11240v2

Authors (3): Chen Peng, Di Zhang, Urbashi Mitra

In this paper, the causal bandit problem is investigated, with the objective of maximizing the long-term reward by selecting an optimal sequence of interventions on nodes in an unknown causal graph. It is assumed that both the causal topology and the distribution of interventions are unknown. First, based on the difference between the two types of graph identification errors (false positives and negatives), a causal graph learning method is proposed. Numerical results suggest that this method has a much lower sample complexity relative to the prior art by learning sub-graphs. However, we note that a sample complexity analysis for the new algorithm has not been undertaken, as of yet. Under the assumption of minimum-mean squared error weight estimation, a new uncertainty bound tailored to the causal bandit problem is derived. This uncertainty bound drives an upper confidence bound-based intervention selection to optimize the reward. Further, we consider a particular instance of non-stationary bandits wherein both the causal topology and interventional distributions can change. Our solution is the design of a sub-graph change detection mechanism that requires a modest number of samples. Numerical results compare the new methodology to existing schemes and show a substantial performance improvement in stationary and non-stationary settings. Averaged over 100 randomly generated causal bandits, the proposed scheme takes significantly fewer samples to learn the causal structure and achieves a reward gain of 85% compared to existing approaches.

nan


Article 1167

Title@2025-06-26 (4): Zebra: In-Context Generative Pretraining for Solving Parametric PDEs

Title: Zebra: In-Context Generative Pretraining for Solving Parametric PDEs Zebra: Im Kontext Generatives Vortraining zur Lösung parametrischer PDEs Zebra: 解决参数PDE的内源生成预设培训 2410.03437v3

Authors (5): Louis Serrano, Armand Kassaï Koupaï, Thomas X Wang, Pierre Erbacher, Patrick Gallinari

Solving time-dependent parametric partial differential equations (PDEs) is challenging for data-driven methods, as these models must adapt to variations in parameters such as coefficients, forcing terms, and initial conditions. State-of-the-art neural surrogates perform adaptation through gradient-based optimization and meta-learning to implicitly encode the variety of dynamics from observations. This often comes with increased inference complexity. Inspired by the in-context learning capabilities of large language models (LLMs), we introduce Zebra, a novel generative auto-regressive transformer designed to solve parametric PDEs without requiring gradient adaptation at inference. By leveraging in-context information during both pre-training and inference, Zebra dynamically adapts to new tasks by conditioning on input sequences that incorporate context example trajectories. As a generative model, Zebra can be used to generate new trajectories and allows quantifying the uncertainty of the predictions. We evaluate Zebra across a variety of challenging PDE scenarios, demonstrating its adaptability, robustness, and superior performance compared to existing approaches.

nan


Article 1168

Title@2025-06-26 (4): Federated Item Response Theory Models

Title: Federated Item Response Theory Models Föderated Item Response Theory Modelle 联邦项目反应理论模型 2506.21744v1

Authors (3): Biying Zhou, Nanyu Luo, Feng Ji

Item Response Theory (IRT) models have been widely used to estimate respondents’ latent abilities and calibrate items’ difficulty. Traditional IRT estimation requires all individual raw response data to be centralized in one place, thus potentially causing privacy issues. Federated learning is an emerging field in computer science and machine learning with added features of privacy protection and distributed computing. To integrate the advances from federated learning with modern psychometrics, we propose a novel framework, Federated Item Response Theory (IRT), to enable estimating traditional IRT models with additional privacy, allowing estimation in a distributed manner without losing estimation accuracy. Our numerical experiments confirm that FedIRT achieves statistical accuracy similar to standard IRT estimation using popular R packages, while offering critical advantages: privacy protection and reduced communication costs. We also validate FedIRT’s utility through a real-world exam dataset, demonstrating its effectiveness in realistic educational contexts. This new framework extends IRT’s applicability to distributed settings, such as multi-school assessments, without sacrificing accuracy or security. To support practical adoption, we provide an open-ource R package, FedIRT, implementing the framework for the two-parameter logistic (2PL) and partial credit models (PCM).

nan


Article 1169

Title@2025-06-26 (4): Storm Surge in Color: RGB-Encoded Physics-Aware Deep Learning for Storm Surge Forecasting

Title: Storm Surge in Color: RGB-Encoded Physics-Aware Deep Learning for Storm Surge Forecasting Storm Surge in Farbe: RGB-kodierte Physik-Bewusstes Lernen für Storm Surge Forecasting 彩色暴风雨: RGB-Encodd 物理软件深入学习以预测风暴暴雨。 2506.21743v1

Authors (5): Jinpai Zhao, Albert Cerrone, Eirik Valseth, Leendert Westerink, Clint Dawson

Storm surge forecasting plays a crucial role in coastal disaster preparedness, yet existing machine learning approaches often suffer from limited spatial resolution, reliance on coastal station data, and poor generalization. Moreover, many prior models operate directly on unstructured spatial data, making them incompatible with modern deep learning architectures. In this work, we introduce a novel approach that projects unstructured water elevation fields onto structured Red Green Blue (RGB)-encoded image representations, enabling the application of Convolutional Long Short Term Memory (ConvLSTM) networks for end-to-end spatiotemporal surge forecasting. Our model further integrates ground-truth wind fields as dynamic conditioning signals and topo-bathymetry as a static input, capturing physically meaningful drivers of surge evolution. Evaluated on a large-scale dataset of synthetic storms in the Gulf of Mexico, our method demonstrates robust 48-hour forecasting performance across multiple regions along the Texas coast and exhibits strong spatial extensibility to other coastal areas. By combining structured representation, physically grounded forcings, and scalable deep learning, this study advances the frontier of storm surge forecasting in usability, adaptability, and interpretability.

nan


Article 1170

Title@2025-06-26 (4): Critically-Damped Higher-Order Langevin Dynamics

Title: Critically-Damped Higher-Order Langevin Dynamics Critically-Damped Higher-Order Langevin Dynamics 高分朗埃文动力公司 2506.21741v1

Authors (3): Benjamin Sterling, Chad Gueli, Mónica F. Bugallo

Denoising Diffusion Probabilistic Models represent an entirely new class of generative AI methods that have yet to be fully explored. Critical damping has been successfully introduced in Critically-Damped Langevin Dynamics (CLD) and Critically-Damped Third-Order Langevin Dynamics (TOLD++), but has not yet been applied to dynamics of arbitrary order. The proposed line of work generalizes Higher-Order Langevin Dynamics (HOLD), a recent state-of-the-art diffusion method, by introducing the concept of critical damping from systems analysis.

nan


Article 1171

Title@2025-06-26 (4): Modification of a Numerical Method Using FIR Filters in a Time-dependent SIR Model for COVID-19

Title: Modification of a Numerical Method Using FIR Filters in a Time-dependent SIR Model for COVID-19 Änderung einer numerischen Methode mit FIR-Filtern in einem zeitabhängigen SIR-Modell für COVID-19 修改CCOVID-19基于时间的SIR模型中使用FIR过滤器的数字方法 2506.21739v1

Authors (2): Felipe Rogério Pimentel, Rafael Gustavo Alves

Authors Yi-Cheng Chen, Ping-En Lu, Cheng-Shang Chang, and Tzu-Hsuan Liu use the Finite Impulse Response (FIR) linear system filtering method to track and predict the number of people infected and recovered from COVID-19, in a pandemic context in which there was still no vaccine and the only way to avoid contagion was isolation. To estimate the coefficients of these FIR filters, Chen et al. used machine learning methods through a classical optimization problem with regularization (ridge regression). These estimated coefficients are called ridge coefficients. The epidemic mathematical model adopted by these researchers to formulate the FIR filters is the time-dependent discrete SIR. In this paper, we propose a small modification to the algorithm of Chen et al. to obtain the ridge coefficients. We then used this modified algorithm to track and predict the number of people infected and recovered from COVID-19 in the state of Minas Gerais/Brazil, within a prediction window, during the initial period of the pandemic. We also compare the predicted data with the respective real data to check how good the approximation is. In the modified algorithm, we set values for the FIR filter orders and for the regularization parameters, both different from the respective values defined by Chen et al. in their algorithm. In this context, the numerical results obtained by the modified algorithm in some simulations present better approximation errors compared to the respective approximation errors presented by the algorithm of Chen et al.

nan


Article 1172

Title@2025-06-26 (4): Hierarchical Reasoning Model

Title: Hierarchical Reasoning Model Hierarchisches Modell der Vernunft 等级推理模型 2506.21734v1

Authors (9): Guan Wang, Jin Li, Yuhao Sun, Xing Chen, Changling Liu, Yue Wu, Meng Lu, Sen Song, Yasin Abbasi Yadkori

Reasoning, the process of devising and executing complex goal-oriented action sequences, remains a critical challenge in AI. Current large language models (LLMs) primarily employ Chain-of-Thought (CoT) techniques, which suffer from brittle task decomposition, extensive data requirements, and high latency. Inspired by the hierarchical and multi-timescale processing in the human brain, we propose the Hierarchical Reasoning Model (HRM), a novel recurrent architecture that attains significant computational depth while maintaining both training stability and efficiency. HRM executes sequential reasoning tasks in a single forward pass without explicit supervision of the intermediate process, through two interdependent recurrent modules: a high-level module responsible for slow, abstract planning, and a low-level module handling rapid, detailed computations. With only 27 million parameters, HRM achieves exceptional performance on complex reasoning tasks using only 1000 training samples. The model operates without pre-training or CoT data, yet achieves nearly perfect performance on challenging tasks including complex Sudoku puzzles and optimal path finding in large mazes. Furthermore, HRM outperforms much larger models with significantly longer context windows on the Abstraction and Reasoning Corpus (ARC), a key benchmark for measuring artificial general intelligence capabilities. These results underscore HRM’s potential as a transformative advancement toward universal computation and general-purpose reasoning systems.

nan


Article 1173

Title@2025-06-26 (4): Experimental investigation of pose informed reinforcement learning for skid-steered visual navigation

Title: Experimental investigation of pose informed reinforcement learning for skid-steered visual navigation Experimentelle Untersuchung des Pose-informierten Verstärkungslernens für die geschliffene visuelle Navigation 进行实验性调查,以进行有事实根据的强化学习,以进行滑滑滑式先行目视导航的强化学习 2506.21732v1

Authors (2): Ameya Salvi, Venkat Krovi

Vision-based lane keeping is a topic of significant interest in the robotics and autonomous ground vehicles communities in various on-road and off-road applications. The skid-steered vehicle architecture has served as a useful vehicle platform for human controlled operations. However, systematic modeling, especially of the skid-slip wheel terrain interactions (primarily in off-road settings) has created bottlenecks for automation deployment. End-to-end learning based methods such as imitation learning and deep reinforcement learning, have gained prominence as a viable deployment option to counter the lack of accurate analytical models. However, the systematic formulation and subsequent verification/validation in dynamic operation regimes (particularly for skid-steered vehicles) remains a work in progress. To this end, a novel approach for structured formulation for learning visual navigation is proposed and investigated in this work. Extensive software simulations, hardware evaluations and ablation studies now highlight the significantly improved performance of the proposed approach against contemporary literature.

nan


Article 1174

Title@2025-06-26 (4): Learning treatment effects while treating those in need

Title: Learning treatment effects while treating those in need Lernbehandlungseffekte bei der Behandlung von Bedürftigen 治疗有需要者的同时的学习治疗效果 2407.07596v2

Authors (2): Bryan Wilder, Pim Welle

Many social programs attempt to allocate scarce resources to people with the greatest need. Indeed, public services increasingly use algorithmic risk assessments motivated by this goal. However, targeting the highest-need recipients often conflicts with attempting to evaluate the causal effect of the program as a whole, as the best evaluations would be obtained by randomizing the allocation. We propose a framework to design randomized allocation rules which optimally balance targeting high-need individuals with learning treatment effects, presenting policymakers with a Pareto frontier between the two goals. We give sample complexity guarantees for the policy learning problem and provide a computationally efficient strategy to implement it. We then collaborate with the human services department of Allegheny County, Pennsylvania to evaluate our methods on data from real service delivery settings. Optimized policies can substantially mitigate the tradeoff between learning and targeting. For example, it is often possible to obtain 90% of the optimal utility in targeting high-need individuals while ensuring that the average treatment effect can be estimated with less than 2 times the samples that a randomized controlled trial would require. Mechanisms for targeting public services often focus on measuring need as accurately as possible. However, our results suggest that algorithmic systems in public services can be most impactful if they incorporate program evaluation as an explicit goal alongside targeting.

nan


Article 1175

Title@2025-06-26 (4): CaloHadronic: a diffusion model for the generation of hadronic showers

Title: CaloHadronic: a diffusion model for the generation of hadronic showers CaloHadronic: ein Diffusionsmodell für die Erzeugung von Hadronikduschen Calo Hadronic: 生成超时速淋浴的传播模型 2506.21720v1

Authors (7): Thorsten Buss, Frank Gaede, Gregor Kasieczka, Anatolii Korol, Katja Krüger, Peter McKeown, Martina Mozzanica

Simulating showers of particles in highly-granular calorimeters is a key frontier in the application of machine learning to particle physics. Achieving high accuracy and speed with generative machine learning models can enable them to augment traditional simulations and alleviate a major computing constraint. Recent developments have shown how diffusion based generative shower simulation approaches that do not rely on a fixed structure, but instead generate geometry-independent point clouds, are very efficient. We present a transformer-based extension to previous architectures which were developed for simulating electromagnetic showers in the highly granular electromagnetic calorimeter of the International Large Detector, ILD. The attention mechanism now allows us to generate complex hadronic showers with more pronounced substructure across both the electromagnetic and hadronic calorimeters. This is the first time that machine learning methods are used to holistically generate showers across the electromagnetic and hadronic calorimeter in highly granular imaging calorimeter systems.

nan


Article 1176

Title@2025-06-26 (4): Performance Prediction for Large Systems via Text-to-Text Regression

Title: Performance Prediction for Large Systems via Text-to-Text Regression Leistungsvorhersage für große Systeme über Text-zu-Text-Regression 通过文字到文字倒退对大型系统的性能预测 2506.21718v1

Authors (10): Yash Akhauri, Bryan Lewandowski, Cheng-Hsi Lin, Adrian N. Reyes, Grant C. Forbes, Arissa Wongpanich, Bangding Yang, Mohamed S. Abdelfattah, Sagi Perel, Xingyou Song

In many industries, predicting metric outcomes of large systems is a fundamental problem, driven largely by traditional tabular regression. However, such methods struggle on complex systems data in the wild such as configuration files or system logs, where feature engineering is often infeasible. We propose text-to-text regression as a general, scalable alternative. For predicting resource efficiency on Borg, Google’s massive compute cluster scheduling system, a 60M parameter encoder-decoder, trained from random initialization, achieves up to a near perfect 0.99 (0.9 average) rank correlation across the entire fleet, and 100x lower MSE than tabular approaches. The model also easily adapts to new tasks in only 500 few-shot examples and captures the densities of complex outcome distributions. Ablation studies highlight the importance of using encoders, increasing sequence length, and the model’s inherent uncertainty quantification. These findings pave the way for universal simulators of real-world outcomes.

nan


Article 1177

Title@2025-06-26 (4): Explicit neural network classifiers for non-separable data

Title: Explicit neural network classifiers for non-separable data Explizite neuronale Netzwerkklassifikatoren für nicht voneinander getrennte Daten 对不可分离数据进行明确的神经网络分类 2504.18710v2

Authors (1): Patrícia Muñoz Ewald

We fully characterize a large class of feedforward neural networks in terms of truncation maps. As an application, we show how a ReLU neural network can implement a feature map which separates concentric data.

nan


Article 1178

Title@2025-06-26 (4): CRISP-NAM: Competing Risks Interpretable Survival Prediction with Neural Additive Models

Title: CRISP-NAM: Competing Risks Interpretable Survival Prediction with Neural Additive Models CRISP-NAM: Konkurrenzfähige Risiken interpretierbare Überlebensvorhersage mit neuralen Additivenmodellen CRIISP-NAM: 与神经添加模型相竞争的风险解释性生存预测 2505.21360v3

Authors (2): Dhanesh Ramachandram, Ananya Raval

Competing risks are crucial considerations in survival modelling, particularly in healthcare domains where patients may experience multiple distinct event types. We propose CRISP-NAM (Competing Risks Interpretable Survival Prediction with Neural Additive Models), an interpretable neural additive model for competing risks survival analysis which extends the neural additive architecture to model cause-specific hazards while preserving feature-level interpretability. Each feature contributes independently to risk estimation through dedicated neural networks, allowing for visualization of complex non-linear relationships between covariates and each competing risk. We demonstrate competitive performance on multiple datasets compared to existing approaches.

nan


Article 1179

Title@2025-06-26 (4): Statistical Inference of the Value Function for Reinforcement Learning in Infinite Horizon Settings

Title: Statistical Inference of the Value Function for Reinforcement Learning in Infinite Horizon Settings Statistische Schlussfolgerung der Wertfunktion für die Stärkung des Lernens in unendlichen Horizont-Einstellungen 在无限地平地环境中加强学习的价值观职能的统计推论 2001.04515v3

Authors (4): C. Shi, S. Zhang, W. Lu, R. Song

Reinforcement learning is a general technique that allows an agent to learn an optimal policy and interact with an environment in sequential decision making problems. The goodness of a policy is measured by its value function starting from some initial state. The focus of this paper is to construct confidence intervals (CIs) for a policy’s value in infinite horizon settings where the number of decision points diverges to infinity. We propose to model the action-value state function (Q-function) associated with a policy based on series/sieve method to derive its confidence interval. When the target policy depends on the observed data as well, we propose a SequentiAl Value Evaluation (SAVE) method to recursively update the estimated policy and its value estimator. As long as either the number of trajectories or the number of decision points diverges to infinity, we show that the proposed CI achieves nominal coverage even in cases where the optimal policy is not unique. Simulation studies are conducted to back up our theoretical findings. We apply the proposed method to a dataset from mobile health studies and find that reinforcement learning algorithms could help improve patient’s health status. A Python implementation of the proposed procedure is available at https://github.com/shengzhang37/SAVE.

nan


Article 1180

Title@2025-06-26 (4): Unimodal Strategies in Density-Based Clustering

Title: Unimodal Strategies in Density-Based Clustering Unimodale Strategien in Dichte-basiertem Clustering 以密度为基础的群集统一模式战略 2506.21695v1

Authors (3): Oron Nir, Jay Tenenbaum, Ariel Shamir

Density-based clustering methods often surpass centroid-based counterparts, when addressing data with noise or arbitrary data distributions common in real-world problems. In this study, we reveal a key property intrinsic to density-based clustering methods regarding the relation between the number of clusters and the neighborhood radius of core points - we empirically show that it is nearly unimodal, and support this claim theoretically in a specific setting. We leverage this property to devise new strategies for finding appropriate values for the radius more efficiently based on the Ternary Search algorithm. This is especially important for large scale data that is high-dimensional, where parameter tuning is computationally intensive. We validate our methodology through extensive applications across a range of high-dimensional, large-scale NLP, Audio, and Computer Vision tasks, demonstrating its practical effectiveness and robustness. This work not only offers a significant advancement in parameter control for density-based clustering but also broadens the understanding regarding the relations between their guiding parameters. Our code is available at https://github.com/oronnir/UnimodalStrategies.

nan


Article 1181

Title@2025-06-26 (4): Multimodal Misinformation Detection Using Early Fusion of Linguistic, Visual, and Social Features

Title: Multimodal Misinformation Detection Using Early Fusion of Linguistic, Visual, and Social Features Multimodale Fehlinformationserkennung mittels frühzeitiger Fusion sprachlicher, visueller und sozialer Merkmale 利用语言、视觉和社会特征的早期融合来进行多模式错误信息探测 2507.01984v1

Authors (1): Gautam Kishore Shahi

Amid a tidal wave of misinformation flooding social media during elections and crises, extensive research has been conducted on misinformation detection, primarily focusing on text-based or image-based approaches. However, only a few studies have explored multimodal feature combinations, such as integrating text and images for building a classification model to detect misinformation. This study investigates the effectiveness of different multimodal feature combinations, incorporating text, images, and social features using an early fusion approach for the classification model. This study analyzed 1,529 tweets containing both text and images during the COVID-19 pandemic and election periods collected from Twitter (now X). A data enrichment process was applied to extract additional social features, as well as visual features, through techniques such as object detection and optical character recognition (OCR). The results show that combining unsupervised and supervised machine learning models improves classification performance by 15% compared to unimodal models and by 5% compared to bimodal models. Additionally, the study analyzes the propagation patterns of misinformation based on the characteristics of misinformation tweets and the users who disseminate them.

nan


Article 1182

Title: Link Prediction with Physics-Inspired Graph Neural Networks Verknüpfung von Vorhersagen mit physikinspirierten Graphen-Neuralen Netzwerken 将预测与物理启发的图形神经网络联系起来 2402.14802v3

Authors (4): Andrea Giuseppe Di Francesco, Francesco Caso, Maria Sofia Bucarelli, Fabrizio Silvestri

The message-passing mechanism underlying Graph Neural Networks (GNNs) is not naturally suited for heterophilic datasets, where adjacent nodes often have different labels. Most solutions to this problem remain confined to the task of node classification. In this article, we focus on the valuable task of link prediction under heterophily, an interesting problem for recommendation systems, social network analysis, and other applications. GNNs like GRAFF have improved node classification under heterophily by incorporating physics biases in the architecture. Similarly, we propose GRAFF-LP, an extension of GRAFF for link prediction. We show that GRAFF-LP effectively discriminates existing from non-existing edges by learning implicitly to separate the edge gradients. Based on this information, we propose a new readout function inspired by physics. Remarkably, this new function not only enhances the performance of GRAFF-LP but also improves that of other baseline models, leading us to reconsider how every link prediction experiment has been conducted so far. Finally, we provide evidence that even simple GNNs did not experience greater difficulty in predicting heterophilic links compared to homophilic ones. This leads us to believe in the necessity for heterophily measures specifically tailored for link prediction, distinct from those used in node classification. The code and appendix are available at https://github.com/difra100/Link_Prediction_with_PIGNN_IJCNN.

nan


Article 1183

Title@2025-06-26 (4): ANUBHUTI: A Comprehensive Corpus For Sentiment Analysis In Bangla Regional Languages

Title: ANUBHUTI: A Comprehensive Corpus For Sentiment Analysis In Bangla Regional Languages ANUBHUTI: Ein umfassender Corpus für die Sentimentanalyse in Bangla Regionalsprachen ANUBUHUTI:孟加拉语地区语言中感应分析综合整体体 2506.21686v1

Authors (4): Swastika Kundu, Autoshi Ibrahim, Mithila Rahman, Tanvir Ahmed

Sentiment analysis for regional dialects of Bangla remains an underexplored area due to linguistic diversity and limited annotated data. This paper introduces ANUBHUTI, a comprehensive dataset consisting of 2000 sentences manually translated from standard Bangla into four major regional dialects Mymensingh, Noakhali, Sylhet, and Chittagong. The dataset predominantly features political and religious content, reflecting the contemporary socio political landscape of Bangladesh, alongside neutral texts to maintain balance. Each sentence is annotated using a dual annotation scheme: multiclass thematic labeling categorizes sentences as Political, Religious, or Neutral, and multilabel emotion annotation assigns one or more emotions from Anger, Contempt, Disgust, Enjoyment, Fear, Sadness, and Surprise. Expert native translators conducted the translation and annotation, with quality assurance performed via Cohens Kappa inter annotator agreement, achieving strong consistency across dialects. The dataset was further refined through systematic checks for missing data, anomalies, and inconsistencies. ANUBHUTI fills a critical gap in resources for sentiment analysis in low resource Bangla dialects, enabling more accurate and context aware natural language processing.

nan


Article 1184

Title@2025-06-26 (4): Risk-Averse Total-Reward Reinforcement Learning

Title: Risk-Averse Total-Reward Reinforcement Learning Risiko-Averse Total-Reward-Verstärkungs-Lernen 反风险 反风险 反 反 反 反 反 反 反 反 反 反 反 反 反 反 反 反 反 反 反 反 反 反 反 反 反 反 反 反 反 反 反 反 反 反 反 反 反 反 反 反 反 反 反 反 反 2506.21683v1

Authors (5): Xihong Su, Jia Lin Hau, Gersi Doko, Kishan Panaganti, Marek Petrik

Risk-averse total-reward Markov Decision Processes (MDPs) offer a promising framework for modeling and solving undiscounted infinite-horizon objectives. Existing model-based algorithms for risk measures like the entropic risk measure (ERM) and entropic value-at-risk (EVaR) are effective in small problems, but require full access to transition probabilities. We propose a Q-learning algorithm to compute the optimal stationary policy for total-reward ERM and EVaR objectives with strong convergence and performance guarantees. The algorithm and its optimality are made possible by ERM’s dynamic consistency and elicitability. Our numerical results on tabular domains demonstrate quick and reliable convergence of the proposed Q-learning algorithm to the optimal risk-averse value function.

nan


Article 1185

Title@2025-06-26 (4): TanDiT: Tangent-Plane Diffusion Transformer for High-Quality 360° Panorama Generation

Title: TanDiT: Tangent-Plane Diffusion Transformer for High-Quality 360° Panorama Generation TanDiT: Tangent-Plane Diffusion Transformer für hochwertige 360° Panorama-Generation TANDIT: 高质360°全景生成的相切电流扩散变异器 2506.21681v1

Authors (6): Hakan Çapuk, Andrew Bond, Muhammed Burak Kızıl, Emir Göçen, Erkut Erdem, Aykut Erdem

Recent advances in image generation have led to remarkable improvements in synthesizing perspective images. However, these models still struggle with panoramic image generation due to unique challenges, including varying levels of geometric distortion and the requirement for seamless loop-consistency. To address these issues while leveraging the strengths of the existing models, we introduce TanDiT, a method that synthesizes panoramic scenes by generating grids of tangent-plane images covering the entire 360$^\circ$ view. Unlike previous methods relying on multiple diffusion branches, TanDiT utilizes a unified diffusion model trained to produce these tangent-plane images simultaneously within a single denoising iteration. Furthermore, we propose a model-agnostic post-processing step specifically designed to enhance global coherence across the generated panoramas. To accurately assess panoramic image quality, we also present two specialized metrics, TangentIS and TangentFID, and provide a comprehensive benchmark comprising captioned panoramic datasets and standardized evaluation scripts. Extensive experiments demonstrate that our method generalizes effectively beyond its training data, robustly interprets detailed and complex text prompts, and seamlessly integrates with various generative models to yield high-quality, diverse panoramic images.

nan


Article 1186

Title@2025-06-26 (4): Whole-Body Conditioned Egocentric Video Prediction

Title: Whole-Body Conditioned Egocentric Video Prediction Ganzkörperbedingte egozentrische Videovorhersage 整盘有条件的Egocentcent视频预报 2506.21552v1

Authors (6): Yutong Bai, Danny Tran, Amir Bar, Yann LeCun, Trevor Darrell, Jitendra Malik

We train models to Predict Ego-centric Video from human Actions (PEVA), given the past video and an action represented by the relative 3D body pose. By conditioning on kinematic pose trajectories, structured by the joint hierarchy of the body, our model learns to simulate how physical human actions shape the environment from a first-person point of view. We train an auto-regressive conditional diffusion transformer on Nymeria, a large-scale dataset of real-world egocentric video and body pose capture. We further design a hierarchical evaluation protocol with increasingly challenging tasks, enabling a comprehensive analysis of the model’s embodied prediction and control abilities. Our work represents an initial attempt to tackle the challenges of modeling complex real-world environments and embodied agent behaviors with video prediction from the perspective of a human.

nan


Article 1187

Title@2025-06-26 (4): mTSBench: Benchmarking Multivariate Time Series Anomaly Detection and Model Selection at Scale

Title: mTSBench: Benchmarking Multivariate Time Series Anomaly Detection and Model Selection at Scale mTSBench: Benchmarking Multivariate Zeitreihen Anomalieerkennung und Modellauswahl auf Scale mTSBench:制定多变时间序列基准 2506.21550v1

Authors (3): Xiaona Zhou, Constantin Brif, Ismini Lourentzou

Multivariate time series anomaly detection (MTS-AD) is critical in domains like healthcare, cybersecurity, and industrial monitoring, yet remains challenging due to complex inter-variable dependencies, temporal dynamics, and sparse anomaly labels. We introduce mTSBench, the largest benchmark to date for MTS-AD and unsupervised model selection, spanning 344 labeled time series across 19 datasets and 12 diverse application domains. mTSBench evaluates 24 anomaly detection methods, including large language model (LLM)-based detectors for multivariate time series, and systematically benchmarks unsupervised model selection techniques under standardized conditions. Consistent with prior findings, our results confirm that no single detector excels across datasets, underscoring the importance of model selection. However, even state-of-the-art selection methods remain far from optimal, revealing critical gaps. mTSBench provides a unified evaluation suite to enable rigorous, reproducible comparisons and catalyze future advances in adaptive anomaly detection and robust model selection.

nan


Article 1188

Title@2025-06-26 (4): Data Efficacy for Language Model Training

Title: Data Efficacy for Language Model Training Dateneffizienz für Sprachmodellschulungen 语文示范培训的数据效率 2506.21545v1

Authors (9): Yalun Dai, Yangyu Huang, Xin Zhang, Wenshan Wu, Chong Li, Wenhui Lu, Shijie Cao, Li Dong, Scarlett Li

Data is fundamental to the training of language models (LM). Recent research has been dedicated to data efficiency, which aims to maximize performance by selecting a minimal or optimal subset of training data. Techniques such as data filtering, sampling, and selection play a crucial role in this area. To complement it, we define Data Efficacy, which focuses on maximizing performance by optimizing the organization of training data and remains relatively underexplored. This work introduces a general paradigm, DELT, for considering data efficacy in LM training, which highlights the significance of training data organization. DELT comprises three components: Data Scoring, Data Selection, and Data Ordering. Among these components, we design Learnability-Quality Scoring (LQS), as a new instance of Data Scoring, which considers both the learnability and quality of each data sample from the gradient consistency perspective. We also devise Folding Ordering (FO), as a novel instance of Data Ordering, which addresses issues such as model forgetting and data distribution bias. Comprehensive experiments validate the data efficacy in LM training, which demonstrates the following: Firstly, various instances of the proposed DELT enhance LM performance to varying degrees without increasing the data scale and model size. Secondly, among these instances, the combination of our proposed LQS for data scoring and Folding for data ordering achieves the most significant improvement. Lastly, data efficacy can be achieved together with data efficiency by applying data selection. Therefore, we believe that data efficacy is a promising foundational area in LM training.

nan


Article 1189

Title@2025-06-26 (4): APO: Enhancing Reasoning Ability of MLLMs via Asymmetric Policy Optimization

Title: APO: Enhancing Reasoning Ability of MLLMs via Asymmetric Policy Optimization APO: Verbesserung der Fähigkeit von MLLMs durch asymmetrische politische Optimierung APO:通过非对称政策优化政策提高低液成比例模型的理性能力 2506.21655v1

Authors (7): Minjie Hong, Zirun Guo, Yan Xia, Zehan Wang, Ziang Zhang, Tao Jin, Zhou Zhao

Multimodal Large Language Models (MLLMs) are powerful at integrating diverse data, but they often struggle with complex reasoning. While Reinforcement learning (RL) can boost reasoning in LLMs, applying it to MLLMs is tricky. Common issues include a drop in performance on general tasks and the generation of overly detailed or “overthinking” reasoning. Our work investigates how the KL penalty and overthinking affect RL training in MLLMs. We propose Asymmetric Policy Optimization (APO) to address these issues, which divides the sampled responses into positive and negative groups. For positive samples, Difficulty-Adaptive Divergence Shaping (DADS) is introduced to dynamically adjust the KL divergence weight based on their difficulty. This method prevents policy entropy from dropping sharply, improves training stability, utilizes samples better, and preserves the model’s existing knowledge. For negative samples, Suboptimal Trajectory Complexity Regularization (STCR) is proposed to penalize overly long responses. This helps mitigate overthinking and encourages more concise reasoning while preserving the model’s explorative capacity. We apply our method to Qwen2.5-VL-3B, creating View-R1-3B. View-R1-3B significantly enhances reasoning capabilities, showing an average 7\% gain over the base model and outperforming larger MLLMs (7-11B) on various reasoning benchmarks. Importantly, unlike other reasoning-tuned MLLMs that often degrade on general tasks, View-R1-3B maintains consistent improvement, demonstrating superior generalization. These results highlight the effectiveness and broad applicability of our DADS and STCR techniques for advancing complex multimodal reasoning in MLLMs. The code will be made available at https://github.com/Indolent-Kawhi/View-R1.

nan


Article 1190

Title@2025-06-26 (4): Maximal Matching Matters: Preventing Representation Collapse for Robust Cross-Modal Retrieval

Title: Maximal Matching Matters: Preventing Representation Collapse for Robust Cross-Modal Retrieval Maximal aufeinander abgestimmte Materien: Vermeidung von Darstellungskollaps für robustes Cross-Modal Retrieval 最大匹配事项: 防止在强力跨模式检索中出现代表比例折叠 2506.21538v1

Authors (4): Hani Alomari, Anushka Sivakumar, Andrew Zhang, Chris Thomas

Cross-modal image-text retrieval is challenging because of the diverse possible associations between content from different modalities. Traditional methods learn a single-vector embedding to represent semantics of each sample, but struggle to capture nuanced and diverse relationships that can exist across modalities. Set-based approaches, which represent each sample with multiple embeddings, offer a promising alternative, as they can capture richer and more diverse relationships. In this paper, we show that, despite their promise, these set-based representations continue to face issues including sparse supervision and set collapse, which limits their effectiveness. To address these challenges, we propose Maximal Pair Assignment Similarity to optimize one-to-one matching between embedding sets which preserve semantic diversity within the set. We also introduce two loss functions to further enhance the representations: Global Discriminative Loss to enhance distinction among embeddings, and Intra-Set Divergence Loss to prevent collapse within each set. Our method achieves state-of-the-art performance on MS-COCO and Flickr30k without relying on external data.

nan


Article 1191

Title@2025-06-26 (4): Exploring the Design Space of 3D MLLMs for CT Report Generation

Title: Exploring the Design Space of 3D MLLMs for CT Report Generation Erforschung des Design-Raums von 3D-MLLMs für die CT-Berichtserstellung 为编写CT报告探索3D MLLMs的设计空间 2506.21535v1

Authors (5): Mohammed Baharoon, Jun Ma, Congyu Fang, Augustin Toma, Bo Wang

Multimodal Large Language Models (MLLMs) have emerged as a promising way to automate Radiology Report Generation (RRG). In this work, we systematically investigate the design space of 3D MLLMs, including visual input representation, projectors, Large Language Models (LLMs), and fine-tuning techniques for 3D CT report generation. We also introduce two knowledge-based report augmentation methods that improve performance on the GREEN score by up to 10\%, achieving the 2nd place on the MICCAI 2024 AMOS-MM challenge. Our results on the 1,687 cases from the AMOS-MM dataset show that RRG is largely independent of the size of LLM under the same training protocol. We also show that larger volume size does not always improve performance if the original ViT was pre-trained on a smaller volume size. Lastly, we show that using a segmentation mask along with the CT volume improves performance. The code is publicly available at https://github.com/bowang-lab/AMOS-MM-Solution

nan


Article 1192

Title@2025-06-26 (4): Chain-of-Sketch: Enabling Global Visual Reasoning

Title: Chain-of-Sketch: Enabling Global Visual Reasoning Chain-of-Sketch: Globale visuelle Vernunft aktivieren 标准链链:扶持全球视觉理性 2410.08165v2

Authors (5): Aryo Lotfi, Enrico Fini, Samy Bengio, Moin Nabi, Emmanuel Abbe

Modern vision models have achieved remarkable success in benchmarks where local features provide critical information about the target. There is now a growing interest in tackling tasks requiring more global reasoning, where local features do not provide significant information. Minsky and Papert put forward such tasks in 1969 with their connectivity study, exposing the limitations of the perceptron model. In this paper, we introduce an expanded set of global visual datasets involving graphs, strings, mazes, and image grids. We show that large vision models still struggle to learn these tasks efficiently. Similarly, state-of-the-art multi-modal LLMs perform poorly on these datasets. We explain this learning inefficiency by means of the ‘globality degree’ measure. To mitigate this, we propose a method called chain-of-sketch (CoS). Similar to the chain-of-thought and scratchpad techniques used in language models, CoS breaks the original task into intermediate visual steps to help learn a complex task. In addition, we show that not all CoS strategies perform equally well. Our key insight is to impose a Markovian structure on the CoS frames. This leads to the introduction of ‘inductive CoS’ which achieves better out-of-distribution generalization and performs well even with smaller models compared to non-inductive variants.

nan


Article 1193

Title@2025-06-26 (4): Mesh-Informed Neural Operator : A Transformer Generative Approach

Title: Mesh-Informed Neural Operator : A Transformer Generative Approach Mesh-informed Neural Operator : Ein transformer Generativer Ansatz 气象化神经操作器:变异创造方法 2506.16656v2

Authors (4): Yaozhong Shi, Zachary E. Ross, Domniki Asimaki, Kamyar Azizzadenesheli

Generative models in function spaces, situated at the intersection of generative modeling and operator learning, are attracting increasing attention due to their immense potential in diverse scientific and engineering applications. While functional generative models are theoretically domain- and discretization-agnostic, current implementations heavily rely on the Fourier Neural Operator (FNO), limiting their applicability to regular grids and rectangular domains. To overcome these critical limitations, we introduce the Mesh-Informed Neural Operator (MINO). By leveraging graph neural operators and cross-attention mechanisms, MINO offers a principled, domain- and discretization-agnostic backbone for generative modeling in function spaces. This advancement significantly expands the scope of such models to more diverse applications in generative, inverse, and regression tasks. Furthermore, MINO provides a unified perspective on integrating neural operators with general advanced deep learning architectures. Finally, we introduce a suite of standardized evaluation metrics that enable objective comparison of functional generative models, addressing another critical gap in the field.

nan


Article 1194

Title@2025-06-26 (4): Efficiently Escaping Saddle Points under Generalized Smoothness via Self-Bounding Regularity

Title: Efficiently Escaping Saddle Points under Generalized Smoothness via Self-Bounding Regularity Effiziente Flucht aus Sattelpunkten unter generalisierter Glätte durch selbsterklärende Regelmäßigkeit 通过自我调整常态,在普遍平滑状态下有效绕开散装货架点 2503.04712v2

Authors (4): Daniel Yiming Cao, August Y. Chen, Karthik Sridharan, Benjamin Tang

We study the optimization of non-convex functions that are not necessarily smooth (gradient and/or Hessian are Lipschitz) using first order methods. Smoothness is a restrictive assumption in machine learning in both theory and practice, motivating significant recent work on finding first order stationary points of functions satisfying generalizations of smoothness with first order methods. We develop a novel framework that lets us systematically study the convergence of a large class of first-order optimization algorithms (which we call decrease procedures) under generalizations of smoothness. We instantiate our framework to analyze the convergence of first order optimization algorithms to first and \textit{second} order stationary points under generalizations of smoothness. As a consequence, we establish the first convergence guarantees for first order methods to second order stationary points under generalizations of smoothness. We demonstrate that several canonical examples fall under our framework, and highlight practical implications.

nan


Article 1195

Title@2025-06-26 (4): Gaussian Invariant Markov Chain Monte Carlo

Title: Gaussian Invariant Markov Chain Monte Carlo Gaussian Invariant Markov Kette Monte Carlo Gausian Invarianant Markov 链 蒙特卡洛 2506.21511v1

Authors (4): Michalis K. Titsias, Angelos Alexopoulos, Siran Liu, Petros Dellaportas

We develop sampling methods, which consist of Gaussian invariant versions of random walk Metropolis (RWM), Metropolis adjusted Langevin algorithm (MALA) and second order Hessian or Manifold MALA. Unlike standard RWM and MALA we show that Gaussian invariant sampling can lead to ergodic estimators with improved statistical efficiency. This is due to a remarkable property of Gaussian invariance that allows us to obtain exact analytical solutions to the Poisson equation for Gaussian targets. These solutions can be used to construct efficient and easy to use control variates for variance reduction of estimators under any intractable target. We demonstrate the new samplers and estimators in several examples, including high dimensional targets in latent Gaussian models where we compare against several advanced methods and obtain state-of-the-art results. We also provide theoretical results regarding geometric ergodicity, and an optimal scaling analysis that shows the dependence of the optimal acceptance rate on the Gaussianity of the target.

nan


Article 1196

Title@2025-06-26 (4): skLEP: A Slovak General Language Understanding Benchmark

Title: skLEP: A Slovak General Language Understanding Benchmark sklep: Ein slowakisches allgemeines Sprachverständnis Benchmark SkLEP:斯洛伐克一般语言理解基准 2506.21508v1

Authors (8): Marek Šuppa, Andrej Ridzik, Daniel Hládek, Tomáš Javůrek, Viktória Ondrejová, Kristína Sásiková, Martin Tamajka, Marián Šimko

In this work, we introduce skLEP, the first comprehensive benchmark specifically designed for evaluating Slovak natural language understanding (NLU) models. We have compiled skLEP to encompass nine diverse tasks that span token-level, sentence-pair, and document-level challenges, thereby offering a thorough assessment of model capabilities. To create this benchmark, we curated new, original datasets tailored for Slovak and meticulously translated established English NLU resources. Within this paper, we also present the first systematic and extensive evaluation of a wide array of Slovak-specific, multilingual, and English pre-trained language models using the skLEP tasks. Finally, we also release the complete benchmark data, an open-source toolkit facilitating both fine-tuning and evaluation of models, and a public leaderboard at https://github.com/slovak-nlp/sklep in the hopes of fostering reproducibility and drive future research in Slovak NLU.

nan


Article 1197

Title@2025-06-26 (4): NY Real Estate Racial Equity Analysis via Applied Machine Learning

Title: NY Real Estate Racial Equity Analysis via Applied Machine Learning NY Real Estate Racial Equity Analyse über angewandtes maschinelles Lernen 通过应用机器学习进行房地产种族公平分析 2505.16946v3

Authors (3): Sanjana Chalavadi, Andrei Pastor, Terry Leitch

This study analyzes tract-level real estate ownership patterns in New York State (NYS) and New York City (NYC) to uncover racial disparities. We use an advanced race/ethnicity imputation model (LSTM+Geo with XGBoost filtering, validated at 89.2% accuracy) to compare the predicted racial composition of property owners to the resident population from census data. We examine both a Full Model (statewide) and a Name-Only LSTM Model (NYC) to assess how incorporating geospatial context affects our predictions and disparity estimates. The results reveal significant inequities: White individuals hold a disproportionate share of properties and property value relative to their population, while Black, Hispanic, and Asian communities are underrepresented as property owners. These disparities are most pronounced in minority-majority neighborhoods, where ownership is predominantly White despite a predominantly non-White population. Corporate ownership (LLCs, trusts, etc.) exacerbates these gaps by reducing owner-occupied opportunities in urban minority communities. We provide a breakdown of ownership vs. population by race for majority-White, -Black, -Hispanic, and -Asian tracts, identify those with extreme ownership disparities, and compare patterns in urban, suburban, and rural contexts. The findings underscore persistent racial inequity in property ownership, reflecting broader historical and socio-economic forces, and highlight the importance of data-driven approaches to address these issues.

nan


Article 1198

Title@2025-06-26 (4): Process mining-driven modeling and simulation to enhance fault diagnosis in cyber-physical systems

Title: Process mining-driven modeling and simulation to enhance fault diagnosis in cyber-physical systems Prozess-Mining-gesteuerte Modellierung und Simulation zur Verbesserung der Fehlerdiagnose in cyber-physischen Systemen 由采矿流程驱动的模型和模拟模型和模拟,以加强网络物理系统中的过失诊断 2506.21502v1

Authors (6): Francesco Vitale, Nicola Dall’Ora, Sebastiano Gaiardelli, Enrico Fraccaroli, Nicola Mazzocca, Franco Fummi

Fault diagnosis in Cyber-Physical Systems (CPSs) is essential for ensuring system dependability and operational efficiency by accurately detecting anomalies and identifying their root causes. However, the manual modeling of faulty behaviors often demands extensive domain expertise and produces models that are complex, error-prone, and difficult to interpret. To address this challenge, we present a novel unsupervised fault diagnosis methodology that integrates collective anomaly detection in multivariate time series, process mining, and stochastic simulation. Initially, collective anomalies are detected from low-level sensor data using multivariate time-series analysis. These anomalies are then transformed into structured event logs, enabling the discovery of interpretable process models through process mining. By incorporating timing distributions into the extracted Petri nets, the approach supports stochastic simulation of faulty behaviors, thereby enhancing root cause analysis and behavioral understanding. The methodology is validated using the Robotic Arm Dataset (RoAD), a widely recognized benchmark in smart manufacturing. Experimental results demonstrate its effectiveness in modeling, simulating, and classifying faulty behaviors in CPSs. This enables the creation of comprehensive fault dictionaries that support predictive maintenance and the development of digital twins for industrial environments.

nan


Article 1199

Title@2025-06-26 (4): Devising a solution to the problems of Cancer awareness in Telangana

Title: Devising a solution to the problems of Cancer awareness in Telangana Lösung der Probleme des Krebsbewusstseins in Telangana 制定特拉甘纳癌症意识问题解决方案 2506.21500v1

Authors (4): Priyanka Avhad, Vedanti Kshirsagar, Urvi Ranjan, Mahek Nakhua

According to the data, the percent of women who underwent screening for cervical cancer, breast and oral cancer in Telangana in the year 2020 was 3.3 percent, 0.3 percent and 2.3 percent respectively. Although early detection is the only way to reduce morbidity and mortality, people have very low awareness about cervical and breast cancer signs and symptoms and screening practices. We developed an ML classification model to predict if a person is susceptible to breast or cervical cancer based on demographic factors. We devised a system to provide suggestions for the nearest hospital or Cancer treatment centres based on the users location or address. In addition to this, we can integrate the health card to maintain medical records of all individuals and conduct awareness drives and campaigns. For ML classification models, we used decision tree classification and support vector classification algorithms for cervical cancer susceptibility and breast cancer susceptibility respectively. Thus, by devising this solution we come one step closer to our goal which is spreading cancer awareness, thereby, decreasing the cancer mortality and increasing cancer literacy among the people of Telangana.

nan


Article 1200

Title@2025-06-26 (4): Multi-Preference Lambda-weighted Listwise DPO for Dynamic Preference Alignment

Title: Multi-Preference Lambda-weighted Listwise DPO for Dynamic Preference Alignment Multi-Preference Lambda-bewertet Listwise DPO für Dynamic Preference Alignment 多首选项 Lambda 加权列表 DPO 动态首选项一致 2506.19780v2

Authors (4): Yuhui Sun, Xiyao Wang, Zixi Li, Jinman Zhao

While large-scale unsupervised language models (LMs) capture broad world knowledge and reasoning capabilities, steering their behavior toward desired objectives remains challenging due to the lack of explicit supervision. Existing alignment techniques, such as reinforcement learning from human feedback (RLHF), rely on training a reward model and performing reinforcement learning to align with human preferences. However, RLHF is often computationally intensive, unstable, and sensitive to hyperparameters. To address these limitations, Direct Preference Optimization (DPO) was introduced as a lightweight and stable alternative, enabling direct alignment of language models with pairwise preference data via classification loss. However, DPO and its extensions generally assume a single static preference distribution, limiting flexibility in multi-objective or dynamic alignment settings. In this paper, we propose a novel framework: Multi-Preference Lambda-weighted Listwise DPO, which extends DPO to incorporate multiple human preference dimensions (e.g., helpfulness, harmlessness, informativeness) and enables dynamic interpolation through a controllable simplex-weighted formulation. Our method supports both listwise preference feedback and flexible alignment across varying user intents without re-training. Empirical and theoretical analysis demonstrates that our method is as effective as traditional DPO on static objectives while offering greater generality and adaptability for real-world deployment.

nan


Article 1201

Title@2025-06-26 (4): One Model to Forecast Them All and in Entity Distributions Bind Them

Title: One Model to Forecast Them All and in Entity Distributions Bind Them Ein Modell, um sie zu prognostizieren Alles und in Entity-Distributionen Bind Them 预测所有实体和实体分配的模型之一 2501.15499v2

Authors (2): Kutay Bölat, Simon Tindemans

Probabilistic forecasting in power systems often involves multi-entity datasets like households, feeders, and wind turbines, where generating reliable entity-specific forecasts presents significant challenges. Traditional approaches require training individual models for each entity, making them inefficient and hard to scale. This study addresses this problem using GUIDE-VAE, a conditional variational autoencoder that allows entity-specific probabilistic forecasting using a single model. GUIDE-VAE provides flexible outputs, ranging from interpretable point estimates to full probability distributions, thanks to its advanced covariance composition structure. These distributions capture uncertainty and temporal dependencies, offering richer insights than traditional methods. To evaluate our GUIDE-VAE-based forecaster, we use household electricity consumption data as a case study due to its multi-entity and highly stochastic nature. Experimental results demonstrate that GUIDE-VAE outperforms conventional quantile regression techniques across key metrics while ensuring scalability and versatility. These features make GUIDE-VAE a powerful and generalizable tool for probabilistic forecasting tasks, with potential applications beyond household electricity consumption.

nan


Article 1202

Title@2025-06-26 (4): Prompting with Phonemes: Enhancing LLMs’ Multilinguality for Non-Latin Script Languages

Title: Prompting with Phonemes: Enhancing LLMs’ Multilinguality for Non-Latin Script Languages Mit Phonemes: Mehrsprachigkeit von LLMs für nicht-lateinische Script-Sprachen verbessern 以电话提示:提高LLMS的非拉丁文拼写语言多重语言质量 2411.02398v3

Authors (7): Hoang H Nguyen, Khyati Mahajan, Vikas Yadav, Julian Salazar, Philip S. Yu, Masoud Hashemi, Rishabh Maheshwary

Although multilingual LLMs have achieved remarkable performance across benchmarks, we find they continue to underperform on non-Latin script languages across contemporary LLM families. This discrepancy arises from the fact that LLMs are pretrained with orthographic scripts, which are dominated by Latin characters that obscure their shared phonology with non-Latin scripts. We propose leveraging phonemic transcriptions as complementary signals to induce script-invariant representations. Our study demonstrates that integrating phonemic signals improves performance across both non-Latin and Latin script languages, with a particularly significant impact on closing the performance gap between the two. Through detailed experiments, we show that phonemic and orthographic scripts retrieve distinct examples for in-context learning (ICL). This motivates our proposed Mixed-ICL retrieval strategy, where further aggregation from both leads to our significant performance improvements for both Latin script languages (up to 12.6%) and non-Latin script languages (up to 15.1%) compared to randomized ICL retrieval.

nan


Article 1203

Title@2025-06-26 (4): Towards Reliable Detection of Empty Space: Conditional Marked Point Processes for Object Detection

Title: Towards Reliable Detection of Empty Space: Conditional Marked Point Processes for Object Detection Zuverlässige Erkennung von leerem Raum: Bedingte markierte Punktprozesse für Objekterkennung 争取可靠地探测空空空间:物体探测的有条件定点过程 2506.21486v1

Authors (3): Tobias J. Riedlinger, Kira Maag, Hanno Gottschalk

Deep neural networks have set the state-of-the-art in computer vision tasks such as bounding box detection and semantic segmentation. Object detectors and segmentation models assign confidence scores to predictions, reflecting the model’s uncertainty in object detection or pixel-wise classification. However, these confidence estimates are often miscalibrated, as their architectures and loss functions are tailored to task performance rather than probabilistic foundation. Even with well calibrated predictions, object detectors fail to quantify uncertainty outside detected bounding boxes, i.e., the model does not make a probability assessment of whether an area without detected objects is truly free of obstacles. This poses a safety risk in applications such as automated driving, where uncertainty in empty areas remains unexplored. In this work, we propose an object detection model grounded in spatial statistics. Bounding box data matches realizations of a marked point process, commonly used to describe the probabilistic occurrence of spatial point events identified as bounding box centers, where marks are used to describe the spatial extension of bounding boxes and classes. Our statistical framework enables a likelihood-based training and provides well-defined confidence estimates for whether a region is drivable, i.e., free of objects. We demonstrate the effectiveness of our method through calibration assessments and evaluation of performance.

nan


Article 1204

Title@2025-06-26 (4): Evaluation of Traffic Signals for Daily Traffic Pattern

Title: Evaluation of Traffic Signals for Daily Traffic Pattern Bewertung von Verkehrssignalen für das tägliche Verkehrsmuster 对每日交通模式交通信号的评价 2506.21469v1

Authors (2): Mohammad Shokrolah Shirazi, Hung-Fu Chang

The turning movement count data is crucial for traffic signal design, intersection geometry planning, traffic flow, and congestion analysis. This work proposes three methods called dynamic, static, and hybrid configuration for TMC-based traffic signals. A vision-based tracking system is developed to estimate the TMC of six intersections in Las Vegas using traffic cameras. The intersection design, route (e.g. vehicle movement directions), and signal configuration files with compatible formats are synthesized and imported into Simulation of Urban MObility for signal evaluation with realistic data. The initial experimental results based on estimated waiting times indicate that the cycle time of 90 and 120 seconds works best for all intersections. In addition, four intersections show better performance for dynamic signal timing configuration, and the other two with lower performance have a lower ratio of total vehicle count to total lanes of the intersection leg. Since daily traffic flow often exhibits a bimodal pattern, we propose a hybrid signal method that switches between dynamic and static methods, adapting to peak and off-peak traffic conditions for improved flow management. So, a built-in traffic generator module creates vehicle routes for 4 hours, including peak hours, and a signal design module produces signal schedule cycles according to static, dynamic, and hybrid methods. Vehicle count distributions are weighted differently for each zone (i.e., West, North, East, South) to generate diverse traffic patterns. The extended experimental results for 6 intersections with 4 hours of simulation time imply that zone-based traffic pattern distributions affect signal design selection. Although the static method works great for evenly zone-based traffic distribution, the hybrid method works well for highly weighted traffic at intersection pairs of the West-East and North-South zones.

nan


Article 1205

Title@2025-06-26 (4): In-Context Learning Strategies Emerge Rationally

Title: In-Context Learning Strategies Emerge Rationally In-Context Learning Strategies Emerge Rational 新兴动力 2506.17859v2

Authors (6): Daniel Wurgaft, Ekdeep Singh Lubana, Core Francisco Park, Hidenori Tanaka, Gautam Reddy, Noah D. Goodman

Recent work analyzing in-context learning (ICL) has identified a broad set of strategies that describe model behavior in different experimental conditions. We aim to unify these findings by asking why a model learns these disparate strategies in the first place. Specifically, we start with the observation that when trained to learn a mixture of tasks, as is popular in the literature, the strategies learned by a model for performing ICL can be captured by a family of Bayesian predictors: a memorizing predictor, which assumes a discrete prior on the set of seen tasks, and a generalizing predictor, where the prior matches the underlying task distribution. Adopting the normative lens of rational analysis, where a learner’s behavior is explained as an optimal adaptation to data given computational constraints, we develop a hierarchical Bayesian framework that almost perfectly predicts Transformer next-token predictions throughout training – without assuming access to its weights. Under this framework, pretraining is viewed as a process of updating the posterior probability of different strategies, and inference-time behavior as a posterior-weighted average over these strategies’ predictions. Our framework draws on common assumptions about neural network learning dynamics, which make explicit a tradeoff between loss and complexity among candidate strategies: beyond how well it explains the data, a model’s preference towards implementing a strategy is dictated by its complexity. This helps explain well-known ICL phenomena, while offering novel predictions: e.g., we show a superlinear trend in the timescale for transitioning from generalization to memorization as task diversity increases. Overall, our work advances an explanatory and predictive account of ICL grounded in tradeoffs between strategy loss and complexity.

nan


Article 1206

Title@2025-06-26 (4): Optimising 4th-Order Runge-Kutta Methods: A Dynamic Heuristic Approach for Efficiency and Low Storage

Title: Optimising 4th-Order Runge-Kutta Methods: A Dynamic Heuristic Approach for Efficiency and Low Storage Optimierung der Runge-Kutta-Methoden der 4. Ordnung: Dynamischer heuristischer Ansatz für Effizienz und geringen Speicher 优化第四阶极龙格-库塔方法:高效和低储存的动态超光速方法 2506.21465v1

Authors (3): Gavin Lee Goodship, Luis Miralles-Pechuan, Stephen O’Sullivan

Extended Stability Runge-Kutta (ESRK) methods are crucial for solving large-scale computational problems in science and engineering, including weather forecasting, aerodynamic analysis, and complex biological modelling. However, balancing accuracy, stability, and computational efficiency remains challenging, particularly for high-order, low-storage schemes. This study introduces a hybrid Genetic Algorithm (GA) and Reinforcement Learning (RL) approach for automated heuristic discovery, optimising low-storage ESRK methods. Unlike traditional approaches that rely on manually designed heuristics or exhaustive numerical searches, our method leverages GA-driven mutations for search-space exploration and an RL-inspired state transition mechanism to refine heuristic selection dynamically. This enables systematic parameter reduction, preserving fourth-order accuracy while significantly improving computational efficiency.The proposed GA-RL heuristic optimisation framework is validated through rigorous testing on benchmark problems, including the 1D and 2D Brusselator systems and the steady-state Navier-Stokes equations. The best-performing heuristic achieves a 25\% reduction in IPOPT runtime compared to traditional ESRK optimisation processes while maintaining numerical stability and accuracy. These findings demonstrate the potential of adaptive heuristic discovery to improve resource efficiency in high-fidelity simulations and broaden the applicability of low-storage Runge-Kutta methods in real-world computational fluid dynamics, physics simulations, and other demanding fields. This work establishes a new paradigm in heuristic optimisation for numerical methods, opening pathways for further exploration using Deep RL and AutoML-based heuristic search

nan


Article 1207

Title@2025-06-26 (4): Capacity-Constrained Online Learning with Delays: Scheduling Frameworks and Regret Trade-offs

Title: Capacity-Constrained Online Learning with Delays: Scheduling Frameworks and Regret Trade-offs Capacity-Constrained Online-Lernen mit Verzögerungen: Scheduling Frameworks und Trade-offs bedauern 受能力制约的有延误的在线学习:时间安排框架和悔恨取舍 2503.19856v2

Authors (3): Alexander Ryabchenko, Idan Attias, Daniel M. Roy

We study online learning with oblivious losses and delays under a novel capacity constraint'' that limits how many past rounds can be tracked simultaneously for delayed feedback. Underclairvoyance’’ (i.e., delay durations are revealed upfront each round) and/or preemptibility'' (i.e., we can stop tracking previously chosen round feedback), we establish matching upper and lower bounds (up to logarithmic terms) on achievable regret, characterizing theoptimal capacity’’ needed to match the minimax rates of classical delayed online learning, which implicitly assume unlimited capacity. Our algorithms achieve minimax-optimal regret across all capacity levels, with performance gracefully degrading under suboptimal capacity. For $K$ actions and total delay $D$ over $T$ rounds, under clairvoyance and assuming capacity $C = \Omega(\log(T))$, we achieve regret $\widetilde{\Theta}(\sqrt{TK + DK/C + D\log(K)})$ for bandits and $\widetilde{\Theta}(\sqrt{(D+T)\log(K)})$ for full-information feedback. When replacing clairvoyance with preemptibility, we require a known maximum delay bound $d_{\max}$, adding ${\widetilde{O}(d_{\max})}$ to the regret. For fixed delays $d$ (i.e., $D=Td$), the minimax regret is $\Theta(\sqrt{TK(1+d/C)+Td\log(K)})$ and the optimal capacity is $\Theta(\min{K/\log(K),d})$ in the bandit setting, while in the full-information feedback setting, the minimax regret is $\Theta(\sqrt{T(d+1)\log(K)})$ and the optimal capacity is $\Theta(1)$. For round-dependent and fixed delays, our upper bounds are achieved using novel preemptive and non-preemptive scheduling policies, based on Pareto-distributed proxy delays, and batching techniques, respectively. Crucially, our work unifies delayed bandits, label-efficient learning, and online scheduling frameworks, demonstrating that robust online learning under delayed feedback is possible with surprisingly modest tracking capacity.

nan


Article 1208

Title@2025-06-26 (4): Aligning Spoken Dialogue Models from User Interactions

Title: Aligning Spoken Dialogue Models from User Interactions Ausrichten von gesprochenen Dialogmodellen aus Benutzerinteraktionen 校对用户互动中的口语对话框模型 2506.21463v1

Authors (4): Anne Wu, Laurent Mazaré, Neil Zeghidour, Alexandre Défossez

We propose a novel preference alignment framework for improving spoken dialogue models on real-time conversations from user interactions. Current preference learning methods primarily focus on text-based language models, and are not directly suited to the complexities of real-time speech interactions, with richer dynamics (e.g. interruption, interjection) and no explicit segmentation between speaker turns.We create a large-scale dataset of more than 150,000 preference pairs from raw multi-turn speech conversations, annotated with AI feedback, to cover preferences over both linguistic content and temporal context variations. We leverage offline alignment methods to finetune a full-duplex autoregressive speech-to-speech model. Extensive experiments demonstrate that feedback on generic conversations can be consistently effective in improving spoken dialogue models to produce more factual, safer and more contextually aligned interactions. We deploy the finetuned model and conduct holistic human evaluations to assess the impact beyond single-turn conversations. Our findings shed light on the importance of a well-calibrated balance among various dynamics, crucial for natural real-time speech dialogue systems.

nan


Article 1209

Title@2025-06-26 (4): A Keyword-Based Technique to Evaluate Broad Question Answer Script

Title: A Keyword-Based Technique to Evaluate Broad Question Answer Script Eine Keyword-basierte Technik zur Bewertung von Broad Question Answer Script 用于评价广泛问答脚本的关键字技术 2506.21461v1

Authors (5): Tamim Al Mahmud, Md Gulzar Hussain, Sumaiya Kabir, Hasnain Ahmad, Mahmudus Sobhan

Evaluation is the method of assessing and determining the educational system through various techniques such as verbal or viva-voice test, subjective or objective written test. This paper presents an efficient solution to evaluate the subjective answer script electronically. In this paper, we proposed and implemented an integrated system that examines and evaluates the written answer script. This article focuses on finding the keywords from the answer script and then compares them with the keywords that have been parsed from both open and closed domain. The system also checks the grammatical and spelling errors in the answer script. Our proposed system tested with answer scripts of 100 students and gives precision score 0.91.

nan


Article 1210

Title@2025-06-26 (4): Wild refitting for black box prediction

Title: Wild refitting for black box prediction Wilde Nachrüstung für Black Box Vorhersage 黑盒预测的野生改造 2506.21460v1

Authors (1): Martin J. Wainwright

We describe and analyze a computionally efficient refitting procedure for computing high-probability upper bounds on the instance-wise mean-squared prediction error of penalized nonparametric estimates based on least-squares minimization. Requiring only a single dataset and black box access to the prediction method, it consists of three steps: computing suitable residuals, symmetrizing and scaling them with a pre-factor $\rho$, and using them to define and solve a modified prediction problem recentered at the current estimate. We refer to it as wild refitting, since it uses Rademacher residual symmetrization as in a wild bootstrap variant. Under relatively mild conditions allowing for noise heterogeneity, we establish a high probability guarantee on its performance, showing that the wild refit with a suitably chosen wild noise scale $\rho$ gives an upper bound on prediction error. This theoretical analysis provides guidance into the design of such procedures, including how the residuals should be formed, the amount of noise rescaling in the wild sub-problem needed for upper bounds, and the local stability properties of the block-box procedure. We illustrate the applicability of this procedure to various problems, including non-rigid structure-from-motion recovery with structured matrix penalties; plug-and-play image restoration with deep neural network priors; and randomized sketching with kernel methods.

nan


Article 1211

Title@2025-06-26 (4): Fake it till You Make it: Reward Modeling as Discriminative Prediction

Title: Fake it till You Make it: Reward Modeling as Discriminative Prediction Verfälschen Sie es, bis Sie es: Belohnung Modellieren als diskriminative Vorhersage 假称直到你做出它: 奖赏模型作为有偏见的预测 2506.13846v2

Authors (6): Runtao Liu, Jiahao Zhan, Yingqing He, Chen Wei, Alan Yuille, Qifeng Chen

An effective reward model plays a pivotal role in reinforcement learning for post-training enhancement of visual generative models. However, current approaches of reward modeling suffer from implementation complexity due to their reliance on extensive human-annotated preference data or meticulously engineered quality dimensions that are often incomplete and engineering-intensive. Inspired by adversarial training in generative adversarial networks (GANs), this paper proposes GAN-RM, an efficient reward modeling framework that eliminates manual preference annotation and explicit quality dimension engineering. Our method trains the reward model through discrimination between a small set of representative, unpaired target samples(denoted as Preference Proxy Data) and model-generated ordinary outputs, requiring only a few hundred target samples. Comprehensive experiments demonstrate our GAN-RM’s effectiveness across multiple key applications including test-time scaling implemented as Best-of-N sample filtering, post-training approaches like Supervised Fine-Tuning (SFT) and Direct Preference Optimization (DPO). Code and data will be released at https://github.com/Visualignment/GAN-RM.

nan


Article 1212

Title@2025-06-26 (4): Measurement to Meaning: A Validity-Centered Framework for AI Evaluation

Title: Measurement to Meaning: A Validity-Centered Framework for AI Evaluation Messung zur Bedeutung: Ein gültigkeitszentrierter Rahmen für die AI-Bewertung 衡量到意义:AI评价的有效性-中心框架 2505.10573v4

Authors (9): Olawale Salaudeen, Anka Reuel, Ahmed Ahmed, Suhana Bedi, Zachary Robertson, Sudharsan Sundar, Ben Domingue, Angelina Wang, Sanmi Koyejo

While the capabilities and utility of AI systems have advanced, rigorous norms for evaluating these systems have lagged. Grand claims, such as models achieving general reasoning capabilities, are supported with model performance on narrow benchmarks, like performance on graduate-level exam questions, which provide a limited and potentially misleading assessment. We provide a structured approach for reasoning about the types of evaluative claims that can be made given the available evidence. For instance, our framework helps determine whether performance on a mathematical benchmark is an indication of the ability to solve problems on math tests or instead indicates a broader ability to reason. Our framework is well-suited for the contemporary paradigm in machine learning, where various stakeholders provide measurements and evaluations that downstream users use to validate their claims and decisions. At the same time, our framework also informs the construction of evaluations designed to speak to the validity of the relevant claims. By leveraging psychometrics’ breakdown of validity, evaluations can prioritize the most critical facets for a given claim, improving empirical utility and decision-making efficacy. We illustrate our framework through detailed case studies of vision and language model evaluations, highlighting how explicitly considering validity strengthens the connection between evaluation evidence and the claims being made.

nan


Article 1213

Title@2025-06-26 (4): PARALLELPROMPT: Extracting Parallelism from Large Language Model Queries

Title: PARALLELPROMPT: Extracting Parallelism from Large Language Model Queries PARALLELPROMPT: Parallelität aus großen Sprachmodellfragen extrahieren PARALELPROPT:从大语言模式查询中提取平行论 2506.18728v2

Authors (4): Steven Kolawole, Keshav Santhanam, Virginia Smith, Pratiksha Thaker

LLM serving systems typically treat user prompts as monolithic inputs, optimizing inference through decoding tricks or inter-query batching. However, many real-world prompts contain latent semantic parallelism–decomposable structures where subtasks can be executed independently to reduce latency while preserving meaning. We introduce PARALLELPROMPT, the first benchmark for measuring intra-query parallelism in natural user prompts. Our dataset comprises over 37,000 real-world prompts from public LLM chat logs, each annotated with a structured schema capturing task templates, shared context, and iteration inputs. These schemas are extracted using LLM-assisted prompting with rule-based multilingual validation. To evaluate the benefits of decomposition, we provide an execution suite that benchmarks serial vs. parallel strategies, measuring latency, structural adherence, and semantic fidelity. Our results show that intra-query parallelism can be successfully parsed in over 75% of curated datasets, unlocking up to 5x speedups on tasks like translation, comprehension, and comparative analysis, with minimal quality degradation. By releasing this benchmark, curation pipeline, and evaluation suite, we provide the first standardized testbed for studying structure-aware execution in LLM serving pipelines.

nan


Article 1214

Title@2025-06-26 (4): Towards an Optimal Control Perspective of ResNet Training

Title: Towards an Optimal Control Perspective of ResNet Training Auf dem Weg zu einer optimalen Steuerungsperspektive der ResNet-Schulung 建立ResNet培训最佳控制视角 2506.21453v1

Authors (4): Jens Püttschneider, Simon Heilig, Asja Fischer, Timm Faulwasser

We propose a training formulation for ResNets reflecting an optimal control problem that is applicable for standard architectures and general loss functions. We suggest bridging both worlds via penalizing intermediate outputs of hidden states corresponding to stage cost terms in optimal control. For standard ResNets, we obtain intermediate outputs by propagating the state through the subsequent skip connections and the output layer. We demonstrate that our training dynamic biases the weights of the unnecessary deeper residual layers to vanish. This indicates the potential for a theory-grounded layer pruning strategy.

nan


Article 1215

Title@2025-06-26 (4): A Comprehensive Dataset for Underground Miner Detection in Diverse Scenario

Title: A Comprehensive Dataset for Underground Miner Detection in Diverse Scenario Ein umfassender Datensatz für die Untertage-Miner-Erkennung in unterschiedlichen Szenarien 不同情景下地下矿工探测综合数据集 2506.21451v1

Authors (4): Cyrus Addy, Ajay Kumar Gurumadaiah, Yixiang Gao, Kwame Awuah-Offei

Underground mining operations face significant safety challenges that make emergency response capabilities crucial. While robots have shown promise in assisting with search and rescue operations, their effectiveness depends on reliable miner detection capabilities. Deep learning algorithms offer potential solutions for automated miner detection, but require comprehensive training datasets, which are currently lacking for underground mining environments. This paper presents a novel thermal imaging dataset specifically designed to enable the development and validation of miner detection systems for potential emergency applications. We systematically captured thermal imagery of various mining activities and scenarios to create a robust foundation for detection algorithms. To establish baseline performance metrics, we evaluated several state-of-the-art object detection algorithms including YOLOv8, YOLOv10, YOLO11, and RT-DETR on our dataset. While not exhaustive of all possible emergency situations, this dataset serves as a crucial first step toward developing reliable thermal-based miner detection systems that could eventually be deployed in real emergency scenarios. This work demonstrates the feasibility of using thermal imaging for miner detection and establishes a foundation for future research in this critical safety application.

nan


Article 1216

Title@2025-06-26 (4): Learnable Adaptive Time-Frequency Representation via Differentiable Short-Time Fourier Transform

Title: Learnable Adaptive Time-Frequency Representation via Differentiable Short-Time Fourier Transform Lernbare adaptive Zeit-Frequenz-Darstellung über differenzierbare Kurzzeit Fourier-Transformation 通过有区别的短时四轮式变换,通过有区别的短时四轮式变换, 2506.21440v1

Authors (5): Maxime Leiber, Yosra Marnissi, Axel Barrau, Sylvain Meignen, Laurent Massoulié

The short-time Fourier transform (STFT) is widely used for analyzing non-stationary signals. However, its performance is highly sensitive to its parameters, and manual or heuristic tuning often yields suboptimal results. To overcome this limitation, we propose a unified differentiable formulation of the STFT that enables gradient-based optimization of its parameters. This approach addresses the limitations of traditional STFT parameter tuning methods, which often rely on computationally intensive discrete searches. It enables fine-tuning of the time-frequency representation (TFR) based on any desired criterion. Moreover, our approach integrates seamlessly with neural networks, allowing joint optimization of the STFT parameters and network weights. The efficacy of the proposed differentiable STFT in enhancing TFRs and improving performance in downstream tasks is demonstrated through experiments on both simulated and real-world data.

nan


Article 1217

Title@2025-06-26 (4): New Bounds for Sparse Variational Gaussian Processes

Title: New Bounds for Sparse Variational Gaussian Processes Neue Grenzen für Sparse Variational Gaussian Prozesse 偏偏多高斯进程的新界口 2502.08730v2

Authors (1): Michalis K. Titsias

Sparse variational Gaussian processes (GPs) construct tractable posterior approximations to GP models. At the core of these methods is the assumption that the true posterior distribution over training function values ${\bf f}$ and inducing variables ${\bf u}$ is approximated by a variational distribution that incorporates the conditional GP prior $p({\bf f} {\bf u})$ in its factorization. While this assumption is considered as fundamental, we show that for model training we can relax it through the use of a more general variational distribution $q({\bf f} {\bf u})$ that depends on $N$ extra parameters, where $N$ is the number of training examples. In GP regression, we can analytically optimize the evidence lower bound over the extra parameters and express a tractable collapsed bound that is tighter than the previous bound. The new bound is also amenable to stochastic optimization and its implementation requires minor modifications to existing sparse GP code. Further, we also describe extensions to non-Gaussian likelihoods. On several datasets we demonstrate that our method can reduce bias when learning the hyperparameters and can lead to better predictive performance.

nan


Article 1218

Title@2025-06-26 (4): Explainability of Large Language Models using SMILE: Statistical Model-agnostic Interpretability with Local Explanations

Title: Explainability of Large Language Models using SMILE: Statistical Model-agnostic Interpretability with Local Explanations Erklärbarkeit großer Sprachmodelle mit SMILE: Statistische Modell-agnostische Interpretierbarkeit mit lokalen Erklärungen 使用SMILE解释大语言模型的可解释性:统计模型 – – 与当地解释的可解释性 2505.21657v3

Authors (4): Zeinab Dehghani, Mohammed Naveed Akram, Koorosh Aslansefat, Adil Khan

Large language models like GPT, LLAMA, and Claude have become incredibly powerful at generating text, but they are still black boxes, so it is hard to understand how they decide what to say. That lack of transparency can be problematic, especially in fields where trust and accountability matter. To help with this, we introduce SMILE, a new method that explains how these models respond to different parts of a prompt. SMILE is model-agnostic and works by slightly changing the input, measuring how the output changes, and then highlighting which words had the most impact. Create simple visual heat maps showing which parts of a prompt matter the most. We tested SMILE on several leading LLMs and used metrics such as accuracy, consistency, stability, and fidelity to show that it gives clear and reliable explanations. By making these models easier to understand, SMILE brings us one step closer to making AI more transparent and trustworthy.

nan


Article 1219

Title@2025-06-26 (4): Graph Neural Network for Neutrino Physics Event Reconstruction

Title: Graph Neural Network for Neutrino Physics Event Reconstruction Graph Neural Netzwerk für Neutrino Physik Ereignis Rekonstruktion 中子物理事件重建神经网络 2403.11872v2

Authors (9): V Hewes, Adam Aurisano, Giuseppe Cerati, Jim Kowalkowski, Claire Lee, Wei-keng Liao, Daniel Grzenda, Kaushal Gumpula, Xiaohe Zhang

Liquid Argon Time Projection Chamber (LArTPC) detector technology offers a wealth of high-resolution information on particle interactions, and leveraging that information to its full potential requires sophisticated automated reconstruction techniques. This article describes NuGraph2, a Graph Neural Network (GNN) for low-level reconstruction of simulated neutrino interactions in a LArTPC detector. Simulated neutrino interactions in the MicroBooNE detector geometry are described as heterogeneous graphs, with energy depositions on each detector plane forming nodes on planar subgraphs. The network utilizes a multi-head attention message-passing mechanism to perform background filtering and semantic labelling on these graph nodes, identifying those associated with the primary physics interaction with 98.0\% efficiency and labelling them according to particle type with 94.9\% efficiency. The network operates directly on detector observables across multiple 2D representations, but utilizes a 3D-context-aware mechanism to encourage consistency between these representations. Model inference takes 0.12~s/event on a CPU, and 0.005s/event batched on a GPU. This architecture is designed to be a general-purpose solution for particle reconstruction in neutrino physics, with the potential for deployment across a broad range of detector technologies, and offers a core convolution engine that can be leveraged for a variety of tasks beyond the two described in this article.

nan


Article 1220

Title@2025-06-26 (4): The Sample Complexity of Learning Lipschitz Operators with respect to Gaussian Measures

Title: The Sample Complexity of Learning Lipschitz Operators with respect to Gaussian Measures Die Probenkomplexität von Lipschitz-Betreibern in Bezug auf Gaussische Maßnahmen Gaussian措施方面学习利普施茨经营者的抽样复杂性 2410.23440v3

Authors (3): Ben Adcock, Michael Griebel, Gregor Maier

Operator learning, the approximation of mappings between infinite-dimensional function spaces using machine learning, has gained increasing research attention in recent years. Approximate operators, learned from data, can serve as efficient surrogate models for problems in computational science and engineering, complementing traditional methods. However, despite their empirical success, our understanding of the underlying mathematical theory is in large part still incomplete. In this paper, we study the approximation of Lipschitz operators with respect to Gaussian measures. We prove higher Gaussian Sobolev regularity of Lipschitz operators and establish lower and upper bounds on the Hermite polynomial approximation error. We then study general reconstruction strategies of Lipschitz operators from $m$ arbitrary (potentially adaptive) linear samples. As a key finding, we tightly characterize the corresponding sample complexity, that is, the smallest achievable worst-case error among all possible choices of (adaptive) sampling and reconstruction strategies in terms of $m$. As a consequence, we identify an inherent curse of sample complexity: No method to approximate Lipschitz operators based on $m$ linear samples can achieve algebraic convergence rates in $m$. On the positive side, we prove that a sufficiently fast spectral decay of the covariance operator of the underlying Gaussian measure guarantees convergence rates which are arbitrarily close to any algebraic rate. Overall, by tightly characterizing the sample complexity, our work confirms the intrinsic difficulty of learning Lipschitz operators, regardless of the data or learning technique.

nan


Article 1221

Title@2025-06-26 (4): Deception Detection in Dyadic Exchanges Using Multimodal Machine Learning: A Study on a Swedish Cohort

Title: Deception Detection in Dyadic Exchanges Using Multimodal Machine Learning: A Study on a Swedish Cohort Deception Detection in dyadischen Austauschen mit multimodalem maschinellem Lernen: Eine Studie über eine schwedische Kohorte 利用多式机器学习的多式机器交流中的欺骗感检测:瑞典教区研究 2506.21429v1

Authors (4): Franco Rugolon, Thomas Jack Samuels, Stephan Hau, Lennart Högman

This study investigates the efficacy of using multimodal machine learning techniques to detect deception in dyadic interactions, focusing on the integration of data from both the deceiver and the deceived. We compare early and late fusion approaches, utilizing audio and video data - specifically, Action Units and gaze information - across all possible combinations of modalities and participants. Our dataset, newly collected from Swedish native speakers engaged in truth or lie scenarios on emotionally relevant topics, serves as the basis for our analysis. The results demonstrate that incorporating both speech and facial information yields superior performance compared to single-modality approaches. Moreover, including data from both participants significantly enhances deception detection accuracy, with the best performance (71%) achieved using a late fusion strategy applied to both modalities and participants. These findings align with psychological theories suggesting differential control of facial and vocal expressions during initial interactions. As the first study of its kind on a Scandinavian cohort, this research lays the groundwork for future investigations into dyadic interactions, particularly within psychotherapy settings.

nan


Article 1222

Title@2025-06-26 (4): Flow-Based Single-Step Completion for Efficient and Expressive Policy Learning

Title: Flow-Based Single-Step Completion for Efficient and Expressive Policy Learning Flow-based Single-Step-Abschluss für effizientes und expressives politisches Lernen 以流动为基础的单一步骤完成高效和明确政策学习 2506.21427v1

Authors (2): Prajwal Koirala, Cody Fleming

Generative models such as diffusion and flow-matching offer expressive policies for offline reinforcement learning (RL) by capturing rich, multimodal action distributions, but their iterative sampling introduces high inference costs and training instability due to gradient propagation across sampling steps. We propose the \textit{Single-Step Completion Policy} (SSCP), a generative policy trained with an augmented flow-matching objective to predict direct completion vectors from intermediate flow samples, enabling accurate, one-shot action generation. In an off-policy actor-critic framework, SSCP combines the expressiveness of generative models with the training and inference efficiency of unimodal policies, without requiring long backpropagation chains. Our method scales effectively to offline, offline-to-online, and online RL settings, offering substantial gains in speed and adaptability over diffusion-based baselines. We further extend SSCP to goal-conditioned RL, enabling flat policies to exploit subgoal structures without explicit hierarchical inference. SSCP achieves strong results across standard offline RL and behavior cloning benchmarks, positioning it as a versatile, expressive, and efficient framework for deep RL and sequential decision-making.

nan


Article 1223

Title@2025-06-26 (4): TracLLM: A Generic Framework for Attributing Long Context LLMs

Title: TracLLM: A Generic Framework for Attributing Long Context LLMs TracLLM: Ein generisches Rahmenwerk für die Zuweisung von LLMs mit langem Kontext TracLLM: 长期LMLM授标通用框架 2506.04202v3

Authors (4): Yanting Wang, Wei Zou, Runpeng Geng, Jinyuan Jia

Long context large language models (LLMs) are deployed in many real-world applications such as RAG, agent, and broad LLM-integrated applications. Given an instruction and a long context (e.g., documents, PDF files, webpages), a long context LLM can generate an output grounded in the provided context, aiming to provide more accurate, up-to-date, and verifiable outputs while reducing hallucinations and unsupported claims. This raises a research question: how to pinpoint the texts (e.g., sentences, passages, or paragraphs) in the context that contribute most to or are responsible for the generated output by an LLM? This process, which we call context traceback, has various real-world applications, such as 1) debugging LLM-based systems, 2) conducting post-attack forensic analysis for attacks (e.g., prompt injection attack, knowledge corruption attacks) to an LLM, and 3) highlighting knowledge sources to enhance the trust of users towards outputs generated by LLMs. When applied to context traceback for long context LLMs, existing feature attribution methods such as Shapley have sub-optimal performance and/or incur a large computational cost. In this work, we develop TracLLM, the first generic context traceback framework tailored to long context LLMs. Our framework can improve the effectiveness and efficiency of existing feature attribution methods. To improve the efficiency, we develop an informed search based algorithm in TracLLM. We also develop contribution score ensemble/denoising techniques to improve the accuracy of TracLLM. Our evaluation results show TracLLM can effectively identify texts in a long context that lead to the output of an LLM. Our code and data are at: https://github.com/Wang-Yanting/TracLLM.

nan


Article 1224

Title@2025-06-26 (4): Continual Learning as Computationally Constrained Reinforcement Learning

Title: Continual Learning as Computationally Constrained Reinforcement Learning Kontinuierliches Lernen als Computationally Constrained Reinforcement Learning 持续学习作为计算限制的训练强化学习 2307.04345v3

Authors (7): Saurabh Kumar, Henrik Marklund, Ashish Rao, Yifan Zhu, Hong Jun Jeon, Yueyang Liu, Benjamin Van Roy

An agent that efficiently accumulates knowledge to develop increasingly sophisticated skills over a long lifetime could advance the frontier of artificial intelligence capabilities. The design of such agents, which remains a long-standing challenge of artificial intelligence, is addressed by the subject of continual learning. This monograph clarifies and formalizes concepts of continual learning, introducing a framework and set of tools to stimulate further research.

nan


Article 1225

Title@2025-06-26 (4): Improving Stochastic Cubic Newton with Momentum

Title: Improving Stochastic Cubic Newton with Momentum Verbesserung der stochastischen Kubik Newton mit Momentum 快速改善斯托卡立方立方牛顿 2410.19644v2

Authors (3): El Mahdi Chayti, Nikita Doikov, Martin Jaggi

We study stochastic second-order methods for solving general non-convex optimization problems. We propose using a special version of momentum to stabilize the stochastic gradient and Hessian estimates in Newton’s method. We show that momentum provably improves the variance of stochastic estimates and allows the method to converge for any noise level. Using the cubic regularization technique, we prove a global convergence rate for our method on general non-convex problems to a second-order stationary point, even when using only a single stochastic data sample per iteration. This starkly contrasts with all existing stochastic second-order methods for non-convex problems, which typically require large batches. Therefore, we are the first to demonstrate global convergence for batches of arbitrary size in the non-convex case for the Stochastic Cubic Newton. Additionally, we show improved speed on convex stochastic problems for our regularized Newton methods with momentum.

nan


Article 1226

Title@2025-06-26 (4): Action-Minimization Meets Generative Modeling: Efficient Transition Path Sampling with the Onsager-Machlup Functional

Title: Action-Minimization Meets Generative Modeling: Efficient Transition Path Sampling with the Onsager-Machlup Functional Aktionsminimierung trifft auf generative Modellierung: Effizientes Transition Path Sampling mit der Onsager-Machlup Funktion 行动最优化符合产生模型的生成模型:与Onsager-Machlup 职能进行高效率过渡道路抽样 2504.18506v3

Authors (6): Sanjeev Raja, Martin Šípka, Michael Psenka, Tobias Kreiman, Michal Pavelka, Aditi S. Krishnapriyan

Transition path sampling (TPS), which involves finding probable paths connecting two points on an energy landscape, remains a challenge due to the complexity of real-world atomistic systems. Current machine learning approaches use expensive, task-specific, and data-free training procedures, limiting their ability to benefit from high-quality datasets and large-scale pre-trained models. In this work, we address TPS by interpreting candidate paths as trajectories sampled from stochastic dynamics induced by the learned score function of pre-trained generative models, specifically denoising diffusion and flow matching. Under these dynamics, finding high-likelihood transition paths becomes equivalent to minimizing the Onsager-Machlup (OM) action functional. This enables us to repurpose pre-trained generative models for TPS in a zero-shot manner, in contrast with bespoke, task-specific approaches in previous work. We demonstrate our approach on varied molecular systems, obtaining diverse, physically realistic transition pathways and generalizing beyond the pre-trained model’s original training dataset. Our method can be easily incorporated into new generative models, making it practically relevant as models continue to scale and improve with increased data availability. Code is available at github.com/ASK-Berkeley/OM-TPS.

nan


Article 1227

Title@2025-06-26 (4): Distributed Cross-Channel Hierarchical Aggregation for Foundation Models

Title: Distributed Cross-Channel Hierarchical Aggregation for Foundation Models Verteilte Cross-Channel Hierarchische Aggregation für Stiftungsmodelle 基金会模型分布式跨河道分道分道分道分道分道分道分道分道分道分道分道分道 2506.21411v1

Authors (9): Aristeidis Tsaris, Isaac Lyngaas, John Lagregren, Mohamed Wahib, Larry York, Prasanna Balaprakash, Dan Lu, Feiyi Wang, Xiao Wang

Vision-based scientific foundation models hold significant promise for advancing scientific discovery and innovation. This potential stems from their ability to aggregate images from diverse sources such as varying physical groundings or data acquisition systems and to learn spatio-temporal correlations using transformer architectures. However, tokenizing and aggregating images can be compute-intensive, a challenge not fully addressed by current distributed methods. In this work, we introduce the Distributed Cross-Channel Hierarchical Aggregation (D-CHAG) approach designed for datasets with a large number of channels across image modalities. Our method is compatible with any model-parallel strategy and any type of vision transformer architecture, significantly improving computational efficiency. We evaluated D-CHAG on hyperspectral imaging and weather forecasting tasks. When integrated with tensor parallelism and model sharding, our approach achieved up to a 75% reduction in memory usage and more than doubled sustained throughput on up to 1,024 AMD GPUs on the Frontier Supercomputer.

nan


Article 1228

Title@2025-06-26 (4): Scalable Bayesian Low-Rank Adaptation of Large Language Models via Stochastic Variational Subspace Inference

Title: Scalable Bayesian Low-Rank Adaptation of Large Language Models via Stochastic Variational Subspace Inference Skalierbare Bayesische Low-Rank-Anpassung von großen Sprachmodellen über stochastische Variations-Subraum-Inferenz 通过Stochastic变异性子空间推断,对大语言模型进行可缩放的Bayesian低Rank 2506.21408v1

Authors (5): Colin Samplawski, Adam D. Cobb, Manoj Acharya, Ramneet Kaur, Susmit Jha

Despite their widespread use, large language models (LLMs) are known to hallucinate incorrect information and be poorly calibrated. This makes the uncertainty quantification of these models of critical importance, especially in high-stakes domains, such as autonomy and healthcare. Prior work has made Bayesian deep learning-based approaches to this problem more tractable by performing inference over the low-rank adaptation (LoRA) parameters of a fine-tuned model. While effective, these approaches struggle to scale to larger LLMs due to requiring further additional parameters compared to LoRA. In this work we present $\textbf{Scala}$ble $\textbf{B}$ayesian $\textbf{L}$ow-Rank Adaptation via Stochastic Variational Subspace Inference (ScalaBL). We perform Bayesian inference in an $r$-dimensional subspace, for LoRA rank $r$. By repurposing the LoRA parameters as projection matrices, we are able to map samples from this subspace into the full weight space of the LLM. This allows us to learn all the parameters of our approach using stochastic variational inference. Despite the low dimensionality of our subspace, we are able to achieve competitive performance with state-of-the-art approaches while only requiring ${\sim}1000$ additional parameters. Furthermore, it allows us to scale up to the largest Bayesian LLM to date, with four times as a many base parameters as prior work.

nan


Article 1229

Title@2025-06-26 (4): Representation Learning of Lab Values via Masked AutoEncoders

Title: Representation Learning of Lab Values via Masked AutoEncoders Darstellung Lernen von Laborwerten über Maskierte AutoEncoder 通过蒙面自动编码器学习实验室价值 2501.02648v3

Authors (8): David Restrepo, Chenwei Wu, Yueran Jia, Jaden K. Sun, Jack Gallifant, Catherine G. Bielick, Yugang Jia, Leo A. Celi

Accurate imputation of missing laboratory values in electronic health records (EHRs) is critical to enable robust clinical predictions and reduce biases in AI systems in healthcare. Existing methods, such as XGBoost, softimpute, GAIN, Expectation Maximization (EM), and MICE, struggle to model the complex temporal and contextual dependencies in EHR data, particularly in underrepresented groups. In this work, we propose Lab-MAE, a novel transformer-based masked autoencoder framework that leverages self-supervised learning for the imputation of continuous sequential lab values. Lab-MAE introduces a structured encoding scheme that jointly models laboratory test values and their corresponding timestamps, enabling explicit capturing temporal dependencies. Empirical evaluation on the MIMIC-IV dataset demonstrates that Lab-MAE significantly outperforms state-of-the-art baselines such as XGBoost, softimpute, GAIN, EM, and MICE across multiple metrics, including root mean square error (RMSE), R-squared (R2), and Wasserstein distance (WD). Notably, Lab-MAE achieves equitable performance across demographic groups of patients, advancing fairness in clinical predictions. We further investigate the role of follow-up laboratory values as potential shortcut features, revealing Lab-MAE’s robustness in scenarios where such data is unavailable. The findings suggest that our transformer-based architecture, adapted to the characteristics of EHR data, offers a foundation model for more accurate and fair clinical imputation. In addition, we measure and compare the carbon footprint of Lab-MAE with the a XGBoost model, highlighting its environmental requirements.

nan


Article 1230

Title@2025-06-26 (4): Temporal-Aware Graph Attention Network for Cryptocurrency Transaction Fraud Detection

Title: Temporal-Aware Graph Attention Network for Cryptocurrency Transaction Fraud Detection Temporal-Aware Graph Aufmerksamkeit Netzwerk für Kryptowährung Transaktion Betrugserkennung 加密货币交易欺诈侦查实时警示图关注网络 2506.21382v1

Authors (3): Zhi Zheng, Bochuan Zhou, Yuping Song

Cryptocurrency transaction fraud detection faces the dual challenges of increasingly complex transaction patterns and severe class imbalance. Traditional methods rely on manual feature engineering and struggle to capture temporal and structural dependencies in transaction networks. This paper proposes an Augmented Temporal-aware Graph Attention Network (ATGAT) that enhances detection performance through three modules: (1) designing an advanced temporal embedding module that fuses multi-scale time difference features with periodic position encoding; (2) constructing a temporal-aware triple attention mechanism that jointly optimizes structural, temporal, and global context attention; (3) employing weighted BCE loss to address class imbalance. Experiments on the Elliptic++ cryptocurrency dataset demonstrate that ATGAT achieves an AUC of 0.9130, representing a 9.2% improvement over the best traditional method XGBoost, 12.0% over GCN, and 10.0% over standard GAT. This method not only validates the enhancement effect of temporal awareness and triple attention mechanisms on graph neural networks, but also provides financial institutions with more reliable fraud detection tools, with its design principles generalizable to other temporal graph anomaly detection tasks.

nan


Article 1231

Title@2025-06-26 (4): HARPT: A Corpus for Analyzing Consumers’ Trust and Privacy Concerns in Mobile Health Apps

Title: HARPT: A Corpus for Analyzing Consumers’ Trust and Privacy Concerns in Mobile Health Apps HARPT: Ein Corpus für die Analyse des Vertrauens und der Datenschutzbelange der Verbraucher in mobilen Gesundheits-Apps HARPT: 分析移动保健应用程序中消费者信任和隐私问题的一个公司 2506.19268v2

Authors (6): Timoteo Kelly, Abdulkadir Korkmaz, Samuel Mallet, Connor Souders, Sadra Aliakbarpour, Praveen Rao

We present HARPT, a large-scale annotated corpus of mobile health app store reviews aimed at advancing research in user privacy and trust. The dataset comprises over 480,000 user reviews labeled into seven categories that capture critical aspects of trust in applications, trust in providers and privacy concerns. Creating HARPT required addressing multiple complexities, such as defining a nuanced label schema, isolating relevant content from large volumes of noisy data, and designing an annotation strategy that balanced scalability with accuracy. This strategy integrated rule-based filtering, iterative manual labeling with review, targeted data augmentation, and weak supervision using transformer-based classifiers to accelerate coverage. In parallel, a carefully curated subset of 7,000 reviews was manually annotated to support model development and evaluation. We benchmark a broad range of classification models, demonstrating that strong performance is achievable and providing a baseline for future research. HARPT is released as a public resource to support work in health informatics, cybersecurity, and natural language processing.

nan


Article 1232

Title@2025-06-26 (4): Pay Attention to Small Weights

Title: Pay Attention to Small Weights Achten Sie auf kleine Gewichte 关注小体重 2506.21374v1

Authors (4): Chao Zhou, Tom Jacobs, Advait Gadhikar, Rebekka Burkholz

Finetuning large pretrained neural networks is known to be resource-intensive, both in terms of memory and computational cost. To mitigate this, a common approach is to restrict training to a subset of the model parameters. By analyzing the relationship between gradients and weights during finetuning, we observe a notable pattern: large gradients are often associated with small-magnitude weights. This correlation is more pronounced in finetuning settings than in training from scratch. Motivated by this observation, we propose NANOADAM, which dynamically updates only the small-magnitude weights during finetuning and offers several practical advantages: first, this criterion is gradient-free – the parameter subset can be determined without gradient computation; second, it preserves large-magnitude weights, which are likely to encode critical features learned during pretraining, thereby reducing the risk of catastrophic forgetting; thirdly, it permits the use of larger learning rates and consistently leads to better generalization performance in experiments. We demonstrate this for both NLP and vision tasks.

nan


Article 1233

Title@2025-06-26 (4): Latent Diffusion Model Based Denoising Receiver for 6G Semantic Communication: From Stochastic Differential Theory to Application

Title: Latent Diffusion Model Based Denoising Receiver for 6G Semantic Communication: From Stochastic Differential Theory to Application Latent Diffusion Modellbasierter Denoisierungsempfänger für 6G Semantische Kommunikation: Von der stochastischen Differentialtheorie zur Anwendung 用于 6G 语义通讯: 从斯托卡差异理论到应用的 6G 语义通讯的 以 DEM 为基础的前传播模型模型 2506.05710v2

Authors (4): Xiucheng Wang, Honggang Jia, Nan Cheng, Dusit Niyato

In this paper, a novel semantic communication framework empowered by generative artificial intelligence (GAI) is proposed, to enhance the robustness against both channel noise and transmission data distribution shifts. A theoretical foundation is established using stochastic differential equations (SDEs), from which a closed-form mapping between any signal-to-noise ratio (SNR) and the optimal denoising timestep is derived. Moreover, to address distribution mismatch, a mathematical scaling method is introduced to align received semantic features with the training distribution of the GAI. Built on this theoretical foundation, a latent diffusion model (LDM)-based semantic communication framework is proposed that combines a variational autoencoder for semantic features extraction, where a pretrained diffusion model is used for denoising. The proposed system is a training-free framework that supports zero-shot generalization, and achieves superior performance under low-SNR and out-of-distribution conditions, offering a scalable and robust solution for future 6G semantic communication systems. Experimental results demonstrate that the proposed semantic communication framework achieves state-of-the-art performance in both pixel-level accuracy and semantic perceptual quality, consistently outperforming baselines across a wide range of SNRs and data distributions without any fine-tuning or post-training.

nan


Article 1234

Title@2025-06-26 (4): MAx-DNN: Multi-Level Arithmetic Approximation for Energy-Efficient DNN Hardware Accelerators

Title: MAx-DNN: Multi-Level Arithmetic Approximation for Energy-Efficient DNN Hardware Accelerators MAx-DNN: Mehrstufige Arithmetik-Annäherung für energieeffiziente DNN-Hardwarebeschleuniger MAX-DNN: 能源高效 DNN 硬件加速器的多级自动测量近似法 2506.21371v1

Authors (5): Vasileios Leon, Georgios Makris, Sotirios Xydis, Kiamal Pekmestzi, Dimitrios Soudris

Nowadays, the rapid growth of Deep Neural Network (DNN) architectures has established them as the defacto approach for providing advanced Machine Learning tasks with excellent accuracy. Targeting low-power DNN computing, this paper examines the interplay of fine-grained error resilience of DNN workloads in collaboration with hardware approximation techniques, to achieve higher levels of energy efficiency. Utilizing the state-of-the-art ROUP approximate multipliers, we systematically explore their fine-grained distribution across the network according to our layer-, filter-, and kernel-level approaches, and examine their impact on accuracy and energy. We use the ResNet-8 model on the CIFAR-10 dataset to evaluate our approximations. The proposed solution delivers up to 54% energy gains in exchange for up to 4% accuracy loss, compared to the baseline quantized model, while it provides 2x energy gains with better accuracy versus the state-of-the-art DNN approximations.

nan


Article 1235

Title@2025-06-26 (4): rQdia: Regularizing Q-Value Distributions With Image Augmentation

Title: rQdia: Regularizing Q-Value Distributions With Image Augmentation rQdia: Regularisieren der Q-Value-Distributionen mit Bildvergrößerung rQdia: 以图像放大方式规范 Q- 价值发行 2506.21367v1

Authors (2): Sam Lerman, Jing Bi

rQdia regularizes Q-value distributions with augmented images in pixel-based deep reinforcement learning. With a simple auxiliary loss, that equalizes these distributions via MSE, rQdia boosts DrQ and SAC on 9/12 and 10/12 tasks respectively in the MuJoCo Continuous Control Suite from pixels, and Data-Efficient Rainbow on 18/26 Atari Arcade environments. Gains are measured in both sample efficiency and longer-term training. Moreover, the addition of rQdia finally propels model-free continuous control from pixels over the state encoding baseline.

nan


Article 1236

Title@2025-06-26 (4): SMMILE: An Expert-Driven Benchmark for Multimodal Medical In-Context Learning

Title: SMMILE: An Expert-Driven Benchmark for Multimodal Medical In-Context Learning SMMILE: Ein sachverständiger Benchmark für multimodales medizinisches In-Context-Lernen SMMILE:多模式医学内书学习专家开发基准 2506.21355v1

Authors (12): Melanie Rieff, Maya Varma, Ossian Rabow, Subathra Adithan, Julie Kim, Ken Chang, Hannah Lee, Nidhi Rohatgi, Christian Bluethgen, Mohamed S. Muneer, Jean-Benoit Delbrouck, Michael Moor

Multimodal in-context learning (ICL) remains underexplored despite significant potential for domains such as medicine. Clinicians routinely encounter diverse, specialized tasks requiring adaptation from limited examples, such as drawing insights from a few relevant prior cases or considering a constrained set of differential diagnoses. While multimodal large language models (MLLMs) have shown advances in medical visual question answering (VQA), their ability to learn multimodal tasks from context is largely unknown. We introduce SMMILE, the first expert-driven multimodal ICL benchmark for medical tasks. Eleven medical experts curated problems, each including a multimodal query and multimodal in-context examples as task demonstrations. SMMILE encompasses 111 problems (517 question-image-answer triplets) covering 6 medical specialties and 13 imaging modalities. We further introduce SMMILE++, an augmented variant with 1038 permuted problems. A comprehensive evaluation of 15 MLLMs demonstrates that most models exhibit moderate to poor multimodal ICL ability in medical tasks. In open-ended evaluations, ICL contributes only 8% average improvement over zero-shot on SMMILE and 9.4% on SMMILE++. We observe a susceptibility for irrelevant in-context examples: even a single noisy or irrelevant example can degrade performance by up to 9.5%. Moreover, example ordering exhibits a recency bias, i.e., placing the most relevant example last can lead to substantial performance improvements by up to 71%. Our findings highlight critical limitations and biases in current MLLMs when learning multimodal medical tasks from context.

nan


Article 1237

Title@2025-06-26 (4): Lipschitz Bounds for Persistent Laplacian Eigenvalues under One-Simplex Insertions

Title: Lipschitz Bounds for Persistent Laplacian Eigenvalues under One-Simplex Insertions Lipschitz Bounds für persistente Laplacian Eigenwerte unter One-Simplex-Insertionen 在单质插入下用于持久性拉板电极值的 Lipschitz Bounds 2506.21352v1

Authors (3): Le Vu Anh, Mehmet Dik, Nguyen Viet Anh

Persistent Laplacians are matrix operators that track how the shape and structure of data transform across scales and are popularly adopted in biology, physics, and machine learning. Their eigenvalues are concise descriptors of geometric and topological features in a filtration. Although earlier work established global algebraic stability for these operators, the precise change in a single eigenvalue when one simplex, such as a vertex, edge, or triangle, is added has remained unknown. This is important because downstream tools, including heat-kernel signatures and spectral neural networks, depend directly on these eigenvalues. We close this gap by proving a uniform Lipschitz bound: after inserting one simplex, every up-persistent Laplacian eigenvalue can vary by at most twice the Euclidean norm of that simplex’s boundary, independent of filtration scale and complex size. This result delivers the first eigenvalue-level robustness guarantee for spectral topological data analysis. It guarantees that spectral features remain stable under local updates and enables reliable error control in dynamic data settings.

nan


Article 1238

Title@2025-06-26 (4): On the Ability of Deep Networks to Learn Symmetries from Data: A Neural Kernel Theory

Title: On the Ability of Deep Networks to Learn Symmetries from Data: A Neural Kernel Theory Über die Fähigkeit tiefer Netzwerke, Symmetrien aus Daten zu lernen: Eine neurale Kerneltheorie 深网络从数据中学习对称的深网络能力:神经核心理论 2412.11521v2

Authors (2): Andrea Perin, Stephane Deny

Symmetries (transformations by group actions) are present in many datasets, and leveraging them holds considerable promise for improving predictions in machine learning. In this work, we aim to understand when and how deep networks – with standard architectures trained in a standard, supervised way – learn symmetries from data. Inspired by real-world scenarios, we study a classification paradigm where data symmetries are only partially observed during training: some classes include all transformations of a cyclic group, while others – only a subset. In the infinite-width limit, where kernel analogies apply, we derive a neural kernel theory of symmetry learning. The group-cyclic nature of the dataset allows us to analyze the Gram matrix of neural kernels in the Fourier domain; here we find a simple characterization of the generalization error as a function of class separation (signal) and class-orbit density (noise). This characterization reveals that generalization can only be successful when the local structure of the data prevails over its non-local, symmetry-induced structure, in the kernel space defined by the architecture. We extend our theoretical treatment to any finite group, including non-abelian groups. Our framework also applies to equivariant architectures (e.g., CNNs), and recovers their success in the special case where the architecture matches the inherent symmetry of the data. Empirically, our theory reproduces the generalization failure of finite-width networks (MLP, CNN, ViT) trained on partially observed versions of rotated-MNIST. We conclude that conventional deep networks lack a mechanism to learn symmetries that have not been explicitly embedded in their architecture a priori. Our framework could be extended to guide the design of architectures and training procedures able to learn symmetries from data.

nan


Article 1239

Title@2025-06-26 (4): Learning Value of Information towards Joint Communication and Control in 6G V2X

Title: Learning Value of Information towards Joint Communication and Control in 6G V2X Lernwert von Informationen zur gemeinsamen Kommunikation und Kontrolle in 6G V2X 6G V2X 6G 6G V2X 6G 6G 6G V2X 6G 6G 6G 2505.06978v2

Authors (4): Lei Lei, Kan Zheng, Xuemin, Shen

As Cellular Vehicle-to-Everything (C-V2X) evolves towards future sixth-generation (6G) networks, Connected Autonomous Vehicles (CAVs) are emerging to become a key application. Leveraging data-driven Machine Learning (ML), especially Deep Reinforcement Learning (DRL), is expected to significantly enhance CAV decision-making in both vehicle control and V2X communication under uncertainty. These two decision-making processes are closely intertwined, with the value of information (VoI) acting as a crucial bridge between them. In this paper, we introduce Sequential Stochastic Decision Process (SSDP) models to define and assess VoI, demonstrating their application in optimizing communication systems for CAVs. Specifically, we formally define the SSDP model and demonstrate that the MDP model is a special case of it. The SSDP model offers a key advantage by explicitly representing the set of information that can enhance decision-making when available. Furthermore, as current research on VoI remains fragmented, we propose a systematic VoI modeling framework grounded in the MDP, Reinforcement Learning (RL) and Optimal Control theories. We define different categories of VoI and discuss their corresponding estimation methods. Finally, we present a structured approach to leverage the various VoI metrics for optimizing the When",What”, and ``How” to communicate problems. For this purpose, SSDP models are formulated with VoI-associated reward functions derived from VoI-based optimization objectives. While we use a simple vehicle-following control problem to illustrate the proposed methodology, it holds significant potential to facilitate the joint optimization of stochastic, sequential control and communication decisions in a wide range of networked control systems.

nan


Article 1240

Title@2025-06-26 (4): PuriDefense: Randomized Local Implicit Adversarial Purification for Defending Black-box Query-based Attacks

Title: PuriDefense: Randomized Local Implicit Adversarial Purification for Defending Black-box Query-based Attacks PuriDefense: Randomized Local Implizite Adversarial Purification for Defending Black-Box Query-based Attacks 防御:保护黑箱质疑式袭击的随机本地秘密对抗性净化 2401.10586v2

Authors (6): Ping Guo, Xiang Li, Zhiyuan Yang, Xi Lin, Qingchuan Zhao, Qingfu Zhang

Black-box query-based attacks constitute significant threats to Machine Learning as a Service (MLaaS) systems since they can generate adversarial examples without accessing the target model’s architecture and parameters. Traditional defense mechanisms, such as adversarial training, gradient masking, and input transformations, either impose substantial computational costs or compromise the test accuracy of non-adversarial inputs. To address these challenges, we propose an efficient defense mechanism, PuriDefense, that employs random patch-wise purifications with an ensemble of lightweight purification models at a low level of inference cost. These models leverage the local implicit function and rebuild the natural image manifold. Our theoretical analysis suggests that this approach slows down the convergence of query-based attacks by incorporating randomness into purifications. Extensive experiments on CIFAR-10 and ImageNet validate the effectiveness of our proposed purifier-based defense mechanism, demonstrating significant improvements in robustness against query-based attacks.

nan


Article 1241

Title@2025-06-26 (4): Regret Bounds for Robust Online Decision Making

Title: Regret Bounds for Robust Online Decision Making Bedauern Sie Grenzen für robuste Online-Entscheidungsfindung 对强有力的在线决策感到遗憾 2504.06820v2

Authors (2): Alexander Appel, Vanessa Kosoy

We propose a framework which generalizes “decision making with structured observations” by allowing robust (i.e. multivalued) models. In this framework, each model associates each decision with a convex set of probability distributions over outcomes. Nature can choose distributions out of this set in an arbitrary (adversarial) manner, that can be nonoblivious and depend on past history. The resulting framework offers much greater generality than classical bandits and reinforcement learning, since the realizability assumption becomes much weaker and more realistic. We then derive a theory of regret bounds for this framework. Although our lower and upper bounds are not tight, they are sufficient to fully characterize power-law learnability. We demonstrate this theory in two special cases: robust linear bandits and tabular robust online reinforcement learning. In both cases, we derive regret bounds that improve state-of-the-art (except that we do not address computational efficiency).

nan


Article 1242

Title@2025-06-26 (4): DynamicBench: Evaluating Real-Time Report Generation in Large Language Models

Title: DynamicBench: Evaluating Real-Time Report Generation in Large Language Models DynamicBench: Bewertung der Echtzeit-Berichtserstellung in großen Sprachmodellen 动态 bench:评价以大语言模式编制实时报告的情况 2506.21343v1

Authors (8): Jingyao Li, Hao Sun, Zile Qiao, Yong Jiang, Pengjun Xie, Fei Huang, Hong Xu, Jiaya Jia

Traditional benchmarks for large language models (LLMs) typically rely on static evaluations through storytelling or opinion expression, which fail to capture the dynamic requirements of real-time information processing in contemporary applications. To address this limitation, we present DynamicBench, a benchmark designed to evaluate the proficiency of LLMs in storing and processing up-to-the-minute data. DynamicBench utilizes a dual-path retrieval pipeline, integrating web searches with local report databases. It necessitates domain-specific knowledge, ensuring accurate responses report generation within specialized fields. By evaluating models in scenarios that either provide or withhold external documents, DynamicBench effectively measures their capability to independently process recent information or leverage contextual enhancements. Additionally, we introduce an advanced report generation system adept at managing dynamic information synthesis. Our experimental results confirm the efficacy of our approach, with our method achieving state-of-the-art performance, surpassing GPT4o in document-free and document-assisted scenarios by 7.0% and 5.8%, respectively. The code and data will be made publicly available.

nan


Article 1243

Title@2025-06-26 (4): AGTCNet: A Graph-Temporal Approach for Principled Motor Imagery EEG Classification

Title: AGTCNet: A Graph-Temporal Approach for Principled Motor Imagery EEG Classification AGTCNet: Ein graphisch-zeitlicher Ansatz für die Klassifikation der Primärmotorik EEG AGTCNet: 固定机动图像电子EEG分类的图表-临时方法 2506.21338v1

Authors (6): Galvin Brice S. Lim, Brian Godwin S. Lim, Argel A. Bandala, John Anthony C. Jose, Timothy Scott C. Chu, Edwin Sybingco

Brain-computer interface (BCI) technology utilizing electroencephalography (EEG) marks a transformative innovation, empowering motor-impaired individuals to engage with their environment on equal footing. Despite its promising potential, developing subject-invariant and session-invariant BCI systems remains a significant challenge due to the inherent complexity and variability of neural activity across individuals and over time, compounded by EEG hardware constraints. While prior studies have sought to develop robust BCI systems, existing approaches remain ineffective in capturing the intricate spatiotemporal dependencies within multichannel EEG signals. This study addresses this gap by introducing the attentive graph-temporal convolutional network (AGTCNet), a novel graph-temporal model for motor imagery EEG (MI-EEG) classification. Specifically, AGTCNet leverages the topographic configuration of EEG electrodes as an inductive bias and integrates graph convolutional attention network (GCAT) to jointly learn expressive spatiotemporal EEG representations. The proposed model significantly outperformed existing MI-EEG classifiers, achieving state-of-the-art performance while utilizing a compact architecture, underscoring its effectiveness and practicality for BCI deployment. With a 49.87% reduction in model size, 64.65% faster inference time, and shorter input EEG signal, AGTCNet achieved a moving average accuracy of 66.82% for subject-independent classification on the BCI Competition IV Dataset 2a, which further improved to 82.88% when fine-tuned for subject-specific classification. On the EEG Motor Movement/Imagery Dataset, AGTCNet achieved moving average accuracies of 64.14% and 85.22% for 4-class and 2-class subject-independent classifications, respectively, with further improvements to 72.13% and 90.54% for subject-specific classifications.

nan


Article 1244

Title@2025-06-26 (4): A Scalable Quantum Neural Network for Approximate SRBB-Based Unitary Synthesis

Title: A Scalable Quantum Neural Network for Approximate SRBB-Based Unitary Synthesis Ein skalierbares Quantum-Neural-Netzwerk für annähernde SRBB-basierte Einheitssynthese 近似基于SRBB的单一合成的可缩放量量子神经网络 2412.03083v2

Authors (3): Giacomo Belli, Marco Mordacci, Michele Amoretti

In this work, a scalable quantum neural network is introduced as a means to approximate any unitary evolution through the Standard Recursive Block Basis (SRBB) and, subsequently, redesigned with a number of CNOTs asymptotically reduced by an exponential contribution. This algebraic approach to the problem of unitary synthesis exploits Lie algebras and their topological features to obtain scalable parameterizations of unitary operators. First, the original SRBB-based scalability scheme, already known in the literature only from a theoretical point of view, is reformulated for efficient algorithm implementation and complexity management. Remarkably, 2-qubit operators emerge as a special case outside the original scaling scheme. Furthermore, an algorithm is proposed to reduce the number of CNOTs, thus deriving a new implementable scaling scheme that requires only one layer of approximation. The scalable CNOT-reduced quantum neural network is implemented and its performance is assessed with a variety of different unitary matrices, both sparse and dense, up to 6 qubits via the PennyLane library. The effectiveness of the approximation is measured with different metrics in relation to two optimizers: a gradient-based method and the Nelder-Mead method. The approximate CNOT-reduced SRBB-based synthesis algorithm is also tested on real hardware and compared with other valid approximation and decomposition methods available in the literature.

nan


Article 1245

Title@2025-06-26 (4): ScaleGNN: Towards Scalable Graph Neural Networks via Adaptive High-order Neighboring Feature Fusion

Title: ScaleGNN: Towards Scalable Graph Neural Networks via Adaptive High-order Neighboring Feature Fusion ScaleGNN: Auf dem Weg zu skalierbaren Graphen-Neuralnetzwerken über adaptive High-Order Neighboring Feature Fusion SASGNN:通过适应性高顺序相邻相邻地貌融合,走向可缩放的图形神经网络 2504.15920v4

Authors (8): Xiang Li, Jianpeng Qi, Haobing Liu, Yuan Cao, Guoqing Chao, Zhongying Zhao, Junyu Dong, Yanwei Yu

Graph Neural Networks (GNNs) have demonstrated impressive performance across diverse graph-based tasks by leveraging message passing to capture complex node relationships. However, when applied to large-scale real-world graphs, GNNs face two major challenges: First, it becomes increasingly difficult to ensure both scalability and efficiency, as the repeated aggregation of large neighborhoods leads to significant computational overhead; Second, the over-smoothing problem arises, where excessive or deep propagation makes node representations indistinguishable, severely hindering model expressiveness. To tackle these issues, we propose ScaleGNN, a novel framework that adaptively fuses multi-hop node features for both scalable and effective graph learning. First, we construct per-hop pure neighbor matrices that capture only the exclusive structural information at each hop, avoiding the redundancy of conventional aggregation. Then, an enhanced feature fusion strategy significantly balances low-order and high-order information, preserving both local detail and global correlations without incurring excessive complexity. To further reduce redundancy and over-smoothing, we introduce a Local Contribution Score (LCS)-based masking mechanism to filter out less relevant high-order neighbors, ensuring that only the most meaningful information is aggregated. In addition, learnable sparse constraints selectively integrate multi-hop valuable features, emphasizing the most informative high-order neighbors. Extensive experiments on real-world datasets demonstrate that ScaleGNN consistently outperforms state-of-the-art GNNs in both predictive accuracy and computational efficiency, highlighting its practical value for large-scale graph learning.

nan


Article 1246

Title@2025-06-26 (4): Latent Prototype Routing: Achieving Near-Perfect Load Balancing in Mixture-of-Experts

Title: Latent Prototype Routing: Achieving Near-Perfect Load Balancing in Mixture-of-Experts Latent Prototype Routing: Erzielen einer nahezu perfekten Lastabgleichung in Mixture-of-Experts 原型原型路由:在混合专家中实现近效果负载平衡 2506.21328v1

Authors (1): Jiajie Yang

Mixture-of-Experts (MoE) architectures have emerged as a key strategy for scaling large language models (LLMs) efficiently. However, current MoE systems suffer from severe load imbalance, where only a small subset of experts is consistently activated during training and inference, leading to significant underutilization of model capacity and computational resources. In this work, we revisit expert routing through a clustering perspective and propose Latent Prototype Routing (LPR), a novel routing framework that generalizes existing approaches while promoting balanced expert utilization without compromising downstream performance. Extensive experiments across multiple open-source MoE models – including DeepSeek-V3, Qwen3-MoE, and Mixtral – demonstrate that LPR reduces the Gini coefficient of expert load from 0.70 to 0.035 on average, improves the min-max expert load ratio from 1e-6 to 0.70, achieving near-perfect load balancing.

nan


Article 1247

Title@2025-06-26 (4): Stochastic Quantum Spiking Neural Networks with Quantum Memory and Local Learning

Title: Stochastic Quantum Spiking Neural Networks with Quantum Memory and Local Learning Stochastische Quantum-Spiking-Neuralnetzwerke mit Quantengedächtnis und lokalem Lernen 具有量子内存和本地学习的实测量量谱剖析神经网络 2506.21324v1

Authors (3): Jiechen Chen, Bipin Rajendran, Osvaldo Simeone

Neuromorphic and quantum computing have recently emerged as promising paradigms for advancing artificial intelligence, each offering complementary strengths. Neuromorphic systems built on spiking neurons excel at processing time-series data efficiently through sparse, event-driven computation, consuming energy only upon input events. Quantum computing, on the other hand, leverages superposition and entanglement to explore feature spaces that are exponentially large in the number of qubits. Hybrid approaches combining these paradigms have begun to show potential, but existing quantum spiking models have important limitations. Notably, prior quantum spiking neuron implementations rely on classical memory mechanisms on single qubits, requiring repeated measurements to estimate firing probabilities, and they use conventional backpropagation on classical simulators for training. Here we propose a stochastic quantum spiking (SQS) neuron model that addresses these challenges. The SQS neuron uses multi-qubit quantum circuits to realize a spiking unit with internal quantum memory, enabling event-driven probabilistic spike generation in a single shot. Furthermore, we outline how networks of SQS neurons – dubbed SQS neural networks (SQSNNs) – can be trained via a hardware-friendly local learning rule, eliminating the need for global classical backpropagation. The proposed SQSNN model fuses the time-series efficiency of neuromorphic computing with the exponentially large inner state space of quantum computing, paving the way for quantum spiking neural networks that are modular, scalable, and trainable on quantum hardware.

nan


Article 1248

Title@2025-06-26 (4): On Uniform Weighted Deep Polynomial approximation

Title: On Uniform Weighted Deep Polynomial approximation Auf einheitliche Gewichte tiefe Polynom-Annäherung 统一加权深多元近似值 2506.21306v1

Authors (2): Kingsley Yeon, Steven B. Damelin

It is a classical result in rational approximation theory that certain non-smooth or singular functions, such as $ x $ and $x^{1/p}$, can be efficiently approximated using rational functions with root-exponential convergence in terms of degrees of freedom \cite{Sta, GN}. In contrast, polynomial approximations admit only algebraic convergence by Jackson’s theorem \cite{Lub2}. Recent work shows that composite polynomial architectures can recover exponential approximation rates even without smoothness \cite{KY}. In this work, we introduce and analyze a class of weighted deep polynomial approximants tailored for functions with asymmetric behavior-growing unbounded on one side and decaying on the other. By multiplying a learnable deep polynomial with a one-sided weight, we capture both local non-smoothness and global growth. We show numerically that this framework outperforms Taylor, Chebyshev, and standard deep polynomial approximants, even when all use the same number of parameters. To optimize these approximants in practice, we propose a stable graph-based parameterization strategy building on \cite{Jar}.

nan


Article 1249

Title@2025-06-26 (4): Context-Aware Doubly-Robust Semi-Supervised Learning

Title: Context-Aware Doubly-Robust Semi-Supervised Learning Kontext-Bewusst Doppel-Robust Semi-überwachtes Lernen Doubly-Robust半监督学习 2502.15577v2

Authors (4): Clement Ruah, Houssem Sifaou, Osvaldo Simeone, Bashir Al-Hashimi

The widespread adoption of artificial intelligence (AI) in next-generation communication systems is challenged by the heterogeneity of traffic and network conditions, which call for the use of highly contextual, site-specific, data. A promising solution is to rely not only on real-world data, but also on synthetic pseudo-data generated by a network digital twin (NDT). However, the effectiveness of this approach hinges on the accuracy of the NDT, which can vary widely across different contexts. To address this problem, this paper introduces context-aware doubly-robust (CDR) learning, a novel semi-supervised scheme that adapts its reliance on the pseudo-data to the different levels of fidelity of the NDT across contexts. CDR is evaluated on the task of downlink beamforming where it outperforms previous state-of-the-art approaches, providing a 24% loss decrease when compared to doubly-robust (DR) semi-supervised learning in regimes with low labeled data availability.

nan


Article 1250

Title@2025-06-26 (4): Semantic Scene Graph for Ultrasound Image Explanation and Scanning Guidance

Title: Semantic Scene Graph for Ultrasound Image Explanation and Scanning Guidance Semantische Szenegrafik für Ultrasound-Bilderklärung und Scan-Anleitung 超声超声图像解释和扫描指导的语义谱图 2506.19683v2

Authors (5): Xuesong Li, Dianye Huang, Yameng Zhang, Nassir Navab, Zhongliang Jiang

Understanding medical ultrasound imaging remains a long-standing challenge due to significant visual variability caused by differences in imaging and acquisition parameters. Recent advancements in large language models (LLMs) have been used to automatically generate terminology-rich summaries orientated to clinicians with sufficient physiological knowledge. Nevertheless, the increasing demand for improved ultrasound interpretability and basic scanning guidance among non-expert users, e.g., in point-of-care settings, has not yet been explored. In this study, we first introduce the scene graph (SG) for ultrasound images to explain image content to ordinary and provide guidance for ultrasound scanning. The ultrasound SG is first computed using a transformer-based one-stage method, eliminating the need for explicit object detection. To generate a graspable image explanation for ordinary, the user query is then used to further refine the abstract SG representation through LLMs. Additionally, the predicted SG is explored for its potential in guiding ultrasound scanning toward missing anatomies within the current imaging view, assisting ordinary users in achieving more standardized and complete anatomical exploration. The effectiveness of this SG-based image explanation and scanning guidance has been validated on images from the left and right neck regions, including the carotid and thyroid, across five volunteers. The results demonstrate the potential of the method to maximally democratize ultrasound by enhancing its interpretability and usability for ordinaries.

nan


Article 1251

Title@2025-06-26 (4): Exploring Adapter Design Tradeoffs for Low Resource Music Generation

Title: Exploring Adapter Design Tradeoffs for Low Resource Music Generation Erforschung von Adapter-Design-Tradeoffs für Low Resource Music Generation 探索用于低资源音乐制作的适应设计取舍 2506.21298v1

Authors (3): Atharva Mehta, Shivam Chauhan, Monojit Choudhury

Fine-tuning large-scale music generation models, such as MusicGen and Mustango, is a computationally expensive process, often requiring updates to billions of parameters and, therefore, significant hardware resources. Parameter-Efficient Fine-Tuning (PEFT) techniques, particularly adapter-based methods, have emerged as a promising alternative, enabling adaptation with minimal trainable parameters while preserving model performance. However, the design choices for adapters, including their architecture, placement, and size, are numerous, and it is unclear which of these combinations would produce optimal adapters and why, for a given case of low-resource music genre. In this paper, we attempt to answer this question by studying various adapter configurations for two AI music models, MusicGen and Mustango, on two genres: Hindustani Classical and Turkish Makam music. Our findings reveal distinct trade-offs: convolution-based adapters excel in capturing fine-grained local musical details such as ornamentations and short melodic phrases, while transformer-based adapters better preserve long-range dependencies crucial for structured improvisation. Additionally, we analyze computational resource requirements across different adapter scales, demonstrating how mid-sized adapters (40M parameters) achieve an optimal balance between expressivity and quality. Furthermore, we find that Mustango, a diffusion-based model, generates more diverse outputs with better adherence to the description in the input prompt while lacking in providing stability in notes, rhythm alignment, and aesthetics. Also, it is computationally intensive and requires significantly more time to train. In contrast, autoregressive models like MusicGen offer faster training and are more efficient, and can produce better quality output in comparison, but have slightly higher redundancy in their generations.

nan


Article 1252

Title@2025-06-26 (4): Devil’s Hand: Data Poisoning Attacks to Locally Private Graph Learning Protocols

Title: Devil’s Hand: Data Poisoning Attacks to Locally Private Graph Learning Protocols Teufelshand: Daten vergiften Angriffe auf lokal private Graphen-Lernprotokolle 魔鬼之手:对本地私人图案学习程序的数据毒害攻击 2506.09803v2

Authors (6): Longzhu He, Chaozhuo Li, Peng Tang, Li Sun, Sen Su, Philip S. Yu

Graph neural networks (GNNs) have achieved significant success in graph representation learning and have been applied to various domains. However, many real-world graphs contain sensitive personal information, such as user profiles in social networks, raising serious privacy concerns when graph learning is performed using GNNs. To address this issue, locally private graph learning protocols have gained considerable attention. These protocols leverage the privacy advantages of local differential privacy (LDP) and the effectiveness of GNN’s message-passing in calibrating noisy data, offering strict privacy guarantees for users’ local data while maintaining high utility (e.g., node classification accuracy) for graph learning. Despite these advantages, such protocols may be vulnerable to data poisoning attacks, a threat that has not been considered in previous research. Identifying and addressing these threats is crucial for ensuring the robustness and security of privacy-preserving graph learning frameworks. This work introduces the first data poisoning attack targeting locally private graph learning protocols. The attacker injects fake users into the protocol, manipulates these fake users to establish links with genuine users, and sends carefully crafted data to the server, ultimately compromising the utility of private graph learning. The effectiveness of the attack is demonstrated both theoretically and empirically. In addition, several defense strategies have also been explored, but their limited effectiveness highlights the need for more robust defenses.

nan


Article 1253

Title@2025-06-26 (4): Improved seeding strategies for k-means and k-GMM

Title: Improved seeding strategies for k-means and k-GMM Verbesserte Saatstrategien für k-Mittel und k-GMM 改进k-手段和k-GMM和k-GMM的播种战略 2506.21291v1

Authors (2): Guillaume Carrière, Frédéric Cazals

We revisit the randomized seeding techniques for k-means clustering and k-GMM (Gaussian Mixture model fitting with Expectation-Maximization), formalizing their three key ingredients: the metric used for seed sampling, the number of candidate seeds, and the metric used for seed selection. This analysis yields novel families of initialization methods exploiting a lookahead principle–conditioning the seed selection to an enhanced coherence with the final metric used to assess the algorithm, and a multipass strategy to tame down the effect of randomization. Experiments show a consistent constant factor improvement over classical contenders in terms of the final metric (SSE for k-means, log-likelihood for k-GMM), at a modest overhead. In particular, for k-means, our methods improve on the recently designed multi-swap strategy, which was the first one to outperform the greedy k-means++ seeding. Our experimental analysis also shed light on subtle properties of k-means often overlooked, including the (lack of) correlations between the SSE upon seeding and the final SSE, the variance reduction phenomena observed in iterative seeding methods, and the sensitivity of the final SSE to the pool size for greedy methods. Practically, our most effective seeding methods are strong candidates to become one of the–if not the–standard techniques. From a theoretical perspective, our formalization of seeding opens the door to a new line of analytical approaches.

nan


Article 1254

Title@2025-06-26 (4): Small Encoders Can Rival Large Decoders in Detecting Groundedness

Title: Small Encoders Can Rival Large Decoders in Detecting Groundedness Kleine Encoder können große Decoder bei der Erkennung von Erdlichkeit rivalisieren 在地面探测中能够使大型分离器在探测地面时发生迭接 2506.21288v1

Authors (7): Istabrak Abbes, Gabriele Prato, Quentin Fournier, Fernando Rodriguez, Alaa Boukhary, Adam Elwood, Sarath Chandar

Augmenting large language models (LLMs) with external context significantly improves their performance in natural language processing (NLP) tasks. However, LLMs struggle to answer queries reliably when the provided context lacks information, often resorting to ungrounded speculation or internal knowledge. Groundedness - generating responses strictly supported by the context - is essential for ensuring factual consistency and trustworthiness. This study focuses on detecting whether a given query is grounded in a document provided in context before the costly answer generation by LLMs. Such a detection mechanism can significantly reduce both inference time and resource consumption. We show that lightweight, task specific encoder models such as RoBERTa and NomicBERT, fine-tuned on curated datasets, can achieve accuracy comparable to state-of-the-art LLMs, such as Llama3 8B and GPT4o, in groundedness detection while reducing inference latency by orders of magnitude. The code is available at : https://github.com/chandarlab/Hallucinate-less

nan


Article 1255

Title@2025-06-26 (4): Energy Matching: Unifying Flow Matching and Energy-Based Models for Generative Modeling

Title: Energy Matching: Unifying Flow Matching and Energy-Based Models for Generative Modeling Energy Matching: Zusammenführen von Flow Matching- und Energy-Based-Modellen für die Generative Modellierung 能源匹配:统一流动匹配和以能源为基础的生成模型模型 2504.10612v4

Authors (8): Michal Balcerak, Tamaz Amiranashvili, Antonio Terpin, Suprosanna Shit, Lea Bogensperger, Sebastian Kaltenbach, Petros Koumoutsakos, Bjoern Menze

The most widely used generative models map noise and data distributions by matching flows or scores. However, they struggle to incorporate partial observations and additional priors–something energy-based models (EBMs) handle elegantly by simply adding corresponding scalar energy terms. We address this issue by proposing Energy Matching, a framework that endows flow-based approaches with the flexibility of EBMs. Far from the data manifold, samples move along curl-free, optimal transport paths from noise to data. As they approach the data manifold, an entropic energy term guides the system into a Boltzmann equilibrium distribution, explicitly capturing the underlying likelihood structure of the data. We parameterize this dynamic with a single time-independent scalar field, which serves as both a powerful generator and a flexible prior for effective regularization of inverse problems. Our method substantially outperforms existing EBMs on CIFAR-10 and ImageNet generation in terms of fidelity, while retaining simulation-free training of transport-based approaches away from the data manifold. Furthermore, we leverage the method’s flexibility to introduce an interaction energy that supports diverse mode exploration, which we demonstrate in a controlled protein-generation setting. Our approach focuses on learning a scalar potential energy–without time-conditioning, auxiliary generators, or additional networks–which marks a significant departure from recent EBM methods. We believe that this simplified framework significantly advances EBMs capabilities and paves the way for their wider adoption in generative modeling across diverse domains.

nan


Article 1256

Title@2025-06-26 (4): Hyperspherical Variational Autoencoders Using Efficient Spherical Cauchy Distribution

Title: Hyperspherical Variational Autoencoders Using Efficient Spherical Cauchy Distribution Hypersphärische Variations-Autoencoder mit effizienter sphärischer Cauchy-Distribution 使用高效球道球道配送的超球变异自动编码器 2506.21278v1

Authors (2): Lukas Sablica, Kurt Hornik

We propose a novel variational autoencoder (VAE) architecture that employs a spherical Cauchy (spCauchy) latent distribution. Unlike traditional Gaussian latent spaces or the widely used von Mises-Fisher (vMF) distribution, spCauchy provides a more natural hyperspherical representation of latent variables, better capturing directional data while maintaining flexibility. Its heavy-tailed nature prevents over-regularization, ensuring efficient latent space utilization while offering a more expressive representation. Additionally, spCauchy circumvents the numerical instabilities inherent to vMF, which arise from computing normalization constants involving Bessel functions. Instead, it enables a fully differentiable and efficient reparameterization trick via M"obius transformations, allowing for stable and scalable training. The KL divergence can be computed through a rapidly converging power series, eliminating concerns of underflow or overflow associated with evaluation of ratios of hypergeometric functions. These properties make spCauchy a compelling alternative for VAEs, offering both theoretical advantages and practical efficiency in high-dimensional generative modeling.

nan


Article 1257

Title@2025-06-26 (4): Lagrangian Index Policy for Restless Bandits with Average Reward

Title: Lagrangian Index Policy for Restless Bandits with Average Reward Lagrangian Index Policy for Restless Bandits with Average Reward 以平均回报率衡量的无休无休止强盗拉格朗加指数政策 2412.12641v2

Authors (3): Konstantin Avrachenkov, Vivek S. Borkar, Pratik Shah

We study the Lagrange Index Policy (LIP) for restless multi-armed bandits with long-run average reward. In particular, we compare the performance of LIP with the performance of the Whittle Index Policy (WIP), both heuristic policies known to be asymptotically optimal under certain natural conditions. Even though in most cases their performances are very similar, in the cases when WIP shows bad performance, LIP continues to perform very well. We then propose reinforcement learning algorithms, both tabular and NN-based, to obtain online learning schemes for LIP in the model-free setting. The proposed reinforcement learning schemes for LIP require significantly less memory than the analogous schemes for WIP. We calculate analytically the Lagrange index for the restart model, which applies to the optimal web crawling and the minimization of the weighted age of information. We also give a new proof of asymptotic optimality in case of homogeneous arms as the number of arms goes to infinity, based on exchangeability and de Finetti’s theorem.

nan


Article 1258

Title@2025-06-26 (4): A GREAT Architecture for Edge-Based Graph Problems Like TSP

Title: A GREAT Architecture for Edge-Based Graph Problems Like TSP Eine großartige Architektur für Edge-Based Graph Probleme wie TSP 象TSP那样的边缘图表问题大建筑 2408.16717v2

Authors (5): Attila Lischka, Filip Rydin, Jiaming Wu, Morteza Haghir Chehreghani, Balázs Kulcsár

In the last years, many learning-based approaches have been proposed to tackle combinatorial optimization problems such as routing problems. Many of these approaches are based on graph neural networks (GNNs) or related transformers, operating on the Euclidean coordinates representing the routing problems. However, models operating on Euclidean coordinates are ill-suited for non-Euclidean, asymmetric problem instances that are often found in real-world settings. To overcome this limitation, we propose a novel GNN-based and edge-focused neural model called Graph Edge Attention Network (GREAT). Using GREAT as an encoder to capture the properties of a routing problem instance, we build a reinforcement learning framework which we apply to Euclidean and non-Euclidean variants of vehicle routing problems such as Traveling Salesman Problem, Capacitated Vehicle Routing Problem and Orienteering Problem. Our framework is among the first to tackle non-Euclidean variants of these problems and achieves competitive results among learning-based solvers.

nan


Article 1259

Title@2025-06-26 (4): These Are Not All the Features You Are Looking For: A Fundamental Bottleneck in Supervised Pretraining

Title: These Are Not All the Features You Are Looking For: A Fundamental Bottleneck in Supervised Pretraining Diese sind nicht alle Funktionen, die Sie suchen: Ein grundlegender Engpass in überwachten Pretraining 这些不是所有你正在寻找的特征: 受监督预科班的基本瓶颈。 2506.18221v2

Authors (3): Xingyu Alice Yang, Jianyu Zhang, Léon Bottou

Transfer learning is a cornerstone of modern machine learning, promising a way to adapt models pretrained on a broad mix of data to new tasks with minimal new data. However, a significant challenge remains in ensuring that transferred features are sufficient to handle unseen datasets, amplified by the difficulty of quantifying whether two tasks are “related”. To address these challenges, we evaluate model transfer from a pretraining mixture to each of its component tasks, assessing whether pretrained features can match the performance of task-specific direct training. We identify a fundamental limitation in deep learning models – an “information saturation bottleneck” – where networks fail to learn new features once they encode similar competing features during training. When restricted to learning only a subset of key features during pretraining, models will permanently lose critical features for transfer and perform inconsistently on data distributions, even components of the training mixture. Empirical evidence from published studies suggests that this phenomenon is pervasive in deep learning architectures – factors such as data distribution or ordering affect the features that current representation learning methods can learn over time. This study suggests that relying solely on large-scale networks may not be as effective as focusing on task-specific training, when available. We propose richer feature representations as a potential solution to better generalize across new datasets and, specifically, present existing methods alongside a novel approach, the initial steps towards addressing this challenge.

nan


Article 1260

Title@2025-06-26 (4): DiLoCoX: A Low-Communication Large-Scale Training Framework for Decentralized Cluster

Title: DiLoCoX: A Low-Communication Large-Scale Training Framework for Decentralized Cluster DiLoCoX: Ein kommunikationsarmer groß angelegter Ausbildungsrahmen für dezentralisierte Cluster DILOCOX:权力下放小组的低通信大范围培训框架 2506.21263v1

Authors (9): Ji Qi, WenPeng Zhu, Li Li, Ming Wu, YingJun Wu, Wu He, Xun Gao, Jason Zeng, Michael Heinrich

The distributed training of foundation models, particularly large language models (LLMs), demands a high level of communication. Consequently, it is highly dependent on a centralized cluster with fast and reliable interconnects. Can we conduct training on slow networks and thereby unleash the power of decentralized clusters when dealing with models exceeding 100 billion parameters? In this paper, we propose DiLoCoX, a low-communication large-scale decentralized cluster training framework. It combines Pipeline Parallelism with Dual Optimizer Policy, One-Step-Delay Overlap of Communication and Local Training, and an Adaptive Gradient Compression Scheme. This combination significantly improves the scale of parameters and the speed of model pre-training. We justify the benefits of one-step-delay overlap of communication and local training, as well as the adaptive gradient compression scheme, through a theoretical analysis of convergence. Empirically, we demonstrate that DiLoCoX is capable of pre-training a 107B foundation model over a 1Gbps network. Compared to vanilla AllReduce, DiLoCoX can achieve a 357x speedup in distributed training while maintaining negligible degradation in model convergence. To the best of our knowledge, this is the first decentralized training framework successfully applied to models with over 100 billion parameters.

nan


Article 1261

Title@2025-06-26 (4): Simulating Hard Attention Using Soft Attention

Title: Simulating Hard Attention Using Soft Attention Simulation der harten Aufmerksamkeit mit weicher Aufmerksamkeit 使用软关注模拟硬关注 2412.09925v2

Authors (4): Andy Yang, Lena Strobl, David Chiang, Dana Angluin

We study conditions under which transformers using soft attention can simulate hard attention, that is, effectively focus all attention on a subset of positions. First, we examine several subclasses of languages recognized by hard-attention transformers, which can be defined in variants of linear temporal logic. We demonstrate how soft-attention transformers can compute formulas of these logics using unbounded positional embeddings or temperature scaling. Second, we demonstrate how temperature scaling allows softmax transformers to simulate general hard-attention transformers, using a temperature that depends on the minimum gap between the maximum attention scores and other attention scores.

nan


Article 1262

Title@2025-06-26 (4): Wavelet Diffusion Neural Operator

Title: Wavelet Diffusion Neural Operator Wavelet Diffusions-Neuraloperator Wavelet 扩散神经操作员 2412.04833v3

Authors (10): Peiyan Hu, Rui Wang, Xiang Zheng, Tao Zhang, Haodong Feng, Ruiqi Feng, Long Wei, Yue Wang, Zhi-Ming Ma, Tailin Wu

Simulating and controlling physical systems described by partial differential equations (PDEs) are crucial tasks across science and engineering. Recently, diffusion generative models have emerged as a competitive class of methods for these tasks due to their ability to capture long-term dependencies and model high-dimensional states. However, diffusion models typically struggle with handling system states with abrupt changes and generalizing to higher resolutions. In this work, we propose Wavelet Diffusion Neural Operator (WDNO), a novel PDE simulation and control framework that enhances the handling of these complexities. WDNO comprises two key innovations. Firstly, WDNO performs diffusion-based generative modeling in the wavelet domain for the entire trajectory to handle abrupt changes and long-term dependencies effectively. Secondly, to address the issue of poor generalization across different resolutions, which is one of the fundamental tasks in modeling physical systems, we introduce multi-resolution training. We validate WDNO on five physical systems, including 1D advection equation, three challenging physical systems with abrupt changes (1D Burgers’ equation, 1D compressible Navier-Stokes equation and 2D incompressible fluid), and a real-world dataset ERA5, which demonstrates superior performance on both simulation and control tasks over state-of-the-art methods, with significant improvements in long-term and detail prediction accuracy. Remarkably, in the challenging context of the 2D high-dimensional and indirect control task aimed at reducing smoke leakage, WDNO reduces the leakage by 78% compared to the second-best baseline. The code can be found at https://github.com/AI4Science-WestlakeU/wdno.git.

nan


Article 1263

Title@2025-06-26 (4): Radio Map Estimation via Latent Domain Plug-and-Play Denoising

Title: Radio Map Estimation via Latent Domain Plug-and-Play Denoising Radiokarte Schätzung über Latent Domain Plug-and-Play Denoising 通过Latent Domain Plug 和 Play Disoising 无线电地图估计 2501.13472v2

Authors (5): Le Xu, Lei Cheng, Junting Chen, Wenqiang Pu, Xiao Fu

Radio map estimation (RME), also known as spectrum cartography, aims to reconstruct the strength of radio interference across different domains (e.g., space and frequency) from sparsely sampled measurements. To tackle this typical inverse problem, state-of-the-art RME methods rely on handcrafted or data-driven structural information of radio maps. However, the former often struggles to model complex radio frequency (RF) environments and the latter requires excessive training – making it hard to quickly adapt to in situ sensing tasks. This work presents a spatio-spectral RME approach based on plug-and-play (PnP) denoising, a technique from computational imaging. The idea is to leverage the observation that the denoising operations of signals like natural images and radio maps are similar – despite the nontrivial differences of the signals themselves. Hence, sophisticated denoisers designed for or learned from natural images can be directly employed to assist RME, avoiding using radio map data for training. Unlike conventional PnP methods that operate directly in the data domain, the proposed method exploits the underlying physical structure of radio maps and proposes an ADMM algorithm that denoises in a latent domain. This design significantly improves computational efficiency and enhances noise robustness. Theoretical aspects, e.g., recoverability of the complete radio map and convergence of the ADMM algorithm are analyzed. Synthetic and real data experiments are conducted to demonstrate the effectiveness of our approach.

nan


Article 1264

Title@2025-06-26 (4): From On-chain to Macro: Assessing the Importance of Data Source Diversity in Cryptocurrency Market Forecasting

Title: From On-chain to Macro: Assessing the Importance of Data Source Diversity in Cryptocurrency Market Forecasting Von der On-Chain zum Makro: Bewertung der Bedeutung der Datenquellenvielfalt in der Kryptowährungsmarktprognose 从连网到宏观:评估数据来源多样性在加密货币市场预测中的重要性 2506.21246v1

Authors (3): Giorgos Demosthenous, Chryssis Georgiou, Eliada Polydorou

This study investigates the impact of data source diversity on the performance of cryptocurrency forecasting models by integrating various data categories, including technical indicators, on-chain metrics, sentiment and interest metrics, traditional market indices, and macroeconomic indicators. We introduce the Crypto100 index, representing the top 100 cryptocurrencies by market capitalization, and propose a novel feature reduction algorithm to identify the most impactful and resilient features from diverse data sources. Our comprehensive experiments demonstrate that data source diversity significantly enhances the predictive performance of forecasting models across different time horizons. Key findings include the paramount importance of on-chain metrics for both short-term and long-term predictions, the growing relevance of traditional market indices and macroeconomic indicators for longer-term forecasts, and substantial improvements in model accuracy when diverse data sources are utilized. These insights help demystify the short-term and long-term driving factors of the cryptocurrency market and lay the groundwork for developing more accurate and resilient forecasting models.

nan


Article 1265

Title@2025-06-26 (4): Zero-Shot Learning for Obsolescence Risk Forecasting

Title: Zero-Shot Learning for Obsolescence Risk Forecasting Zero-Shot-Lernen für Obsoleszenz-Risikoprognosen 用于悬浮风险预测的零热学习 2506.21240v1

Authors (7): Elie Saad, Aya Mrabah, Mariem Besbes, Marc Zolghadri, Victor Czmil, Claude Baron, Vincent Bourgeois

Component obsolescence poses significant challenges in industries reliant on electronic components, causing increased costs and disruptions in the security and availability of systems. Accurate obsolescence risk prediction is essential but hindered by a lack of reliable data. This paper proposes a novel approach to forecasting obsolescence risk using zero-shot learning (ZSL) with large language models (LLMs) to address data limitations by leveraging domain-specific knowledge from tabular datasets. Applied to two real-world datasets, the method demonstrates effective risk prediction. A comparative evaluation of four LLMs underscores the importance of selecting the right model for specific forecasting tasks.

nan


Article 1266

Title@2025-06-26 (4): Capturing Style in Author and Document Representation

Title: Capturing Style in Author and Document Representation Stil in der Autor- und Dokumentdarstellung erfassen 在作者和文件代表中获取样式 2407.13358v2

Authors (3): Enzo Terreau, Antoine Gourru, Julien Velcin

A wide range of Deep Natural Language Processing (NLP) models integrates continuous and low dimensional representations of words and documents. Surprisingly, very few models study representation learning for authors. These representations can be used for many NLP tasks, such as author identification and classification, or in recommendation systems. A strong limitation of existing works is that they do not explicitly capture writing style, making them hardly applicable to literary data. We therefore propose a new architecture based on Variational Information Bottleneck (VIB) that learns embeddings for both authors and documents with a stylistic constraint. Our model fine-tunes a pre-trained document encoder. We stimulate the detection of writing style by adding predefined stylistic features making the representation axis interpretable with respect to writing style indicators. We evaluate our method on three datasets: a literary corpus extracted from the Gutenberg Project, the Blog Authorship Corpus and IMDb62, for which we show that it matches or outperforms strong/recent baselines in authorship attribution while capturing much more accurately the authors stylistic aspects.

nan


Article 1267

Title@2025-06-26 (4): Rapid Gyroscope Calibration: A Deep Learning Approach

Title: Rapid Gyroscope Calibration: A Deep Learning Approach Schnelle Gyroskop-Kalibrierung: Ein tiefer Lernansatz 快速热波校准:深学习方法 2409.00488v3

Authors (2): Yair Stolero, Itzik Klein

Low-cost gyroscope calibration is essential for ensuring the accuracy and reliability of gyroscope measurements. Stationary calibration estimates the deterministic parts of measurement errors. To this end, a common practice is to average the gyroscope readings during a predefined period and estimate the gyroscope bias. Calibration duration plays a crucial role in performance, therefore, longer periods are preferred. However, some applications require quick startup times and calibration is therefore allowed only for a short time. In this work, we focus on reducing low-cost gyroscope calibration time using deep learning methods. We propose an end-to-end convolutional neural network for the application of gyroscope calibration. We explore the possibilities of using multiple real and virtual gyroscopes to improve the calibration performance of single gyroscopes. To train and validate our approach, we recorded a dataset consisting of 186.6 hours of gyroscope readings, using 36 gyroscopes of four different brands. We also created a virtual dataset consisting of simulated gyroscope readings. The six datasets were used to evaluate our proposed approach. One of our key achievements in this work is reducing gyroscope calibration time by up to 89% using three low-cost gyroscopes. Our dataset is publicly available to allow reproducibility of our work and to increase research in the field.

nan


Article 1268

Title@2025-06-26 (4): Complexity-aware fine-tuning

Title: Complexity-aware fine-tuning Komplexitätsbewusste Feinabstimmung 复杂度认知微调 2506.21220v1

Authors (5): Andrey Goncharov, Daniil Vyazhev, Petr Sychev, Edvard Khalafyan, Alexey Zaytsev

General-purpose Large Language Models (LLMs) are frequently fine-tuned through supervised fine-tuning (SFT) to enhance performance in specific domains. Better results can be achieved by distilling the chain-of-thought of a larger model at the cost of numerous expensive calls and a much greater amount of data. We propose a novel blueprint for efficient fine-tuning that uses reasoning only for complex data identified by entropy. Specifically, across two small open models ($\approx 3B$) we split the training data into complexity categories by a single token answer entropy (ROC AUC $0.73$), fine-tune large language models (LLMs) via SFT and distillation, and show that our pipeline significantly outperforms the standard SFT approach ($0.55$ vs $0.43$ average accuracy) and provides comparable with distillation performance while using $62\%$ less data ($0.55$ average accuracy for both). We publish our code and data to facilitate further research in this direction.

nan


Article 1269

Title@2025-06-26 (4): Balancing Privacy, Robustness, and Efficiency in Machine Learning

Title: Balancing Privacy, Robustness, and Efficiency in Machine Learning Ausbalancierende Privatsphäre, Robustheit und Effizienz im maschinellen Lernen 平衡隐私、强健和机器学习效率 2312.14712v3

Authors (3): Youssef Allouah, Rachid Guerraoui, John Stephan

This position paper argues that achieving robustness, privacy, and efficiency simultaneously in machine learning systems is infeasible under prevailing threat models. The tension between these goals arises not from algorithmic shortcomings but from structural limitations imposed by worst-case adversarial assumptions. We advocate for a systematic research agenda aimed at formalizing the robustness-privacy-efficiency trilemma, exploring how principled relaxations of threat models can unlock better trade-offs, and designing benchmarks that expose rather than obscure the compromises made. By shifting focus from aspirational universal guarantees to context-aware system design, the machine learning community can build models that are truly appropriate for real-world deployment.

nan


Article 1270

Title@2025-06-26 (4): Unveiling Causal Reasoning in Large Language Models: Reality or Mirage?

Title: Unveiling Causal Reasoning in Large Language Models: Reality or Mirage? Kausale Vernunft in großen Sprachmodellen enthüllen: Realität oder Mirage? 大语言模型中未解的因果理由:现实还是幻影? 2506.21215v1

Authors (8): Haoang Chi, He Li, Wenjing Yang, Feng Liu, Long Lan, Xiaoguang Ren, Tongliang Liu, Bo Han

Causal reasoning capability is critical in advancing large language models (LLMs) toward strong artificial intelligence. While versatile LLMs appear to have demonstrated capabilities in understanding contextual causality and providing responses that obey the laws of causality, it remains unclear whether they perform genuine causal reasoning akin to humans. However, current evidence indicates the contrary. Specifically, LLMs are only capable of performing shallow (level-1) causal reasoning, primarily attributed to the causal knowledge embedded in their parameters, but they lack the capacity for genuine human-like (level-2) causal reasoning. To support this hypothesis, methodologically, we delve into the autoregression mechanism of transformer-based LLMs, revealing that it is not inherently causal. Empirically, we introduce a new causal Q&A benchmark called CausalProbe-2024, whose corpora are fresh and nearly unseen for the studied LLMs. The LLMs exhibit a significant performance drop on CausalProbe-2024 compared to earlier benchmarks, indicating the fact that they primarily engage in level-1 causal reasoning. To bridge the gap towards level-2 causal reasoning, we draw inspiration from the fact that human reasoning is usually facilitated by general knowledge and intended goals. We propose G^2-Reasoner, a method that incorporates general knowledge and goal-oriented prompts into LLMs’ causal reasoning processes. Experiments demonstrate that G^2-Reasoner significantly enhances LLMs’ causal reasoning capability, particularly in fresh and counterfactual contexts. This work sheds light on a new path for LLMs to advance towards genuine causal reasoning, going beyond level-1 and making strides towards level-2.

nan


Article 1271

Title@2025-06-26 (4): Unsupervised Learning for Optimal Transport plan prediction between unbalanced graphs

Title: Unsupervised Learning for Optimal Transport plan prediction between unbalanced graphs Unüberwachtes Lernen für optimale Verkehrsplanungsvorhersage zwischen unausgewogenen Graphen 未受监督的优化交通学习计划预测 2506.12025v2

Authors (3): Sonia Mazelet, Rémi Flamary, Bertrand Thirion

Optimal transport between graphs, based on Gromov-Wasserstein and other extensions, is a powerful tool for comparing and aligning graph structures. However, solving the associated non-convex optimization problems is computationally expensive, which limits the scalability of these methods to large graphs. In this work, we present Unbalanced Learning of Optimal Transport (ULOT), a deep learning method that predicts optimal transport plans between two graphs. Our method is trained by minimizing the fused unbalanced Gromov-Wasserstein (FUGW) loss. We propose a novel neural architecture with cross-attention that is conditioned on the FUGW tradeoff hyperparameters. We evaluate ULOT on synthetic stochastic block model (SBM) graphs and on real cortical surface data obtained from fMRI. ULOT predicts transport plans with competitive loss up to two orders of magnitude faster than classical solvers. Furthermore, the predicted plan can be used as a warm start for classical solvers to accelerate their convergence. Finally, the predicted transport plan is fully differentiable with respect to the graph inputs and FUGW hyperparameters, enabling the optimization of functionals of the ULOT plan.

nan


Article 1272

Title@2025-06-26 (4): LLM-Based Human-Agent Collaboration and Interaction Systems: A Survey

Title: LLM-Based Human-Agent Collaboration and Interaction Systems: A Survey LLM-basierte human-agente Kooperations- und Interaktionssysteme: Eine Umfrage 以LLM为基础的人类-机构协作和互动系统:调查 2505.00753v4

Authors (15): Henry Peng Zou, Wei-Chieh Huang, Yaozu Wu, Yankai Chen, Chunyu Miao, Hoang Nguyen, Yue Zhou, Weizhi Zhang, Liancheng Fang, Langzhou He, Yangning Li, Dongyuan Li, Renhe Jiang, Xue Liu, Philip S. Yu

Recent advances in large language models (LLMs) have sparked growing interest in building fully autonomous agents. However, fully autonomous LLM-based agents still face significant challenges, including limited reliability due to hallucinations, difficulty in handling complex tasks, and substantial safety and ethical risks, all of which limit their feasibility and trustworthiness in real-world applications. To overcome these limitations, LLM-based human-agent systems (LLM-HAS) incorporate human-provided information, feedback, or control into the agent system to enhance system performance, reliability and safety. These human-agent collaboration systems enable humans and LLM-based agents to collaborate effectively by leveraging their complementary strengths. This paper provides the first comprehensive and structured survey of LLM-HAS. It clarifies fundamental concepts, systematically presents core components shaping these systems, including environment & profiling, human feedback, interaction types, orchestration and communication, explores emerging applications, and discusses unique challenges and opportunities arising from human-AI collaboration. By consolidating current knowledge and offering a structured overview, we aim to foster further research and innovation in this rapidly evolving interdisciplinary field. Paper lists and resources are available at https://github.com/HenryPengZou/Awesome-Human-Agent-Collaboration-Interaction-Systems.

nan


Article 1273

Title@2025-06-26 (4): Seal Your Backdoor with Variational Defense

Title: Seal Your Backdoor with Variational Defense Versiegeln Sie Ihre Hintertür mit abwechslungsreicher Verteidigung 以不同防御方式密封你的后门 2503.08829v2

Authors (3): Ivan Sabolić, Matej Grcić, Siniša Šegvić

We propose VIBE, a model-agnostic framework that trains classifiers resilient to backdoor attacks. The key concept behind our approach is to treat malicious inputs and corrupted labels from the training dataset as observed random variables, while the actual clean labels are latent. VIBE then recovers the corresponding latent clean label posterior through variational inference. The resulting training procedure follows the expectation-maximization (EM) algorithm. The E-step infers the clean pseudolabels by solving an entropy-regularized optimal transport problem, while the M-step updates the classifier parameters via gradient descent. Being modular, VIBE can seamlessly integrate with recent advancements in self-supervised representation learning, which enhance its ability to resist backdoor attacks. We experimentally validate the method effectiveness against contemporary backdoor attacks on standard datasets, a large-scale setup with 1$k$ classes, and a dataset poisoned with multiple attacks. VIBE consistently outperforms previous defenses across all tested scenarios.

nan


Article 1274

Title@2025-06-26 (4): Artificial Delegates Resolve Fairness Issues in Perpetual Voting with Partial Turnout

Title: Artificial Delegates Resolve Fairness Issues in Perpetual Voting with Partial Turnout Künstliche Delegierte lösen Fairness-Probleme bei der ständigen Abstimmung mit teilweiser Wahlbeteiligung 持部分投票票的永久表决中的人造代表解决公平问题 2506.21186v1

Authors (4): Apurva Shah, Axel Abels, Ann Nowé, Tom Lenaerts

Perpetual voting addresses fairness in sequential collective decision-making by evaluating representational equity over time. However, existing perpetual voting rules rely on full participation and complete approval information, assumptions that rarely hold in practice, where partial turnout is the norm. In this work, we study the integration of Artificial Delegates, preference-learning agents trained to represent absent voters, into perpetual voting systems. We examine how absenteeism affects fairness and representativeness under various voting methods and evaluate the extent to which Artificial Delegates can compensate for missing participation. Our findings indicate that while absenteeism significantly affects fairness, Artificial Delegates reliably mitigate these effects and enhance robustness across diverse scenarios.

nan


Article 1275

Title@2025-06-26 (4): PCF-Grasp: Converting Point Completion to Geometry Feature to Enhance 6-DoF Grasp

Title: PCF-Grasp: Converting Point Completion to Geometry Feature to Enhance 6-DoF Grasp PCF-Grasp: Umwandlung der Punktvervollständigung in Geometrie-Feature zur Verbesserung der 6-DoF-Grasp PCF-格拉斯普:将完成点转换成几何特征,以加强6-DoF格拉斯普 2504.16320v2

Authors (7): Yaofeng Cheng, Fusheng Zha, Wei Guo, Pengfei Wang, Chao Zeng, Lining Sun, Chenguang Yang

The 6-Degree of Freedom (DoF) grasp method based on point clouds has shown significant potential in enabling robots to grasp target objects. However, most existing methods are based on the point clouds (2.5D points) generated from single-view depth images. These point clouds only have one surface side of the object providing incomplete geometry information, which mislead the grasping algorithm to judge the shape of the target object, resulting in low grasping accuracy. Humans can accurately grasp objects from a single view by leveraging their geometry experience to estimate object shapes. Inspired by humans, we propose a novel 6-DoF grasping framework that converts the point completion results as object shape features to train the 6-DoF grasp network. Here, point completion can generate approximate complete points from the 2.5D points similar to the human geometry experience, and converting it as shape features is the way to utilize it to improve grasp efficiency. Furthermore, due to the gap between the network generation and actual execution, we integrate a score filter into our framework to select more executable grasp proposals for the real robot. This enables our method to maintain a high grasp quality in any camera viewpoint. Extensive experiments demonstrate that utilizing complete point features enables the generation of significantly more accurate grasp proposals and the inclusion of a score filter greatly enhances the credibility of real-world robot grasping. Our method achieves a 17.8\% success rate higher than the state-of-the-art method in real-world experiments.

nan


Article 1276

Title@2025-06-26 (4): Performance improvement of spatial semantic segmentation with enriched audio features and agent-based error correction for DCASE 2025 Challenge Task 4

Title: Performance improvement of spatial semantic segmentation with enriched audio features and agent-based error correction for DCASE 2025 Challenge Task 4 Performance-Verbesserung der räumlichen semantischen Segmentierung mit angereicherten Audio-Features und agentenbasierter Fehlerkorrektur für DCASE 2025 Challenge Task 4 DASAS 2025年挑战任务4,具有浓缩音频特征和以代理物为基础的错误更正的 空间语义分离的性能改进 2506.21174v1

Authors (6): Jongyeon Park, Joonhee Lee, Do-Hyeon Lim, Hong Kook Kim, Hyeongcheol Geum, Jeong Eun Lim

This technical report presents submission systems for Task 4 of the DCASE 2025 Challenge. This model incorporates additional audio features (spectral roll-off and chroma features) into the embedding feature extracted from the mel-spectral feature to im-prove the classification capabilities of an audio-tagging model in the spatial semantic segmentation of sound scenes (S5) system. This approach is motivated by the fact that mixed audio often contains subtle cues that are difficult to capture with mel-spectrograms alone. Thus, these additional features offer alterna-tive perspectives for the model. Second, an agent-based label correction system is applied to the outputs processed by the S5 system. This system reduces false positives, improving the final class-aware signal-to-distortion ratio improvement (CA-SDRi) metric. Finally, we refine the training dataset to enhance the classi-fication accuracy of low-performing classes by removing irrele-vant samples and incorporating external data. That is, audio mix-tures are generated from a limited number of data points; thus, even a small number of out-of-class data points could degrade model performance. The experiments demonstrate that the submit-ted systems employing these approaches relatively improve CA-SDRi by up to 14.7% compared to the baseline of DCASE 2025 Challenge Task 4.

nan


Article 1277

Title@2025-06-26 (4): Variational Supervised Contrastive Learning

Title: Variational Supervised Contrastive Learning Variationelles Überwachtes Kontrastuelles Lernen 差异监督反舞弊学习 2506.07413v2

Authors (5): Ziwen Wang, Jiajun Fan, Thao Nguyen, Heng Ji, Ge Liu

Contrastive learning has proven to be highly efficient and adaptable in shaping representation spaces across diverse modalities by pulling similar samples together and pushing dissimilar ones apart. However, two key limitations persist: (1) Without explicit regulation of the embedding distribution, semantically related instances can inadvertently be pushed apart unless complementary signals guide pair selection, and (2) excessive reliance on large in-batch negatives and tailored augmentations hinders generalization. To address these limitations, we propose Variational Supervised Contrastive Learning (VarCon), which reformulates supervised contrastive learning as variational inference over latent class variables and maximizes a posterior-weighted evidence lower bound (ELBO) that replaces exhaustive pair-wise comparisons for efficient class-aware matching and grants fine-grained control over intra-class dispersion in the embedding space. Trained exclusively on image data, our experiments on CIFAR-10, CIFAR-100, ImageNet-100, and ImageNet-1K show that VarCon (1) achieves state-of-the-art performance for contrastive learning frameworks, reaching 79.36% Top-1 accuracy on ImageNet-1K and 78.29% on CIFAR-100 with a ResNet-50 encoder while converging in just 200 epochs; (2) yields substantially clearer decision boundaries and semantic organization in the embedding space, as evidenced by KNN classification, hierarchical clustering results, and transfer-learning assessments; and (3) demonstrates superior performance in few-shot learning than supervised baseline and superior robustness across various augmentation strategies.

nan


Article 1278

Title@2025-06-26 (4): Moderating the Generalization of Score-based Generative Model

Title: Moderating the Generalization of Score-based Generative Model Moderierung der Generalisierung des Score-basierten Generativen Modells 简化基于记分制的通用创制模式 2412.07229v2

Authors (7): Wan Jiang, He Wang, Xin Zhang, Dan Guo, Zhaoxin Fan, Yunfeng Diao, Richang Hong

Score-based Generative Models (SGMs) have demonstrated remarkable generalization abilities, e.g. generating unseen, but natural data. However, the greater the generalization power, the more likely the unintended generalization, and the more dangerous the abuse. Research on moderated generalization in SGMs remains limited. To fill this gap, we first examine the current ‘gold standard’ in Machine Unlearning (MU), i.e., re-training the model after removing the undesirable training data, and find it does not work in SGMs. Further analysis of score functions reveals that the MU ‘gold standard’ does not alter the original score function, which explains its ineffectiveness. Based on this insight, we propose the first Moderated Score-based Generative Model (MSGM), which introduces a novel score adjustment strategy that redirects the score function away from undesirable data during the continuous-time stochastic differential equation process. Extensive experimental results demonstrate that MSGM significantly reduces the likelihood of generating undesirable content while preserving high visual quality for normal image generation. Albeit designed for SGMs, MSGM is a general and flexible MU framework that is compatible with diverse diffusion architectures (SGM and DDPM) and training strategies (re-training and fine-tuning), and enables zero-shot transfer of the pre-trained models to downstream tasks, e.g. image inpainting and reconstruction. The code will be shared upon acceptance.

nan


Article 1279

Title@2025-06-26 (4): Metis-RISE: RL Incentivizes and SFT Enhances Multimodal Reasoning Model Learning

Title: Metis-RISE: RL Incentivizes and SFT Enhances Multimodal Reasoning Model Learning Metis-RISE: RL fördert und verbessert multimodales Reasoning Model Learning Metis-RISE: RL 激励和SFT加强多模式理由示范学习 2506.13056v2

Authors (7): Haibo Qiu, Xiaohan Lan, Fanfan Liu, Xiaohu Sun, Delian Ruan, Peng Shi, Lin Ma

Recent advancements in large language models (LLMs) have witnessed a surge in the development of advanced reasoning paradigms, which are now being integrated into multimodal large language models (MLLMs). However, existing approaches often fall short: methods solely employing reinforcement learning (RL) can struggle with sample inefficiency and activating entirely absent reasoning capabilities, while conventional pipelines that initiate with a cold-start supervised fine-tuning (SFT) phase before RL may restrict the model’s exploratory capacity and face suboptimal convergence. In this work, we introduce \textbf{Metis-RISE} (\textbf{R}L \textbf{I}ncentivizes and \textbf{S}FT \textbf{E}nhances) for multimodal reasoning model learning. Unlike conventional approaches, Metis-RISE distinctively omits an initial SFT stage, beginning instead with an RL phase (e.g., using a Group Relative Policy Optimization variant) to incentivize and activate the model’s latent reasoning capacity. Subsequently, the targeted SFT stage addresses two key challenges identified during RL: (1) \textit{inefficient trajectory sampling} for tasks where the model possesses but inconsistently applies correct reasoning, which we tackle using self-distilled reasoning trajectories from the RL model itself; and (2) \textit{fundamental capability absence}, which we address by injecting expert-augmented knowledge for prompts where the model entirely fails. This strategic application of RL for incentivization followed by SFT for enhancement forms the core of Metis-RISE, leading to two versions of our MLLMs (7B and 72B parameters). Evaluations on the OpenCompass Multimodal Reasoning Leaderboard demonstrate that both models achieve state-of-the-art performance among similar-sized models, with the 72B version ranking fourth overall. Please refer to our project page for open-source information.

nan


Article 1280

Title@2025-06-26 (4): Self-Regulated Neurogenesis for Online Data-Incremental Learning

Title: Self-Regulated Neurogenesis for Online Data-Incremental Learning Selbstregulierte Neurogenese für Online-Daten-Inkrementelles Lernen 在线数据强化学习自调节神经源 2403.14684v2

Authors (4): Murat Onur Yildirim, Elif Ceren Gok Yildirim, Decebal Constantin Mocanu, Joaquin Vanschoren

Neural networks often struggle with catastrophic forgetting when learning sequences of tasks or data streams, unlike humans who can continuously learn and consolidate new concepts even in the absence of explicit cues. Online data-incremental learning seeks to emulate this capability by processing each sample only once, without having access to task or stream cues at any point in time since this is more realistic compared to offline setups, where all data from novel class(es) is assumed to be readily available. However, existing methods typically rely on storing the subsets of data in memory or expanding the initial model architecture, resulting in significant computational overhead. Drawing inspiration from ‘self-regulated neurogenesis’-brain’s mechanism for creating specialized regions or circuits for distinct functions-we propose a novel approach SERENA which encodes each concept in a specialized network path called ‘concept cell’, integrated into a single over-parameterized network. Once a concept is learned, its corresponding concept cell is frozen, effectively preventing the forgetting of previously acquired information. Furthermore, we introduce two new continual learning scenarios that more closely reflect real-world conditions, characterized by gradually changing sample sizes. Experimental results show that our method not only establishes new state-of-the-art results across ten benchmarks but also remarkably surpasses offline supervised batch learning performance. The code is available at https://github.com/muratonuryildirim/serena.

nan


Article 1281

Title@2025-06-26 (4): Diverse Mini-Batch Selection in Reinforcement Learning for Efficient Chemical Exploration in de novo Drug Design

Title: Diverse Mini-Batch Selection in Reinforcement Learning for Efficient Chemical Exploration in de novo Drug Design Vielfältige Mini-Batch-Auswahl in Verstärkungs-Lernen für effiziente chemische Exploration in de novo Drug Design 为在新药设计中进行高效化学勘探而加强学习的多样化小型批次选择 2506.21158v1

Authors (5): Hampus Gummesson Svensson, Ola Engkvist, Jon Paul Janet, Christian Tyrchan, Morteza Haghir Chehreghani

In many real-world applications, evaluating the goodness of instances is often costly and time-consuming, e.g., human feedback and physics simulations, in contrast to proposing new instances. In particular, this is even more critical in reinforcement learning, as new interactions with the environment (i.e., new instances) need to be evaluated to provide a reward signal to learn from. As sufficient exploration is crucial, learning from a diverse mini-batch can have a large impact and help mitigate mode collapse. In this paper, we introduce diverse mini-batch selection for reinforcement learning and propose to use determinantal point processes for this task. We study this framework in the context of a real-world problem, namely drug discovery. We experimentally study how our proposed framework can improve the effectiveness of chemical exploration in de novo drug design, where finding diverse and high-quality solutions is essential. We conduct a comprehensive evaluation with three well-established molecular generation oracles over numerous generative steps. Our experiments conclude that our diverse mini-batch selection framework can substantially improve the diversity of the solutions, while still obtaining solutions of high quality. In drug discovery, such outcome can potentially lead to fulfilling unmet medication needs faster.

nan


Article 1282

Title@2025-06-26 (4): Transformer-Based Spatial-Temporal Counterfactual Outcomes Estimation

Title: Transformer-Based Spatial-Temporal Counterfactual Outcomes Estimation Schätzung von transformerbasierten räumlich-zeitlichen kontrafaktischen Ergebnissen 以变换器为基础的空间-时-时-时-反事实结果估计 2506.21154v1

Authors (6): He Li, Haoang Chi, Mingyu Liu, Wanrong Huang, Liyang Xu, Wenjing Yang

The real world naturally has dimensions of time and space. Therefore, estimating the counterfactual outcomes with spatial-temporal attributes is a crucial problem. However, previous methods are based on classical statistical models, which still have limitations in performance and generalization. This paper proposes a novel framework for estimating counterfactual outcomes with spatial-temporal attributes using the Transformer, exhibiting stronger estimation ability. Under mild assumptions, the proposed estimator within this framework is consistent and asymptotically normal. To validate the effectiveness of our approach, we conduct simulation experiments and real data experiments. Simulation experiments show that our estimator has a stronger estimation capability than baseline methods. Real data experiments provide a valuable conclusion to the causal effect of conflicts on forest loss in Colombia. The source code is available at https://github.com/lihe-maxsize/DeppSTCI_Release_Version-master.

nan


Article 1283

Title@2025-06-26 (4): A Novel Federated Learning-Based IDS for Enhancing UAVs Privacy and Security

Title: A Novel Federated Learning-Based IDS for Enhancing UAVs Privacy and Security Ein neuartiges, lernbasiertes IDS zur Verbesserung der Privatsphäre und Sicherheit von UAVs 旨在加强无人驾驶航空器隐私和安全的新联邦学习型新学习型ISDS 2312.04135v3

Authors (4): Ozlem Ceviz, Pinar Sadioglu, Sevil Sen, Vassilios G. Vassilakis

Unmanned aerial vehicles (UAVs) operating within Flying Ad-hoc Networks (FANETs) encounter security challenges due to the dynamic and distributed nature of these networks. Previous studies focused predominantly on centralized intrusion detection, assuming a central entity responsible for storing and analyzing data from all devices. However, these approaches face challenges including computation and storage costs, along with a single point of failure risk, threatening data privacy and availability. The widespread dispersion of data across interconnected devices underscores the need for decentralized approaches. This paper introduces the Federated Learning-based Intrusion Detection System (FL-IDS), addressing challenges encountered by centralized systems in FANETs. FL-IDS reduces computation and storage costs for both clients and the central server, which is crucial for resource-constrained UAVs. Operating in a decentralized manner, FL-IDS enables UAVs to collaboratively train a global intrusion detection model without sharing raw data, thus avoiding delay in decisions based on collected data, as is often the case with traditional methods. Experimental results demonstrate FL-IDS’s competitive performance with Central IDS (C-IDS) while mitigating privacy concerns, with the Bias Towards Specific Clients (BTSC) method further enhancing FL-IDS performance even at lower attacker ratios. Comparative analysis with traditional intrusion detection methods, including Local IDS (L-IDS), sheds light on the strengths of FL-IDS. This study significantly contributes to UAV security by introducing a privacy-aware, decentralized intrusion detection approach tailored to UAV networks. Moreover, by introducing a realistic dataset for FANETs and federated learning, our approach differs from others lacking high dynamism and 3D node movements or accurate federated data federations.

nan


Article 1284

Title@2025-06-26 (4): Linearity-based neural network compression

Title: Linearity-based neural network compression Linearitätsbasierte neuronale Netzwerkkompression 线性神经网络压缩 2506.21146v1

Authors (2): Silas Dobler, Florian Lemmerich

In neural network compression, most current methods reduce unnecessary parameters by measuring importance and redundancy. To augment already highly optimized existing solutions, we propose linearity-based compression as a novel way to reduce weights in a neural network. It is based on the intuition that with ReLU-like activation functions, neurons that are almost always activated behave linearly, allowing for merging of subsequent layers. We introduce the theory underlying this compression and evaluate our approach experimentally. Our novel method achieves a lossless compression down to 1/4 of the original model size in over the majority of tested models. Applying our method on already importance-based pruned models shows very little interference between different types of compression, demonstrating the option of successful combination of techniques. Overall, our work lays the foundation for a new type of compression method that enables smaller and ultimately more efficient neural network models.

nan


Article 1285

Title@2025-06-26 (4): Personalized Federated Learning via Dual-Prompt Optimization and Cross Fusion

Title: Personalized Federated Learning via Dual-Prompt Optimization and Cross Fusion Personalisiertes Federated Learning durch Dual-Prompt-Optimierung und Cross Fusion 通过双速优化和交叉融合进行个性化联邦学习 2506.21144v1

Authors (5): Yuguang Zhang, Kuangpu Guo, Zhihe Lu, Yunbo Wang, Jian Liang

Federated learning (FL) enables collaborative model training across decentralized clients without sharing local data, but is challenged by heterogeneity in data, computation, and communication. Pretrained vision-language models (VLMs), with their strong generalization and lightweight tuning via prompts, offer a promising solution. However, existing federated prompt-learning methods rely only on text prompts and overlook joint label-domain distribution shifts. In this paper, we propose a personalized FL framework based on dual-prompt learning and cross fusion, termed pFedDC. Specifically, each client maintains both global and local prompts across vision and language modalities: global prompts capture common knowledge shared across the federation, while local prompts encode client-specific semantics and domain characteristics. Meanwhile, a cross-fusion module is designed to adaptively integrate prompts from different levels, enabling the model to generate personalized representations aligned with each client’s unique data distribution. Extensive experiments across nine datasets with various types of heterogeneity show that pFedDC consistently outperforms state-of-the-art methods.

nan


Article 1286

Title@2025-06-26 (4): Generative Adversarial Evasion and Out-of-Distribution Detection for UAV Cyber-Attacks

Title: Generative Adversarial Evasion and Out-of-Distribution Detection for UAV Cyber-Attacks Generative Adversarial Evasion und Out-of-Distribution-Detection für UAV-Cyber-Attacks 无人驾驶航空飞行器网络设备生成反向疏散和分配外探测 2506.21142v1

Authors (2): Deepak Kumar Panda, Weisi Guo

The growing integration of UAVs into civilian airspace underscores the need for resilient and intelligent intrusion detection systems (IDS), as traditional anomaly detection methods often fail to identify novel threats. A common approach treats unfamiliar attacks as out-of-distribution (OOD) samples; however, this leaves systems vulnerable when mitigation is inadequate. Moreover, conventional OOD detectors struggle to distinguish stealthy adversarial attacks from genuine OOD events. This paper introduces a conditional generative adversarial network (cGAN)-based framework for crafting stealthy adversarial attacks that evade IDS mechanisms. We first design a robust multi-class IDS classifier trained on benign UAV telemetry and known cyber-attacks, including Denial of Service (DoS), false data injection (FDI), man-in-the-middle (MiTM), and replay attacks. Using this classifier, our cGAN perturbs known attacks to generate adversarial samples that misclassify as benign while retaining statistical resemblance to OOD distributions. These adversarial samples are iteratively refined to achieve high stealth and success rates. To detect such perturbations, we implement a conditional variational autoencoder (CVAE), leveraging negative log-likelihood to separate adversarial inputs from authentic OOD samples. Comparative evaluation shows that CVAE-based regret scores significantly outperform traditional Mahalanobis distance-based detectors in identifying stealthy adversarial threats. Our findings emphasize the importance of advanced probabilistic modeling to strengthen IDS capabilities against adaptive, generative-model-based cyber intrusions.

nan


Article 1287

Title@2025-06-26 (4): Multi-convex Programming for Discrete Latent Factor Models Prototyping

Title: Multi-convex Programming for Discrete Latent Factor Models Prototyping Multi-convex-Programmierung für diskrete Latent Factor Models Prototyping Discrete 后端因数模型的多contex 编程程序 2504.01431v2

Authors (4): Hao Zhu, Shengchao Yan, Jasper Hoffmann, Joschka Boedecker

Discrete latent factor models (DLFMs) are widely used in various domains such as machine learning, economics, neuroscience, psychology, etc. Currently, fitting a DLFM to some dataset relies on a customized solver for individual models, which requires lots of effort to implement and is limited to the targeted specific instance of DLFMs. In this paper, we propose a generic framework based on CVXPY, which allows users to specify and solve the fitting problem of a wide range of DLFMs, including both regression and classification models, within a very short script. Our framework is flexible and inherently supports the integration of regularization terms and constraints on the DLFM parameters and latent factors, such that the users can easily prototype the DLFM structure according to their dataset and application scenario. We introduce our open-source Python implementation and illustrate the framework in several examples.

nan


Article 1288

Title@2025-06-26 (4): DBConformer: Dual-Branch Convolutional Transformer for EEG Decoding

Title: DBConformer: Dual-Branch Convolutional Transformer for EEG Decoding DBConformer: Doppel-Branch-Konvolutionstransformator für EEG-Dekodierung DBCon前导体: EEG 解码的双相相相电变异变异器 2506.21140v1

Authors (6): Ziwei Wang, Hongbin Wang, Tianwang Jia, Xingyi He, Siyang Li, Dongrui Wu

Electroencephalography (EEG)-based brain-computer interfaces (BCIs) transform spontaneous/evoked neural activity into control commands for external communication. While convolutional neural networks (CNNs) remain the mainstream backbone for EEG decoding, their inherently short receptive field makes it difficult to capture long-range temporal dependencies and global inter-channel relationships. Recent CNN-Transformer (Conformers) hybrids partially address this issue, but most adopt a serial design, resulting in suboptimal integration of local and global features, and often overlook explicit channel-wise modeling. To address these limitations, we propose DBConformer, a dual-branch convolutional Transformer network tailored for EEG decoding. It integrates a temporal Conformer to model long-range temporal dependencies and a spatial Conformer to extract inter-channel interactions, capturing both temporal dynamics and spatial patterns in EEG signals. A lightweight channel attention module further refines spatial representations by assigning data-driven importance to EEG channels. Extensive experiments on five motor imagery (MI) datasets and two seizure detection datasets under three evaluation settings demonstrate that DBConformer consistently outperforms 10 competitive baseline models, with over eight times fewer parameters than the high-capacity EEG Conformer baseline. Further, the visualization results confirm that the features extracted by DBConformer are physiologically interpretable and aligned with sensorimotor priors in MI. The superior performance and interpretability of DBConformer make it reliable for robust and explainable EEG decoding. Code is publicized at https://github.com/wzwvv/DBConformer.

nan


Article 1289

Title@2025-06-26 (4): Solving Inverse Problem for Multi-armed Bandits via Convex Optimization

Title: Solving Inverse Problem for Multi-armed Bandits via Convex Optimization Inverses Problem für mehrarmige Banditen durch Convex-Optimierung lösen 通过 Convex 优化解决多武装强盗的反向问题 2501.18945v3

Authors (2): Hao Zhu, Joschka Boedecker

We consider the inverse problem of multi-armed bandits (IMAB) that are widely used in neuroscience and psychology research for behavior modelling. We first show that the IMAB problem is not convex in general, but can be relaxed to a convex problem via variable transformation. Based on this result, we propose a two-step sequential heuristic for (approximately) solving the IMAB problem. We discuss a condition where our method provides global solution to the IMAB problem with certificate, as well as approximations to further save computing time. Numerical experiments indicate that our heuristic method is more robust than directly solving the IMAB problem via repeated local optimization, and can achieve the performance of Monte Carlo methods within a significantly decreased running time. We provide the implementation of our method based on CVXPY, which allows straightforward application by users not well versed in convex optimization.

nan


Article 1290

Title@2025-06-26 (4): NaLaFormer: Norm-Aware Linear Attention for Transformer Models

Title: NaLaFormer: Norm-Aware Linear Attention for Transformer Models NaLaFormer: Norm-Aware Lineare Aufmerksamkeit für Transformer-Modelle NaLaFormer: 变形模型的诺姆- Aware 线性注意 2506.21137v1

Authors (6): Weikang Meng, Yadan Luo, Liangyu Huo, Yaowei Wang, Xin Li, Zheng Zhang

Linear attention has emerged as a viable alternative to softmax attention by reducing complexity from quadratic to linear in sequence length. To preserve two fundamental properties of softmax, non-negativity and entropy reduction, current works employ various linearly separatable kernel functions with $L1$ normalization instead of softmax operator. However, query norms are neglected by the normalization operation in linear attention, such degradation heavily leads to an entropy gap. Meanwhile, existing works inhibit negative values of query and key vectors resulting in a missing inner-product interactions after being mapped. To address these dual challenges, we propose a novel Norm-Aware Linear Attention mechanism serving to restore norm-guided dynamic spikiness and recover kernel-perturbed norm distributions. Specifically, we first decouple query and key matrices into two components: norm and direction, to achieve norm-aware spikiness control and norm consistency, respectively. We mathematically reveal that the extent of entropy reduction varies with the query norm in softmax normalization, motivating a query-norm aware kernel function for dynamic control over entropy reduction. Furthermore, to ensure norm consistency and enforce non-negativity constraints, we employ a norm-preserving mapping to project all elements of the angular matrix into positive values, leveraging cosine similarity to inhibit dimensions with opposite directions. We conduct extensive experiments demonstrating that the NaLaFormer improves performance on vision and language tasks, enhancing both expressiveness and efficiency by up to 4.2\%.

nan


Article 1291

Title@2025-06-26 (4): Inverse Reinforcement Learning via Convex Optimization

Title: Inverse Reinforcement Learning via Convex Optimization Inverse Verstärkungs-Lernen über Convex-Optimierung 通过Convex优化化进行反强化学习 2501.15957v2

Authors (3): Hao Zhu, Yuan Zhang, Joschka Boedecker

We consider the inverse reinforcement learning (IRL) problem, where an unknown reward function of some Markov decision process is estimated based on observed expert demonstrations. In most existing approaches, IRL is formulated and solved as a nonconvex optimization problem, posing challenges in scenarios where robustness and reproducibility are critical. We discuss a convex formulation of the IRL problem (CIRL) initially proposed by Ng and Russel, and reformulate the problem such that the domain-specific language CVXPY can be applied directly to specify and solve the convex problem. We also extend the CIRL problem to scenarios where the expert policy is not given analytically but by trajectory as state-action pairs, which can be strongly inconsistent with optimality, by augmenting some of the constraints. Theoretical analysis and practical implementation for hyperparameter auto-selection are introduced. This note helps the users to easily apply CIRL for their problems, without background knowledge on convex optimization.

nan


Article 1292

Title@2025-06-26 (4): Curriculum-Guided Antifragile Reinforcement Learning for Secure UAV Deconfliction under Observation-Space Attacks

Title: Curriculum-Guided Antifragile Reinforcement Learning for Secure UAV Deconfliction under Observation-Space Attacks Curriculum-geführtes Antifragiles Verstärkungslernen für sichere UAV-Dekonfliktion unter Beobachtungs-Raumangriffen 在观测-空间攻击下安全无人驾驶飞行器消除冲突课程-指导反脆弱强化学习 2506.21129v1

Authors (3): Deepak Kumar Panda, Adolfo Perrusquia, Weisi Guo

Reinforcement learning (RL) policies deployed in safety-critical systems, such as unmanned aerial vehicle (UAV) navigation in dynamic airspace, are vulnerable to out-ofdistribution (OOD) adversarial attacks in the observation space. These attacks induce distributional shifts that significantly degrade value estimation, leading to unsafe or suboptimal decision making rendering the existing policy fragile. To address this vulnerability, we propose an antifragile RL framework designed to adapt against curriculum of incremental adversarial perturbations. The framework introduces a simulated attacker which incrementally increases the strength of observation-space perturbations which enables the RL agent to adapt and generalize across a wider range of OOD observations and anticipate previously unseen attacks. We begin with a theoretical characterization of fragility, formally defining catastrophic forgetting as a monotonic divergence in value function distributions with increasing perturbation strength. Building on this, we define antifragility as the boundedness of such value shifts and derive adaptation conditions under which forgetting is stabilized. Our method enforces these bounds through iterative expert-guided critic alignment using Wasserstein distance minimization across incrementally perturbed observations. We empirically evaluate the approach in a UAV deconfliction scenario involving dynamic 3D obstacles. Results show that the antifragile policy consistently outperforms standard and robust RL baselines when subjected to both projected gradient descent (PGD) and GPS spoofing attacks, achieving up to 15% higher cumulative reward and over 30% fewer conflict events. These findings demonstrate the practical and theoretical viability of antifragile reinforcement learning for secure and resilient decision-making in environments with evolving threat scenarios.

nan


Article 1293

Title@2025-06-26 (4): Robust Policy Switching for Antifragile Reinforcement Learning for UAV Deconfliction in Adversarial Environments

Title: Robust Policy Switching for Antifragile Reinforcement Learning for UAV Deconfliction in Adversarial Environments Robuster Policy-Switch für Antifragiles Verstärkungslernen für UAV-Deconfliction in Adversarial Environments 在逆向环境中为无人驾驶航空器消除冲突而进行抗脆弱强化学习的强有力政策转换 2506.21127v1

Authors (2): Deepak Kumar Panda, Weisi Guo

The increasing automation of navigation for unmanned aerial vehicles (UAVs) has exposed them to adversarial attacks that exploit vulnerabilities in reinforcement learning (RL) through sensor manipulation. Although existing robust RL methods aim to mitigate such threats, their effectiveness has limited generalization to out-of-distribution shifts from the optimal value distribution, as they are primarily designed to handle fixed perturbation. To address this limitation, this paper introduces an antifragile RL framework that enhances adaptability to broader distributional shifts by incorporating a switching mechanism based on discounted Thompson sampling (DTS). This mechanism dynamically selects among multiple robust policies to minimize adversarially induced state-action-value distribution shifts. The proposed approach first derives a diverse ensemble of action robust policies by accounting for a range of perturbations in the policy space. These policies are then modeled as a multiarmed bandit (MAB) problem, where DTS optimally selects policies in response to nonstationary Bernoulli rewards, effectively adapting to evolving adversarial strategies. Theoretical framework has also been provided where by optimizing the DTS to minimize the overall regrets due to distributional shift, results in effective adaptation against unseen adversarial attacks thus inducing antifragility. Extensive numerical simulations validate the effectiveness of the proposed framework in complex navigation environments with multiple dynamic three-dimensional obstacles and with stronger projected gradient descent (PGD) and spoofing attacks. Compared to conventional robust, non-adaptive RL methods, the antifragile approach achieves superior performance, demonstrating shorter navigation path lengths and a higher rate of conflict-free navigation trajectories compared to existing robust RL techniques

nan


Article 1294

Title@2025-06-26 (4): Pushing Trade-Off Boundaries: Compact yet Effective Remote Sensing Change Detection

Title: Pushing Trade-Off Boundaries: Compact yet Effective Remote Sensing Change Detection Trade-Off-Grenzen drücken: Kompakte und dennoch effektive Fernerkundungs-Änderungserkennung 推进贸易-开放边界:结合但有效的遥感变化探测 2506.21109v1

Authors (3): Luosheng Xu, Dalin Zhang, Zhaohui Song

Remote sensing change detection is essential for monitoring urban expansion, disaster assessment, and resource management, offering timely, accurate, and large-scale insights into dynamic landscape transformations. While deep learning has revolutionized change detection, the increasing complexity and computational demands of modern models have not necessarily translated into significant accuracy gains. Instead of following this trend, this study explores a more efficient approach, focusing on lightweight models that maintain high accuracy while minimizing resource consumption, which is an essential requirement for on-satellite processing. To this end, we propose FlickCD, which means quick flick then get great results, pushing the boundaries of the performance-resource trade-off. FlickCD introduces an Enhanced Difference Module (EDM) to amplify critical feature differences between temporal phases while suppressing irrelevant variations such as lighting and weather changes, thereby reducing computational costs in the subsequent change decoder. Additionally, the FlickCD decoder incorporates Local-Global Fusion Blocks, leveraging Shifted Window Self-Attention (SWSA) and Enhanced Global Self-Attention (EGSA) to efficiently capture semantic information at multiple scales, preserving both coarse- and fine-grained changes. Extensive experiments on four benchmark datasets demonstrate that FlickCD reduces computational and storage overheads by more than an order of magnitude while achieving state-of-the-art (SOTA) performance or incurring only a minor (<1\% F1) accuracy trade-off. The implementation code is publicly available at https://github.com/xulsh8/FlickCD.

nan


Article 1295

Title@2025-06-26 (4): Unlasting: Unpaired Single-Cell Multi-Perturbation Estimation by Dual Conditional Diffusion Implicit Bridges

Title: Unlasting: Unpaired Single-Cell Multi-Perturbation Estimation by Dual Conditional Diffusion Implicit Bridges Unhaltbar: Unpaarte Single-Cell-Multi-Perturbation-Schätzung durch Dual Conditional Diffusion Implizite Brücken 不持久: 由双条件分解隐形桥进行无压单细胞多扰动估计 2506.21107v1

Authors (8): Changxi Chi, Jun Xia, Yufei Huang, Jingbo Zhou, Siyuan Li, Yunfan Liu, Chang Yu, Stan Z. Li

Estimating single-cell responses across various perturbations facilitates the identification of key genes and enhances drug screening, significantly boosting experimental efficiency. However, single-cell sequencing is a destructive process, making it impossible to capture the same cell’s phenotype before and after perturbation. Consequently, data collected under perturbed and unperturbed conditions are inherently unpaired. Existing methods either attempt to forcibly pair unpaired data using random sampling, or neglect the inherent relationship between unperturbed and perturbed cells during the modeling. In this work, we propose a framework based on Dual Diffusion Implicit Bridges (DDIB) to learn the mapping between different data distributions, effectively addressing the challenge of unpaired data. We further interpret this framework as a form of data augmentation. We integrate gene regulatory network (GRN) information to propagate perturbation signals in a biologically meaningful way, and further incorporate a masking mechanism to predict silent genes, improving the quality of generated profiles. Moreover, gene expression under the same perturbation often varies significantly across cells, frequently exhibiting a bimodal distribution that reflects intrinsic heterogeneity. To capture this, we introduce a more suitable evaluation metric. We propose Unlasting, dual conditional diffusion models that overcome the problem of unpaired single-cell perturbation data and strengthen the model’s insight into perturbations under the guidance of the GRN, with a dedicated mask model designed to improve generation quality by predicting silent genes. In addition, we introduce a biologically grounded evaluation metric that better reflects the inherent heterogeneity in single-cell responses.

nan


Article 1296

Title@2025-06-26 (4): Learning to Skip the Middle Layers of Transformers

Title: Learning to Skip the Middle Layers of Transformers Lernen, die mittleren Schichten der Transformer zu überspringen 学习跳过变换器的中层 2506.21103v1

Authors (2): Tim Lawson, Laurence Aitchison

Conditional computation is a popular strategy to make Transformers more efficient. Existing methods often target individual modules (e.g., mixture-of-experts layers) or skip layers independently of one another. However, interpretability research has demonstrated that the middle layers of Transformers exhibit greater redundancy, and that early layers aggregate information into token positions. Guided by these insights, we propose a novel architecture that dynamically skips a variable number of layers from the middle outward. In particular, a learned gating mechanism determines whether to bypass a symmetric span of central blocks based on the input, and a gated attention mechanism prevents subsequent tokens from attending to skipped token positions. Residual norms are controlled with a ‘sandwich’ or ‘perilayernorm’ scheme and gate sparsity with an adaptive regularization loss. We had aimed to reduce compute requirements for ‘simpler’ tokens and potentially foster an emergent multi-level representational hierarchy but, at the scales investigated, our approach does not achieve improvements in the trade-off between validation cross-entropy and estimated FLOPs compared to dense baselines with fewer layers. We release our code at https://github.com/tim-lawson/skip-middle.

nan


Article 1297

Title@2025-06-26 (4): Interpretable Hierarchical Concept Reasoning through Attention-Guided Graph Learning

Title: Interpretable Hierarchical Concept Reasoning through Attention-Guided Graph Learning Interpretierbares Hierarchisches Konzept durch aufmerksamkeitsorientiertes Graphenlernen 通过引人指导图表学习推理的可解释的等级概念 2506.21102v1

Authors (4): David Debot, Pietro Barbiero, Gabriele Dominici, Giuseppe Marra

Concept-Based Models (CBMs) are a class of deep learning models that provide interpretability by explaining predictions through high-level concepts. These models first predict concepts and then use them to perform a downstream task. However, current CBMs offer interpretability only for the final task prediction, while the concept predictions themselves are typically made via black-box neural networks. To address this limitation, we propose Hierarchical Concept Memory Reasoner (H-CMR), a new CBM that provides interpretability for both concept and task predictions. H-CMR models relationships between concepts using a learned directed acyclic graph, where edges represent logic rules that define concepts in terms of other concepts. During inference, H-CMR employs a neural attention mechanism to select a subset of these rules, which are then applied hierarchically to predict all concepts and the final task. Experimental results demonstrate that H-CMR matches state-of-the-art performance while enabling strong human interaction through concept and model interventions. The former can significantly improve accuracy at inference time, while the latter can enhance data efficiency during training when background knowledge is available.

nan


Article 1298

Title@2025-06-26 (4): FeDa4Fair: Client-Level Federated Datasets for Fairness Evaluation

Title: FeDa4Fair: Client-Level Federated Datasets for Fairness Evaluation FeDa4Fair: Client-Level-Federated Datasets für die Fairness-Bewertung FeDa4fair:公平评价客户-联邦数据集 2506.21095v1

Authors (4): Xenia Heilmann, Luca Corbucci, Mattia Cerrato, Anna Monreale

Federated Learning (FL) enables collaborative model training across multiple clients without sharing clients’ private data. However, fairness remains a key concern, as biases in local clients’ datasets can impact the entire federated system. Heterogeneous data distributions across clients may lead to models that are fairer for some clients than others. Although several fairness-enhancing solutions are present in the literature, most focus on mitigating bias for a single sensitive attribute, typically binary, overlooking the diverse and sometimes conflicting fairness needs of different clients. This limited perspective can limit the effectiveness of fairness interventions for the different clients. To support more robust and reproducible fairness research in FL, we aim to enable a consistent benchmarking of fairness-aware FL methods at both the global and client levels. In this paper, we contribute in three ways: (1) We introduce FeDa4Fair, a library to generate tabular datasets tailored to evaluating fair FL methods under heterogeneous client bias; (2) we release four bias-heterogeneous datasets and corresponding benchmarks to compare fairness mitigation methods in a controlled environment; (3) we provide ready-to-use functions for evaluating fairness outcomes for these datasets.

nan


Article 1299

Title@2025-06-26 (4): Chain-of-Thought Enhanced Shallow Transformers for Wireless Symbol Detection

Title: Chain-of-Thought Enhanced Shallow Transformers for Wireless Symbol Detection Chain-of-Thought verbesserte Shallow Transformer für drahtlose Symbolerkennung 用于无线探测无线符号探测的 研究链强化浅ow变压器 2506.21093v1

Authors (4): Li Fan, Peng Wang, Jing Yang, Cong Shen

Transformers have shown potential in solving wireless communication problems, particularly via in-context learning (ICL), where models adapt to new tasks through prompts without requiring model updates. However, prior ICL-based Transformer models rely on deep architectures with many layers to achieve satisfactory performance, resulting in substantial storage and computational costs. In this work, we propose CHain Of thOught Symbol dEtection (CHOOSE), a CoT-enhanced shallow Transformer framework for wireless symbol detection. By introducing autoregressive latent reasoning steps within the hidden space, CHOOSE significantly improves the reasoning capacity of shallow models (1-2 layers) without increasing model depth. This design enables lightweight Transformers to achieve detection performance comparable to much deeper models, making them well-suited for deployment on resource-constrained mobile devices. Experimental results demonstrate that our approach outperforms conventional shallow Transformers and achieves performance comparable to that of deep Transformers, while maintaining storage and computational efficiency. This represents a promising direction for implementing Transformer-based algorithms in wireless receivers with limited computational resources.

nan


Article 1300

Title@2025-06-26 (4): CovDocker: Benchmarking Covalent Drug Design with Tasks, Datasets, and Solutions

Title: CovDocker: Benchmarking Covalent Drug Design with Tasks, Datasets, and Solutions CovDocker: Benchmarking Covalent Drug Design mit Aufgaben, Datensätzen und Lösungen CovDocker:用任务、数据集和解决办法确定共价药物设计基准 2506.21085v1

Authors (7): Yangzhe Peng, Kaiyuan Gao, Liang He, Yuheng Cong, Haiguang Liu, Kun He, Lijun Wu

Molecular docking plays a crucial role in predicting the binding mode of ligands to target proteins, and covalent interactions, which involve the formation of a covalent bond between the ligand and the target, are particularly valuable due to their strong, enduring binding nature. However, most existing docking methods and deep learning approaches hardly account for the formation of covalent bonds and the associated structural changes. To address this gap, we introduce a comprehensive benchmark for covalent docking, CovDocker, which is designed to better capture the complexities of covalent binding. We decompose the covalent docking process into three main tasks: reactive location prediction, covalent reaction prediction, and covalent docking. By adapting state-of-the-art models, such as Uni-Mol and Chemformer, we establish baseline performances and demonstrate the effectiveness of the benchmark in accurately predicting interaction sites and modeling the molecular transformations involved in covalent binding. These results confirm the role of the benchmark as a rigorous framework for advancing research in covalent drug design. It underscores the potential of data-driven approaches to accelerate the discovery of selective covalent inhibitors and addresses critical challenges in therapeutic development.

nan


Article 1301

Title@2025-06-26 (4): EgoAdapt: Adaptive Multisensory Distillation and Policy Learning for Efficient Egocentric Perception

Title: EgoAdapt: Adaptive Multisensory Distillation and Policy Learning for Efficient Egocentric Perception EgoAdapt: Adaptive multisensorische Destillation und politisches Lernen für eine effiziente egozentrische Wahrnehmung EgoAdapt: 适应性多感性蒸馏和政策学习,促进高效率的以地球为中心感知 2506.21080v1

Authors (10): Sanjoy Chowdhury, Subrata Biswas, Sayan Nag, Tushar Nagarajan, Calvin Murdock, Ishwarya Ananthabhotla, Yijun Qian, Vamsi Krishna Ithapu, Dinesh Manocha, Ruohan Gao

Modern perception models, particularly those designed for multisensory egocentric tasks, have achieved remarkable performance but often come with substantial computational costs. These high demands pose challenges for real-world deployment, especially in resource-constrained environments. In this paper, we introduce EgoAdapt, a framework that adaptively performs cross-modal distillation and policy learning to enable efficient inference across different egocentric perception tasks, including egocentric action recognition, active speaker localization, and behavior anticipation. Our proposed policy module is adaptable to task-specific action spaces, making it broadly applicable. Experimental results on three challenging egocentric datasets EPIC-Kitchens, EasyCom, and Aria Everyday Activities demonstrate that our method significantly enhances efficiency, reducing GMACs by up to 89.09%, parameters up to 82.02%, and energy up to 9.6x, while still on-par and in many cases outperforming, the performance of corresponding state-of-the-art models.

nan


Article 1302

Title@2025-06-26 (4): Homogenization of Multi-agent Learning Dynamics in Finite-state Markov Games

Title: Homogenization of Multi-agent Learning Dynamics in Finite-state Markov Games Homogenisierung von Multi-Agent-Learning-Dynamik in Finite-State Markov Spiele 在Finite- State-Markov运动会中多剂学习动态的同质化 2506.21079v1

Authors (1): Yann Kerzreho

This paper introduces a new approach for approximating the learning dynamics of multiple reinforcement learning (RL) agents interacting in a finite-state Markov game. The idea is to rescale the learning process by simultaneously reducing the learning rate and increasing the update frequency, effectively treating the agent’s parameters as a slow-evolving variable influenced by the fast-mixing game state. Under mild assumptions-ergodicity of the state process and continuity of the updates-we prove the convergence of this rescaled process to an ordinary differential equation (ODE). This ODE provides a tractable, deterministic approximation of the agent’s learning dynamics. An implementation of the framework is available at\,: https://github.com/yannKerzreho/MarkovGameApproximation

nan


Article 1303

Title@2025-06-26 (4): Enhancing LLM Tool Use with High-quality Instruction Data from Knowledge Graph

Title: Enhancing LLM Tool Use with High-quality Instruction Data from Knowledge Graph Verbesserung der LLM-Tool-Nutzung mit hochwertigen Instruktionsdaten aus Wissensgrafik 利用来自知识图的高质量教学数据加强LLM工具的使用 2506.21071v1

Authors (10): Jingwei Wang, Zai Zhang, Hao Qian, Chunjing Gan, Binbin Hu, Ziqi Liu, Zhiqiang Zhang, Jun Zhou, Bin Shi, Bo Dong

Teaching large language models (LLMs) to use tools is crucial for improving their problem-solving abilities and expanding their applications. However, effectively using tools is challenging because it requires a deep understanding of tool functionalities and user intentions. Previous methods relied mainly on LLMs to generate instruction data, but the quality of these data was often insufficient. In this paper, we propose a new method that uses knowledge graphs to generate high-quality instruction data for LLMs. Knowledge graphs are manually curated datasets rich in semantic information. We begin by extracting various query pathways from a given knowledge graph, which are transformed into a broad spectrum of user queries. We then translate the relationships between entities into actionable tools and parse the pathways of each query into detailed solution steps, thereby creating high-quality instruction data. Our experiments show that fine-tuning on just a small sample of this synthetic data can significantly improve the tool utilization and overall capabilities of LLMs.

nan


Article 1304

Title@2025-06-26 (4): SDE Matching: Scalable and Simulation-Free Training of Latent Stochastic Differential Equations

Title: SDE Matching: Scalable and Simulation-Free Training of Latent Stochastic Differential Equations SDE Matching: Skalierbares und simulationsfreies Training latenter stochastischer Differentialgleichungen SDE 匹配:可缩放和模拟无模拟的静态碎裂差异等量模拟培训 2502.02472v3

Authors (3): Grigory Bartosh, Dmitry Vetrov, Christian A. Naesseth

The Latent Stochastic Differential Equation (SDE) is a powerful tool for time series and sequence modeling. However, training Latent SDEs typically relies on adjoint sensitivity methods, which depend on simulation and backpropagation through approximate SDE solutions, which limit scalability. In this work, we propose SDE Matching, a new simulation-free method for training Latent SDEs. Inspired by modern Score- and Flow Matching algorithms for learning generative dynamics, we extend these ideas to the domain of stochastic dynamics for time series and sequence modeling, eliminating the need for costly numerical simulations. Our results demonstrate that SDE Matching achieves performance comparable to adjoint sensitivity methods while drastically reducing computational complexity.

nan


Article 1305

Title@2025-06-26 (4): FedDAA: Dynamic Client Clustering for Concept Drift Adaptation in Federated Learning

Title: FedDAA: Dynamic Client Clustering for Concept Drift Adaptation in Federated Learning FedDAA: Dynamisches Client-Clustering für Konzept Drift-Anpassung im Federated Learning FedDAA: 联邦学习中适应概念的动态客户集群组合 2506.21054v1

Authors (2): Fu Peng, Ming Tang

In federated learning (FL), the data distribution of each client may change over time, introducing both temporal and spatial data heterogeneity, known as concept drift. Data heterogeneity arises from three drift sources: real drift (a shift in the conditional distribution P(y x)), virtual drift (a shift in the input distribution P(x)), and label drift (a shift in the label distribution P(y)). However, most existing FL methods addressing concept drift primarily focus on real drift. When clients experience virtual or label drift, these methods often fail to selectively retain useful historical knowledge, leading to catastrophic forgetting. A key challenge lies in distinguishing different sources of drift, as they require distinct adaptation strategies: real drift calls for discarding outdated data, while virtual or label drift benefits from retaining historical data. Without explicitly identifying the drift sources, a general adaptation strategy is suboptimal and may harm generalization. To address this challenge, we propose FedDAA, a dynamic clustered FL framework designed to adapt to multi-source concept drift while preserving valuable historical knowledge. Specifically, FedDAA integrates three modules: a cluster number determination module to find the optimal number of clusters; a real drift detection module to distinguish real drift from virtual/label drift; and a concept drift adaptation module to adapt to new data while retaining useful historical information. We provide theoretical convergence guarantees, and experiments show that FedDAA achieves 7.84% to 8.52% accuracy improvements over state-of-the-art methods on Fashion-MNIST, CIFAR-10, and CIFAR-100.

nan


Article 1306

Title@2025-06-26 (4): Sharp concentration of uniform generalization errors in binary linear classification

Title: Sharp concentration of uniform generalization errors in binary linear classification Scharfe Konzentration von einheitlichen Verallgemeinerungsfehlern in der binären linearen Klassifikation 二进线线性分类中统一一般化误差的集中程度 2505.16713v2

Authors (1): Shogo Nakakita

We examine the concentration of uniform generalization errors around their expectation in binary linear classification problems via an isoperimetric argument. In particular, we establish Poincar'{e} and log-Sobolev inequalities for the joint distribution of the output labels and the label-weighted input vectors, which we apply to derive concentration bounds. The derived concentration bounds are sharp up to moderate multiplicative constants by those under well-balanced labels. In asymptotic analysis, we also show that almost sure convergence of uniform generalization errors to their expectation occurs in very broad settings, such as proportionally high-dimensional regimes. Using this convergence, we establish uniform laws of large numbers under dimension-free conditions.

nan


Article 1307

Title@2025-06-26 (4): Improving Diffusion-Based Image Editing Faithfulness via Guidance and Scheduling

Title: Improving Diffusion-Based Image Editing Faithfulness via Guidance and Scheduling Verbesserung der Diffusions-basierten Bildbearbeitung Treue durch Anleitung und Planung 通过指导和日程安排改进基于传播的图像编辑信仰 2506.21045v1

Authors (2): Hansam Cho, Seoung Bum Kim

Text-guided diffusion models have become essential for high-quality image synthesis, enabling dynamic image editing. In image editing, two crucial aspects are editability, which determines the extent of modification, and faithfulness, which reflects how well unaltered elements are preserved. However, achieving optimal results is challenging because of the inherent trade-off between editability and faithfulness. To address this, we propose Faithfulness Guidance and Scheduling (FGS), which enhances faithfulness with minimal impact on editability. FGS incorporates faithfulness guidance to strengthen the preservation of input image information and introduces a scheduling strategy to resolve misalignment between editability and faithfulness. Experimental results demonstrate that FGS achieves superior faithfulness while maintaining editability. Moreover, its compatibility with various editing methods enables precise, high-quality image edits across diverse tasks.

nan


Article 1308

Title@2025-06-26 (4): Efficient Skill Discovery via Regret-Aware Optimization

Title: Efficient Skill Discovery via Regret-Aware Optimization Effiziente Skill Discovery durch regret-aware Optimierung 通过Regret-Aware 优化发现高效技能 2506.21044v1

Authors (5): He Zhang, Ming Zhou, Shaopeng Zhai, Ying Sun, Hui Xiong

Unsupervised skill discovery aims to learn diverse and distinguishable behaviors in open-ended reinforcement learning. For existing methods, they focus on improving diversity through pure exploration, mutual information optimization, and learning temporal representation. Despite that they perform well on exploration, they remain limited in terms of efficiency, especially for the high-dimensional situations. In this work, we frame skill discovery as a min-max game of skill generation and policy learning, proposing a regret-aware method on top of temporal representation learning that expands the discovered skill space along the direction of upgradable policy strength. The key insight behind the proposed method is that the skill discovery is adversarial to the policy learning, i.e., skills with weak strength should be further explored while less exploration for the skills with converged strength. As an implementation, we score the degree of strength convergence with regret, and guide the skill discovery with a learnable skill generator. To avoid degeneration, skill generation comes from an up-gradable population of skill generators. We conduct experiments on environments with varying complexities and dimension sizes. Empirical results show that our method outperforms baselines in both efficiency and diversity. Moreover, our method achieves a 15% zero shot improvement in high-dimensional environments, compared to existing methods.

nan


Article 1309

Title@2025-06-26 (4): Strict Subgoal Execution: Reliable Long-Horizon Planning in Hierarchical Reinforcement Learning

Title: Strict Subgoal Execution: Reliable Long-Horizon Planning in Hierarchical Reinforcement Learning Strenge Subgoal Execution: Zuverlässige Langzeitplanung im Hierarchischen Stärkungslernen 严格次级目标执行:在等级强化学习中可靠的长期规划 2506.21039v1

Authors (4): Jaebak Hwang, Sanghyeon Lee, Jeongmo Kim, Seungyul Han

Long-horizon goal-conditioned tasks pose fundamental challenges for reinforcement learning (RL), particularly when goals are distant and rewards are sparse. While hierarchical and graph-based methods offer partial solutions, they often suffer from subgoal infeasibility and inefficient planning. We introduce Strict Subgoal Execution (SSE), a graph-based hierarchical RL framework that enforces single-step subgoal reachability by structurally constraining high-level decision-making. To enhance exploration, SSE employs a decoupled exploration policy that systematically traverses underexplored regions of the goal space. Furthermore, a failure-aware path refinement, which refines graph-based planning by dynamically adjusting edge costs according to observed low-level success rates, thereby improving subgoal reliability. Experimental results across diverse long-horizon benchmarks demonstrate that SSE consistently outperforms existing goal-conditioned RL and hierarchical RL approaches in both efficiency and success rate.

nan


Article 1310

Title@2025-06-26 (4): DKGCM: A Spatio-Temporal Prediction Model for Traffic Flow by Fusing Spatial Node Clustering Method and Fourier Bidirectional Mamba Mechanism

Title: DKGCM: A Spatio-Temporal Prediction Model for Traffic Flow by Fusing Spatial Node Clustering Method and Fourier Bidirectional Mamba Mechanism DKGCM: Ein räumlich-zeitliches Vorhersagemodell für den Verkehrsfluss durch das Verschmelzen von Raumknoten-Clustering-Methode und Fourier Bidirektional Mamba-Mechanismus DKGCM:通过使用空间节点集束法和Fourier双向双向Mamba机制进行交通流动的SPA-临时预测模型 2507.01982v1

Authors (4): Siqing Long, Xiangzhi Huang, Jiemin Xie, Ming Cai

Accurate traffic demand forecasting enables transportation management departments to allocate resources more effectively, thereby improving their utilization efficiency. However, complex spatiotemporal relationships in traffic systems continue to limit the performance of demand forecasting models. To improve the accuracy of spatiotemporal traffic demand prediction, we propose a new graph convolutional network structure called DKGCM. Specifically, we first consider the spatial flow distribution of different traffic nodes and propose a novel temporal similarity-based clustering graph convolution method, DK-GCN. This method utilizes Dynamic Time Warping (DTW) and K-means clustering to group traffic nodes and more effectively capture spatial dependencies. On the temporal scale, we integrate the Fast Fourier Transform (FFT) within the bidirectional Mamba deep learning framework to capture temporal dependencies in traffic demand. To further optimize model training, we incorporate the GRPO reinforcement learning strategy to enhance the loss function feedback mechanism. Extensive experiments demonstrate that our model outperforms several advanced methods and achieves strong results on three public datasets.

nan


Article 1311

Title@2025-06-26 (4): RL-Selector: Reinforcement Learning-Guided Data Selection via Redundancy Assessment

Title: RL-Selector: Reinforcement Learning-Guided Data Selection via Redundancy Assessment RL-Selector: Verstärkte lernorientierte Datenauswahl über Redundanzbewertung RL-选择者:通过裁员评估甄选强化学习指导数据 2506.21037v1

Authors (4): Suorong Yang, Peijia Li, Furao Shen, Jian Zhao

Modern deep architectures often rely on large-scale datasets, but training on these datasets incurs high computational and storage overhead. Real-world datasets often contain substantial redundancies, prompting the need for more data-efficient training paradigms. Data selection has shown promise to mitigate redundancy by identifying the most representative samples, thereby reducing training costs without compromising performance. Existing methods typically rely on static scoring metrics or pretrained models, overlooking the combined effect of selected samples and their evolving dynamics during training. We introduce the concept of epsilon-sample cover, which quantifies sample redundancy based on inter-sample relationships, capturing the intrinsic structure of the dataset. Based on this, we reformulate data selection as a reinforcement learning (RL) process and propose RL-Selector, where a lightweight RL agent optimizes the selection policy by leveraging epsilon-sample cover derived from evolving dataset distribution as a reward signal. Extensive experiments across benchmark datasets and diverse architectures demonstrate that our method consistently outperforms existing state-of-the-art baselines. Models trained with our selected datasets show enhanced generalization performance with improved training efficiency.

nan


Article 1312

Title@2025-06-26 (4): An Information-Theoretic Analysis for Federated Learning under Concept Drift

Title: An Information-Theoretic Analysis for Federated Learning under Concept Drift Eine informationstheoretische Analyse für das Federated Learning unter Konzept Drift 根据 “ 漂流概念 “ 进行的联邦学习信息理论分析 2506.21036v1

Authors (3): Fu Peng, Meng Zhang, Ming Tang

Recent studies in federated learning (FL) commonly train models on static datasets. However, real-world data often arrives as streams with shifting distributions, causing performance degradation known as concept drift. This paper analyzes FL performance under concept drift using information theory and proposes an algorithm to mitigate the performance degradation. We model concept drift as a Markov chain and introduce the \emph{Stationary Generalization Error} to assess a model’s capability to capture characteristics of future unseen data. Its upper bound is derived using KL divergence and mutual information. We study three drift patterns (periodic, gradual, and random) and their impact on FL performance. Inspired by this, we propose an algorithm that regularizes the empirical risk minimization approach with KL divergence and mutual information, thereby enhancing long-term performance. We also explore the performance-cost tradeoff by identifying a Pareto front. To validate our approach, we build an FL testbed using Raspberry Pi4 devices. Experimental results corroborate with theoretical findings, confirming that drift patterns significantly affect performance. Our method consistently outperforms existing approaches for these three patterns, demonstrating its effectiveness in adapting concept drift in FL.

nan


Article 1313

Title@2025-06-26 (4): SceneGenAgent: Precise Industrial Scene Generation with Coding Agent

Title: SceneGenAgent: Precise Industrial Scene Generation with Coding Agent SceneGenAgent: Präzise industrielle Szenegenerierung mit Coding Agent SceneGenerAgenti: 精密工业场景与编码剂生成 2410.21909v3

Authors (8): Xiao Xia, Dan Zhang, Zibo Liao, Zhenyu Hou, Tianrui Sun, Jing Li, Ling Fu, Yuxiao Dong

The modeling of industrial scenes is essential for simulations in industrial manufacturing. While large language models (LLMs) have shown significant progress in generating general 3D scenes from textual descriptions, generating industrial scenes with LLMs poses a unique challenge due to their demand for precise measurements and positioning, requiring complex planning over spatial arrangement. To address this challenge, we introduce SceneGenAgent, an LLM-based agent for generating industrial scenes through C# code. SceneGenAgent ensures precise layout planning through a structured and calculable format, layout verification, and iterative refinement to meet the quantitative requirements of industrial scenarios. Experiment results demonstrate that LLMs powered by SceneGenAgent exceed their original performance, reaching up to 81.0% success rate in real-world industrial scene generation tasks and effectively meeting most scene generation requirements. To further enhance accessibility, we construct SceneInstruct, a dataset designed for fine-tuning open-source LLMs to integrate into SceneGenAgent. Experiments show that fine-tuning open-source LLMs on SceneInstruct yields significant performance improvements, with Llama3.1-70B approaching the capabilities of GPT-4o. Our code and data are available at https://github.com/THUDM/SceneGenAgent .

nan


Article 1314

Title@2025-06-26 (4): Little By Little: Continual Learning via Self-Activated Sparse Mixture-of-Rank Adaptive Learning

Title: Little By Little: Continual Learning via Self-Activated Sparse Mixture-of-Rank Adaptive Learning Little By Little: Kontinuierliches Lernen über selbsttätiges Sparse Mixture-of-Rank Adaptives Lernen 小小小小:通过自发的微小混血体适应性学习不断学习 2506.21035v1

Authors (6): Haodong Lu, Chongyang Zhao, Jason Xue, Lina Yao, Kristen Moore, Dong Gong

Continual learning (CL) with large pre-trained models is challenged by catastrophic forgetting and task interference. Existing LoRA-based Mixture-of-Experts (MoE) approaches mitigate forgetting by assigning and freezing task-specific adapters, but suffer from interference, redundancy, and ambiguous routing due to coarse adapter-level selection. However, this design introduces three key challenges: 1) Interference: Activating full LoRA experts per input leads to subspace interference and prevents selective reuse of useful components across tasks. 2) Redundancy: Newly added experts often duplicate or contradict existing knowledge due to unnecessary activation of unrelated ranks and insufficient reuse of relevant ones. 3) Ambiguity: Overlapping features across tasks confuse the router, resulting in unstable expert assignments. As more experts accumulate, earlier task routing degrades, accelerating forgetting. We propose MoRA, a Mixture-of-Rank Adaptive learning approach with self-activated and sparse rank activation for CL. Unlike mixing multiple low-rank matrices, MoRA decomposes each rank-r update into r rank-1 components, each treated as an independent expert, enabling fine-grained mixture of rank-1 expert utilization while mitigating interference and redundancy. To avoid ambiguous routing, we propose that each rank-1 expert can infer its own relevance via intermediate activations. Coupled with our proposed rank pruning and activation budgets, MoRA adaptively selects a sparse mixture of ranks per input. We validate MoRA on continual learning tasks with CLIP and large language models (LLMs), analyzing both in-domain learning and out-of-domain forgetting/generalization during fine-tuning. MoRA shows significant effectiveness on enhancing CL with PTMs, and improving generalization while mitigating forgetting.

nan


Article 1315

Title@2025-06-26 (4): PCDVQ: Enhancing Vector Quantization for Large Language Models via Polar Coordinate Decoupling

Title: PCDVQ: Enhancing Vector Quantization for Large Language Models via Polar Coordinate Decoupling PCDVQ: Verbesserung der Vector Quantization für große Sprachmodelle über Polar Coordinate Entkopplung PCDVQ:通过极地协调脱钩,加强大语言模型的矢量量化 2506.05432v2

Authors (6): Yuxuan Yue, Zukang Xu, Zhihang Yuan, Dawei Yang, Jianlong Wu, Liqiang Nie

Large Language Models (LLMs) face significant challenges in edge deployment due to their massive parameter scale. Vector Quantization (VQ), a clustering-based quantization method, serves as a prevalent solution to this issue for its extremely low-bit (even at 2-bit) and considerable accuracy. Since a vector is a quantity in mathematics and physics that has both direction and magnitude, existing VQ works typically quantize them in a coupled manner. However, we find that direction exhibits significantly greater sensitivity to quantization compared to the magnitude. For instance, when separately clustering the directions and magnitudes of weight vectors in LLaMA-2-7B, the accuracy drop of zero-shot tasks are 46.5\% and 2.3\%, respectively. This gap even increases with the reduction of clustering centers. Further, Euclidean distance, a common metric to access vector similarities in current VQ works, places greater emphasis on reducing the magnitude error. This property is contrary to the above finding, unavoidably leading to larger quantization errors. To these ends, this paper proposes Polar Coordinate Decoupled Vector Quantization (PCDVQ), an effective and efficient VQ framework consisting of two key modules: 1) Polar Coordinate Decoupling (PCD), which transforms vectors into their polar coordinate representations and perform independent quantization of the direction and magnitude parameters.2) Distribution Aligned Codebook Construction (DACC), which optimizes the direction and magnitude codebooks in accordance with the source distribution. Experimental results show that PCDVQ outperforms baseline methods at 2-bit level by at least 1.5\% zero-shot accuracy, establishing a novel paradigm for accurate and highly compressed LLMs.

nan


Article 1316

Title@2025-06-26 (4): TRIDENT: Tri-Modal Molecular Representation Learning with Taxonomic Annotations and Local Correspondence

Title: TRIDENT: Tri-Modal Molecular Representation Learning with Taxonomic Annotations and Local Correspondence TRIDENT: Tri-Modal Molecular Representative Learning mit taxonomischen Anmerkungen und lokaler Korrespondenz 三模式分子代表性学习,具有分类说明和当地通讯 2506.21028v1

Authors (9): Feng Jiang, Mangal Prakash, Hehuan Ma, Jianyuan Deng, Yuzhi Guo, Amina Mollaysa, Tommaso Mansi, Rui Liao, Junzhou Huang

Molecular property prediction aims to learn representations that map chemical structures to functional properties. While multimodal learning has emerged as a powerful paradigm to learn molecular representations, prior works have largely overlooked textual and taxonomic information of molecules for representation learning. We introduce TRIDENT, a novel framework that integrates molecular SMILES, textual descriptions, and taxonomic functional annotations to learn rich molecular representations. To achieve this, we curate a comprehensive dataset of molecule-text pairs with structured, multi-level functional annotations. Instead of relying on conventional contrastive loss, TRIDENT employs a volume-based alignment objective to jointly align tri-modal features at the global level, enabling soft, geometry-aware alignment across modalities. Additionally, TRIDENT introduces a novel local alignment objective that captures detailed relationships between molecular substructures and their corresponding sub-textual descriptions. A momentum-based mechanism dynamically balances global and local alignment, enabling the model to learn both broad functional semantics and fine-grained structure-function mappings. TRIDENT achieves state-of-the-art performance on 11 downstream tasks, demonstrating the value of combining SMILES, textual, and taxonomic functional annotations for molecular property prediction.

nan


Article 1317

Title@2025-06-26 (4): Mixture of Experts-augmented Deep Unfolding for Activity Detection in IRS-aided Systems

Title: Mixture of Experts-augmented Deep Unfolding for Activity Detection in IRS-aided Systems Mischung von Experten-augmented Deep Unfolding für Aktivitätserkennung in IRS-gestützten Systemen IRS辅助系统中活动探测专家加固深载混合体 2502.20183v2

Authors (5): Zeyi Ren, Qingfeng Lin, Jingreng Lei, Yang Li, Yik-Chung Wu

In the realm of activity detection for massive machine-type communications, intelligent reflecting surfaces (IRS) have shown significant potential in enhancing coverage for devices lacking direct connections to the base station (BS). However, traditional activity detection methods are typically designed for a single type of channel model, which does not reflect the complexities of real-world scenarios, particularly in systems incorporating IRS. To address this challenge, this paper introduces a novel approach that combines model-driven deep unfolding with a mixture of experts (MoE) framework. By automatically selecting one of three expert designs and applying it to the unfolded projected gradient method, our approach eliminates the need for prior knowledge of channel types between devices and the BS. Simulation results demonstrate that the proposed MoE-augmented deep unfolding method surpasses the traditional covariance-based method and black-box neural network design, delivering superior detection performance under mixed channel fading conditions.

nan


Article 1318

Title@2025-06-26 (4): HybridQ: Hybrid Classical-Quantum Generative Adversarial Network for Skin Disease Image Generation

Title: HybridQ: Hybrid Classical-Quantum Generative Adversarial Network for Skin Disease Image Generation HybridQ: Hybrid-Klassisch-Quantum Generatives Adversariales Netzwerk für die Bildgenerierung von Hauterkrankungen CCF: 皮肤疾病成像生成的混合古金-量反反转网络 2506.21015v1

Authors (4): Qingyue Jiao, Kangyu Zheng, Yiyu Shi, Zhiding Liang

Machine learning-assisted diagnosis is gaining traction in skin disease detection, but training effective models requires large amounts of high-quality data. Skin disease datasets often suffer from class imbalance, privacy concerns, and object bias, making data augmentation essential. While classical generative models are widely used, they demand extensive computational resources and lengthy training time. Quantum computing offers a promising alternative, but existing quantum-based image generation methods can only yield grayscale low-quality images. Through a novel classical-quantum latent space fusion technique, our work overcomes this limitation and introduces the first classical-quantum generative adversarial network (GAN) capable of generating color medical images. Our model outperforms classical deep convolutional GANs and existing hybrid classical-quantum GANs in both image generation quality and classification performance boost when used as data augmentation. Moreover, the performance boost is comparable with that achieved using state-of-the-art classical generative models, yet with over 25 times fewer parameters and 10 times fewer training epochs. Such results suggest a promising future for quantum image generation as quantum hardware advances. Finally, we demonstrate the robust performance of our model on real IBM quantum machine with hardware noise.

nan


Article 1319

Title@2025-06-26 (4): Efficient Image Generation with Variadic Attention Heads

Title: Efficient Image Generation with Variadic Attention Heads Effiziente Bildgenerierung mit verschiedenen Aufmerksamkeitsköpfen 高效的图像生成,由Variadic关注组织负责人负责 2211.05770v3

Authors (5): Steven Walton, Ali Hassani, Xingqian Xu, Zhangyang Wang, Humphrey Shi

While the integration of transformers in vision models have yielded significant improvements on vision tasks they still require significant amounts of computation for both training and inference. Restricted attention mechanisms significantly reduce these computational burdens but come at the cost of losing either global or local coherence. We propose a simple, yet powerful method to reduce these trade-offs: allow the attention heads of a single transformer to attend to multiple receptive fields. We demonstrate our method utilizing Neighborhood Attention (NA) and integrate it into a StyleGAN based architecture for image generation. With this work, dubbed StyleNAT, we are able to achieve a FID of 2.05 on FFHQ, a 6% improvement over StyleGAN-XL, while utilizing 28% fewer parameters and with 4$\times$ the throughput capacity. StyleNAT achieves the Pareto Frontier on FFHQ-256 and demonstrates powerful and efficient image generation on other datasets. Our code and model checkpoints are publicly available at: https://github.com/SHI-Labs/StyleNAT

nan


Article 1320

Title@2025-06-26 (4): Proximal Point Method for Online Saddle Point Problem

Title: Proximal Point Method for Online Saddle Point Problem Proximale Point-Methode für Online-Sättelpunkt-Problem 在线搭配点问题的近点方法 2407.04591v3

Authors (2): Qing-xin Meng, Jian-wei Liu

This paper focuses on the online saddle point problem, which involves a sequence of two-player time-varying convex-concave games. Considering the nonstationarity of the environment, we adopt the duality gap and the dynamic Nash equilibrium regret as performance metrics for algorithm design. We present three variants of the proximal point method: the Online Proximal Point Method (OPPM), the Optimistic OPPM (OptOPPM), and the OptOPPM with multiple predictors. Each algorithm guarantees upper bounds for both the duality gap and dynamic Nash equilibrium regret, achieving near-optimality when measured against the duality gap. Specifically, in certain benign environments, such as sequences of stationary payoff functions, these algorithms maintain a nearly constant metric bound. Experimental results further validate the effectiveness of these algorithms. Lastly, this paper discusses potential reliability concerns associated with using dynamic Nash equilibrium regret as a performance metric. The technical appendix and code can be found at https://github.com/qingxin6174/PPM-for-OSP.

nan


Article 1321

Title@2025-06-26 (4): Review learning: Real world validation of privacy preserving continual learning across medical institutions

Title: Review learning: Real world validation of privacy preserving continual learning across medical institutions Review learning: Echte Welt-Validierung der Privatsphäre Erhaltung kontinuierlichen Lernens in medizinischen Einrichtungen 审查学习:维护各医疗机构持续学习的隐私的真实世界验证 2210.09394v2

Authors (12): Jaesung Yoo, Sunghyuk Choi, Ye Seul Yang, Suhyeon Kim, Jieun Choi, Dongkyeong Lim, Yaeji Lim, Hyung Joon Joo, Dae Jung Kim, Rae Woong Park, Hyeong-Jin Yoon, Kwangsoo Kim

When a deep learning model is trained sequentially on different datasets, it often forgets the knowledge learned from previous data, a problem known as catastrophic forgetting. This damages the model’s performance on diverse datasets, which is critical in privacy-preserving deep learning (PPDL) applications based on transfer learning (TL). To overcome this, we introduce “review learning” (RevL), a low cost continual learning algorithm for diagnosis prediction using electronic health records (EHR) within a PPDL framework. RevL generates data samples from the model which are used to review knowledge from previous datasets. Six simulated institutional experiments and one real-world experiment involving three medical institutions were conducted to validate RevL, using three binary classification EHR data. In the real-world experiment with data from 106,508 patients, the mean global area under the receiver operating curve was 0.710 for RevL and 0.655 for TL. These results demonstrate RevL’s ability to retain previously learned knowledge and its effectiveness in real-world PPDL scenarios. Our work establishes a realistic pipeline for PPDL research based on model transfers across institutions and highlights the practicality of continual learning in real-world medical settings using private EHR data.

nan


Article 1322

Title@2025-06-26 (4): Distilling Normalizing Flows

Title: Distilling Normalizing Flows Destillieren von Normalisierungsströmen 保持正常流动 2506.21003v1

Authors (6): Steven Walton, Valeriy Klyukin, Maksim Artemev, Denis Derkach, Nikita Orlov, Humphrey Shi

Explicit density learners are becoming an increasingly popular technique for generative models because of their ability to better model probability distributions. They have advantages over Generative Adversarial Networks due to their ability to perform density estimation and having exact latent-variable inference. This has many advantages, including: being able to simply interpolate, calculate sample likelihood, and analyze the probability distribution. The downside of these models is that they are often more difficult to train and have lower sampling quality. Normalizing flows are explicit density models, that use composable bijective functions to turn an intractable probability function into a tractable one. In this work, we present novel knowledge distillation techniques to increase sampling quality and density estimation of smaller student normalizing flows. We seek to study the capacity of knowledge distillation in Compositional Normalizing Flows to understand the benefits and weaknesses provided by these architectures. Normalizing flows have unique properties that allow for a non-traditional forms of knowledge transfer, where we can transfer that knowledge within intermediate layers. We find that through this distillation, we can make students significantly smaller while making substantial performance gains over a non-distilled student. With smaller models there is a proportionally increased throughput as this is dependent upon the number of bijectors, and thus parameters, in the network.

nan


Article 1323

Title@2025-06-26 (4): Genetic Algorithm with Innovative Chromosome Patterns in the Breeding Process

Title: Genetic Algorithm with Innovative Chromosome Patterns in the Breeding Process Genetischer Algorithmus mit innovativen Chromosomenmustern im Zuchtprozess 育种过程中创新性染色体模式的遗传数值 2501.18184v3

Authors (1): Qingchuan Lyu

This paper proposes Genetic Algorithm with Border Trades (GAB), a novel modification of the standard genetic algorithm that enhances exploration by incorporating new chromosome patterns in the breeding process. This approach significantly mitigates premature convergence and improves search diversity. Empirically, GAB achieves up to 8x higher fitness and 10x faster convergence on complex job scheduling problems compared to standard Genetic Algorithms, reaching average fitness scores of 888 versus 106 in under 20 seconds. On the classic Flip-Flop problem, GAB consistently finds optimal or near-optimal solutions in fewer generations, even as input sizes scale to thousands of bits. These results highlight GAB as a highly effective and computationally efficient alternative for solving large-scale combinatorial optimization problems.

nan


Article 1324

Title@2025-06-26 (4): Leveraging Unlabeled Audio-Visual Data in Speech Emotion Recognition using Knowledge Distillation

Title: Leveraging Unlabeled Audio-Visual Data in Speech Emotion Recognition using Knowledge Distillation Nutzung unmarkierter Audio-Visual-Daten in der Sprach-Emotionserkennung mittels Wissensdestillation 利用知识蒸馏在语音情感识别中利用未贴标签的视听数据 2507.00055v1

Authors (3): Varsha Pendyala, Pedro Morgado, William Sethares

Voice interfaces integral to the human-computer interaction systems can benefit from speech emotion recognition (SER) to customize responses based on user emotions. Since humans convey emotions through multi-modal audio-visual cues, developing SER systems using both the modalities is beneficial. However, collecting a vast amount of labeled data for their development is expensive. This paper proposes a knowledge distillation framework called LightweightSER (LiSER) that leverages unlabeled audio-visual data for SER, using large teacher models built on advanced speech and face representation models. LiSER transfers knowledge regarding speech emotions and facial expressions from the teacher models to lightweight student models. Experiments conducted on two benchmark datasets, RAVDESS and CREMA-D, demonstrate that LiSER can reduce the dependence on extensive labeled datasets for SER tasks.

nan


Article 1325

Title@2025-06-26 (4): Bridging the Gap Between Approximation and Learning via Optimal Approximation by ReLU MLPs of Maximal Regularity

Title: Bridging the Gap Between Approximation and Learning via Optimal Approximation by ReLU MLPs of Maximal Regularity Überbrückung der Lücke zwischen Annäherung und Lernen durch Optimale Annäherung durch ReLU MLPs der Maximalregularität 通过最大合规性RELU MLP,通过最佳接近缩小接近与学习之间的差距 2409.12335v4

Authors (2): Ruiyang Hong, Anastasis Kratsios

The foundations of deep learning are supported by the seemingly opposing perspectives of approximation or learning theory. The former advocates for large/expressive models that need not generalize, while the latter considers classes that generalize but may be too small/constrained to be universal approximators. Motivated by real-world deep learning implementations that are both expressive and statistically reliable, we ask: “Is there a class of neural networks that is both large enough to be universal but structured enough to generalize?” This paper constructively provides a positive answer to this question by identifying a highly structured class of ReLU multilayer perceptions (MLPs), which are optimal function approximators and are statistically well-behaved. We show that any $(L,\alpha)$-H"{o}lder function from $[0,1]^d$ to $[-n,n]$ can be approximated to a uniform $\mathcal{O}(1/n)$ error on $[0,1]^d$ with a sparsely connected ReLU MLP with the same H"{o}lder exponent $\alpha$ and coefficient $L$, of width $\mathcal{O}(dn^{d/\alpha})$, depth $\mathcal{O}(\log(d))$, with $\mathcal{O}(dn^{d/\alpha})$ nonzero parameters, and whose weights and biases take values in ${0,\pm 1/2}$ except in the first and last layers which instead have magnitude at-most $n$. Further, our class of MLPs achieves a near-optimal sample complexity of $\mathcal{O}(\log(N)/\sqrt{N})$ when given $N$ i.i.d. normalized sub-Gaussian training samples. We achieve this through a new construction that perfectly fits together linear pieces using Kuhn triangulations, along with a new proof technique which shows that our construction preserves the regularity of not only the H"{o}lder functions, but also any uniformly continuous function. Our results imply that neural networks can solve the McShane extension problem on suitable finite sets.

nan


Article 1326

Title@2025-06-26 (4): SharpZO: Hybrid Sharpness-Aware Vision Language Model Prompt Tuning via Forward-Only Passes

Title: SharpZO: Hybrid Sharpness-Aware Vision Language Model Prompt Tuning via Forward-Only Passes SharpZO: Hybrid Sharpness-Aware Vision Sprachmodell Prompt Tuning via Forward-Only Passes SharpZO: 混合尖锐-敏锐视觉语言模型,通过前向-单行道快速调试 2506.20990v1

Authors (6): Yifan Yang, Zhen Zhang, Rupak Vignesh Swaminathan, Jing Liu, Nathan Susanj, Zheng Zhang

Fine-tuning vision language models (VLMs) has achieved remarkable performance across various downstream tasks; yet, it requires access to model gradients through backpropagation (BP), making them unsuitable for memory-constrained, inference-only edge devices. To address this limitation, previous work has explored various BP-free fine-tuning methods. However, these approaches often rely on high-variance evolutionary strategies (ES) or zeroth-order (ZO) optimization, and often fail to achieve satisfactory performance. In this paper, we propose a hybrid Sharpness-aware Zeroth-order optimization (SharpZO) approach, specifically designed to enhance the performance of ZO VLM fine-tuning via a sharpness-aware warm-up training. SharpZO features a two-stage optimization process: a sharpness-aware ES stage that globally explores and smooths the loss landscape to construct a strong initialization, followed by a fine-grained local search via sparse ZO optimization. The entire optimization relies solely on forward passes. Detailed theoretical analysis and extensive experiments on CLIP models demonstrate that SharpZO significantly improves accuracy and convergence speed, achieving up to 7% average gain over state-of-the-art forward-only methods.

nan


Article 1327

Title@2025-06-26 (4): Can Gradient Descent Simulate Prompting?

Title: Can Gradient Descent Simulate Prompting? Kann Gradient Descent Simulate Prompting? 梯子源模拟能刺激吗? 2506.20989v1

Authors (3): Eric Zhang, Leshem Choshen, Jacob Andreas

There are two primary ways of incorporating new information into a language model (LM): changing its prompt or changing its parameters, e.g. via fine-tuning. Parameter updates incur no long-term storage cost for model changes. However, for many model updates, prompting is significantly more effective: prompted models can generalize robustly from single examples and draw logical inferences that do not occur under standard fine-tuning. Can models be modified so that fine-tuning does emulate prompting? This paper describes a method for meta-training LMs such that gradient updates emulate the effects of conditioning on new information. Our approach uses tools from gradient-based meta-learning but uses an LM’s own prompted predictions as targets, eliminating the need for ground-truth labels. Subsequent gradient descent training recovers some (and occasionally all) of prompted model performance – showing improvement on the ``reversal curse’’ tasks, and answering questions about text passages after a single gradient update. These results suggest that, with appropriate initialization, gradient descent can be surprisingly expressive. Our results suggest new avenues for long-context modeling and offer insight into the generalization capabilities of gradient-based learning.

nan


Article 1328

Title@2025-06-26 (4): Split-Merge: A Difference-based Approach for Dominant Eigenvalue Problem

Title: Split-Merge: A Difference-based Approach for Dominant Eigenvalue Problem Split-Merge: Ein unterschiedsbasierter Ansatz für das Dominante Eigenwertproblem Split-Merge:对支配性电子价值问题采取基于差异的办法 2501.15131v2

Authors (2): Xiaozhi Liu, Yong Xia

The computation of the dominant eigenvector of symmetric positive semidefinite matrices is a cornerstone operation in numerous optimization-driven applications. Traditional methods, typically based on the \textit{Quotient} formulation, often suffer from challenges related to computational efficiency and reliance on prior spectral knowledge. In this work, we leverage the alternative \textit{Difference} formulation to reinterpret the classical power method as a first-order optimization algorithm. This perspective allows for a novel convergence analysis and facilitates the development of accelerated variants with larger step-sizes, achieving faster convergence without additional computational cost. Building on this insight, we introduce a generalized family of Difference-based methods, with the power method as a special case. Within this family, we propose Split-Merge, an algorithm that attains accelerated convergence without requiring spectral knowledge and operates solely via matrix-vector products. Extensive experiments on both synthetic and real-world datasets demonstrate that Split-Merge consistently outperforms state-of-the-art methods in both efficiency and scalability. In particular, it achieves more than a $\boldsymbol{10\times}$ speedup over the classical power method, underscoring its practical effectiveness for large-scale problems.

nan


Article 1329

Title@2025-06-26 (4): Generalized Tensor-based Parameter-Efficient Fine-Tuning via Lie Group Transformations

Title: Generalized Tensor-based Parameter-Efficient Fine-Tuning via Lie Group Transformations Generalisierte Tensor-basierte Parameter-Effizient Feinsteuerung über Lie Group Transformationen 通过 “ 谎言集团变形 “ 进行通用的Tensor基准参数有效精美调整 2504.00851v2

Authors (6): Chongjie Si, Zhiyi Shi, Xuehui Wang, Yichen Xiao, Xiaokang Yang, Wei Shen

Adapting pre-trained foundation models for diverse downstream tasks is a core practice in artificial intelligence. However, the wide range of tasks and high computational costs make full fine-tuning impractical. To overcome this, parameter-efficient fine-tuning (PEFT) methods like LoRA have emerged and are becoming a growing research focus. Despite the success of these methods, they are primarily designed for linear layers, focusing on two-dimensional matrices while largely ignoring higher-dimensional parameter spaces like convolutional kernels. Moreover, directly applying these methods to higher-dimensional parameter spaces often disrupts their structural relationships. Given the rapid advancements in matrix-based PEFT methods, rather than designing a specialized strategy, we propose a generalization that extends matrix-based PEFT methods to higher-dimensional parameter spaces without compromising their structural properties. Specifically, we treat parameters as elements of a Lie group, with updates modeled as perturbations in the corresponding Lie algebra. These perturbations are mapped back to the Lie group through the exponential map, ensuring smooth, consistent updates that preserve the inherent structure of the parameter space. Extensive experiments on computer vision and natural language processing validate the effectiveness and versatility of our approach, demonstrating clear improvements over existing methods.

nan


Article 1330

Title@2025-06-26 (4): Explainable quantum regression algorithm with encoded data structure

Title: Explainable quantum regression algorithm with encoded data structure Erklärbarer Quantenregressionsalgorithmus mit kodierter Datenstruktur 具有编码数据结构的可解释量子回归算法 2307.03334v5

Authors (6): C. -C. Joseph Wang, F. Perkkola, I. Salmenperä, A. Meijer-van de Griend, J. K. Nurminen, R. S. Bennink

Hybrid variational quantum algorithms (VQAs) are promising for solving practical problems such as combinatorial optimization, quantum chemistry simulation, quantum machine learning, and quantum error correction on noisy quantum computers. However, with typical random ansatz or quantum alternating operator ansatz, derived variational quantum algorithms become a black box that cannot be trusted for model interpretation, not to mention deploying as applications in informing critical decisions: the results of these variational parameters are just rotational angles for the quantum gates and have nothing to do with interpretable values that a model can provide directly. In this paper, we construct the first interpretable quantum regression algorithm, in which the quantum state exactly encodes the classical data table and the variational parameters correspond directly to the regression coefficients, which are real numbers by construction, providing a high degree of model interpretability and minimal cost to optimize due to the right expressiveness. We also take advantage of the encoded data structure to reduce the time complexity of computing the regression map. To shorten the circuit depth for nonlinear regression, our algorithm can be extended by building nonlinear features by classical preprocessing as the independent encoded column vectors. Even though the realization of compressed encoding in superconducting qubits has been achieved by the less noisy compressed encoding recently by the authors, we envision potential quantum utilities with multi-qubit gates implemented in neutral cold atoms and ions.

nan


Article 1331

Title@2025-06-26 (4): EraRAG: Efficient and Incremental Retrieval Augmented Generation for Growing Corpora

Title: EraRAG: Efficient and Incremental Retrieval Augmented Generation for Growing Corpora Erarag: Effiziente und inkrementelle retrieval Augmented Generation für wachsende Corpora EraRAG: 增长企业的高效和递增回收增量增殖型增殖型增殖型增殖型增殖型增殖型增殖型 2506.20963v1

Authors (9): Fangyuan Zhang, Zhengjun Huang, Yingli Zhou, Qintian Guo, Zhixun Li, Wensheng Luo, Di Jiang, Yixiang Fang, Xiaofang Zhou

Graph-based Retrieval-Augmented Generation (Graph-RAG) enhances large language models (LLMs) by structuring retrieval over an external corpus. However, existing approaches typically assume a static corpus, requiring expensive full-graph reconstruction whenever new documents arrive, limiting their scalability in dynamic, evolving environments. To address these limitations, we introduce EraRAG, a novel multi-layered Graph-RAG framework that supports efficient and scalable dynamic updates. Our method leverages hyperplane-based Locality-Sensitive Hashing (LSH) to partition and organize the original corpus into hierarchical graph structures, enabling efficient and localized insertions of new data without disrupting the existing topology. The design eliminates the need for retraining or costly recomputation while preserving high retrieval accuracy and low latency. Experiments on large-scale benchmarks demonstrate that EraRag achieves up to an order of magnitude reduction in update time and token consumption compared to existing Graph-RAG systems, while providing superior accuracy performance. This work offers a practical path forward for RAG systems that must operate over continually growing corpora, bridging the gap between retrieval efficiency and adaptability. Our code and data are available at https://github.com/EverM0re/EraRAG-Official.

nan


Article 1332

Title@2025-06-26 (4): Antibody Design and Optimization with Multi-scale Equivariant Graph Diffusion Models for Accurate Complex Antigen Binding

Title: Antibody Design and Optimization with Multi-scale Equivariant Graph Diffusion Models for Accurate Complex Antigen Binding Antikörper-Design und Optimierung mit mehrstufigen äquivarianten Graphen-Diffusions-Modellen für präzise, komplexe Antigen-Bindung 防反体设计和优化,采用多种规模等同图形扩散模型,用于准确的复合抗原装订 2506.20957v1

Authors (4): Jiameng Chen, Xiantao Cai, Jia Wu, Wenbin Hu

Antibody design remains a critical challenge in therapeutic and diagnostic development, particularly for complex antigens with diverse binding interfaces. Current computational methods face two main limitations: (1) capturing geometric features while preserving symmetries, and (2) generalizing novel antigen interfaces. Despite recent advancements, these methods often fail to accurately capture molecular interactions and maintain structural integrity. To address these challenges, we propose \textbf{AbMEGD}, an end-to-end framework integrating \textbf{M}ulti-scale \textbf{E}quivariant \textbf{G}raph \textbf{D}iffusion for antibody sequence and structure co-design. Leveraging advanced geometric deep learning, AbMEGD combines atomic-level geometric features with residue-level embeddings, capturing local atomic details and global sequence-structure interactions. Its E(3)-equivariant diffusion method ensures geometric precision, computational efficiency, and robust generalizability for complex antigens. Furthermore, experiments using the SAbDab database demonstrate a 10.13\% increase in amino acid recovery, 3.32\% rise in improvement percentage, and a 0.062~\AA\ reduction in root mean square deviation within the critical CDR-H3 region compared to DiffAb, a leading antibody design model. These results highlight AbMEGD’s ability to balance structural integrity with improved functionality, establishing a new benchmark for sequence-structure co-design and affinity optimization. The code is available at: https://github.com/Patrick221215/AbMEGD.

nan


Article 1333

Title@2025-06-26 (4): Model State Arithmetic for Machine Unlearning

Title: Model State Arithmetic for Machine Unlearning Modell Staat Arithmetik für Maschine Unlearning 机器脱修示范国 2506.20941v1

Authors (4): Keivan Rezaei, Mehrdad Saberi, Abhilasha Ravichander, Soheil Feizi

Large language models are trained on massive corpora of web data, which may include private data, copyrighted material, factually inaccurate data, or data that degrades model performance. Eliminating the influence of such problematic datapoints through complete retraining – by repeatedly pretraining the model on datasets that exclude these specific instances – is computationally prohibitive. For this reason, unlearning algorithms have emerged that aim to eliminate the influence of particular datapoints, while otherwise preserving the model – at a low computational cost. However, precisely estimating and undoing the influence of individual datapoints has proved to be challenging. In this work, we propose a new algorithm, MSA, for estimating and undoing the influence of datapoints – by leveraging model checkpoints i.e. artifacts capturing model states at different stages of pretraining. Our experimental results demonstrate that MSA consistently outperforms existing machine unlearning algorithms across multiple benchmarks, models, and evaluation metrics, suggesting that MSA could be an effective approach towards more flexible large language models that are capable of data erasure.

nan


Article 1334

Title@2025-06-26 (4): Forecasting Geopolitical Events with a Sparse Temporal Fusion Transformer and Gaussian Process Hybrid: A Case Study in Middle Eastern and U.S. Conflict Dynamics

Title: Forecasting Geopolitical Events with a Sparse Temporal Fusion Transformer and Gaussian Process Hybrid: A Case Study in Middle Eastern and U.S. Conflict Dynamics Prognose geopolitischer Ereignisse mit einem spare Temporal Fusion Transformer und Gaußschen Prozesshybrid: Eine Fallstudie in Nahost und US-Konfliktdynamik 以松散的时空融合变异器和高斯进程混合器预测地缘政治事件:中东和美国冲突动态案例研究 2506.20935v1

Authors (2): Hsin-Hsiung Huang, Hayden Hampton

Forecasting geopolitical conflict from data sources like the Global Database of Events, Language, and Tone (GDELT) is a critical challenge for national security. The inherent sparsity, burstiness, and overdispersion of such data cause standard deep learning models, including the Temporal Fusion Transformer (TFT), to produce unreliable long-horizon predictions. We introduce STFT-VNNGP, a hybrid architecture that won the 2023 Algorithms for Threat Detection (ATD) competition by overcoming these limitations. Designed to bridge this gap, our model employs a two-stage process: first, a TFT captures complex temporal dynamics to generate multi-quantile forecasts. These quantiles then serve as informed inputs for a Variational Nearest Neighbor Gaussian Process (VNNGP), which performs principled spatiotemporal smoothing and uncertainty quantification. In a case study forecasting conflict dynamics in the Middle East and the U.S., STFT-VNNGP consistently outperforms a standalone TFT, showing a superior ability to predict the timing and magnitude of bursty event periods, particularly at long-range horizons. This work offers a robust framework for generating more reliable and actionable intelligence from challenging event data, with all code and workflows made publicly available to ensure reproducibility.

nan


Article 1335

Title@2025-06-26 (4): Lower Bounds on the Size of Markov Equivalence Classes

Title: Lower Bounds on the Size of Markov Equivalence Classes Untere Grenzen auf der Größe der Markov-Äquivalenzklassen 马克夫等等效类大小的下下界界圈 2506.20933v1

Authors (3): Erik Jahn, Frederick Eberhardt, Leonard J. Schulman

Causal discovery algorithms typically recover causal graphs only up to their Markov equivalence classes unless additional parametric assumptions are made. The sizes of these equivalence classes reflect the limits of what can be learned about the underlying causal graph from purely observational data. Under the assumptions of acyclicity, causal sufficiency, and a uniform model prior, Markov equivalence classes are known to be small on average. In this paper, we show that this is no longer the case when any of these assumptions is relaxed. Specifically, we prove exponentially large lower bounds for the expected size of Markov equivalence classes in three settings: sparse random directed acyclic graphs, uniformly random acyclic directed mixed graphs, and uniformly random directed cyclic graphs.

nan


Article 1336

Title@2025-06-26 (4): Extremely Simple Streaming Forest

Title: Extremely Simple Streaming Forest Extrem einfach streaming Wald 极简单流流森林 2110.08483v7

Authors (4): Haoyin Xu, Jayanta Dey, Sambit Panda, Joshua T. Vogelstein

Decision forests, including random forests and gradient boosting trees, remain the leading machine learning methods for many real-world data problems, especially on tabular data. However, most of the current implementations only operate in batch mode, and therefore cannot incrementally update when more data arrive. Several previous works developed streaming trees and ensembles to overcome this limitation. Nonetheless, we found that those state-of-the-art algorithms suffer from a number of drawbacks, including low accuracy on some problems and high memory usage on others. We therefore developed an extremely simple extension of decision trees: given new data, simply update existing trees by continuing to grow them, and replace some old trees with new ones to control the total number of trees. In a benchmark suite containing 72 classification problems (the OpenML-CC18 data suite), we illustrate that our approach, $\textit{Extremely Simple Streaming Forest}$ (XForest), does not suffer from either of the aforementioned limitations. On those datasets, we also demonstrate that our approach often performs as well as, and sometimes even better than, conventional batch decision forest algorithms. With a $\textit{zero-added-node}$ approach, XForest-Zero, we also further extend existing splits to new tasks, and this very efficient method only requires inference time. Thus, XForests establish a simple standard for streaming trees and forests that could readily be applied to many real-world problems.

nan


Article 1337

Title@2025-06-26 (4): Quantum Reinforcement Learning Trading Agent for Sector Rotation in the Taiwan Stock Market

Title: Quantum Reinforcement Learning Trading Agent for Sector Rotation in the Taiwan Stock Market Quantum-Verstärkung-Learning-Trading-Agent für Sektor-Rotation auf dem Aktienmarkt Taiwan 台湾股市部门轮换的量级强化学习贸易代理 2506.20930v1

Authors (3): Chi-Sheng Chen, Xinyu Zhang, Ya-Chuan Chen

We propose a hybrid quantum-classical reinforcement learning framework for sector rotation in the Taiwan stock market. Our system employs Proximal Policy Optimization (PPO) as the backbone algorithm and integrates both classical architectures (LSTM, Transformer) and quantum-enhanced models (QNN, QRWKV, QASA) as policy and value networks. An automated feature engineering pipeline extracts financial indicators from capital share data to ensure consistent model input across all configurations. Empirical backtesting reveals a key finding: although quantum-enhanced models consistently achieve higher training rewards, they underperform classical models in real-world investment metrics such as cumulative return and Sharpe ratio. This discrepancy highlights a core challenge in applying reinforcement learning to financial domains – namely, the mismatch between proxy reward signals and true investment objectives. Our analysis suggests that current reward designs may incentivize overfitting to short-term volatility rather than optimizing risk-adjusted returns. This issue is compounded by the inherent expressiveness and optimization instability of quantum circuits under Noisy Intermediate-Scale Quantum (NISQ) constraints. We discuss the implications of this reward-performance gap and propose directions for future improvement, including reward shaping, model regularization, and validation-based early stopping. Our work offers a reproducible benchmark and critical insights into the practical challenges of deploying quantum reinforcement learning in real-world finance.

nan


Article 1338

Title@2025-06-26 (4): Active Learning for Manifold Gaussian Process Regression

Title: Active Learning for Manifold Gaussian Process Regression Aktives Lernen für manifolde Gaußsche Prozessregression Gaussian 进程倒退的 Manifide Gaussian 正在学习 2506.20928v1

Authors (4): Yuanxing Cheng, Lulu Kang, Yiwei Wang, Chun Liu

This paper introduces an active learning framework for manifold Gaussian Process (GP) regression, combining manifold learning with strategic data selection to improve accuracy in high-dimensional spaces. Our method jointly optimizes a neural network for dimensionality reduction and a Gaussian process regressor in the latent space, supervised by an active learning criterion that minimizes global prediction error. Experiments on synthetic data demonstrate superior performance over randomly sequential learning. The framework efficiently handles complex, discontinuous functions while preserving computational tractability, offering practical value for scientific and engineering applications. Future work will focus on scalability and uncertainty-aware manifold learning.

nan


Article 1339

Title@2025-06-26 (4): Interpretable Representation Learning for Additive Rule Ensembles

Title: Interpretable Representation Learning for Additive Rule Ensembles Interpretable Representative Learning for Additive Rule Ensembles 补充规则会议的解释性代表性学习 2506.20927v1

Authors (4): Shahrzad Behzadimanesh, Pierre Le Bodic, Geoffrey I. Webb, Mario Boley

Small additive ensembles of symbolic rules offer interpretable prediction models. Traditionally, these ensembles use rule conditions based on conjunctions of simple threshold propositions $x \geq t$ on a single input variable $x$ and threshold $t$, resulting geometrically in axis-parallel polytopes as decision regions. While this form ensures a high degree of interpretability for individual rules and can be learned efficiently using the gradient boosting approach, it relies on having access to a curated set of expressive and ideally independent input features so that a small ensemble of axis-parallel regions can describe the target variable well. Absent such features, reaching sufficient accuracy requires increasing the number and complexity of individual rules, which diminishes the interpretability of the model. Here, we extend classical rule ensembles by introducing logical propositions with learnable sparse linear transformations of input variables, i.e., propositions of the form $\mathbf{x}^\mathrm{T}\mathbf{w} \geq t$, where $\mathbf{w}$ is a learnable sparse weight vector, enabling decision regions as general polytopes with oblique faces. We propose a learning method using sequential greedy optimization based on an iteratively reweighted formulation of logistic regression. Experimental results demonstrate that the proposed method efficiently constructs rule ensembles with the same test risk as state-of-the-art methods while significantly reducing model complexity across ten benchmark datasets.

nan


Article 1340

Title@2025-06-26 (4): LLM-guided Chemical Process Optimization with a Multi-Agent Approach

Title: LLM-guided Chemical Process Optimization with a Multi-Agent Approach LLM-geführte chemische Prozessoptimierung mit einem Multi-Agent-Ansatz LLM-LLM-制导化学过程 优化采用多机构办法 2506.20921v1

Authors (5): Tong Zeng, Srivathsan Badrinarayanan, Janghoon Ock, Cheng-Kai Lai, Amir Barati Farimani

Chemical process optimization is crucial to maximize production efficiency and economic performance. Traditional methods, including gradient-based solvers, evolutionary algorithms, and parameter grid searches, become impractical when operating constraints are ill-defined or unavailable, requiring engineers to rely on subjective heuristics to estimate feasible parameter ranges. To address this constraint definition bottleneck, we present a multi-agent framework of large language model (LLM) agents that autonomously infer operating constraints from minimal process descriptions, then collaboratively guide optimization using the inferred constraints. Our AutoGen-based agentic framework employs OpenAI’s o3 model, with specialized agents for constraint generation, parameter validation, simulation execution, and optimization guidance. Through two phases - autonomous constraint generation using embedded domain knowledge, followed by iterative multi-agent optimization - the framework eliminates the need for predefined operational bounds. Validated on the hydrodealkylation process across cost, yield, and yield-to-cost ratio metrics, the framework demonstrated competitive performance with conventional optimization methods while achieving better computational efficiency, requiring fewer iterations to converge. Our approach converged in under 20 minutes, achieving a 31-fold speedup over grid search. Beyond computational efficiency, the framework’s reasoning-guided search demonstrates sophisticated process understanding, correctly identifying utility trade-offs, and applying domain-informed heuristics. This approach shows significant potential for optimization scenarios where operational constraints are poorly characterized or unavailable, particularly for emerging processes and retrofit applications.

nan


Article 1341

Title@2025-06-26 (4): Machine learning of microstructure–property relationships in materials leveraging microstructure representation from foundational vision transformers

Title: Machine learning of microstructure–property relationships in materials leveraging microstructure representation from foundational vision transformers Maschinelles Lernen von Mikrostruktur-Eigenschaftsbeziehungen in Materialien, die die Mikrostrukturdarstellung von grundlegenden Vision-Transformatoren nutzen 利用基础视觉变压器代表微观结构的材料中微型结构-财产关系 2501.18637v2

Authors (2): Sheila E. Whitman, Marat I. Latypov

Machine learning of microstructure–property relationships from data is an emerging approach in computational materials science. Most existing machine learning efforts focus on the development of task-specific models for each microstructure–property relationship. We propose utilizing pre-trained foundational vision transformers for the extraction of task-agnostic microstructure features and subsequent light-weight machine learning of a microstructure-dependent property. We demonstrate our approach with pre-trained state-of-the-art vision transformers (CLIP, DINOv2, SAM) in two case studies on machine-learning: (i) elastic modulus of two-phase microstructures based on simulations data; and (ii) Vicker’s hardness of Ni-base and Co-base superalloys based on experimental data published in literature. Our results show the potential of foundational vision transformers for robust microstructure representation and efficient machine learning of microstructure–property relationships without the need for expensive task-specific training or fine-tuning of bespoke deep learning models.

nan


Article 1342

Title@2025-06-26 (4): Explainable AI for Radar Resource Management: Modified LIME in Deep Reinforcement Learning

Title: Explainable AI for Radar Resource Management: Modified LIME in Deep Reinforcement Learning Erklärbare KI für Radar-Ressourcenmanagement: Modifizierte LIME im Deep Reinforcement Learning 用于雷达资源管理的可解释的AIAI:深强化学习中修改的LIME 2506.20916v1

Authors (4): Ziyang Lu, M. Cenk Gursoy, Chilukuri K. Mohan, Pramod K. Varshney

Deep reinforcement learning has been extensively studied in decision-making processes and has demonstrated superior performance over conventional approaches in various fields, including radar resource management (RRM). However, a notable limitation of neural networks is their ``black box” nature and recent research work has increasingly focused on explainable AI (XAI) techniques to describe the rationale behind neural network decisions. One promising XAI method is local interpretable model-agnostic explanations (LIME). However, the sampling process in LIME ignores the correlations between features. In this paper, we propose a modified LIME approach that integrates deep learning (DL) into the sampling process, which we refer to as DL-LIME. We employ DL-LIME within deep reinforcement learning for radar resource management. Numerical results show that DL-LIME outperforms conventional LIME in terms of both fidelity and task performance, demonstrating superior performance with both metrics. DL-LIME also provides insights on which factors are more important in decision making for radar resource management.

nan


Article 1343

Title@2025-06-26 (4): ZKPROV: A Zero-Knowledge Approach to Dataset Provenance for Large Language Models

Title: ZKPROV: A Zero-Knowledge Approach to Dataset Provenance for Large Language Models ZKPROV: Ein Null-Knowledge-Ansatz zur Datensatzprovenz für große Sprachmodelle ZKPROV:大语言模型数据集验证零知识化办法 2506.20915v1

Authors (3): Mina Namazi, Alexander Nemecek, Erman Ayday

As the deployment of large language models (LLMs) grows in sensitive domains, ensuring the integrity of their computational provenance becomes a critical challenge, particularly in regulated sectors such as healthcare, where strict requirements are applied in dataset usage. We introduce ZKPROV, a novel cryptographic framework that enables zero-knowledge proofs of LLM provenance. It allows users to verify that a model is trained on a reliable dataset without revealing sensitive information about it or its parameters. Unlike prior approaches that focus on complete verification of the training process (incurring significant computational cost) or depend on trusted execution environments, ZKPROV offers a distinct balance. Our method cryptographically binds a trained model to its authorized training dataset(s) through zero-knowledge proofs while avoiding proof of every training step. By leveraging dataset-signed metadata and compact model parameter commitments, ZKPROV provides sound and privacy-preserving assurances that the result of the LLM is derived from a model trained on the claimed authorized and relevant dataset. Experimental results demonstrate the efficiency and scalability of the ZKPROV in generating this proof and verifying it, achieving a practical solution for real-world deployments. We also provide formal security guarantees, proving that our approach preserves dataset confidentiality while ensuring trustworthy dataset provenance.

nan


Article 1344

Title@2025-06-26 (4): Faster Fixed-Point Methods for Multichain MDPs

Title: Faster Fixed-Point Methods for Multichain MDPs Schnellere Fixed-Point-Methoden für Multichain-MDPs 《多链 MDP快速固定点方法》 2506.20910v1

Authors (2): Matthew Zurek, Yudong Chen

We study value-iteration (VI) algorithms for solving general (a.k.a. multichain) Markov decision processes (MDPs) under the average-reward criterion, a fundamental but theoretically challenging setting. Beyond the difficulties inherent to all average-reward problems posed by the lack of contractivity and non-uniqueness of solutions to the Bellman operator, in the multichain setting an optimal policy must solve the navigation subproblem of steering towards the best connected component, in addition to optimizing long-run performance within each component. We develop algorithms which better solve this navigational subproblem in order to achieve faster convergence for multichain MDPs, obtaining improved rates of convergence and sharper measures of complexity relative to prior work. Many key components of our results are of potential independent interest, including novel connections between average-reward and discounted problems, optimal fixed-point methods for discounted VI which extend to general Banach spaces, new sublinear convergence rates for the discounted value error, and refined suboptimality decompositions for multichain MDPs. Overall our results yield faster convergence rates for discounted and average-reward problems and expand the theoretical foundations of VI approaches.

nan


Article 1345

Title@2025-06-26 (4): Optimal Single-Policy Sample Complexity and Transient Coverage for Average-Reward Offline RL

Title: Optimal Single-Policy Sample Complexity and Transient Coverage for Average-Reward Offline RL Optimale Single-Policy-Probenkomplexität und transiente Abdeckung für durchschnittlich reward Offline-RL 平均离岸平均回报率的 最佳单一政策抽样复杂程度和中度覆盖率 2506.20904v1

Authors (3): Matthew Zurek, Guy Zamir, Yudong Chen

We study offline reinforcement learning in average-reward MDPs, which presents increased challenges from the perspectives of distribution shift and non-uniform coverage, and has been relatively underexamined from a theoretical perspective. While previous work obtains performance guarantees under single-policy data coverage assumptions, such guarantees utilize additional complexity measures which are uniform over all policies, such as the uniform mixing time. We develop sharp guarantees depending only on the target policy, specifically the bias span and a novel policy hitting radius, yielding the first fully single-policy sample complexity bound for average-reward offline RL. We are also the first to handle general weakly communicating MDPs, contrasting restrictive structural assumptions made in prior work. To achieve this, we introduce an algorithm based on pessimistic discounted value iteration enhanced by a novel quantile clipping technique, which enables the use of a sharper empirical-span-based penalty function. Our algorithm also does not require any prior parameter knowledge for its implementation. Remarkably, we show via hard examples that learning under our conditions requires coverage assumptions beyond the stationary distribution of the target policy, distinguishing single-policy complexity measures from previously examined cases. We also develop lower bounds nearly matching our main result.

nan


Article 1346

Title@2025-06-26 (4): Graph-Structured Feedback Multimodel Ensemble Online Conformal Prediction

Title: Graph-Structured Feedback Multimodel Ensemble Online Conformal Prediction Graph-strukturiertes Feedback Multimodel Ensemble Online Conformal Prediction 多模型组合在线非正式预测 2506.20898v1

Authors (2): Erfan Hajihashemi, Yanning Shen

Online conformal prediction has demonstrated its capability to construct a prediction set for each incoming data point that covers the true label with a predetermined probability. To cope with potential distribution shift, multi-model online conformal prediction has been introduced to select and leverage different models from a preselected candidate set. Along with the improved flexibility, the choice of the preselected set also brings challenges. A candidate set that includes a large number of models may increase the computational complexity. In addition, the inclusion of irrelevant models with poor performance may negatively impact the performance and lead to unnecessarily large prediction sets. To address these challenges, we propose a novel multi-model online conformal prediction algorithm that identifies a subset of effective models at each time step by collecting feedback from a bipartite graph, which is refined upon receiving new data. A model is then selected from this subset to construct the prediction set, resulting in reduced computational complexity and smaller prediction sets. Additionally, we demonstrate that using prediction set size as feedback, alongside model loss, can significantly improve efficiency by constructing smaller prediction sets while still satisfying the required coverage guarantee. The proposed algorithms are proven to ensure valid coverage and achieve sublinear regret. Experiments on real and synthetic datasets validate that the proposed methods construct smaller prediction sets and outperform existing multi-model online conformal prediction approaches.

nan