cs.LG @ 2025-07-25: 1261
-
00 07-24 (4) Pseudo-Labeling for Kernel Ridge Regression under Covariate Shift Pseudo-Labeling für Kernel Ridge Regression unter Kovariate Shift 共变移下内核循环脊回归的优多环流 2302.10160v4 -
01 07-24 SIDA: Synthetic Image Driven Zero-shot Domain Adaptation SIDA: Synthetisches Bild angetrieben Null-Schuss Domain-Anpassung SIDA: 合成图像驱动器零弹射域适应 2507.18632v1 -
02 07-24 Gait Recognition Based on Tiny ML and IMU Sensors Gait-Erkennung basierend auf winzigen ML- und IMU-Sensoren 基于小ML和IMU传感器的Gait识别 2507.18627v1 -
03 07-24 Moving Out: Physically-grounded Human-AI Collaboration Ausstieg: physikalisch begründete Mensch-AI-Kollaboration 搬出:基于身体的人类 – – AI协作 2507.18623v1 -
04 07-24 Diffusion Beats Autoregressive in Data-Constrained Settings Diffusion schlägt Autoregressive in datenbeschränkten Einstellungen 在受数据约束的设置中自动递减 2507.15857v2 -
05 07-24 TRPrompt: Bootstrapping Query-Aware Prompt Optimization from Textual Rewards TRPrompt: Bootstrapping Query-Aware Prompt Optimierung von Textbelohnungen TRPropt: 从文本奖励中促进解答询问软件快速优化 2507.18618v1 -
06 07-24 SynC: Synthetic Image Caption Dataset Refinement with One-to-many Mapping for Zero-shot Image Captioning SynC: Synthetische Bildunterschrift Datensatzverfeinerung mit ein-zu-vielen Mapping für Zero-shot Bildunterschrift 合成图像说明: 合成图像说明数据集精化,用一到多个绘图进行零光图像说明的合成图像说明 2507.18616v1 -
07 07-24 BEARCUBS: A benchmark for computer-using web agents BEARCUBS: Benchmark für computergestützte Web-Agenten BEARCUBS:计算机使用网络代理器的基准 2503.07919v3 -
08 07-24 Explainable Mapper: Charting LLM Embedding Spaces Using Perturbation-Based Explanation and Verification Agents Erklärbarer Mapper: LLM-Embedding-Räume mit Perturbation-basierten Erklärungs- und Verifikations-Agenten kartographieren 可解释的成像仪:利用以扰动为基础的解释和核查仪器绘制LLM内嵌空间图 2507.18607v1 -
09 07-24 Hybrid quantum-classical algorithm for near-optimal planning in POMDPs Hybrider quantenklassischer Algorithmus zur nahezu optimalen Planung in POMDPs POMDPs中接近最佳规划的混合量子-古典量子算法 2507.18606v1 -
10 07-24 Beyond Euclid: An Illustrated Guide to Modern Machine Learning with Geometric, Topological, and Algebraic Structures Beyond Euklid: Ein illustrierter Leitfaden zum modernen maschinellen Lernen mit geometrischen, topologischen und algebraischen Strukturen 欧几里特以外:带有几何、地形学和代数结构的现代机器学习设计指南 2407.09468v2 -
11 07-24 Demystify Protein Generation with Hierarchical Conditional Diffusion Models Entmystifizieren Protein-Generation mit Hierarchische Bedingte Diffusion Modelle 使用等级级有条件扩散模型解密蛋白一代 2507.18603v1 -
12 07-24 Sparse Logit Sampling: Accelerating Knowledge Distillation in LLMs Sparse Logit Sampling: Beschleunigung der Wissensdestillation in LLMs 粗略的登录抽样:加速在LLMs中进行知识蒸馏 2503.16870v2 -
13 07-24 Linear Memory SE(2) Invariant Attention Linearer Speicher SE(2) Invariante Aufmerksamkeit 线性内存 SE(2) 惯性注意 2507.18597v1 -
14 07-24 Private Counterfactual Retrieval Private kontraaktische Retrieval 私人反事实检索 2410.13812v2 -
15 07-24 DRWKV: Focusing on Object Edges for Low-Light Image Enhancement DRWKV: Fokussierung auf Objektkanten für Low-Light Image Enhancement DRWKV: 关注低光图像增强对象边缘 2507.18594v1 -
16 07-24 On the Convergence of Gradient Descent on Learning Transformers with Residual Connections Über die Konvergenz des gradienten Abstiegs auf Lerntransformatoren mit residualen Verbindungen 关于有残余连接的学习变异器的 “ 渐渐后代 “ 趋同 2506.05249v3 -
17 07-24 Agentar-Fin-R1: Enhancing Financial Intelligence through Domain Expertise, Training Efficiency, and Advanced Reasoning Agent-Fin-R1: Verbesserung der Finanzintelligenz durch Domain-Expertise, Trainingseffizienz und Advanced Reasoning Agentar Fin-Fin-R1:通过域域专门知识、培训效率和高级理由加强金融情报 2507.16802v3 -
18 07-24 Beyond Internal Data: Constructing Complete Datasets for Fairness Testing Jenseits interner Daten: Konstruieren vollständiger Datensätze für Fairness-Tests 超越内部数据:为公平测试建立完整的数据集 2507.18561v1 -
19 07-24 Neural Tangent Kernels and Fisher Information Matrices for Simple ReLU Networks with Random Hidden Weights Neural Tangent Kernel und Fisher Information Matrizen für einfache ReLU-Netzwerke mit zufälligen versteckten Gewichten 带有随机隐藏重的简单 ReLU 网络神经相垂直内核和渔业信息矩阵 2507.18555v1 -
20 07-24 Omni-Thinker: Scaling Cross-Domain Generalization in LLMs via Multi-Task RL with Hybrid Rewards Omni-Thinker: Skalierung der Cross-Domain-Verallgemeinerung in LLMs über Multi-Task RL mit Hybrid Rewards Omni-Thinker:通过多任务RL与混合奖励在LLMLM中扩大跨域通用化 2507.14783v2 -
21 07-24 LagKV: Lag-Relative Information of the KV Cache Tells Which Tokens Are Important LagKV: Lag-Relative Information des KV-Cache erzählt, welche Token wichtig sind LagKV: KV 缓存告诉哪个 Tokens 重要, 而 KV 缓存的拉格- 相对信息Name 2504.04704v2 -
22 07-24 The Geometry of LLM Quantization: GPTQ as Babai’s Nearest Plane Algorithm Die Geometrie der LLM-Quantisierung: GPTQ als Babai’s nächste Flugzeugalgorithmus LLM 定量法的几何测量:GPTQ作为Babai最接近的平地 2507.18553v1 -
23 07-24 Zeroth-Order Fine-Tuning of LLMs in Random Subspaces Zeroth-Order Feinsteuerung von LLMs in Random Subspaces 随机子空间中LLMs的零级微调微调 2410.08989v3 -
24 07-24 On the Performance of Concept Probing: The Influence of the Data (Extended Version) Zur Performance von Konzept-Probing: Der Einfluss der Daten (Erweiterte Version) 关于 “ 概念检验:数据的影响 “ 的绩效(扩展版) 2507.18550v1 -
25 07-24 The Price equation reveals a universal force-metric-bias law of algorithmic learning and natural selection Die Preisgleichung zeigt ein universelles Gesetz des algorithmischen Lernens und der natürlichen Selektion. 价格方程式揭示了一种通用的算法学习法和自然选择法 2507.18549v1 -
26 07-24 Learning Gentle Grasping Using Vision, Sound, and Touch Sanftes Greifen lernen mit Vision, Sound und Touch 利用愿景、声音和触摸进行轻巧的学习 2503.07926v2 -
27 07-24 Deep Variational Free Energy Calculation of Hydrogen Hugoniot Tiefe Variationsfreie Energieberechnung von Wasserstoff Hugoniot 雨原氢能深变化式自由能源计算 2507.18540v1 -
28 07-24 AI/ML Life Cycle Management for Interoperable AI Native RAN AI/ML Life Cycle Management für interoperable KI Native RAN AI/ML 土著RAN 2507.18538v1 -
29 07-24 External Knowledge Injection for CLIP-Based Class-Incremental Learning Externe Wissensinjektion für CLIP-basiertes Klassen-Inkrementelles Lernen 为基于CLIP的高级类强化学习提供外部知识注射 2503.08510v2 -
30 07-24 Elucidating the Design Space of Arbitrary-Noise-Based Diffusion Models Erklärung des Design-Raums für willkürlich-lärmbasierte Diffusionsmodelle 说明以任意噪音为基础的传播模型的设计空间 2507.18534v1 -
31 07-24 C2G-KD: PCA-Constrained Generator for Data-Free Knowledge Distillation C2G-KD: PCA-Constrained Generator für datenfreie Wissensdestillation C2G-KD: 五氯苯甲醚-经培训的无数据知识蒸馏生成器 2507.18533v1 -
32 07-24 Diffuse and Disperse: Image Generation with Representation Regularization Diffuse und Disperse: Bildgenerierung mit Repräsentationsregularisierung Diffuse & diffperse: 形象生成,有代表性的规范化 2506.09027v2 -
33 07-24 Are AI-Generated Fixes Secure? Analyzing LLM and Agent Patches on SWE-bench Sind KI-erzeugte Fixes sicher? LLM und Agent Patches auf der SWE-Bench analysieren AI - 具有安全性吗? 分析SWE-bench 上的LLM 和代理补丁 2507.02976v2 -
34 07-24 The Moral Gap of Large Language Models Die moralische Kluft großer Sprachmodelle 大语言模式的道德差距 2507.18523v1 -
35 07-24 Optimal Transport Regularized Divergences: Application to Adversarial Robustness Optimaler Transport Regularisierte Divergenzen: Anwendung auf widrige Robustheit 优化运输 常规化差异:适用于逆向强力 2309.03791v3 -
36 07-24 GCC-Spam: Spam Detection via GAN, Contrastive Learning, and Character Similarity Networks GCC-Spam: Spam-Erkennung über GAN, Kontrastives Lernen und Charaktergleichheitsnetzwerke 海合会-Spam:通过全球大气监测网、反竞争学习和特征相似网络探测垃圾邮件 2507.14679v2 -
37 07-24 Robust sensitivity control in digital pathology via tile score distribution matching Robuste Sensitivitätskontrolle in der digitalen Pathologie über Kacheln-Score-Verteilungsabgleich 通过瓷砖计分分布匹配对数字病理学中的强力敏感度控制 2502.20144v3 -
38 07-24 GLANCE: Graph Logic Attention Network with Cluster Enhancement for Heterophilous Graph Representation Learning GLANCE: Graph Logic Attention Network mit Cluster Enhancement für heterophiles Graph Representation Learning 图表逻辑关注网络,通过群集增强混合图示代表性学习 2507.18521v1 -
39 07-24 Euclidean Distance Deflation Under High-Dimensional Heteroskedastic Noise Euklidische Distanz Deflation unter hochdimensionalen heteroskedastischen Geräuschen 高多变性热电传噪声下的远距离通缩 2507.18520v1 -
40 07-24 Revisiting Bisimulation Metric for Robust Representations in Reinforcement Learning Revisiting Bisimulation Metric für robuste Darstellungen in Verstärkungs-Lernen 重新研究强化学习中强力代表制的模拟比照模型 2507.18519v1 -
41 07-24 Visual Adaptive Prompting for Compositional Zero-Shot Learning Visuelle Adaptive Prompting für kompositorisches Zero-Shot-Lernen 零热学习的视觉适应性促进 2502.20292v6 -
42 07-24 A Transfer Learning-Based Method for Water Body Segmentation in Remote Sensing Imagery: A Case Study of the Zhada Tulin Area Eine Transfer-Lernmethode für die Segmentierung von Wasserkörpern in Fernerkundungsbildern: Eine Fallstudie des Zhada-Tulin-Gebiets 遥感图像中水体分离的转让学习方法:Zhada Tulin地区的案例研究 2507.10084v2 -
43 07-24 Sublinear Regret for a Class of Continuous-Time Linear-Quadratic Reinforcement Learning Problems Sublinearer Bedauern für eine Klasse von linear-Quadratischen Lernproblemen 连续时线性强化学习问题分类的子线性遗憾 2407.17226v6 -
44 07-24 Masked Autoencoders that Feel the Heart: Unveiling Simplicity Bias for ECG Analyses Masked Autoencoder, die das Herz fühlen: Enthüllen Einfachheit Bias für EKG-Analysen 感觉心脏的蒙面自动代码器:用于ECG分析的“永存的简单比” 2506.22495v2 -
45 07-24 Multi-Preference Lambda-weighted Listwise DPO for Small-Scale Model Alignment Multi-Preference Lambda-bewertet Listwise DPO für kleine Modellausrichtung 用于小规模模型调整的多参数 Lambda加权列表DPO 2506.19780v5 -
46 07-24 DualXDA: Towards Sparse, Efficient and Explainable Data Attribution in Large AI Models DualXDA: Auf dem Weg zu sparsamen, effizienten und erklärbaren Datenzuweisungen in großen KI-Modellen DUAXDA:在大型AI型模型中实现数据分散、高效和可解释的归属 2402.12118v2 -
47 07-24 Not All Features Deserve Attention: Graph-Guided Dependency Learning for Tabular Data Generation with Language Models Nicht alle Funktionen widmen sich der Aufmerksamkeit: Graphengeführtes Abhängigkeitslernen für tabellarische Datengenerierung mit Sprachmodellen 并非所有值得注意的地物:用语言模型编制图表数据时的图表指导依赖性学习 2507.18504v1 -
48 07-24 PLOT-TAL: Prompt Learning with Optimal Transport for Few-Shot Temporal Action Localization PLOT-TAL: Schnell lernen mit optimalem Transport für temporale Aktionslokalisierung PLOT-TAL: 以最优化交通方式迅速学习,促进少数时空行动地方化 2403.18915v2 -
49 07-24 EarthLink: A Self-Evolving AI Agent for Climate Science EarthLink: Ein sich selbst entwickelnder KI-Agent für Klimawissenschaften EarthLink:一个自我发展的AI气候科学代理机构 2507.17311v2 -
50 07-24 Unsupervised Concept Drift Detection from Deep Learning Representations in Real-time Unüberwachtes Konzept Drift Erkennung von Deep-Learning-Darstellungen in Echtzeit 从实时深层学习代表中检测出 2406.17813v2 -
51 07-24 Faithful, Interpretable Chest X-ray Diagnosis with Anti-Aliased B-cos Networks Treue, dolmetschbare Röntgendiagnose im Brustkorb mit Anti-Aliased-B-Cos-Netzwerken 真实的、可解释的胸透透透透透透透透透透透透透透透析与反闭合的B子网络的诊断 2507.16761v2 -
52 07-24 DriftMoE: A Mixture of Experts Approach to Handle Concept Drifts DriftMoE: Eine Mischung aus Experten Ansatz zum Umgang mit Konzept Drifts DriftMoE:处理 “ 漂流概念 “ 的混合专家办法 2507.18464v1 -
53 07-24 Restoring Rhythm: Punctuation Restoration Using Transformer Models for Bangla, a Low-Resource Language Wiederherstellung des Rhythmus: Pünktlichkeitsrestaurierung mit Transformer-Modellen für Bangla, eine Sprache mit geringer Ressource 恢复时速:使用孟加拉国低资源语言 “ 孟加拉 “ 变压器模型恢复脉冲 2507.18448v1 -
54 07-24 Outcome-Based Online Reinforcement Learning: Algorithms and Fundamental Limits Ergebnisbasiertes Online-Verstärkungslernen: Algorithmen und grundlegende Grenzen 基于成果的在线强化学习:等级和基本限制 2505.20268v2 -
55 07-24 IPCGRL: Language-Instructed Reinforcement Learning for Procedural Level Generation IPCGRL: Sprachgestütztes Verstärkungslernen für die verfahrenstechnische Level-Generierung ICPCGRL: 程序生成阶段语言教学强化学习 2503.12358v4 -
56 07-24 NLML-HPE: Head Pose Estimation with Limited Data via Manifold Learning NLML-HPE: Kopfhosenschätzung mit begrenzten Daten über Manifold Learning NLML-HPE:通过人工学习用有限数据进行测算的负责人 2507.18429v1 -
57 07-24 How do language models learn facts? Dynamics, curricula and hallucinations Wie lernen Sprachmodelle Fakten? Dynamik, Lehrpläne und Halluzinationen 语言模式如何了解事实?动态、课程和幻觉 2503.21676v2 -
58 07-24 Multi-Model Ensemble and Reservoir Computing for River Discharge Prediction in Ungauged Basins Multi-Model-Ensemble und Reservoir Computing für Flussentladungsvorhersage in ungespurten Becken 多模型组合和储量计算,用于未排出盆地的河流排泄预测 2507.18423v1 -
59 07-24 Residual Prior-driven Frequency-aware Network for Image Fusion Residual Prior-driven Frequency-aware Netzwerk für Bild-Fusion 图像融合超前驱动频率感知网络 2507.06735v2 -
60 07-24 FinDPO: Financial Sentiment Analysis for Algorithmic Trading through Preference Optimization of LLMs FinDPO: Finanz-Sentiment-Analyse für algorithmischen Handel durch Preference-Optimierung von LLMs FinDPO:通过优惠优化LLMs,分析通过高利贷交易的金融敏感度 2507.18417v1 -
61 07-24 Iwin Transformer: Hierarchical Vision Transformer using Interleaved Windows Iwin Transformer: Hierarchische Vision Transformer mit Interleaved Windows Iwin 变换器: 使用内部视窗的等级愿景变换器 2507.18405v1 -
62 07-24 CLEAR: Error Analysis via LLM-as-a-Judge Made Easy CLEAR: Fehleranalyse über LLM-as-a-Judge leicht gemacht CLLEAR:通过LLM-as-a法官进行错误分析 2507.18392v1 -
63 07-24 A Comprehensive Review of Diffusion Models in Smart Agriculture: Progress, Applications, and Challenges Eine umfassende Überprüfung von Difffusionsmodellen in der intelligenten Landwirtschaft: Fortschritt, Anwendungen und Herausforderungen 全面审查 “ 智能农业传播模式:进展、应用和挑战 “ 2507.18376v1 -
64 07-24 On Reconstructing Training Data From Bayesian Posteriors and Trained Models Über die Wiederherstellung von Trainingsdaten aus Bayesischen Nachbildungen und ausgebildeten Modellen Bayesian Posides和经过培训的模型的培训数据重建 2507.18372v1 -
65 07-24 Efficient Uncertainty in LLMs through Evidential Knowledge Distillation Effiziente Unsicherheit in LLMs durch Evidential Knowledge Destillation 通过证据知识蒸馏在LLMs中提高效能的不确定性 2507.18366v1 -
66 07-24 Leveraging the Structure of Medical Data for Improved Representation Learning Nutzung der Struktur medizinischer Daten für ein verbessertes Repräsentationslernen 利用医疗数据结构改进代表性学习 2507.02987v3 -
67 07-24 Latent Space Alignment for AI-Native MIMO Semantic Communications Latent Space Alignment für KI-Native MIMO Semantische Kommunikation 用于AI-Native MIMO语义通信的 远程空间对齐 2507.16680v2 -
68 07-24 Tiny is not small enough: High-quality, low-resource facial animation models through hybrid knowledge distillation Tiny ist nicht klein genug: Hochwertige, ressourcenarme Gesichtsanimationsmodelle durch Hybrid-Wissensdestillation 微小不够小:通过混合知识蒸馏,建立高质量、资源低的面部动画模型。 2507.18352v1 -
69 07-24 Low-rank adaptive physics-informed HyperDeepONets for solving differential equations Low-rank adaptive Physik-informiert HyperDeepONets zur Lösung von Differentialgleichungen 用于解决差别方程的低级别适应性物理知情高超深电联 2507.18346v1 -
70 07-24 Remembering the Markov Property in Cooperative MARL Erinnerung an das Markov-Grundstück in der Genossenschaft MARL 记得马尔科夫在MARL合作社中的财产 2507.18333v1 -
71 07-24 Hierarchical Dimensionless Learning (Hi-π): A physics-data hybrid-driven approach for discovering dimensionless parameter combinations Hierarchisches dimensionsloses Lernen (Hi-π): Ein physik-data-hybridgetriebener Ansatz zur Entdeckung dimensionsloser Parameterkombinationen 高层次无尺寸学习(Hi-):物理学-数据混合驱动的发现无尺寸参数组合的物理-数据混合法 2507.18332v1 -
72 07-24 GVCCS: A Dataset for Contrail Identification and Tracking on Visible Whole Sky Camera Sequences GVCCS: Ein Datensatz zur kontrailen Identifizierung und Verfolgung sichtbarer Ganzhimmel-Kamerasequenzen GVCSCS:一个用于识别和跟踪可见全天相摄像机序列的可视全天相摄像头的对照识别和跟踪数据集 2507.18330v1 -
73 07-24 Position: An Empirically Grounded Identifiability Theory Will Accelerate Self-Supervised Learning Research Position: Eine empirisch begründete Identifizierbarkeitstheorie beschleunigt die selbstüberwachte Lernforschung 职位: 以活性基础的可识别性理论将加速自我监督学习研究 2504.13101v3 -
74 07-24 A Multi-Dataset Benchmark for Semi-Supervised Semantic Segmentation in ECG Delineation Ein Multi-Dataset-Benchmark für semi-überwachte semantische Segmentierung in EKG-Delineation ECG 划定中半超部分解的多数据集基准 2507.18323v1 -
75 07-24 I-CEE: Tailoring Explanations of Image Classification Models to User Expertise I-CEE: Maßgeschneiderte Erläuterungen von Bildklassifikationsmodellen zur Benutzerexpertise I-CEE:根据用户专门知识对图像分类模型的定制解释 2312.12102v3 -
76 07-24 State of Health Estimation of Batteries Using a Time-Informed Dynamic Sequence-Inverted Transformer Zustand der Gesundheit Schätzung von Batterien mit einem zeitinformierten dynamischen Sequenz-invertierten Transformer 使用时间化动态序列反向转换器对电池进行健康状况估计 2507.18320v1 -
77 07-24 Regression-aware Continual Learning for Android Malware Detection Regressions-aware Continual Learning für Android Malware-Erkennung Android Maware 探测 Android Maware 持续学习 2507.18313v1 -
78 07-24 GNN-ACLP: Graph Neural Networks Based Analog Circuit Link Prediction GNN-ACLP: Graph Neural Networks Based Analog Circuit Link Prediction GNN-ALLP:基于模拟电路链接预测的图表神经网络 2504.10240v4 -
79 07-24 Variational inference for pile-up removal at hadron colliders with diffusion models Variationsableitung zur Stapelabfuhr an Hadron-Kollidern mit Diffusionsmodellen 与扩散模型相撞的hadron相撞器的堆叠式清除的变异推论 2410.22074v2 -
80 07-24 PRIX: Learning to Plan from Raw Pixels for End-to-End Autonomous Driving PRIX: Planen lernen von rohen Pixeln für autonomes Fahren Ende-zu-Ende PRIX: 学习从Raw像素到计划用于终端到终端自治驾驶 2507.17596v2 -
81 07-24 Self-Supervised Coarsening of Unstructured Grid with Automatic Differentiation Selbstüberwachte Verzahnung des unstrukturierten Gitters mit automatischer Differenzierung 带有自动差异的无结构网格自操作粗化 2507.18297v1 -
82 07-24 Leveraging Data Augmentation and Siamese Learning for Predictive Process Monitoring Leveraging Data Augmentation und Siamese Learning für vorausschauende Prozessüberwachung 利用数据增强和西亚学习来监测预测过程 2507.18293v1 -
83 07-24 BEAVER: Building Environments with Assessable Variation for Evaluating Multi-Objective Reinforcement Learning BEAVER: Bauen von Umgebungen mit einschätzbarer Variation zur Bewertung von multi-objektiven Verstärkungslernen BEAVER: 在环境建设中采用可评估的变数评估多目标强化学习 2507.07769v2 -
84 07-24 ReSem3D: Refinable 3D Spatial Constraints via Fine-Grained Semantic Grounding for Generalizable Robotic Manipulation ReSem3D: Verfeinerbare 3D-Raumeinschränkungen durch feinkörnige semantische Erdung für eine generalisierbare Robotermanipulation ReSem3D:通过精密的可通用机器人操纵的语义定位,改进3D空间限制 2507.18262v1 -
85 07-24 Alternative Loss Function in Evaluation of Transformer Models Alternative Verlustfunktion bei der Bewertung von Transformer-Modellen 变换模型评价中的替代损失功能 2507.16548v2 -
86 07-24 SyncMapV2: Robust and Adaptive Unsupervised Segmentation SyncMapV2: Robuste und adaptive unüberwachte Segmentierung 同步马普V2: 强力和适应性不受监督的分割 2506.16297v3 -
87 07-24 Boosting Revisited: Benchmarking and Advancing LP-Based Ensemble Methods Revisited Boosting: Benchmarking and Advancing LP-Based Ensemble Methods 重新审视促进:基准制定和推进基于LP的组合组合方法 2507.18242v1 -
88 07-24 Robust Multi-View Learning via Representation Fusion of Sample-Level Attention and Alignment of Simulated Perturbation Robustes Multi-View-Lernen durch Darstellung Fusion von Sample-Level-Achtung und Ausrichtung der simulierten Perturbation 通过展示抽样关注层的聚合和模拟扰动的调整,通过代表方式进行强有力的多视角学习 2503.04151v2 -
89 07-24 Compositional Coordination for Multi-Robot Teams with Large Language Models Kompositionskoordination für Multi-Roboter-Teams mit großen Sprachmodellen 具有大语言模式的多机器人小组的组成协调 2507.16068v2 -
90 07-24 Why Do Class-Dependent Evaluation Effects Occur with Time Series Feature Attributions? A Synthetic Data Investigation Warum wirken sich klassenabhängige Auswertungseffekte mit Zeitreihen-Feature-Attributionen aus? Eine synthetische Datenuntersuchung 为何类依赖评价效果与时间序列特征属性是否相符? 合成数据调查 2506.11790v2 -
91 07-24 Sparse identification of nonlinear dynamics with library optimization mechanism: Recursive long-term prediction perspective Sparse Identifikation von nichtlinearen Dynamiken mit Bibliotheksoptimierungsmechanismus: Rekursive langfristige Vorhersageperspektive 利用图书馆优化机制粗略地识别非线性动态与图书馆优化机制:递归性长期预测前景 2507.18220v1 -
92 07-24 FedSA-GCL: A Semi-Asynchronous Federated Graph Learning Framework with Personalized Aggregation and Cluster-Aware Broadcasting FedSA-GCL: Ein semi-asynchrones Federated Graph Learning Framework mit personalisierter Aggregation und Cluster-Aware Broadcasting FedSA-GCL:半同步的联邦联邦图表学习框架,配有个性化聚合和集束软件广播 2507.18219v1 -
93 07-24 The Role of the Time-Dependent Hessian in High-Dimensional Optimization Die Rolle des Zeitabhängigen Hessen bei der hochdimensionalen Optimierung 时间依赖的赫西安人在高多样性最佳化中的作用 2403.02418v3 -
94 07-24 Goal-based Trajectory Prediction for improved Cross-Dataset Generalization Zielbasierte Trajektorie-Vorhersage für verbesserte Cross-Dataset-Verallgemeinerung 改进交叉数据通用化的基于目标的轨迹预测 2507.18196v1 -
95 07-24 Beyond Low-rank Decomposition: A Shortcut Approach for Efficient On-Device Learning Jenseits der Low-Rank-Dekomposition: Ein Shortcut-Ansatz für effizientes On-Device-Lernen 超越低级别分解:高效在线学习的捷径方法 2505.05086v2 -
96 07-24 A general language model for peptide identification Ein allgemeines Sprachmodell für die Peptididentifikation 铅化物识别通用语言模式 2502.15610v4 -
97 07-24 ChronoSelect: Robust Learning with Noisy Labels via Dynamics Temporal Memory ChronoSelect: Robustes Lernen mit lauten Etiketten über Dynamics Temporal Memory ChronoSect: 通过动态时空内存与新标签进行强力学习 2507.18183v1 -
98 07-24 Statistical Runtime Verification for LLMs via Robustness Estimation Statistische Laufzeitprüfung für LLMs mittels Robustheitsschätzung 通过强力估计法对LLMs进行统计运行时间校验 2504.17723v2 -
99 07-24 SDSC:A Structure-Aware Metric for Semantic Signal Representation Learning SDSC:A Structure-Aware Metric for Semantic Signal Representative Learning SDSC:用于语义信号代言学习的结构-孔径计量仪 2507.14516v2 -
100 07-24 GeoAvatar: Adaptive Geometrical Gaussian Splatting for 3D Head Avatar GeoAvatar: Adaptive geometrische Gaussian Splatting für 3D-Kopf Avatar GeoAvatar: 3D Avatar 头的适应性几何高山喷涂 2507.18155v1 -
101 07-24 When Noisy Labels Meet Class Imbalance on Graphs: A Graph Augmentation Method with LLM and Pseudo Label Wenn geräuschvolle Etiketten die Klassenungleichgewichte auf Graphen treffen: Eine grafische Augmentationsmethode mit LLM und Pseudo-Label 当噪音标签在图表上达到类平衡时:与LLM和Pseudo标签的图表放大法 2507.18153v1 -
102 07-24 Robust Non-adaptive Group Testing under Errors in Group Membership Specifications Robuste, nicht adaptive Gruppenprüfung unter Fehlern in den Gruppenmitgliedschaftsspezifikationen 根据集团成员类别规格错误进行强力非适应性小组测试 2409.05345v2 -
103 07-24 Neuromorphic Computing for Embodied Intelligence in Autonomous Systems: Current Trends, Challenges, and Future Directions Neuromorphes Computing für körpereigene Intelligenz in autonomen Systemen: Aktuelle Trends, Herausforderungen und Zukunftsrichtungen 自治区内渗透情报的神经元化计算:当前趋势、挑战和未来方向 2507.18139v1 -
104 07-24 DAA*: Deep Angular A Star for Image-based Path Planning DAA*: Deep Angular Ein Stern für bildbasierte Pfadplanung DAA*:基于图像的路径规划深角A星 2507.09305v3 -
105 07-24 TOC-UCO: a comprehensive repository of tabular ordinal classification datasets TOC-UCO: ein umfassendes Repository von tabellarischen Klassifikationsdatensätzen TOC-UCO:表格格式分类数据集综合储存库 2507.17348v2 -
106 07-24 Maximizing Prefix-Confidence at Test-Time Efficiently Improves Mathematical Reasoning Maximierung von Prefix-Konfidenz bei Test-Time verbessert mathematische Reasoning effizient 使试验时间有效改进数学理由的预设信息最大化 2507.18122v1 -
107 07-24 A Survey of Deep Learning for Geometry Problem Solving Eine Umfrage über Deep Learning zur Lösung von Geometrieproblemen 解决几何问题深层学习调查 2507.11936v3 -
108 07-24 VCDiag: Classifying Erroneous Waveforms for Failure Triage Acceleration VCDiag: Klassifizierende Erroneous-Wellenformen für Ausfall-Triage-Beschleunigung VCDiag: 失灵千兆字节加速不规则波形分类 2506.03590v3 -
109 07-24 Generalizing Adam to Manifolds for Efficiently Training Transformers Verallgemeinern von Adam zu Manifolds für effizientes Training Transformers 将亚当推广为高效率培训变换器的处理器 2305.16901v4 -
110 07-24 A Two-armed Bandit Framework for A/B Testing Ein zweiarmiges Bandit-Framework für A/B-Tests A/B测试有两武装的土匪框架 2507.18118v1 -
111 07-24 The Impact of Pseudo-Science in Financial Loans Risk Prediction Die Auswirkungen von Pseudo-Science auf die Risikovorhersage von Finanzkrediten 假科学对金融贷款风险预测的影响 2507.16182v2 -
112 07-24 On the Approximation of Stationary Processes using the ARMA Model Zur Annäherung von stationären Prozessen mit dem ARMA-Modell 使用ARMA模型的固定工艺接近情况 2408.10610v3 -
113 07-24 Agentic AI framework for End-to-End Medical Data Inference Agentische KI-Framework für Ende-zu-Ende medizinische Datenableitung 最终至最终医疗数据推断的AA AA 框架框架 2507.18115v1 -
114 07-24 Nonconvex Optimization Framework for Group-Sparse Feedback Linear-Quadratic Optimal Control I: Penalty Approach Nonconvex Optimization Framework for Group-Spasse Feedback Linear-Quadratic Optimal Control I: Strafansatz 群分反馈线性水量最佳最佳控制一:惩罚办法的优化框架 2507.18114v1 -
115 07-24 Policy Disruption in Reinforcement Learning:Adversarial Attack with Large Language Models and Critical State Identification Politische Disruption bei der Stärkung des Lernens:Umgekehrter Angriff mit großen Sprachmodellen und kritischer Zustandsidentifikation 强化学习方面的政策混乱:以大语言模式和关键状态识别进行反向攻击 2507.18113v1 -
116 07-24 Percentile-Based Deep Reinforcement Learning and Reward Based Personalization For Delay Aware RAN Slicing in O-RAN Prozentual basierte Deep-Verstärkung-Lernen und Belohnung basierte Personalisierung für Delay Aware RAN Slicing in O-RAN 在O-RAN为延迟了解RAN切片而进行百分百分率深强化学习和奖励性个人化 2507.18111v1 -
117 07-24 A New Pair of GloVes Ein neues Paar GloVes 新的地球之对 2507.18103v1 -
118 07-24 Comparison of Segmentation Methods in Remote Sensing for Land Use Land Cover Vergleich der Segmentierungsmethoden bei der Fernerkundung für die Bodenbedeckung 土地利用、土地利用的变化和林业遥感遥感 分路方法比较 2507.18099v1 -
119 07-24 Learning from Hard Labels with Additional Supervision on Non-Hard-Labeled Classes Lernen von Hardlabels mit zusätzlicher Überwachung auf nicht-Hard-Label-Klassen 学习从硬标签中学习,对非黑、黑、黑、有附加监督的非黑、 黑、 黑、 黑、 黑、 黑、 黑、 黑、 黑、 黑、 黑、 黑、 黑、 黑、 黑、 黑、 黑、 黑、 黑 2507.18098v1 -
120 07-24 Long-Short Distance Graph Neural Networks and Improved Curriculum Learning for Emotion Recognition in Conversation Lang-Short-Distanz Graph Neural Networks und verbessertes Curriculum-Lernen für Emotionserkennung im Gespräch 长短距离远距神经神经网络和改进课程学习,以在对话中认识情感 2507.15205v2 -
121 07-24 LLM Web Dynamics: Tracing Model Collapse in a Network of LLMs LLM Web Dynamics: Aufspüren eines Modellkollapses in einem Netzwerk von LLMs LLM 网络动态:追踪在LLM网络中的模型崩溃情况 2506.15690v3 -
122 07-24 A Principled Approach for Data Bias Mitigation Ein prinzipieller Ansatz für Daten-Bias-Minderung 减轻数据偏见的原则办法 2405.12312v4 -
123 07-24 Compliant Residual DAgger: Improving Real-World Contact-Rich Manipulation with Human Corrections Compliant Residual DAgger: Verbesserung der Real-World Kontakt-Rich-Manipulation mit menschlichen Korrekturen 共同残存挖掘者:改进现实世界接触-Rich 人教管管管 2506.16685v2 -
124 07-24 Fine-Tuned Language Models Generate Stable Inorganic Materials as Text Feinangepasste Sprachmodelle erzeugen stabile anorganische Materialien als Text 精精精导语言模型生成稳定无机材料作为文本 2402.04379v2 -
125 07-24 Compressed and distributed least-squares regression: convergence rates with applications to Federated Learning Komprimierte und verteilte am wenigsten quadratische Regression: Konvergenzraten mit Anwendungen für Federated Learning 压缩和分布的最小平方回归:与应用到联邦学习的趋同率 2308.01358v2 -
126 07-24 History-Guided Video Diffusion Geschichte-geführte Video-Diffusion 历史引导视频传播 2502.06764v2 -
127 07-24 Squeeze10-LLM: Squeezing LLMs’ Weights by 10 Times via a Staged Mixed-Precision Quantization Method Squeeze10-LLM: Gewichte der LLMs um 10 Mal durch eine stufenweise gemischte Präzisionsquantifizierung Squeze10-LLLM:通过分阶段混合精密量化方法用10 Times挤压LLMs的重量 2507.18073v1 -
128 07-24 C-AAE: Compressively Anonymizing Autoencoders for Privacy-Preserving Activity Recognition in Healthcare Sensor Streams C-AAE: Komprimierend anonymisierende Autoencoder für Datenschutz-Erhaltung Aktivitätserkennung in Healthcare Sensor Streams C-AAE: 压缩匿名自动编码器,以便在保健感应器流中确认隐私保护活动 2507.18072v1 -
129 07-24 Group Sequence Policy Optimization Optimierung der Gruppensequenzpolitik 组序列政策优化 2507.18071v1 -
130 07-24 BlockDialect: Block-wise Fine-grained Mixed Format Quantization for Energy-Efficient LLM Inference BlockDialekt: Blockweise feinkörnige Mischformat-Quantisierung für energieeffiziente LLM-Inferenz BlockDiaect: 节能LLM 推论的粗件精细混合格式量化 2501.01144v5 -
131 07-24 Multiscale Neural PDE Surrogates for Prediction and Downscaling: Application to Ocean Currents Multiscale Neural PDE Surrogats für Vorhersage und Downscaling: Anwendung auf Meeresströmungen 预测和缩小预测和缩小尺度的多尺度多神经PDE代号:对洋流的应用 2507.18067v1 -
132 07-24 Fixing the Pitfalls of Probabilistic Time-Series Forecasting Evaluation by Kernel Quadrature Fixierung der Pitfalls der probabilistischen Zeitreihen-Prognosebewertung durch Kernel-Quadratur 由内核二次曲线确定概率时间- 系列预测评价的空隙 2503.06079v2 -
133 07-24 Causally Testing Gender Bias in LLMs: A Case Study on Occupational Bias Causally Testing Gender Bias in LLMs: Eine Fallstudie über berufsbezogene Bias 《LLMM中因果测试性别偏见:职业偏见案例研究》 2212.10678v4 -
134 07-24 A Multi-Faceted Evaluation Framework for Assessing Synthetic Data Generated by Large Language Models Ein Multi-Faceted-Evaluierungsrahmen für die Bewertung synthetischer Daten, erzeugt durch große Sprachmodelle 评估由大语言模型生成的合成数据多面评价框架 2404.14445v2 -
135 07-24 Privacy-Preserving Synthetic Review Generation with Diverse Writing Styles Using LLMs Privacy-Preserving Synthetic Review Generation mit unterschiedlichen Schreibstilen mit LLMs 使用LLMMs以多种写作风格生成的隐私-保护合成审查 2507.18055v1 -
136 07-24 Unisoma: A Unified Transformer-based Solver for Multi-Solid Systems Unisoma: Ein Unified Transformer-basierter Solver für Multi-Solid-Systeme Unisoma:多层系统统一变压器解决方案 2506.06021v2 -
137 07-24 ViGText: Deepfake Image Detection with Vision-Language Model Explanations and Graph Neural Networks ViGText: Deepfake-Bilderkennung mit Vision-Language-Modellerklärungen und Graph-Neural-Netzwerken ViGText: 用视觉语言模型解释和图形神经网络进行深假图像探测 2507.18031v1 -
138 07-24 AI Workflow, External Validation, and Development in Eye Disease Diagnosis KI-Workflow, externe Validierung und Entwicklung in der Augenerkrankungen-Diagnose AI 工作流程、外部验证和眼病诊断的发展 2409.15087v2 -
139 07-24 Does visualization help AI understand data? Hilft die Visualisierung KI, Daten zu verstehen? 可视化能帮助AI理解数据吗? 2507.18022v1 -
140 07-24 Zeroth-order log-concave sampling logkonkav-Probenahme der Nullten Ordnung 零级对数集中取样 2507.18021v1 -
141 07-24 Predictive Scaling Laws for Efficient GRPO Training of Large Reasoning Models vorausschauende Skalierungsgesetze für eine effiziente GRPO-Schulung großer vernünftiger Modelle GROPP 高效培训大理由模型的预测增强法律 2507.18014v1 -
142 07-24 Active Learning For Repairable Hardware Systems With Partial Coverage Aktives Lernen für reparable Hardware-Systeme mit teilweiser Abdeckung 为部分覆盖的可修理硬件系统积极学习 2503.16315v3 -
143 07-24 Analyzing Islamophobic Discourse Using Semi-Coded Terms and LLMs Analyse des Islamophoben Diskurses mit semi-kodierten Ausdrücken und LLMs 使用半编码术语和LLMs分析仇视伊斯兰者的情况 2503.18273v2 -
144 07-24 Fine-Grained Uncertainty Quantification via Collisions Feinkörnige Unsicherheit Quantifizierung über Kollisionen 通过碰撞进行精细的不确定性定量 2411.12127v4 -
145 07-23 (3) Machine Unlearning of Traffic State Estimation and Prediction Maschinelles Entlernen von Verkehrsstaatschätzungen und Vorhersagen 取消学习交通国估计和预测 2507.17984v1 -
146 07-23 Pulse-PPG: An Open-Source Field-Trained PPG Foundation Model for Wearable Applications Across Lab and Field Settings Pulse-PPG: Ein Open-Source Feld-Trained PPG Foundation Modell für tragbare Anwendungen über Labor- und Feldeinstellungen hinweg Pulse-PPG:开放源码实地培训的PPG基金会模型,用于跨实验室和实地环境的可穿戴应用 2502.01108v2 -
147 07-23 Machine Learning Workflow for Analysis of High-Dimensional Order Parameter Space: A Case Study of Polymer Crystallization from Molecular Dynamics Simulations Machine Learning Workflow zur Analyse von hochdimensionalen Ordnungsparametern Raum: Eine Fallstudie zur Polymerkristallisation aus molekularen Dynamiksimulationen 分析高多元秩序参数空间的机器学习工作流:分子动态模拟的聚合体晶化案例研究 2507.17980v1 -
148 07-23 SIFOTL: A Principled, Statistically-Informed Fidelity-Optimization Method for Tabular Learning SIFOTL: Eine grundsätzliche, statistisch informierte Methode der Fidelity-Optimierung für tabellarisches Lernen SIFOTL: 表格学习的有原则的、统计化的、统计化的助产性优化方法 2507.17979v1 -
149 07-23 Improving the Computational Efficiency and Explainability of GeoAggregator Verbesserung der Computational Efficiency und Erklärbarkeit von GeoAggregator 提高地理聚合体的计算效率和可解释性 2507.17977v1 -
150 07-23 Zero-Shot Dynamic Concept Personalization with Grid-Based LoRA Zero-Shot Dynamic Concept Personalisierung mit Grid-Based LoRA 以网基LORA为基网格的零热动态个人化概念 2507.17963v1 -
151 07-23 VIBE: Video-Input Brain Encoder for fMRI Response Modeling VIBE: Video-Input Gehirnencoder für fMRI Response Modeling VIBE: 用于FMRI反应建模的视频投入大脑编码器 2507.17958v1 -
152 07-23 Clo-HDnn: A 4.66 TFLOPS/W and 3.78 TOPS/W Continual On-Device Learning Accelerator with Energy-efficient Hyperdimensional Computing via Progressive Search Clo-HDnn: A 4.66 TFLOPS/W und 3.78 TOPS/W Continual On-Device Learning Accelerator mit energieeffizientem Hyperdimensional Computing via Progressive Search Clo-HDnn: 一种4.66 TFLOPS/W和3.78 TOPS/W 通过渐进搜索使用节能超多维电子计算器的不间断远程学习加速器 2507.17953v1 -
153 07-23 Analyzing Fairness of Computer Vision and Natural Language Processing Models Analyse der Fairness von Computer Vision und natürlichen Sprachverarbeitungsmodellen 分析计算机视觉和自然语言处理模式的公平性 2412.09900v3 -
154 07-23 Learning Safe Strategies for Value Maximizing Buyers in Uniform Price Auctions Sichere Strategien für die Wertmaximierung von Käufern in einheitlichen Preisauktionen lernen 统一价格拍卖中价值最大化买方学习安全战略 2406.03674v3 -
155 07-23 Quantum Machine Learning Playground Quantum Machine Learning Spielplatz 量子机器学习游戏场 2507.17931v1 -
156 07-23 Task Priors: Enhancing Model Evaluation by Considering the Entire Space of Downstream Tasks Task Priors: Verbesserung der Modellbewertung unter Berücksichtigung des gesamten Raumes von Downstream-Aufgaben 任务前期:考虑到下游任务的全部空间,加强示范评价 2507.09871v2 -
157 07-23 UrbanPulse: A Cross-City Deep Learning Framework for Ultra-Fine-Grained Population Transfer Prediction UrbanPulse: Ein stadtübergreifendes Deep-Learning-Framework für ultra-reine Bevölkerungstransfer-Vorhersage 城市脉动:关于超精子人口转移预测的跨城市深入学习框架 2507.17924v1 -
158 07-23 From Seed to Harvest: Augmenting Human Creativity with AI for Red-teaming Text-to-Image Models Vom Samen zur Ernte: Augmenting Human Creativity mit KI für Red-Teaming-Text-to-Image-Modelle 从种子到收割:通过国际促进红-红-电制文本到图像模型学会增强人类的创造力 2507.17922v1 -
159 07-23 Sliding Window Informative Canonical Correlation Analysis Sliding Window Informative Canonical Correlation Analysis Sliding 窗口信息化 Canonical 关联分析 2507.17921v1 -
160 07-23 Tuning Sequential Monte Carlo Samplers via Greedy Incremental Divergence Minimization Tuning Sequentielle Monte Carlo Sampler über Greedy Incremental Divergence Minimierung 通过贪婪递增差异最小化, 2503.15704v4 -
161 07-23 SETOL: A Semi-Empirical Theory of (Deep) Learning SETOL: Eine semi-empirische Theorie des (Tiefen) Lernens SETOL:半经验学理论(深)学习 2507.17912v1 -
162 07-23 EEG Foundation Models: A Critical Review of Current Progress and Future Directions EEG-Stiftungsmodelle: Ein kritischer Überblick über aktuelle Fortschritte und zukünftige Richtungen EEG基金会模式:对当前进展和未来方向的重要审查 2507.11783v2 -
163 07-23 Deep learning-aided inverse design of porous metamaterials Tiefes Lernen-unterstütztes inverses Design poröser Metamaterialien 多孔元材料的深深学习辅助反向设计 2507.17907v1 -
164 07-23 Federated Learning for Large-Scale Cloud Robotic Manipulation: Opportunities and Challenges Föderiertes Lernen für großräumige Cloud-Robotermanipulation: Chancen und Herausforderungen 大型云层机器人操纵联合会学习:机遇与挑战 2507.17903v1 -
165 07-23 Multimodal Recurrent Ensembles for Predicting Brain Responses to Naturalistic Movies (Algonauts 2025) Multimodale Recurrent-Ensembles zur Vorhersage von Gehirnreaktionen auf naturalistische Filme (Algonauten 2025) 预测对自然电影的脑反应的多式经常性多年度联合会议(2025年8月20日) 2507.17897v1 -
166 07-23 Lower Bounds for Public-Private Learning under Distribution Shift Untere Grenzen für öffentlich-privates Lernen unter Verteilungsverschiebung 分配轮班下公-私学习的下下档次 2507.17895v1 -
167 07-23 Action-List Reinforcement Learning Syndrome Decoding for Binary Linear Block Codes Action-Liste Verstärkungs-Lernsyndrom-Dekodierung für Binary Linear Block Codes 二元线性线性块块代码的标记 2507.17893v1 -
168 07-23 DeepCrossAttention: Supercharging Transformer Residual Connections DeepCrossAchtung: Supercharging Transformer Residual Verbindungen 深十字感应:高压变压器残余连接 2502.06785v2 -
169 07-23 Fourier Neural Operators for Non-Markovian Processes:Approximation Theorems and Experiments Fourier-Neural-Betreiber für nicht markovianische Prozesse:Approximationstheorien und Experimente 非 Markovian 进程四神经操作器: 近似理论和实验 2507.17887v1 -
170 07-23 PerceptionLM: Open-Access Data and Models for Detailed Visual Understanding PerceptionLM: Open-Access-Daten und Modelle für ein detailliertes visuelles Verständnis 感知LM:开放存取数据和详细视觉理解模型 2504.13180v3 -
171 07-23 A Supervised Machine Learning Framework for Multipactor Breakdown Prediction in High-Power Radio Frequency Devices and Accelerator Components: A Case Study in Planar Geometry Ein überwachtes Machine Learning Framework für Multipactor-Ausfallvorhersage in hochleistungsfähigen Funkfrequenzgeräten und Accelerator-Komponenten: Eine Fallstudie in der planaren Geometrie 高功率无线电频率装置和加速器部件多光速分解预测监督的机器学习框架:平板几何案例研究 2507.17881v1 -
172 07-23 Look the Other Way: Designing ‘Positive’ Molecules with Negative Data via Task Arithmetic Sehen Sie den anderen Weg: Entwerfen von ‘Positiven’ Molekülen mit negativen Daten über Task-Arithmetik 查看其他方式 : 通过任务亚学用负数据设计“ 功能性” 分子 2507.17876v1 -
173 07-23 Integrating Feature Selection and Machine Learning for Nitrogen Assessment in Grapevine Leaves using In-Field Hyperspectral Imaging Integration von Feature Selection und Machine Learning für die Stickstoffabschätzung in Grapevine Leaves mit Hilfe von Hyperspektralbildgebung im Feld 利用实地超光谱成像法将地物选择和机器学习综合结合,用于在格拉佩维尼叶中进行氮评估 2507.17869v1 -
174 07-23 Learning Individual Reproductive Behavior from Aggregate Fertility Rates via Neural Posterior Estimation Individuelles reproduktives Verhalten von Aggregat Fertilitätsraten über neurale hintere Schätzung lernen 学习个人生殖行为 学习个人生殖行为 2506.22607v2 -
175 07-23 Streaming, Fast and Slow: Cognitive Load-Aware Streaming for Efficient LLM Serving Streaming, schnell und langsam: Kognitives Load-Aware Streaming für effizientes LLM Serving 串流、快速和慢速:高效LLM服务认知式负载-软件流 2504.17999v2 -
176 07-23 PALADIN : Robust Neural Fingerprinting for Text-to-Image Diffusion Models PALADIN : Robustes neurales Fingerprinting für Diffusionsmodelle von Text zu Bild PALADIN: 文本到图像传播模型的强力神经指纹打印 2506.03170v2 -
177 07-23 Towards Facilitated Fairness Assessment of AI-based Skin Lesion Classifiers Through GenAI-based Image Synthesis Auf dem Weg zu einer erleichterten Fairnessbewertung von KI-basierten Haut-Lesions-Klassifikatoren durch GenAI-basierte Bildsynthese 通过GenAI基于GenAI的图像合成,促进基于AI的皮肤皮质分类分类的公平评估 2507.17860v1 -
178 07-23 Choosing Public Datasets for Private Machine Learning via Gradient Subspace Distance Auswahl öffentlicher Datensätze für privates maschinelles Lernen über Gradient Subspace Distance 通过梯度子空间距离为私人机器学习选择公共数据集 2303.01256v2 -
179 07-23 On the Energy Distribution of the Galactic Center Excess’ Sources Zur Energieverteilung der Quellen des Galaktischen Zentrums 银河中心能源分配问题 2507.17804v1 -
180 07-23 Large Learning Rates Simultaneously Achieve Robustness to Spurious Correlations and Compressibility Große Lernraten gleichzeitig Robustheit zu sauberen Korrelationen und Kompressibilität erreichen 高学习率同时实现对净腐蚀和抑制的强力 2507.17748v1 -
181 07-23 Rubrics as Rewards: Reinforcement Learning Beyond Verifiable Domains Rubriken als Belohnungen: Verstärktes Lernen jenseits überprüfbarer Domänen ” 奖励 “ :超越可核实域域的强化学习 2507.17746v1 -
182 07-23 SpecCLIP: Aligning and Translating Spectroscopic Measurements for Stars SpecCLIP: Richten und Übersetzen spektroskopischer Messungen für Sterne spectCLIP: 恒星光谱测量的对齐和转换 2507.01939v2 -
183 07-23 Flow Matching Meets Biology and Life Science: A Survey Flow Matching trifft auf Biologie und Life Science: Eine Umfrage 流动匹配满足生物学和生命科学:调查 2507.17731v1 -
184 07-23 Deep Generative Learning of Magnetic Frustration in Artificial Spin Ice from Magnetic Force Microscopy Images Tiefes generatives Lernen der magnetischen Frustration im künstlichen Spin-Eis von magnetischen Kraftmikroskopie-Bildern 从磁力显微镜像图像中深入学习人造脊柱冰中的磁破碎 2507.17726v1 -
185 07-23 On the Interaction of Compressibility and Adversarial Robustness Über die Wechselwirkung von Kompressibilität und adversarialer Robustheit 压缩和反压力相互作用问题 2507.17725v1 -
186 07-23 Towards Generalist Robot Learning from Internet Video: A Survey Auf dem Weg zum generalistischen Roboter Lernen aus dem Internet Video: Eine Umfrage 从互联网视频学习:调查 2404.19664v5 -
187 07-23 Flow-Based Single-Step Completion for Efficient and Expressive Policy Learning Flow-based Single-Step-Abschluss für effizientes und expressives politisches Lernen 以流动为基础的单一步骤完成高效和明确政策学习 2506.21427v2 -
188 07-23 Challenges learning from imbalanced data using tree-based models: Prevalence estimates systematically depend on hyperparameters and can be upwardly biased Herausforderungen beim Lernen aus unausgeglichenen Daten mit Baum-basierten Modellen: Prävalenzschätzungen hängen systematisch von Hyperparametern ab und können nach oben verzerrt sein 利用树基模型从不平衡数据中吸取挑战:流行率估计数系统依赖超参数,可能向上偏偏 2412.16209v3 -
189 07-23 Sequential Bayesian Design for Efficient Surrogate Construction in the Inversion of Darcy Flows Sequential Bayesian Design für effiziente Surrogate Konstruktion in der Inversion von Darcy Flows 有效代用品建造以扭转达西流动的按顺序排列的贝耶斯设计 2507.17713v1 -
190 07-23 The Impact of Feature Scaling In Machine Learning: Effects on Regression and Classification Tasks Die Auswirkungen von Feature Scaling im maschinellen Lernen: Auswirkungen auf Regressions- und Klassifizierungsaufgaben 机械学习中的特质增强效果:对倒退和分类任务的影响 2506.08274v3 -
191 07-23 Diffusion Factor Models: Generating High-Dimensional Returns with Factor Structure Diffusionsfaktormodelle: Erzeugen von hochdimensionalen Rückgaben mit Faktorstruktur 传播因数模型:产生具有因数结构的高差异返回 2504.06566v4 -
192 07-23 HydraOpt: Navigating the Efficiency-Performance Trade-off of Adapter Merging HydraOpt: Navigieren des Effizienz-Leistungs-Austauschs von Adapter-Zusammenschlüssen Hydjopt: 管理适应器合并的效率-绩效权衡 2507.17706v1 -
193 07-23 Balans: Multi-Armed Bandits-based Adaptive Large Neighborhood Search for Mixed-Integer Programming Problem Balans: Multi-Armed Bandits-basierte adaptive Großnachbarschaft Suche nach gemischt-integer-Programmierungsproblem Balans:多武装强盗基于适应性的大型邻里搜索混合内插方案拟订问题 2412.14382v3 -
194 07-23 A Mathematical Theory of Discursive Networks Eine mathematische Theorie diskursiver Netzwerke 讨论网络的数学理论 2507.06565v5 -
195 07-23 Joint Asymmetric Loss for Learning with Noisy Labels Gemeinsamer asymmetrischer Lernverlust mit geräuscharmen Etiketten 与Noisy标签的 联合非对称学习损失 2507.17692v1 -
196 07-23 CASCADE: LLM-Powered JavaScript Deobfuscator at Google CASCADE: LLM-Powered JavaScript Deobfuscator bei Google CASCADE: 谷歌的LLM Powered JavaScript Deobfuscator 谷歌的LLM Powered JavaScript Deobfuscator 2507.17691v1 -
197 07-23 In-Trajectory Inverse Reinforcement Learning: Learn Incrementally Before An Ongoing Trajectory Terminates In-Trajektorie Inverse Verstärkung Lernen: Inkrementell lernen, bevor eine laufende Trajektorie endet 轨迹反反强化学习:在持续轨迹终止之前逐步学习 2410.15612v7 -
198 07-23 Mindfulness Meditation and Respiration: Accelerometer-Based Respiration Rate and Mindfulness Progress Estimation to Enhance App Engagement and Mindfulness Skills Achtsamkeitsmeditation und Atmung: Beschleunigungsmesser-basierte Atmungsrate und Achtsamkeitsfortschritt Schätzung zur Verbesserung von App-Verlobung und Achtsamkeits-Fähigkeiten 冥想和呼吸:以加速计为基础的呼吸率和记忆进展估计,以加强应用参与和记忆技能 2507.17688v1 -
199 07-23 Towards Effective Open-set Graph Class-incremental Learning Auf dem Weg zu einem effektiven, offenen, klasseninternen Lernen in der Graphen-Klasse 走向有效的开放设置图表升入级学习 2507.17687v1 -
200 07-23 Debiased maximum-likelihood estimators for hazard ratios under machine-learning adjustment Debiased Maximum-Likelihood-Schätzer für Gefahrenverhältnisse unter Maschinen-Learning-Anpassung 机学习调整下危险比率的偏差最大类似性最高估计估计值 2507.17686v1 -
201 07-23 LoX: Low-Rank Extrapolation Robustifies LLM Safety Against Fine-tuning LoX: Low-Rank-Extrapolation stärkt LLM-Sicherheit gegen Feinabstimmung LoX:低Rank外推法强力推力LLM 安全防止微调 2506.15606v2 -
202 07-23 Generalized Dual Discriminator GANs Generalisierte Dual Discriminator GANs GANs 通用双辨识器 2507.17684v1 -
203 07-23 RAPID-Net: Accurate Pocket Identification for Binding-Site-Agnostic Docking RAPID-Net: Genaue Pocket-Identifikation für das Binden-Site-Agnostic Docking RAPID-Net: 装订性锡石-不可知文件的精确口袋识别 2502.02371v2 -
204 07-23 On the Lipschitz Constant of Deep Networks and Double Descent Auf der Lipschitz-Konstante von Deep Networks und Double Descent 利普西茨深网络和双人后裔中心 2301.12309v5 -
205 07-23 How Should We Meta-Learn Reinforcement Learning Algorithms? Wie sollten wir Meta-Lernen Stärkung lernen Algorithmen? 我们怎样才能提高学习的比喻呢? 2507.17668v1 -
206 07-23 Mammo-Mamba: A Hybrid State-Space and Transformer Architecture with Sequential Mixture of Experts for Multi-View Mammography Mammo-Mamba: Hybride State-Space- und Transformer-Architektur mit sequentieller Mischung von Experten für Multi-View-Mammographie Mammo-Mamba:国家空间和变形综合结构及多视力造影学专家顺序混合结构 2507.17662v1 -
207 07-23 XStacking: Explanation-Guided Stacked Ensemble Learning XStacking: Erklärungsgeführtes Gestapeltes Ensemble Lernen XStacking: 解释引导堆叠组合学习 2507.17650v1 -
208 07-23 A Concept-based approach to Voice Disorder Detection Ein konzeptbasierter Ansatz zur Erkennung von Sprachstörungen 一种基于概念的语音疾病检测方法 2507.17799v1 -
209 07-23 WSM: Decay-Free Learning Rate Schedule via Checkpoint Merging for LLM Pre-training WSM: Decay-Free Learning Rate Scheduling via Checkpoint Merging für LLM Pre-Training WSM:通过LLM培训前的检查站合并,制定无下降的学习率表 2507.17634v1 -
210 07-23 Unified Convergence Theory of Stochastic and Variance-Reduced Cubic Newton Methods Unified Convergence Theory of Stochastic and Variance-Reduced Cubic Newton Methods 存储和差异缩减立方立方体牛顿方法统一一致理论 2302.11962v5 -
211 07-23 Toward a Lightweight and Robust Design for Caching Hin zu einem leichten und robusten Design für Caching Caching 轻量度和强力设计 2507.16242v2 -
212 07-23 Machine Learning Classification and Portfolio Allocation: with Implications from Machine Uncertainty Machine Learning Klassifizierung und Portfoliozuteilung: mit Implikationen aus der Maschinenunsicherheit 机器学习分类和组合分配:机器不确定性的影响 2108.02283v2 -
213 07-23 Vision Transformer attention alignment with human visual perception in aesthetic object evaluation Vision Transformer Aufmerksamkeitsausrichtung mit menschlicher visueller Wahrnehmung in ästhetischer Objektauswertung 在美学物体评价中,视觉转变器关注与人类视觉认知的一致性 2507.17616v1 -
214 07-23 Time Deep Gradient Flow Method for pricing American options Time Deep Gradient Flow Methode für die Preisgestaltung amerikanischen Optionen 美国选项定价的 “ 深梯度 “ 流程方法 2507.17606v1 -
215 07-23 Trusted Multi-view Learning under Noisy Supervision Vertrauenswürdiges Multi-View-Lernen unter Noisy Supervision 在噪音监督下的可信赖的多观点学习 2404.11944v3 -
216 07-23 Citation Recommendation using Deep Canonical Correlation Analysis Zitationsempfehlung mit tiefer kanonischen Korrelationsanalyse 使用深锥体关联分析的引用建议 2507.17603v1 -
217 07-23 HyDRA: A Hybrid-Driven Reasoning Architecture for Verifiable Knowledge Graphs HyDRA: Eine hybrid-getriebene Grundarchitektur für überprüfbare Wissensgraphen HYDRA:可核实知识图的混合驱动理由结构 2507.15917v2 -
218 07-23 Wasserstein GAN-Based Precipitation Downscaling with Optimal Transport for Enhancing Perceptual Realism Wasserstein GAN-based Niederschlag Downscaling mit optimalem Transport zur Verbesserung des Wahrnehmungsrealismus 瓦森斯坦GAN的降水量降幅与最佳运输的降幅,以加强观念现实主义 2507.17798v1 -
219 07-23 First, Learn What You Don’t Know: Active Information Gathering for Driving at the Limits of Handling Zuerst erfahren Sie, was Sie nicht wissen: Aktive Informationen sammeln für das Fahren an den Grenzen der Handhabung 首先,学习你不知道的东西:为在处理的极限驾驶而积极收集信息 2411.00107v2 -
220 07-23 Constructing Optimal Noise Channels for Enhanced Robustness in Quantum Machine Learning Konstruieren von optimalen Lärmkanälen für verbesserte Robustheit im Quantum Machine Learning 构建量子机器学习中增强强力的最佳噪音通道 2404.16417v2 -
221 07-23 GenSelect: A Generative Approach to Best-of-N GenSelect: Ein generativer Ansatz zum Best-of-N GenSect: 产生最佳N型的方法 2507.17797v1 -
222 07-23 SToFM: a Multi-scale Foundation Model for Spatial Transcriptomics SToFM: ein Multi-Skala-Stiftungsmodell für räumliche Transkriptomik SToFM:空间转换学多规模基础模型 2507.11588v2 -
223 07-23 Enhancing Quantum Federated Learning with Fisher Information-Based Optimization Verbesserung des Quantum-Federated-Learnings mit Fisher Information-based Optimization 加强以渔业信息为基础的优化的量子联邦学习 2507.17580v1 -
224 07-23 Boosting Ray Search Procedure of Hard-label Attacks with Transfer-based Priors Förderung der Ray-Suche von Hard-Label-Angriffen mit transferbasierten Prioren 采用基于转移的前期程序,推动对硬标签袭击的雷光搜索程序 2507.17577v1 -
225 07-23 Federated Behavioural Planes: Explaining the Evolution of Client Behaviour in Federated Learning Federated Behavioural Planes: Erklärung der Evolution des Kundenverhaltens im Federated Learning 联邦计划:解释联邦学习中客户行为演变的原因 2405.15632v3 -
226 07-23 A Physically Driven Long Short Term Memory Model for Estimating Snow Water Equivalent over the Continental United States Ein physikalisch angetriebenes Langzeit-Speichermodell zur Schätzung von Schneewasser, das über den Kontinent Vereinigte Staaten äquivalent ist 估算美国大陆等效雪水的物理驱动长长短期记忆模型 2504.20129v2 -
227 07-23 Scalable DC Optimization via Adaptive Frank-Wolfe Algorithms Skalierbare DC-Optimierung über adaptive Frank-Wolfe Algorithmen 通过适应性 Frank-Wolfe Algorithms 进行可缩放的DC优化 2507.17545v1 -
228 07-23 Optimal differentially private kernel learning with random projection Optimales differenzielles privates Kernel-Lernen mit Zufallsprojektion 以随机预测的方式进行最佳、有差别的私人内核学习 2507.17544v1 -
229 07-23 Clustering-based hard negative sampling for supervised contrastive speaker verification Clustering-basierte harte Negativprobenahme für überwachte kontrastive Lautsprecherprüfung 分组制硬底抽样,用于有监督的对比式发言者核查 2507.17540v1 -
230 07-23 CoCAI: Copula-based Conformal Anomaly Identification for Multivariate Time-Series CoCAI: Copula-basierte konforme Anomalien-Identifikation für multivariate Zeitreihen COCAI:多变时间序列的常规异常识别 2507.17796v1 -
231 07-23 Federated Majorize-Minimization: Beyond Parameter Aggregation Föderierte Majorize-Minimierung: Jenseits der Parameteraggregation 联邦多数-私有化:超越参数聚合 2507.17534v1 -
232 07-23 HiFi-Stream: Streaming Speech Enhancement with Generative Adversarial Networks HiFi-Stream: Streaming-Sprachverbesserung mit generativen Adversarial-Netzwerken HiFi-Stream:利用创性反对性网络加强语音交流 2503.17141v2 -
233 07-23 Channel Estimation for RIS-Assisted mmWave Systems via Diffusion Models Kanalschätzung für RIS-gestützte mmWave-Systeme über Diffusionsmodelle 通过扩散模型对RIS-辅助毫米防波系统的通道估计 2506.07770v2 -
234 07-23 Sampling-enabled scalable manifold learning unveils discriminative cluster structure of high-dimensional data Samplingfähiges skalierbares, vielfältiges Lernen enthüllt diskriminative Clusterstruktur hochdimensionaler Daten 抽样式可扩缩、可扩缩的多元学习揭开高维数据的歧视性集群结构 2401.01100v3 -
235 07-23 Generalized Advantage Estimation for Distributional Policy Gradients Generalisierte Vorteil Schätzung für Verteilungspolitik Gradienten 分配政策梯度一般有利因素估计 2507.17530v1 -
236 07-23 Generalized Low-Rank Matrix Contextual Bandits with Graph Information Generalisierte Low-Rank Matrix Kontextuelle Banditen mit Graph Information 带有图表信息的通用低射速矩阵背景土匪 2507.17528v1 -
237 07-23 Integrating Physics-Based and Data-Driven Approaches for Probabilistic Building Energy Modeling Integration physikbasierter und datengestützter Ansätze zur probabilistischen Gebäudeenergiemodellierung 将基于物理和数据驱动的综合办法纳入概率建建能建能建 建 建 建 建 建 建 建 建 建 建 建 建 建 建 建 建 建 建 建 建 建 建 建 建 建 建 建 建 建 建 建 建 建 建 建 建 建 建 建 建 建 建 建 建 建 建 建 建 建 建 建 建 建 建 建 建 建 建 建 建 建 建 建 建 建 建 建 建 建 建 建 建 建 建 建 建 建 建 建 建 建 建 建 建 建 建 建 建 建 建 建 建 建 建 建 建 建 建 建 建 建 建 建 建 建 建 建 建 建 建 建 建 建 建 建 建 建 建 建 建 建 建 建 建 建 建 建 建 建 建 建 建 建 建 建 建 建 建 建 建 建 建 建 建 建 建 建 建 建 建 建 建 建 建 建 建 建 建 建 建 建 建 建 建 建 建 建 建 建 建 建 建 建 建 建 建 建 建 建 建 建 建 建 建 建 建 建 建 建 建 建 建 建 建 建 建 建 建 建 建 建 建 建 建 建 建 建 建 建 建 建 建 建 建 建 建 建 建 建 建 建 建 建 建 建 建 建 建 建 建 建 建 建 建 建 建 建 2507.17526v1 -
238 07-23 LSDM: LLM-Enhanced Spatio-temporal Diffusion Model for Service-Level Mobile Traffic Prediction LSDM: LLM-gesteigertes Spatio-temporales Diffusionsmodell für Service-Level-Mobilverkehrsvorhersage LSDM:LLM-增强的用于服务级移动交通预测的时空传播模型 2507.17795v1 -
239 07-23 Data-Driven Exploration for a Class of Continuous-Time Indefinite Linear–Quadratic Reinforcement Learning Problems Daten-getriebene Exploration für eine Klasse von kontinuierlichen-Zeit-Unbestimmte Linear–Quadratische Verstärkung Lernprobleme 连续-不定期线性-宽压强化学习问题分类数据探索 2507.00358v2 -
240 07-23 HOTA: Hamiltonian framework for Optimal Transport Advection HOTA: Hamiltonsche Rahmenbedingungen für eine optimale Verkehrsanbindung 汉密尔顿最佳交通评估框架 2507.17513v1 -
241 07-23 Can One Domain Help Others? A Data-Centric Study on Multi-Domain Reasoning via Reinforcement Learning Kann eine Domain anderen helfen? Eine Data-Centric Studie über Multi-Domain-Reasoning durch Verstärkungslernen 一个域能帮助他人吗? 关于通过强化学习提供多领域理由的数据中心研究。 2507.17512v1 -
242 07-23 Fake or Real: The Impostor Hunt in Texts for Space Operations Fake or Real: Die Impostorjagd in Texten für Weltraumoperationen 虚假或真实:空间业务文字中的伪造者猎杀 2507.13508v3 -
243 07-23 Graph Neural Network Approach to Predicting Magnetization in Quasi-One-Dimensional Ising Systems Graphischer Ansatz des neuralen Netzwerks zur Vorhersage der Magnetisierung in Quasi-One-Dimensional Ising Systemen Quasi-单一二元化离子系统中预测磁化的神经网络方法 2507.17509v1 -
244 07-23 Joint Multi-Target Detection-Tracking in Cognitive Massive MIMO Radar via POMCP Gemeinsames Multi-Target-Erkennungs-Tracking im kognitiven Massiv MIMO Radar über POMCP 通过POMCP在认知性大规模弥集性海事组织雷达上联合进行多目标多目标探测-跟踪 2507.17506v1 -
245 07-23 DNT: a Deeply Normalized Transformer that can be trained by Momentum SGD DNT: ein tief normalisierter Transformer, der von Momentum SGD trainiert werden kann DNT:一种可接受 “ 动力 “ SPGD培训的 “ 高度正常化 “ 变异器 2507.17501v1 -
246 07-23 Fast post-process Bayesian inference with Variational Sparse Bayesian Quadrature Schnelle post-process Bayesische Schlussfolgerung mit Variational Sparse Bayesische Quadratur 贝叶斯推断法与变异的斯帕鲁贝伊斯二次夸度 2303.05263v4 -
247 07-23 To Trust or Not to Trust: On Calibration in ML-based Resource Allocation for Wireless Networks Vertrauen oder nicht vertrauen: Kalibrierung in ML-basierte Ressourcenzuteilung für drahtlose Netzwerke 信任或不信任:校准无线网络基于ML的资源分配 2507.17494v1 -
248 07-23 Infinite Video Understanding Unendliches Video-Verständnis 无限视频理解 2507.09068v2 -
249 07-23 Leveraging Diffusion Models for Parameterized Quantum Circuit Generation Nutzung von Diffusionsmodellen für die parameterisierte Quantum Circuit Generation 利用可计量量子电路生成的传播模型 2505.20863v3 -
250 07-23 The Surprising Agreement Between Convex Optimization Theory and Learning-Rate Scheduling for Large Model Training Die überraschende Vereinbarung zwischen Convex-Optimierungstheorie und Lern-Rate-Scheeduling für große Modellausbildung 大型示范培训的 Convex优化理论和学习-学习-进度安排之间令人惊讶的协定 2501.18965v2 -
251 07-23 SRMambaV2: Biomimetic Attention for Sparse Point Cloud Upsampling in Autonomous Driving SRMambaV2: Biomimetische Aufmerksamkeit für Sparse Point Cloud Upsampling im autonomen Fahren SRMambaV2:在自主驾驶中抽取点云取样的生物模拟注意 2507.17479v1 -
252 07-23 BGM-HAN: A Hierarchical Attention Network for Accurate and Fair Decision Assessment on Semi-Structured Profiles BGM-HAN: Hierarchisches Aufmerksamkeitsnetzwerk für eine genaue und faire Entscheidungsbeurteilung von semistrukturierten Profilen BGM-HAN:关于半结构概况的准确和公平决定评估的等级关注网络 2507.17472v1 -
253 07-23 Demonstration of Efficient Predictive Surrogates for Large-scale Quantum Processors Demonstration effizienter vorausschauender Surrogate für große Quantenprozessoren 大型量子处理器高效预测加速器演示演示 2507.17470v1 -
254 07-23 MIRA: Medical Time Series Foundation Model for Real-World Health Data MIRA: Medical Time Series Foundation Modell für real-World Gesundheitsdaten 医疗时间系列基金会实际世界卫生数据模型 2506.07584v3 -
255 07-23 Mapping of Weed Management Methods in Orchards using Sentinel-2 and PlanetScope Data Kartierung von Unkraut-Management-Methoden in Obstgärten mit Sentinel-2 und PlanetScope-Daten 利用哨兵-2和行星域数据绘制果园杂草管理方法图 2504.19991v2 -
256 07-23 C3RL: Rethinking the Combination of Channel-independence and Channel-mixing from Representation Learning C3RL: Die Kombination von Kanal-Unabhängigkeit und Kanal-Mixing aus Repräsentationslernen neu denken C3RL:重新思考将频道独立和频道混合与代表性学习相结合的问题 2507.17454v1 -
257 07-23 Efficient Neural Network Verification via Order Leading Exploration of Branch-and-Bound Trees Effiziente Neuralnetzverifizierung durch Order Leading Exploration von Zweig-und-Bound-Bäumen 通过分树和环形树的有序主要勘探进行高效神经网络核查 2507.17453v1 -
258 07-23 JEDI: The Force of Jensen-Shannon Divergence in Disentangling Diffusion Models JEDI: Die Macht der Jensen-Shannon-Divergenz bei entwirrenden Diffusionsmodellen JEDI: 詹森-夏农分解扩散模型的分解力量 2505.19166v2 -
259 07-23 Persistent Patterns in Eye Movements: A Topological Approach to Emotion Recognition Persistente Muster in Augenbewegungen: Ein topologischer Ansatz zur Emotionserkennung 眼睛运动中的持久性模式:对情感认识的主观学方法 2507.17450v1 -
260 07-23 Doubly robust outlier resistant inference on causal treatment effect Doppelt robuste aussergewöhnliche resistente Inferenz auf kausalen Behandlungseffekt 关于因果处理效果的断断实有力的外部抗异物抗性推论 2507.17439v1 -
261 07-23 Gathering and Exploiting Higher-Order Information when Training Large Structured Models Sammeln und Ausnutzen von Informationen höherer Ordnung beim Training großer strukturierter Modelle 培训大型结构型模型时收集和利用高级命令信息 2312.03885v4 -
262 07-23 Ctx2TrajGen: Traffic Context-Aware Microscale Vehicle Trajectories using Generative Adversarial Imitation Learning Ctx2TrajGen: Traffic Context-Aware Microscale Fahrzeug-Trajektorien mit Generative Adversarial Imitation Learning Ctx2TrajGen: 利用产生反逆模拟学习的交通环境-软件微型车辆轨迹 2507.17418v1 -
263 07-23 A Comprehensive Evaluation on Quantization Techniques for Large Language Models Eine umfassende Bewertung von Quantisierungstechniken für große Sprachmodelle 对大语言模型量化技术的综合评价 2507.17417v1 -
264 07-23 How Well Does GPT-4o Understand Vision? Evaluating Multimodal Foundation Models on Standard Computer Vision Tasks Wie gut versteht GPT-4o Vision? Bewertung multimodaler Basismodelle auf Standard Computer Vision Aufgaben GPT-4o GPT-4o如何理解愿景?评估标准计算机愿景任务多模式基金会模式 2507.01955v2 -
265 07-23 Learning from Scratch: Structurally-masked Transformer for Next Generation Lib-free Simulation Von Scratch lernen: Strukturell maskierter Transformer für Lib-freie Simulation der nächsten Generation 从 Scratch 中学习: 下一代自由模拟的结构性巨型变形器 2507.17396v1 -
266 07-23 Causal Mechanism Estimation in Multi-Sensor Systems Across Multiple Domains Causal Mechanism Abschätzung in Multi-Sensor-Systemen über mehrere Domains 跨多域多传感器系统中因果机制估算 2507.17792v1 -
267 07-23 Helix 1.0: An Open-Source Framework for Reproducible and Interpretable Machine Learning on Tabular Scientific Data Helix 1.0: Ein Open-Source-Framework für reproduzierbares und interpretierbares maschinelles Lernen auf tabellarischen wissenschaftlichen Daten Helix 1.0:关于表格科学数据可复制和可解释的机器学习的开放源码框架 2507.17791v1 -
268 07-23 Confidence Calibration in Vision-Language-Action Models Vertrauenskalibrierung in Vision-Language-Action-Modelle 愿景-语言-行动模式中的信任调和 2507.17383v1 -
269 07-23 Continual Generalized Category Discovery: Learning and Forgetting from a Bayesian Perspective Continual Generalized Category Discovery: Lernen und Vergessen aus einer bayesischen Perspektive 发现:从巴伊西亚角度学习和遗忘 2507.17382v1 -
270 07-23 Towards Efficient Generative Large Language Model Serving: A Survey from Algorithms to Systems Auf dem Weg zu einem effizienten generativen großen Sprachmodell: Eine Umfrage von Algorithmen zu Systemen 实现高效产生大型语文示范服务:从等级到系统的调查 2312.15234v2 -
271 07-23 ViRN: Variational Inference and Distribution Trilateration for Long-Tailed Continual Representation Learning ViRN: Variationale Schlussfolgerung und Verteilung Trilateration für langgestrecktes kontinuierliches Repräsentationslernen VIRN: 长期旷课的连续代表制学习的变异推理和分布推推力 2507.17368v1 -
272 07-23 Leveraging RAG-LLMs for Urban Mobility Simulation and Analysis Nutzung von RAG-LLMs für Simulation und Analyse der urbanen Mobilität 为城市流动模拟和分析利用RAG-LLMs进行城市流动模拟和分析 2507.10382v2 -
273 07-23 Artificial Intelligence for Green Hydrogen Yield Prediction and Site Suitability using SHAP-Based Composite Index: Focus on Oman Künstliche Intelligenz für Green Hydrogen Yield Prediction und Site Suitability mit SHAP-Based Composite Index: Fokus auf Oman 利用以SHAP为基础的综合综合指数,对绿色氢氢氢氢、年产量预测和场地适用性进行人工智能:阿曼 2507.14219v2 -
274 07-23 DynaSearcher: Dynamic Knowledge Graph Augmented Search Agent via Multi-Reward Reinforcement Learning DynaSearcher: Dynamic Knowledge Graph Augmented Search Agent via Multi-Reward Verstärkungslernen DynaSearcher:通过多奖励强化学习增加搜索代理 2507.17365v1 -
275 07-23 Adaptive Repetition for Mitigating Position Bias in LLM-Based Ranking Adaptive Wiederholung für die Abmilderung von Positions-Bias im LLM-basierten Ranking 以LLM为基础的排名中减轻职位偏见的适应性重复 2507.17788v1 -
276 07-23 Monitoring digestate application on agricultural crops using Sentinel-2 Satellite imagery Überwachung der Gärung auf landwirtschaftlichen Nutzpflanzen mit Sentinel-2 Satellitenbildern 利用Sentinel-2卫星图像对农作物施用监测消化法 2504.19996v2 -
277 07-23 Hyperbolic Deep Learning for Foundation Models: A Survey Hyperbolisches Deep Learning für Gründungsmodelle: Eine Umfrage 用于基础模型的超双曲深修:调查 2507.17787v1 -
278 07-23 DeCo-SGD: Joint Optimization of Delay Staleness and Gradient Compression Ratio for Distributed SGD DeCo-SGD: Gemeinsame Optimierung der Verzögerungsstabilität und des Gradienten-Kompressions-Verhältnisses für verteilte SGD DeCo-SGD: 分配的SGD延迟滞缓和逐步压缩比率联合优化 2507.17346v1 -
279 07-23 Reinforcement Learning for Accelerated Aerodynamic Shape Optimisation Verstärktes Lernen zur beschleunigten aerodynamischen Formoptimierung 加速空气动力元件优化强化学习 2507.17786v1 -
280 07-23 Principled Multimodal Representation Learning Grundsatz des multimodalen Repräsentationslernens 注重原则的多模式代表制学习 2507.17343v1 -
281 07-23 Self-similarity Analysis in Deep Neural Networks Selbstähnlichkeitsanalyse in tiefen neuralen Netzwerken 深神经网络中的自我差异分析 2507.17785v1 -
282 07-23 Optimizing Privacy-Utility Trade-off in Decentralized Learning with Generalized Correlated Noise Optimierung der Privatsphäre-Utility-Trade-off im dezentralisierten Lernen mit generalisierter korrelierter Geräuschentwicklung 与普遍相关联的噪音优化分散化学习中的隐私-公用事业交易 2501.14644v2 -
283 07-23 A Learning-based Domain Decomposition Method Eine lernbasierte Methode der Domänenzersetzung 以学习为基础的域分解方法 2507.17328v1 -
284 07-23 RIS-aided Latent Space Alignment for Semantic Channel Equalization RIS-gestützte Latent Space Alignment für semantische Kanalausgleich RIS援助的静语频道平准空间对齐 2507.16450v2 -
285 07-23 Towards Detecting Persuasion on Social Media: From Model Development to Insights on Persuasion Strategies Auf dem Weg zur Erkennbarkeit von Überzeugungen in sozialen Medien: Von der Modellentwicklung zu Erkenntnissen über Überzeugungsstrategien 探索社会媒体的观察:从示范发展到观察社会媒体的观察 2503.13844v2 -
286 07-23 Nearly Minimax Discrete Distribution Estimation in Kullback-Leibler Divergence with High Probability Fast Minimax Diskrete Distribution Schätzung in Kullback-Leibler Divergenz mit hoher Wahrscheinlichkeit Kullback- Leibler 高概率差异中近微小马克分解分布估计值 2507.17316v1 -
287 07-23 Confounded Causal Imitation Learning with Instrumental Variables Konfounded Causal Imitation Learning with Instrumental Variables 带有乐器变量的因果模仿学习 2507.17309v1 -
288 07-23 R-Stitch: Dynamic Trajectory Stitching for Efficient Reasoning R-Stitch: Dynamische Trajektorien-Stitching für effiziente Vernunft R-Stitch: 高效理性的动态轨迹切换 2507.17307v1 -
289 07-23 Universal Fourier Neural Operators for Micromechanics Universal Fourier-Neural-Betreiber für Mikromechanik 通用微型机械天体神经操作员 2507.12233v2 -
290 07-23 Cautious Next Token Prediction Vorsichtige nächste Zeichen Vorhersage 谨慎的次下 Tok 预测 2507.03038v2 -
291 07-23 A Spatio-Temporal Machine Learning Model for Mortgage Credit Risk: Default Probabilities and Loan Portfolios Ein Spatio-Temporal Machine Learning Modell für Hypothekenkreditrisiko: Standardwahrscheinlichkeiten und Kreditportfolios 抵押信贷风险:默认概率和贷款组合的Spadio-临时机械学习模式 2410.02846v2 -
292 07-23 On Temporal Guidance and Iterative Refinement in Audio Source Separation Zur zeitlichen Führung und iterativen Verfeinerung in der Audioquelle Trennung 关于音频源分离的时间指导和动态改进 2507.17297v1 -
293 07-23 VLA-Touch: Enhancing Vision-Language-Action Models with Dual-Level Tactile Feedback VLA-Touch: Erweiterung von Vision-Language-Action-Modellen mit Dual-Level Taktiles Feedback VLA-Touch:加强具有双轨反馈的愿景-语言-行动模式 2507.17294v1 -
294 07-23 Data Virtualization for Machine Learning Datenvirtualisierung für maschinelles Lernen 机器学习数据虚拟化 2507.17293v1 -
295 07-23 Decentralized Federated Learning of Probabilistic Generative Classifiers Dezentrales Föderiertes Lernen von probabilistischen Generativen Klassifikatoren 风险生成分类法的联邦分权分权学习 2507.17285v1 -
296 07-23 Hardware-Efficient Photonic Tensor Core: Accelerating Deep Neural Networks with Structured Compression Hardware-Effizient Photonic Tensor Core: Beschleunigen von tiefen neuralen Netzwerken mit strukturierter Kompression 硬件-高效光学光学时标核心:有结构压缩的加速深神经网络 2502.01670v2 -
297 07-23 Advancing Multimodal Reasoning via Reinforcement Learning with Cold Start Multimodale Reasoning durch verstärktes Lernen mit kaltem Start fördern 通过 “ 冷起 “ 的强化学习推进多模式理由 2505.22334v2 -
298 07-23 Prolonging Tool Life: Learning Skillful Use of General-purpose Tools through Lifespan-guided Reinforcement Learning Verlängerung des Werkzeuglebens: Erlernen eines kompetenzvollen Einsatzes von Allzweck-Werkzeugen durch lebenslanges Stärkungslernen 延长工具寿命:通过终身指导强化学习学习如何熟练使用普通用途工具 2507.17275v1 -
299 07-23 Leveraging Knowledge Graphs and LLM Reasoning to Identify Operational Bottlenecks for Warehouse Planning Assistance Nutzung von Wissensgraphen und LLM-Gründung zur Ermittlung operativer Engpässe für die Lagerplanung 利用知识图和LLM解释,查明用于仓库规划援助的业务瓶颈 2507.17273v1 -
300 07-23 Bayesian Optimization of Robustness Measures under Input Uncertainty: A Randomized Gaussian Process Upper Confidence Bound Approach Bayesische Optimierung von Robustheitsmaßen unter Input Uncertainty: Ein Randomized Gaussian Prozess Oberer Vertrauensbund Ansatz Bayesian 优化投入不确定性下的有力措施:随机化高斯进程最高信任度办法 2504.03172v2 -
301 07-23 EXGnet: a single-lead explainable-AI guided multiresolution network with train-only quantitative features for trustworthy ECG arrhythmia classification EXGnet: ein einbleiiges, erklärbares, KI-geführtes Multiauflösungsnetzwerk mit nur zuggebundenen quantitativen Eigenschaften für eine vertrauenswürdige EKG-Arrhythmieklassifizierung EXGnet:一个单一领导、可解释的、以AI为指南的多分辨率网络,在可信赖ECG心律失常分类方面,只有培训的量化特征 2506.12404v2 -
302 07-23 Knowledge Abstraction for Knowledge-based Semantic Communication: A Generative Causality Invariant Approach Wissensabstraktion für wissensbasierte semantische Kommunikation: Eine generative Kausalität invarianter Ansatz 基于知识的语义交流知识抽象学知识摘要:产生因果性易变方法 2507.17784v1 -
303 07-23 Rethinking VAE: From Continuous to Discrete Representations Without Probabilistic Assumptions VAE neu denken: Von kontinuierlichen zu diskreten Repräsentationen ohne probabilistische Annahmen 重新思考VAE:从连续到分解的表述,无概率假设 2507.17255v1 -
304 07-23 DistrAttention: An Efficient and Flexible Self-Attention Mechanism on Modern GPUs DistrAchtung: Ein effizienter und flexibler Selbstaufmerksamkeitsmechanismus für moderne GPUs 危 难:关于现代全球公益物的高效和灵活自控机制 2507.17245v1 -
305 07-23 Eco-Friendly AI: Unleashing Data Power for Green Federated Learning Eco-friendly KI: Entleashing Data Power für Green Federated Learning 生态友好型AI:绿色联邦学习的释放数据动力 2507.17241v1 -
306 07-23 NeuroHD-RA: Neural-distilled Hyperdimensional Model with Rhythm Alignment NeuroHD-RA: Neural-destilliertes Hyperdimensionales Modell mit Rhythm Alignment NeuroHD-RA:具有同步调整的神经蒸蒸多维模型 2507.14184v3 -
307 07-23 P3SL: Personalized Privacy-Preserving Split Learning on Heterogeneous Edge Devices P3SL: Personalisiertes Datenschutz-Erhalten von Split-Lernen auf heterogenen Edge-Geräten P3SL: 个人化隐私保护关于异异异异边缘装置的分离学习 2507.17228v1 -
308 07-23 Dataset Distillation as Data Compression: A Rate-Utility Perspective Datensatzdestillation als Datenkompression: Eine Rate-Utility-Perspektive 将数据集作为数据压缩进行蒸馏:率-功用视角 2507.17221v1 -
309 07-23 A Low-Cost Machine Learning Approach for Timber Diameter Estimation Ein Low-Cost Machine Learning Ansatz für die Schätzung des Holzdurchmessers 木材直径估算的低成本机器学习方法 2507.17219v1 -
310 07-23 APTx Neuron: A Unified Trainable Neuron Architecture Integrating Activation and Computation APTx Neuron: Unified Trainable Neuron Architecture Integrating Activation and Computation APTx Neuron: 统一可训练的中子建筑综合激活和计算 2507.14270v2 -
311 07-23 Blind Source Separation of Single-Channel Mixtures via Multi-Encoder Autoencoders Blindquelle Trennung von Single-Channel-Mischungen über Multi-Encoder-Autoencoder 通过多 Encder 自动自动编码器将单一气道混合体的盲源分离 2309.07138v4 -
312 07-23 HypoChainer: A Collaborative System Combining LLMs and Knowledge Graphs for Hypothesis-Driven Scientific Discovery HypoChainer: Ein kollaboratives System zur Kombination von LLMs und Wissensgraphen für hypothesisgetriebene wissenschaftliche Entdeckungen HypoChainner:一个合作系统,将假设-驱动科学发现所利用的LLMs和知识图集结合起来 2507.17209v1 -
313 07-23 AlignDistil: Token-Level Language Model Alignment as Adaptive Policy Distillation AlignDistil: Token-Level-Sprachmodell Alignment als Adaptive Policy Destillation Aligndistil: 作为适应性政策蒸馏的调整级语言模式模型对齐 2503.02832v3 -
314 07-23 Filter-And-Refine: A MLLM Based Cascade System for Industrial-Scale Video Content Moderation Filter-And-Refine: Ein MLLM-basiertes Cascade-System für die Moderation von industriellen Videoinhalten 筛选和注释:一个基于 MLLM 的工业规模视频内容调节系统 2507.17204v1 -
315 07-23 Spintronic Bayesian Hardware Driven by Stochastic Magnetic Domain Wall Dynamics Spintronic Bayesian Hardware angetrieben von stochastischen magnetischen Domain Wall Dynamics Spentronic Bayesian 硬器驱动器, 由实心磁域域外壁动态驱动 2507.17193v1 -
316 07-23 Met$^2$Net: A Decoupled Two-Stage Spatio-Temporal Forecasting Model for Complex Meteorological Systems Met$^2$Net: Ein entkoppeltes zweistufiges Spatio-Temporal-Prognosemodell für komplexe meteorologische Systeme Met$2美元 Net:一个分解的复杂气象系统双层空间-时空预报模型 2507.17189v1 -
317 07-23 GhostUMAP2: Measuring and Analyzing (r,d)-Stability of UMAP GhostUMAP2: Messung und Analyse (r,d)-Stabilität von UMAP JUMUMAAP2:测量和分析(r,d)-UMAP的稳定性 2507.17174v1 -
318 07-23 Privacy-Preserving Multimodal News Recommendation through Federated Learning Datenschutz-Erhaltung multimodaler Nachrichten Empfehlung durch Federated Learning 通过联邦学习促进隐私保护多模式新闻建议 2507.15460v3 -
319 07-23 Unmasking Trees for Tabular Data Entlarvung von Bäumen für tabellarische Daten 用于表格数据解压缩树 2407.05593v5 -
320 07-23 Attention-Based Multiscale Temporal Fusion Network for Uncertain-Mode Fault Diagnosis in Multimode Processes Aufmerksamkeitsbasiertes Multiscale Temporal Fusion Network für unsichere Fehlerdiagnosen in Multimode-Prozessen 多模式进程中不确定-Mode失密诊断多波段时空聚变网络 2504.05172v3 -
321 07-23 Tabular Diffusion based Actionable Counterfactual Explanations for Network Intrusion Detection Tabuläre Diffusion basierte, gegenfaktische Erklärungen zur Netzwerkintrusionserkennung 用于网络入侵探测的基于传播表的可行动反事实解释 2507.17161v1 -
322 07-23 Flexible Coded Distributed Convolution Computing for Enhanced Straggler Resilience and Numerical Stability in Distributed CNNs Flexibles Coded Distributed Convolution Computing für verbesserte Straggler-Resilienz und numerische Stabilität in verteilten CNNs 增强钢固者的抗力和数字稳定性的灵活代码化分布式分散式电动计算器在分布式有线电视上的分布式有线电视 2411.01579v2 -
323 07-23 JAM: Keypoint-Guided Joint Prediction after Classification-Aware Marginal Proposal for Multi-Agent Interaction JAM: Keypoint-Guided Joint Prediction nach Classification-Aware Marginal-Vorschlag für Multi-Agent-Interaktion JAM:关于多机构互动的分类-软件边际建议之后的关键点指导联合预测 2507.17152v1 -
324 07-23 PICore: Physics-Informed Unsupervised Coreset Selection for Data Efficient Neural Operator Training PICore: Physik-informierte, unüberwachte Coreset-Auswahl für dateneffiziente Neuraloperator-Schulungen PICore: 数据高效神经操作员培训的物理-内建无监督核心集选择 2507.17151v1 -
325 07-23 ScSAM: Debiasing Morphology and Distributional Variability in Subcellular Semantic Segmentation ScSAM: Debiasing Morphology and Distributional Variability in subzellulärer semantischer Segmentierung ScSAM: 子细胞间断分解中减少对分细胞分解的道德和分布变异性的影响 2507.17149v1 -
326 07-23 SADA: Stability-guided Adaptive Diffusion Acceleration SADA: Stabilitätsgeführte Adaptive Diffusions-Beschleunigung SADA: 稳定导向的适应性扩散加速 2507.17135v1 -
327 07-23 Enabling Self-Improving Agents to Learn at Test Time With Human-In-The-Loop Guidance Selbstverbessernde Mittel zur Testzeit mit der Anleitung für Mensch-in-der-Loop lernen 使自我改进代理机构能够在测试时以 “ 人工在网上 “ 指南学习 2507.17131v1 -
328 07-23 OkadaTorch: A Differentiable Programming of Okada Model to Calculate Displacements and Strains from Fault Parameters OkadaTorch: Eine differenzierte Programmierung des Okada-Modells zur Berechnung von Displacements und Strains aus Fehlerparametern OkadaTorch: Okada 模型的不同编程,用以计算与故障参数有关的流离失所和 Strains 2507.17126v1 -
329 07-23 Model Compression Engine for Wearable Devices Skin Cancer Diagnosis Modell-Kompressions-Engine für tragbare Geräte Hautkrebs-Diagnose 穿戴设备皮肤癌症诊断模型压缩引擎 2507.17125v1 -
330 07-23 Computer Vision for Real-Time Monkeypox Diagnosis on Embedded Systems Computer Vision für Echtzeit-Monkeypox-Diagnose auf Embedded-Systemen 关于嵌入系统实时猴子天花诊断的计算机愿景 2507.17123v1 -
331 07-23 Robust Five-Class and binary Diabetic Retinopathy Classification Using Transfer Learning and Data Augmentation Robuste Fünf-Klasse und binäre diabetische Retinopathie Klassifizierung mittels Transfer Lernen und Datenvergrößerung 五类强力细胞和二分体糖尿病病理病理学分类,利用转让学习和数据增强 2507.17121v1 -
332 07-23 Probabilistic Graphical Models: A Concise Tutorial Probabilistische Graphische Modelle: Ein kurzes Tutorial 概率概率图形模型:简洁的教学 2507.17116v1 -
333 07-23 EnsemW2S: Enhancing Weak-to-Strong Generalization with Large Language Model Ensembles EnsemW2S: Verbesserung der Schwach-zu-Strong-Verallgemeinerung mit großsprachigen Modellensembles EnsemW2S:用大语言模型组合加强弱至强的通用化 2410.04571v3 -
334 07-23 Reinforcement Learning Fine-Tunes a Sparse Subnetwork in Large Language Models Verstärktes Lernen Fine-Tunes ein Sparse Subnetwork in großen Sprachmodellen 以大语言模式建立粗略的子网络 2507.17107v1 -
335 07-23 ZORMS-LfD: Learning from Demonstrations with Zeroth-Order Random Matrix Search ZORMS-LfD: Aus Demonstrationen lernen mit der Zufallsmatrix-Suche der Nullten Ordnung ZORMS-LfD: 学习用零极随机矩阵搜索从演示中学习 2507.17096v1 -
336 07-23 Joint Pedestrian and Vehicle Traffic Optimization in Urban Environments using Reinforcement Learning Gemeinsame Fußgänger- und Fahrzeugverkehrsoptimierung in städtischen Umgebungen mittels Verstärkungslernen 利用强化学习在城市环境中联合优化步行和车辆交通 2504.05018v2 -
337 07-22 (2) Deformable Cluster Manipulation via Whole-Arm Policy Learning Verformbare Clustermanipulation über Ganzarm Policy Learning 通过全Arm政策学习进行变形集束操纵 2507.17085v1 -
338 07-22 A Parameter-Efficient Quantum Anomaly Detection Method on a Superconducting Quantum Processor Eine Parameter-effiziente Quantenanomalie-Erkennungsmethode auf einem supraleitenden Quantenprozessor 超导量子处理器超导量子处理器的参数有效量子异常探测方法 2412.16867v4 -
339 07-22 Advanced U-Net Architectures with CNN Backbones for Automated Lung Cancer Detection and Segmentation in Chest CT Images Erweiterte U-Net-Architekturen mit CNN-Backbones für automatisierte Lungenkrebserkennung und Segmentierung in Brust CT-Bildern 使用有线电视新闻网用于肺癌自动检测和切斯特CT图象分割的U-Net高级建筑 2507.09898v2 -
340 07-22 Language model developers should report train-test overlap Entwickler von Sprachmodellen sollten Überlappungen von Zugversuchen melden 语言模式开发者应报告培训测试重叠情况 2410.08385v2 -
341 07-22 Sensor Drift Compensation in Electronic-Nose-Based Gas Recognition Using Knowledge Distillation Sensor-Drift-Kompensation in der elektronisch-nasebasierten Gaserkennung mittels Wissensdestillation 利用知识蒸馏在基于电子喷气气体识别中 使用知识蒸馏 2507.17071v1 -
342 07-22 Advancing Robustness in Deep Reinforcement Learning with an Ensemble Defense Approach Robustheit im Deep Reinforcement Learning mit einem Ensemble Defense Approach fördern 以组合防御方法推进深强化学习的强力 2507.17070v1 -
343 07-22 The FIX Benchmark: Extracting Features Interpretable to eXperts Der FIX-Benchmark: Merkmale extrahieren Interpretierbar auf eXperts FIX基准:提取可解释为eXperts的地物 2409.13684v4 -
344 07-22 Risk In Context: Benchmarking Privacy Leakage of Foundation Models in Synthetic Tabular Data Generation Risk In Context: Benchmarking Privacy Leakage of Foundation Models in Synthetic Tabular Data Generation 背景风险:根据基准确定合成图表数据生成基础模型的隐私渗漏 2507.17066v1 -
345 07-22 A Coalition Game for On-demand Multi-modal 3D Automated Delivery System Ein Koalitionsspiel für Multimodales 3D-Automatisiertes Liefersystem auf Abruf 供求多式3D自动交付系统联盟游戏 2412.17252v2 -
346 07-22 Pragmatic Policy Development via Interpretable Behavior Cloning Pragmatische Politikentwicklung durch interpretierbares Verhalten Klonen 通过可解释行为克隆制定实用政策 2507.17056v1 -
347 07-22 Shared Control of Holonomic Wheelchairs through Reinforcement Learning Gemeinsame Kontrolle von Holonomic Rollstuhls durch Verstärkungslernen 通过强化学习共同控制全神轮椅 2507.17055v1 -
348 07-22 Beyond Single-Channel: Multichannel Signal Imaging for PPG-to-ECG Reconstruction with Vision Transformers Beyond Single-Channel: Multichannel Signal Imaging für PPG-zu-ECG-Rekonstruktion mit Vision Transformern 超越单一通道:利用愿景变形器进行PPG到ECG重建的多通道信号成像 2505.21767v2 -
349 07-22 GenMol: A Drug Discovery Generalist with Discrete Diffusion GenMol: Ein Drug Discovery Generalist mit diskreter Diffusion GenMol: 具有分辨扩散作用的药物发现通俗主义者 2501.06158v3 -
350 07-22 CoLT: The conditional localization test for assessing the accuracy of neural posterior estimates CoLT: Der bedingte Lokalisierungstest zur Beurteilung der Genauigkeit neuronaler posteriorer Schätzungen COLT:评估神经后天估计值准确性的有条件本地化测试 2507.17030v1 -
351 07-22 Koel-TTS: Enhancing LLM based Speech Generation with Preference Alignment and Classifier Free Guidance Koel-TTS: Verbesserung der LLM-basierten Sprachgenerierung mit Präferenz-Ausrichtung und Klassifikator-freier Anleitung Koel-TTS:加强基于LLM的语音生成,提供优先调整和分类免费指导 2502.05236v2 -
352 07-22 The surprising strength of weak classifiers for validating neural posterior estimates Die überraschende Stärke schwacher Klassifikatoren zur Validierung neuraler posteriorer Schätzungen 证实神经后天估计值的薄弱分类师的惊人力量 2507.17026v1 -
353 07-22 CM-UNet: A Self-Supervised Learning-Based Model for Coronary Artery Segmentation in X-Ray Angiography CM-UNet: Ein selbstüberwachtes lernbasiertes Modell für koronare Arteriensegmentierung in der Röntgenangiographie CM-UNet:X射线血管成像的冠状动脉切除自上式学习模型 2507.17779v1 -
354 07-22 BiLO: Bilevel Local Operator Learning for PDE Inverse Problems. Part II: Efficient Uncertainty Quantification with Low-Rank Adaptation BiLO: Bilevel Local Operator Learning für PDE Inverse Probleme. Teil II: Effiziente Unsicherheitsquantifizierung mit Low-Rank-Anpassung BILO: 双级地方操作员学习PDE反问题,第二部分:低Rank适应性高效率的不确定性量化 2507.17019v1 -
355 07-22 Causal Graph Fuzzy LLMs: A First Introduction and Applications in Time Series Forecasting Causal Graph Fuzzy LLMs: Eine erste Einführung und Anwendungen in der Zeitreihenprognose Causal 图形模糊模糊LLMM:时间序列预测的第一介绍和应用 2507.17016v1 -
356 07-22 laplax – Laplace Approximations with JAX laplax – Laplace-Annäherungen mit JAX 与 JAX 的拉位相近 2507.17013v1 -
357 07-22 Towards Trustworthy AI: Secure Deepfake Detection using CNNs and Zero-Knowledge Proofs Auf dem Weg zu vertrauenswürdiger KI: Sichere Deepfake-Erkennung mit CNNs und Zero-Knowledge-Proofs 利用有线电视新闻网和零知识证明确保深假探测 2507.17010v1 -
358 07-22 ORANSight-2.0: Foundational LLMs for O-RAN ORANSight-2.0: LLM-Grundlagen für O-RAN ORANSight-2.0.0:O-RAN基础项目 2503.05200v2 -
359 07-22 Deep RL Dual Sourcing Inventory Management with Supply and Capacity Risk Awareness Deep RL Dual Sourcing Bestandsmanagement mit Versorgungs- und Kapazitätsrisiko-Bewusstsein 具有供应和能力风险意识的深入RL 双重保值双重保值库存管理 2507.14446v2 -
360 07-22 Revisiting Randomization in Greedy Model Search Randomisierung in der Suche nach Greedy-Modellen erneut besuchen 重新审视贪婪模式搜索中的随机化 2506.15643v2 -
361 07-22 Should Bias Always be Eliminated? A Principled Framework to Use Data Bias for OOD Generation Sollten Bias immer beseitigt werden? Ein prinzipieller Rahmen für die Nutzung von Daten Bias für die OOD-Generierung 是否应该永远消除偏见? 生成OOD时使用数据偏见的主要框架。 2507.17001v1 -
362 07-22 Fine-Grained Alignment and Noise Refinement for Compositional Text-to-Image Generation Feinkörnige Ausrichtung und Geräuschverfeinerung für kompositorische Text-zu-Bild-Generierung 精细调整和噪音改进,以形成成组文字到成象 2503.06506v2 -
363 07-22 Divisive Decisions: Improving Salience-Based Training for Generalization in Binary Classification Tasks Divisive Entscheidungen: Verbesserung der Salience-basierten Ausbildung für Generalisierung in Binary-Klassifikation Aufgaben 不同决定:改进以素养为基础的培训,促进二元分类任务中的普遍化 2507.17000v1 -
364 07-22 Bayesian preference elicitation for decision support in multiobjective optimization Bayesische Präferenz-Elizitation für Entscheidungsunterstützung bei multiobjektiver Optimierung 在多目标优化中争取决策支持的贝耶斯偏好 2507.16999v1 -
365 07-22 Unified Sparse-Matrix Representations for Diverse Neural Architectures Unified Sparse-Matrix-Darstellungen für unterschiedliche Neuralarchitekturen 不同神经神经结构的统一斯普马马马力显示器 2506.01966v3 -
366 07-22 PyG 2.0: Scalable Learning on Real World Graphs PyG 2.0: Scalable Learning on Real World Graphs PyG 2.0: 真实世界图表上的可缩放学习 2507.16991v1 -
367 07-22 Hierarchical Reinforcement Learning Framework for Adaptive Walking Control Using General Value Functions of Lower-Limb Sensor Signals Hierarchisches Verstärkungs-Lern-Framework für adaptive Walking-Steuerung unter Verwendung von allgemeinen Wertfunktionen von Lower-Limb Sensor Signalen 利用低Limb传感器信号的一般价值功能的适应性步行控制梯级强化学习框架 2507.16983v1 -
368 07-22 Fast and Scalable Gene Embedding Search: A Comparative Study of FAISS and ScaNN Schnelle und skalierbare Gene-Einbettung Suche: Eine vergleichende Studie von FAISS und Scann 快速和可缩放基因嵌入搜索:FASIS和SCANN的比较研究 2507.16978v1 -
369 07-22 Temporally Consistent Dynamic Scene Graphs: An End-to-End Approach for Action Tracklet Generation Zeitgleich konsistente dynamische Szenendiagramme: Ein End-to-End-Ansatz für die Tracklet-Generierung von Action Tracklets 临时一致的动态场景图:行动轨迹生成的端至端办法 2412.02808v2 -
370 07-22 MRI-CORE: A Foundation Model for Magnetic Resonance Imaging MRI-CORE: Ein Basismodell für Magnetresonanz-Imaging MRI-CORE:磁共振成像基础模型 2506.12186v2 -
371 07-22 A Hybrid CNN-VSSM model for Multi-View, Multi-Task Mammography Analysis: Robust Diagnosis with Attention-Based Fusion Hybrides CNN-VSSM-Modell für Multi-View, Multi-Task Mammographie Analyse: Robuste Diagnose mit aufmerksamkeitsbasierter Fusion 有线电视新闻网-VSSM混合多视、多任务乳房造影分析模式:以注意力为基础的结合的强力诊断 2507.16955v1 -
372 07-22 Fundamental limits of distributed covariance matrix estimation via a conditional strong data processing inequality Grundlegende Grenzen der verteilten Kovarianz-Matrix-Schätzung über eine bedingt starke Datenverarbeitungsungleichheit 通过有条件的强有力的数据处理不平等状况进行分布式共变量矩阵估计的基本限制 2507.16953v1 -
373 07-22 ResidualPlanner+: a scalable matrix mechanism for marginals and beyond ResidualPlanner+: ein skalierbarer Matrixmechanismus für Randbereiche und darüber hinaus 剩余规划者+:边际和边际外的可缩放矩阵机制 2305.08175v3 -
374 07-22 AURA: A Multi-Modal Medical Agent for Understanding, Reasoning & Annotation AURA: Multi-Modal Medical Agent für Verständnis, Vernunft und Annotation AURA:一个多模式医疗代理,用于理解、说明理由和说明 2507.16940v1 -
375 07-22 Enhancing supply chain security with automated machine learning Verbesserung der Sicherheit der Lieferkette durch automatisiertes maschinelles Lernen 通过自动机械学习加强供应链安全 2406.13166v3 -
376 07-22 SiLQ: Simple Large Language Model Quantization-Aware Training SiLQ: Einfaches großsprachiges Modell Quantization-Aware Training SiLQ: 简单大语言模型量化软件培训 2507.16933v1 -
377 07-22 Avoiding spectral pollution for transfer operators using residuals Vermeidung von spektralen Verschmutzungen für Übertragungsbetreiber mit Reststoffen 避免对使用残留物的转移经营者的光谱污染 2507.16915v1 -
378 07-22 ThinkAct: Vision-Language-Action Reasoning via Reinforced Visual Latent Planning ThinkAct: Vision-Language-Action-Reasoning durch verstärkte visuelle Latent-Planung 思考:通过强化视觉预备规划提出愿景-语言-行动理由 2507.16815v1 -
379 07-22 Semi-off-Policy Reinforcement Learning for Vision-Language Slow-thinking Reasoning Semi-off-Policy-Verstärkung Lernen für Vision-Sprache langsam denkende Vernunft 愿景-语言-思维迟慢原因半非政策强化学习 2507.16814v1 -
380 07-22 MegaScience: Pushing the Frontiers of Post-Training Datasets for Science Reasoning MegaScience: Die Grenzen von Post-Training-Datensätzen für wissenschaftliche Vernunft sprengen 超科学:推进培训后数据集的前沿,促进科学理性 2507.16812v1 -
381 07-22 Revisiting Pre-trained Language Models for Vulnerability Detection Überprüfung vortrainierter Sprachmodelle für die Erkennung von Schwachstellen 重新审查关于脆弱性检测的预培训语言模式 2507.16887v1 -
382 07-22 Beyond Binary Rewards: Training LMs to Reason About Their Uncertainty Über Binäre Belohnungen hinaus: LMs zur Vernunft über ihre Ungewissheit ausbilden 二元奖励之后的奖励:培训 “ 以其不确定性为由 “ 培训 “ 培训 “ 培训 “ 培训 “ 培训 “ 培训 “ 培训 “ 培训 “ 培训 “ 培训 “ 培训 “ 培训 “ 培训 “ 培训 “ 培训 “ 培训 “ 培训 “ 培训 “ 培训 “ 培训 “ 培训 “ 2507.16806v1 -
383 07-22 Steering Out-of-Distribution Generalization with Concept Ablation Fine-Tuning Steuerung der Out-of-Distribution-Verallgemeinerung mit Konzeptablation Fine-Tuning 带有 “ 缩算概念 “ 定额概念的 “ 批发外普遍化 “ 指导指导 2507.16795v1 -
384 07-22 Edge of Stochastic Stability: Revisiting the Edge of Stability for SGD Rand der stochastischen Stabilität: Die Kante der Stabilität für SGD 斯托卡稳定边缘:重新审视稳定边缘,促进稳定发展 2412.20553v4 -
385 07-22 Graph Neural Networks Gone Hogwild Schaubild Neurale Netze vor Hogwild 神经网络离开霍格维勒德 2407.00494v2 -
386 07-22 A Partitioned Sparse Variational Gaussian Process for Fast, Distributed Spatial Modeling Ein geteilter Sparse Variational Gaussian Prozess für schnelle, verteilte räumliche Modellierung 快速、分布空间建模的分散分布式平面平面变异高斯进程 2507.16771v1 -
387 07-22 RadAlign: Advancing Radiology Report Generation with Vision-Language Concept Alignment RadAlign: Weiterentwicklung der Radiologie Report Generation mit Vision-Sprachkonzept Ausrichtung 辐射:推进放射学报告的编制,并统一愿景-语言概念 2501.07525v2 -
388 07-22 Learning novel representations of variable sources from multi-modal $\textit{Gaia}$ data via autoencoders Erlernen neuer Darstellungen variabler Quellen aus multimodalen $\textit{Gaia}$ Daten über Autoencoder 通过自动编码器学习多式 $\ textit{Gaia} $ 数据变量来源的新表达式 2505.16320v2 -
389 07-22 Assessing Adaptive World Models in Machines with Novel Games Bewertung von adaptiven Weltmodellen in Maschinen mit neuen Spielen 评估具有新运动会的机器中适应性世界模型 2507.12821v2 -
390 07-22 Towards Robust Foundation Models for Digital Pathology Auf dem Weg zu robusten Grundmodellen für die digitale Pathologie 走向坚固基金会数字病理学模型 2507.17845v1 -
391 07-22 GUI-G$^2$: Gaussian Reward Modeling for GUI Grounding GUI-G$^2$: Gaussian Reward Modeling für GUI Grounding GUI-G$$2美元:GUI地基的高斯奖赏模型 2507.15846v2 -
392 07-22 Zebra-CoT: A Dataset for Interleaved Vision Language Reasoning Zebra-CoT: Ein Datensatz für interleaved Vision Language Reasoning Zebra-CoT:关于不同视力语言理由的数据集 2507.16746v1 -
393 07-22 SplitMeanFlow: Interval Splitting Consistency in Few-Step Generative Modeling SplitMeanFlow: Intervall-Splitting-Konsistenz in wenigen Schritten generative Modellierung SlipMeanFlow: 微小生成模型中的中间分割一致性 2507.16884v1 -
394 07-22 T-GRAB: A Synthetic Diagnostic Benchmark for Learning on Temporal Graphs T-GRAB: Ein synthetischer Diagnose-Benchmark für das Lernen auf zeitlichen Graphen T-GRAB: 时间图学习的合成诊断基准 2507.10183v2 -
395 07-22 Improving Model Classification by Optimizing the Training Dataset Verbesserung der Modellklassifikation durch Optimierung des Trainingsdatensatzes 通过优化培训数据集改进示范分类 2507.16729v1 -
396 07-22 The Joys of Categorical Conformal Prediction Die Freuden der kategorischen konformen Vorhersage 分类共变预言的欢乐 2507.04441v2 -
397 07-22 Multi-objective Portfolio Optimization Via Gradient Descent Multi-objektive Portfolio-Optimierung durch gradienten Abstieg 多目标组合优化组合 2507.16717v1 -
398 07-22 Learning Causally Predictable Outcomes from Psychiatric Longitudinal Data Erlernen ursächlich vorhersehbarer Ergebnisse aus Psychiatrischen Langzeitdaten 精神病纵向数据产生的可预期的学习结果 2506.16629v3 -
399 07-22 Screen2AX: Vision-Based Approach for Automatic macOS Accessibility Generation Screen2AX: Vision-basierter Ansatz für automatische macOS-Zugänglichkeitsgenerierung Screen2AX:以愿景为基础的自动 MacOS无障碍生成方法 2507.16704v1 -
400 07-22 Pixel-Resolved Long-Context Learning for Turbulence at Exascale: Resolving Small-scale Eddies Toward the Viscous Limit Pixel-Resolved Long-Context Learning for Turbulence at Exascale: Lösung kleiner Eddies auf dem Weg zur Viskosegrenze 用像素解解析的超大型扰动远程学习长像学习:解决小型艾迪问题以达到微声限制 2507.16697v1 -
401 07-22 Confidence Optimization for Probabilistic Encoding Vertrauensoptimierung für die probabilistische Kodierung 概率编码的可信度优化 2507.16881v1 -
402 07-22 FISHER: A Foundation Model for Multi-Modal Industrial Signal Comprehensive Representation FISCHER: Ein Basismodell für die umfassende Vertretung multimodaler industrieller Signale 多模式工业信号综合代表制基金会模式 2507.16696v1 -
403 07-22 Interpretable Topic Extraction and Word Embedding Learning using row-stochastic DEDICOM Interpretierbare Themenextraktion und Wort-Embedding Lernen mit zeilenstochastischem DEDICOM 利用行可查的DEDICOM进行可解释专题抽取和单词嵌入学习 2507.16695v1 -
404 07-22 Universal Model Routing for Efficient LLM Inference Universelle Modellführung für effiziente LLM-Inferenz 高效LLM 推导法通用通用模型规则 2502.08773v2 -
405 07-22 Multilevel Picard approximations and deep neural networks with ReLU, leaky ReLU, and softplus activation overcome the curse of dimensionality when approximating semilinear parabolic partial differential equations in $L^p$-sense Mehrstufige Picard-Annäherungen und tiefe neuronale Netzwerke mit ReLU, undichter ReLU und Softplus-Aktivierung überwinden den Fluch der Dimensionalität, wenn sie semilineare parabolische partielle Differentialgleichungen in $L^p$-Sense annähern 多级 Piccar 近似和深神经网络,与 ReLU、 泄漏 ReLU 和软附加激活 克服了维度的诅咒, 当半线性半线性抛抛物线部分偏差方程以 $Lp$- sense 等值接近一致时 2409.20431v4 -
406 07-22 Structural Effect and Spectral Enhancement of High-Dimensional Regularized Linear Discriminant Analysis Strukturelle Wirkung und spektrale Verbesserung der hochdimensionalen Regularisierten Linearen Diskriminanzanalyse 结构效应和高分层常规线性分线差异分析的光谱增强 2507.16682v1 -
407 07-22 Deep Unfolding Network for Nonlinear Multi-Frequency Electrical Impedance Tomography Deep Unfolding Netzwerk für nichtlineare Multi-Frequenz elektrische Impedanz Tomographie 非线性多功能多功能电气阻力断层造影的深载网络 2507.16678v1 -
408 07-22 Custom Algorithm-based Fault Tolerance for Attention Layers in Transformers Benutzerdefinierte Algorithmen-basierte Fehlertoleranz für Aufmerksamkeitsschichten in Transformatoren 自定义基于 ALgorithm 的对变换器中注意层的不宽容 2507.16676v1 -
409 07-22 GASPnet: Global Agreement to Synchronize Phases GASPnet: Globales Abkommen zur Synchronisierung von Phasen GASPnet:同步阶段全球协定 2507.16674v1 -
410 07-22 Meta-Learning for Cold-Start Personalization in Prompt-Tuned LLMs Meta-Learning für die Kaltstart-Personalisierung in LLMs 以即时引导的LMM 实现低天起的个性化的元学习 2507.16672v1 -
411 07-22 Finding Dori: Memorization in Text-to-Image Diffusion Models Is Less Local Than Assumed Dori finden: Erinnerung in Text-zu-Bild-Diffusions-Modellen ist weniger lokal als angenommen 查找 Dori : 文本到图像传播模型的记忆比假设的要小 2507.16880v1 -
412 07-22 FLAIN: Mitigating Backdoor Attacks in Federated Learning via Flipping Weight Updates of Low-Activation Input Neurons FLAIN: Hecktürangriffe im Federated Learning durch kippende Gewichtsaktualisierungen von Niedrig-Aktivierungs-Eingangs-Neuronen abmildern FLAIN:通过降低低活性输入神经的重量更新,减少联邦学习中的后门攻击 2408.08655v2 -
413 07-22 Recent Advances in Malware Detection: Graph Learning and Explainability Neueste Fortschritte bei der Malware-Erkennung: Graphisches Lernen und Erklärbarkeit 错误软件探测:图表学习和可解释性方面的最新进展 2502.10556v2 -
414 07-22 Quantum Cognition Machine Learning for Forecasting Chromosomal Instability Quantenkognition Maschinelles Lernen zur Prognose der Chromosomeninstabilität 预测染色体不稳定状况的量子聚合机学习 2506.03199v2 -
415 07-22 Soft Computing Approaches for Predicting Shade-Seeking Behaviour in Dairy Cattle under Heat Stress: A Comparative Study of Random Forests and Neural Networks Soft Computing Ansätze zur Vorhersage von Shade-Seeking Verhalten bei Milchvieh unter Hitzestress: Eine vergleichende Studie von Random Forests und Neuronalen Netzwerken 预测受热压力的奶牛的变形寻找行为的软计算方法:随机森林和神经网络比较研究 2501.05494v2 -
416 07-22 Graph Neural Network-Based Distributed Optimal Control for Linear Networked Systems: An Online Distributed Training Approach Graph Neural Network-based Distributed Optimal Control for Linear Networked Systems: Ein Online Distributed Training Approach 线性网络系统分布式最佳最佳控制:在线分布式培训方法 2504.06439v2 -
417 07-22 Towards Automated Regulatory Compliance Verification in Financial Auditing with Large Language Models Auf dem Weg zu einer automatisierten Überprüfung der regulatorischen Compliance bei der Finanzprüfung mit großen Sprachmodellen 采用大语言模式进行财务审计自动监管合规核查 2507.16642v1 -
418 07-22 Hybrid Reward-Driven Reinforcement Learning for Efficient Quantum Circuit Synthesis Hybrid-Reward-getriebenes Verstärkungslernen für effiziente Quantenschaltungssynthese 高效量子电路合成增强学习 2507.16641v1 -
419 07-22 Risk and cross validation in ridge regression with correlated samples Risiko- und Kreuzvalidierung bei der Regression des Grats mit korrelierten Proben 具有相关样本的山脊回归风险和交叉验证 2408.04607v5 -
420 07-22 Automatic Fine-grained Segmentation-assisted Report Generation Automatische, feinkörnige Segmentierung unterstützte Berichtserstellung 自动精精细分割辅助报告生成 2507.16623v1 -
421 07-22 Towards a deeper GCN: Alleviate over-smoothing with iterative training and fine-tuning Auf dem Weg zu einer tieferen GCN: Überglätten mit iterativem Training und Feinabstimmung 更深入的GCN:通过迭接培训和微调,减轻过度缓解 2506.17576v2 -
422 07-22 Stable and Accurate Orbital-Free DFT Powered by Machine Learning Stabile und genaue Orbital-Free DFT Powered by Machine Learning 借助机器学习的稳定和准确的无轨道无轨道DFT 2503.00443v2 -
423 07-22 Rethinking Data Input for Point Cloud Upsampling Dateneingabe für Punkt-Cloud-Upsampling neu denken 重新思考点云取样的数据输入 2407.04476v3 -
424 07-22 A computational transition for detecting correlated stochastic block models by low-degree polynomials Ein rechnerischer Übergang zur Erkennung korrelierter stochastischer Blockmodelle durch Low-Grad-Polynome 用低度多元度探测相关随机区块模型的计算过渡 2409.00966v2 -
425 07-22 Adaptive Gaussian Mixture Models-based Anomaly Detection for under-constrained Cable-Driven Parallel Robots Adaptive Gaussian Mixture Models-basierte Anomalieerkennung für unterbeschränkte kabelgetriebene Parallelroboter 用于控制不足的有线驱动平行机器人的适应性高斯混合混合模型异常探测 2507.07714v2 -
426 07-22 Spectral Algorithms under Covariate Shift Spektrale Algorithmen unter Kovariate Verschiebung 共变量移动下的频谱值 2504.12625v2 -
427 07-22 Antithetic Sampling for Top-k Shapley Identification Antithetische Probenahme für Top-K Shapley-Identifikation 顶部形状识别的抗抗异性取样 2504.02019v2 -
428 07-22 Scaling Linear Attention with Sparse State Expansion Scaling Lineare Aufmerksamkeit mit Sparse State Expansion Sparassar 州扩展时的 缩放线性注意 2507.16577v1 -
429 07-22 Leveraging Distribution Matching to Make Approximate Machine Unlearning Faster Leveraging Distribution Passend, um annähernde Maschine Unlearning schneller zu machen 利用配配配配的配送让近似机器更快退出学习 2507.09786v2 -
430 07-22 Supernova: Achieving More with Less in Transformer Architectures Supernova: Mit weniger Transformer-Architekturen mehr erreichen 超新星:在变形结构结构中以更少的变形结构实现更大的成就 2507.15773v2 -
431 07-22 Families of Optimal Transport Kernels for Cell Complexes Familien von optimalen Transport-Kerneln für Zellkomplexe 细胞综合体最佳运输核心家庭 2507.16569v1 -
432 07-22 Exploring Gender Bias in Large Language Models: An In-depth Dive into the German Language Gender Bias in großen Sprachmodellen erforschen: Ein tiefer Einblick in die deutsche Sprache 在大语言模式中探索性别偏见:深入跳入德语 2507.16557v1 -
433 07-22 Optimization of DNN-based HSI Segmentation FPGA-based SoC for ADS: A Practical Approach Optimierung der DNN-basierten HSI-Segmentierung FPGA-basierten SoC für ADS: Ein praktischer Ansatz 优化基于DNN 的基于DNNHSIHSI的ADS的基于FPGA的FPGA SoC分类:一种实用办法 2507.16556v1 -
434 07-22 A Comprehensive Data-centric Overview of Federated Graph Learning Ein umfassender datenzentrierter Überblick über das Federated Graph Learning 以数据为核心的联邦图表学习综合概览 2507.16541v1 -
435 07-22 Symbolic Graph Intelligence: Hypervector Message Passing for Learning Graph-Level Patterns with Tsetlin Machines Symbolische Graphenintelligenz: Hypervektor-Nachricht für das Lernen von Graph-Level-Mustern mit Tsetlin-Maschinen 图示情报:用于学习的Tsetlin机器图层模式的超矢量信息传递 2507.16537v1 -
436 07-22 Frontier AI Risk Management Framework in Practice: A Risk Analysis Technical Report Frontier AI Risk Management Framework in der Praxis: Ein technischer Bericht zur Risikoanalyse 《国际边界风险管理框架实际操作:风险分析技术报告》 2507.16534v1 -
437 07-22 confopt: A Library for Implementation and Evaluation of Gradient-based One-Shot NAS Methods confopt: Eine Bibliothek zur Implementierung und Bewertung von gradient-basierten One-Shot-NAS-Methoden 实施和评价基于梯度的单制热NAS方法图书馆 2507.16533v1 -
438 07-22 Benchmarking machine learning models for predicting aerofoil performance Benchmarking von Machine-Learning-Modellen zur Vorhersage der Leistungsfähigkeit des Öls 确定用于预测油层性能的机器学习模型的基准基准 2504.15993v2 -
439 07-22 Neural Approaches for Multi-Objective Routing on Multigraphs Neurale Ansätze für multi-objektives Routing auf Multigraphen 多种计量多目的路由的神经方法 2506.22095v2 -
440 07-22 C2-Evo: Co-Evolving Multimodal Data and Model for Self-Improving Reasoning C2-Evo: Co-Evolving multimodale Daten und Modell zur Selbstverbesserung C2-Evo:共同演进的多模式数据和自我改进理由模型 2507.16518v1 -
441 07-22 Analogy making as amortised model construction Analoge Herstellung als amortisierter Modellbau 模拟作为摊还模型建造 2507.16511v1 -
442 07-22 Network Analytics for Anti-Money Laundering – A Systematic Literature Review and Experimental Evaluation Network Analytics for Anti-Money Laundering – Eine systematische Literaturrecherche und experimentelle Auswertung 反洗钱网络分析 – – 系统文献审查和实验评价 2405.19383v4 -
443 07-22 Sparrow: Data-Efficient Video-LLM with Text-to-Image Augmentation Sparrow: Dateneffizientes Video-LLM mit Text-zu-Bild-Erweiterung 麻雀:数据有效视频LLM,带有文本到图像放大功能 2411.19951v5 -
444 07-22 FlowEdit: Inversion-Free Text-Based Editing Using Pre-Trained Flow Models FlowEdit: Inversionsfreies Text-basiertes Bearbeiten mit vortrainierten Flow-Modellen 流程:使用预先培训的流程模型进行无逆向无文本编辑 2412.08629v2 -
445 07-22 BioMaze: Benchmarking and Enhancing Large Language Models for Biological Pathway Reasoning BioMaze: Benchmarking und Verbesserung großer Sprachmodelle für biologische Pathway-Reasoning Biomaze:为生物途径理由确定基准和加强大语言模式 2502.16660v5 -
446 07-22 Canonical Correlation Patterns for Validating Clustering of Multivariate Time Series Canonical Correlation Patterns für die Validierung Clustering von Multivariate Time Series 校验多变量时间序列群集的卡尼相对关系模式 2507.16497v1 -
447 07-22 Combining Language and Topic Models for Hierarchical Text Classification Kombination von Sprach- und Themenmodellen für die Hierarchische Textklassifikation 将等级文字分类的语言和专题模式相结合 2507.16490v1 -
448 07-22 Learning from Data Streams: An Overview and Update Lernen aus Datenströmen: Eine Übersicht und Aktualisierung 从数据流中学习:概览和最新情况 2212.14720v3 -
449 07-22 Comparison of Optimised Geometric Deep Learning Architectures, over Varying Toxicological Assay Data Environments Vergleich von optimierten geometrischen Deep-Learning-Architekturen über unterschiedliche toxikologische Analyse-Datenumgebungen 超过不同毒性分析数据环境的最佳几何深学习结构比较 2507.17775v1 -
450 07-22 Adaptive Bayesian Single-Shot Quantum Sensing Adaptive Bayesian Single-Shot-Quantum Sensing Bayesian 单制热量量遥感 2507.16477v1 -
451 07-22 Estimating Treatment Effects with Independent Component Analysis Abschätzung der Behandlungseffekte mit unabhängiger Komponentenanalyse 利用独立组成部分分析估算治疗效果 2507.16467v1 -
452 07-22 Machine learning-based multimodal prognostic models integrating pathology images and high-throughput omic data for overall survival prediction in cancer: a systematic review Maschinelles Lernen-basierte multimodale prognostische Modelle zur Integration pathologischer Bilder und hochdurchsetzter omischer Daten für die Gesamtüberlebensvorhersage bei Krebs: eine systematische Überprüfung 综合病理图象和高通量血压数据以全面预测癌症存活率的机器学习的多式联运预测模型:系统审查 2507.16876v1 -
453 07-22 The Sweet Danger of Sugar: Debunking Representation Learning for Encrypted Traffic Classification Sweet Danger of Sugar: Debunking Representative Learning für verschlüsselte Verkehrsklassifikation 糖的甜甜危险:加密交通分类的取消代表学习 2507.16438v1 -
454 07-22 Hierarchical Safety Realignment: Lightweight Restoration of Safety in Pruned Large Vision-Language Models Hierarchische Sicherheits-Neuausrichtung: Leichte Wiederherstellung der Sicherheit in beschnittenen großen Vision-Sprachen-Modellen 等级安全调整:谨慎大型视觉语言模型中轻度安全恢复 2505.16104v2 -
455 07-22 From model-based learning to model-free behaviour with Meta-Interpretive Learning Vom modellbasierten Lernen zum modellfreien Verhalten mit Meta-Interpretive Learning 从基于模式的学习到无模式的行为,与 “ 元解释性学习 “ 合作 2507.16434v1 -
456 07-22 Adaptive Multi-task Learning for Multi-sector Portfolio Optimization Adaptives Multi-Task-Lernen für Multi-Sector Portfolio-Optimierung 促进多部门组合优化的适应性多任务学习 2507.16433v1 -
457 07-22 An effective physics-informed neural operator framework for predicting wavefields Ein effektives physik-informiertes neuronales Bediener-Framework zur Vorhersage von Wellenfeldern 有效的物理知情神经操作器框架,用于预测波地 2507.16431v1 -
458 07-22 Combined Image Data Augmentations diminish the benefits of Adaptive Label Smoothing Kombinierte Bilddatenvergrößerungen mindern die Vorteile der adaptiven Labelglättung 合并图像数据放大减少了调适标签平滑的好处 2507.16427v1 -
459 07-22 Practical Insights into Knowledge Distillation for Pre-Trained Models Praktische Einblicke in die Wissensdestillation für vortrainierte Modelle 预培训模式知识提炼的实用透视 2402.14922v2 -
460 07-22 PromptAL: Sample-Aware Dynamic Soft Prompts for Few-Shot Active Learning PromptAL: Sample-Aware Dynamische Soft-Prompts für wenig heißes aktives Lernen 提示: 用于少点热积极学习的样本- 软件动态软提示 2507.16424v1 -
461 07-22 Improving Predictions on Highly Unbalanced Data Using Open Source Synthetic Data Upsampling Verbesserung der Vorhersagen auf sehr unausgewogenen Daten mit Hilfe von Open Source Synthetic Data Upsampling 利用开放源码合成数据抽样改进对高度不平衡数据的预测 2507.16419v1 -
462 07-22 GG-BBQ: German Gender Bias Benchmark for Question Answering GG-BBQ: Deutscher Gender-Bias-Benchmark für Fragenbeantwortung GGG-BBQ:德国回答问题性别比基准 2507.16410v1 -
463 07-22 MolPIF: A Parameter Interpolation Flow Model for Molecule Generation MolPIF: Ein Parameter Interpolationsflussmodell für die Molekülerzeugung MoLPIF: 分子一代的参数内插流动模型 2507.13762v2 -
464 07-22 Self-Supervised Inductive Logic Programming Selbstüberwachte induktive Logik-Programmierung 自上自上自上引逻辑规划 2507.16405v1 -
465 07-22 Balancing Robustness and Efficiency in Embedded DNNs Through Activation Function Selection Ausbalancierung von Robustheit und Effizienz in eingebetteten DNNs durch Aktivierungsfunktionsauswahl 通过启动职能选择,在嵌入的DNN 中平衡稳健和效率 2504.05119v2 -
466 07-22 Technical report: Impact of Duration Prediction on Speaker-specific TTS for Indian Languages Technischer Bericht: Auswirkungen der Dauervorhersage auf Speakerspezifische TTS für indische Sprachen 技术报告:期限预测对印度语特定演讲人TTS的影响 2507.16875v1 -
467 07-22 Optimization and generalization analysis for two-layer physics-informed neural networks without over-parametrization Optimierungs- und Generalisierungsanalyse für zweischichtige physik-informierte neuronale Netzwerke ohne Überparametrierung 为两层物理学知情神经网络提供优化和概括化分析,不过分对称 2507.16380v1 -
468 07-22 Meta-learning of Gibbs states for many-body Hamiltonians with applications to Quantum Boltzmann Machines Meta-Lernen von Gibbs-Staaten für viele-Körper Hamiltonians mit Anwendungen für Quantum Boltzmann Maschinen 利用Gibbbs各邦的Met 学习,让许多身体机体的汉密尔顿人学习,并使用量子波尔兹曼机器 2507.16373v1 -
469 07-22 Autonomous Data Selection with Zero-shot Generative Classifiers for Mathematical Texts Autonome Datenauswahl mit Zero-shot Generative Klassifikatoren für mathematische Texte 具有数学文本零光生成分类器的自动数据选择 2402.07625v7 -
470 07-22 Physical models realizing the transformer architecture of large language models Physikalische Modelle, die die Transformatorenarchitektur großer Sprachmodelle realisieren 实现大型语言模型变压器结构的物理模型 2507.13354v2 -
471 07-22 Bipartite Patient-Modality Graph Learning with Event-Conditional Modelling of Censoring for Cancer Survival Prediction Bipartite Patienten-Modalität Graphenlernen mit Ereignis-Bedingte Modellierung der Zensur für Krebs-Überlebensvorhersage 两边患者-模式图表学习,以及癌症生存预测审查的有条件事件模型 2507.16363v1 -
472 07-22 Pre-Training LLMs on a budget: A comparison of three optimizers Pre-Training LLMs auf einem Budget: Ein Vergleich von drei Optimierern 预算培训前LLMLM项目:三个优化器的比较 2507.08472v2 -
473 07-22 Tri-Learn Graph Fusion Network for Attributed Graph Clustering Tri-Learn Graph Fusion Network für zugeschriebene Graph Clustering Tri- Learn 属性图集集成的三光图融合网络 2507.13620v2 -
474 07-22 Streamlining Prediction in Bayesian Deep Learning Straffung der Vorhersagen in Bayesian Deep Learning 精简贝耶斯深层学习的预测 2411.18425v4 -
475 07-22 Multimodal Coordinated Online Behavior: Trade-offs and Strategies Multimodal koordiniertes Online-Verhalten: Kompromisse und Strategien 多式联运协调在线行为:取舍和战略 2507.12108v2 -
476 07-22 InfiniteHBD: Building Datacenter-Scale High-Bandwidth Domain for LLM with Optical Circuit Switching Transceivers InfiniteHBD: Building Datacenter-Scale High-Bandwidth Domain für LLM mit optischen Schaltungsschalter Transceivern 无限HBD:利用光电转换收发器为LLM 建立数据中心 – – 高度宽宽度高域域 2502.03885v4 -
477 07-22 Leveraging Personalized PageRank and Higher-Order Topological Structures for Heterophily Mitigation in Graph Neural Networks Leveraging Personalisiertes PageRank und höher geordnete Topologische Strukturen zur heterophilen Milderung in Graph Neural Networks 在图形神经网络中利用个性化平板和高端地形结构进行热缓解 2507.16347v1 -
478 07-22 The Cost of Compression: Tight Quadratic Black-Box Attacks on Sketches for $\ell_2$ Norm Estimation Die Kosten der Kompression: Enge quadratische Black-Box-Angriffe auf Skizzen für $\ell_2$ Normschätzung 压缩成本: 以 $@ ell_ 2 $ Norm 估计对制片人进行密切的 Quadristic Black-Box 攻击 2507.16345v1 -
479 07-22 Constructing material network representations for intelligent amorphous alloys design Konstruktion von Materialnetzwerkdarstellungen für intelligente amorphe Legierungen 为智能无定形合金设计建立材料网络示意图 2507.16336v1 -
480 07-22 Higher Gauge Flow Models Modelle mit höherem Messfluss 高压流动模型 2507.16334v1 -
481 07-22 Physics-Driven Neural Network for Solving Electromagnetic Inverse Scattering Problems Physik-getriebenes Neuronales Netzwerk zur Lösung elektromagnetischer Inverse Streuprobleme 解决电磁反向散射问题的物理动力神经网络 2507.16321v1 -
482 07-22 Accurate and Efficient Fine-Tuning of Quantized Large Language Models Through Optimal Balance Genaues und effizientes Feintuning von Quantisierten großen Sprachmodellen durch optimale Balance 通过最佳平衡对量化大语言模型进行准确、高效的微调 2407.17029v2 -
483 07-22 Unsupervised Joint Learning of Optical Flow and Intensity with Event Cameras Unüberwachtes gemeinsames Lernen von optischem Fluss und Intensität mit Ereigniskameras 利用活动摄像机联合学习光流和强度 2503.17262v2 -
484 07-22 Perovskite-R1: A Domain-Specialized LLM for Intelligent Discovery of Precursor Additives and Experimental Design Perovskite-R1: Eine Domain-Spezialisierte LLM für die intelligente Entdeckung von Precursor-Additiven und experimentellen Design Perovskite-R1: 用于智能发现前体添加剂和实验设计的一个域专用LLM 2507.16307v1 -
485 07-22 Attention-Based Fusion of IQ and FFT Spectrograms with AoA Features for GNSS Jammer Localization Aufmerksamkeitsbasierte Fusion von IQ- und FFT-Spektrogrammen mit AoA-Features für die GNSS-Jammerlokalisierung 以注意力为基础的IQ和FFFT Spectrogragragrams与AoA地貌特征的聚合,用于全球导航卫星系统Jammer本地化 2507.14167v2 -
486 07-22 CUDA-L1: Improving CUDA Optimization via Contrastive Reinforcement Learning CUDA-L1: Verbesserung der CUDA-Optimierung durch kontrastives Verstärkungslernen CUDA-L1:通过反竞争强化学习改进CUDA优化 2507.14111v3 -
487 07-22 Towards Resilient Safety-driven Unlearning for Diffusion Models against Downstream Fine-tuning Auf dem Weg zu einem resilienten sicherheitsorientierten Unlearning für Diffusionsmodelle gegen Downstream-Fine-Tuning 面向适应性安全驱动的弹性安全驱动不学习如何利用下游微调传播模型 2507.16302v1 -
488 07-22 Navigation through Non-Compact Symmetric Spaces: a mathematical perspective on Cartan Neural Networks Navigation durch nicht-kompakte Symmetrische Räume: eine mathematische Perspektive auf kartanische Neuralnetze 通过非协议对称空间导航:关于Cartan神经网络的数学视角 2507.16871v1 -
489 07-22 Progressive-Resolution Policy Distillation: Leveraging Coarse-Resolution Simulations for Time-Efficient Fine-Resolution Policy Learning Progressive-Resolution Policy Destillation: Leveraging Coarse-Resolution Simulationen für zeiteffizientes Fine-Resolution Policy Learning 渐进式决议政策蒸馏:利用利用粗制粗制模拟器进行时间效率高的精细决议政策学习 2412.07477v3 -
490 07-22 Note on Follow-the-Perturbed-Leader in Combinatorial Semi-Bandit Problems Hinweis zum Follow-the-Perturbed-Leader bei kombinatorischen Semi-Bandit-Problemen 关于合并半银行问题后续行动说明 2506.12490v2 -
491 07-22 Unisolver: PDE-Conditional Transformers Are Universal PDE Solvers Unisolver: PDE-Conditional Transformer sind universelle PDE-Lösemittel 离子: PDE- 条件变换器为通用 PDE 解答器 2405.17527v4 -
492 07-22 Ownership Verification of DNN Models Using White-Box Adversarial Attacks with Specified Probability Manipulation Eigentumsverifizierung von DNN-Modellen mit White-Box-Adversarial-Angriffen mit spezifizierter Wahrscheinlichkeitsmanipulation DNN 使用白毒对反对反对性袭击模式进行指定概率操纵的DNN自有性核查 2505.17579v2 -
493 07-22 Time to Split: Exploring Data Splitting Strategies for Offline Evaluation of Sequential Recommenders Time to Split: Erforschung von Datenspaltungsstrategien für Offline-Evaluierung von Sequential Recommenders 拆分时间:探索对序列建议者进行离线评价的数据分割战略 2507.16289v1 -
494 07-22 FedMultiEmo: Real-Time Emotion Recognition via Multimodal Federated Learning FedMultiEmo: Echtzeit-Emotionserkennung durch multimodales Federated Learning Fed MultiEmo:通过多模式联邦学习来实时承认情感 2507.15470v2 -
495 07-22 Diffusion-Based Electrocardiography Noise Quantification via Anomaly Detection Diffusionsbasierte Elektrokardiographie Geräuschquantifizierung durch Anomalieerkennung 通过非异常检测进行传播基电动心电心动噪音测量 2506.11815v2 -
496 07-22 Tagging fully hadronic exotic decays of the vectorlike $\mathbf{B}$ quark using a graph neural network Tagging voll hadronische exotische Zerfalle des vektorartigen $\mathbf{B}$ Quark mit einem Graphen-Neural-Netzwerk 使用一个图形神经网络,将 $\ mathbf{B} $quark 等矢量完全老化的异质衰变 2505.07769v2 -
497 07-22 Hierarchical Reasoning Model Hierarchisches Modell der Vernunft 等级推理模型 2506.21734v2 -
498 07-22 Understanding Generalization, Robustness, and Interpretability in Low-Capacity Neural Networks Verallgemeinerung, Robustheit und Dolmetschbarkeit in neuralen Netzwerken mit geringer Kapazität verstehen 理解低能力神经网络的普遍化、强健和可解释性 2507.16278v1 -
499 07-22 Reducing GPU Memory Fragmentation via Spatio-Temporal Planning for Efficient Large-Scale Model Training Reduzierung der GPU-Speicherfragmentierung durch Spatio-Temporale Planung für effiziente großformatige Modellschulungen 通过SPA-时间规划减少GPU内存碎片化,促进高效大型示范培训 2507.16274v1 -
500 07-22 On exploration of an interior mirror descent flow for stochastic nonconvex constrained problem Auf der Erforschung des inneren Spiegelabflusses für stochastisches nichtkonvexes beschränktes Problem 探索内镜面下下下流的内孔反镜下流,以缓解杂乱的非电流制约问题 2507.15264v2 -
501 07-22 Probing Ranking LLMs: A Mechanistic Analysis for Information Retrieval Probing Ranking LLMs: Eine mechanistische Analyse für die Informationswiederherstellung 检验排名LMS:信息检索的机械分析 2410.18527v3 -
502 07-22 ToFe: Lagged Token Freezing and Reusing for Efficient Vision Transformer Inference ToFe: Gefälschtes Einfrieren und Wiederverwenden von Token für eine effiziente Bildverarbeitungstransformer-Inferenz ToFe: “ 高效愿景变换引力 “ : “ 冷冻和再利用 “ 拖累的 ToFe: “ 冷冻和再利用 “ 2507.16260v1 -
503 07-22 Edge-case Synthesis for Fisheye Object Detection: A Data-centric Perspective Edge-Case-Synthese zur Erkennung von Fisheye-Objekten: Eine datenzentrierte Perspektive 鱼眼物体探测边缘综合情况:以数据为中心的视角 2507.16254v1 -
504 07-22 Multi-Agent Reinforcement Learning for Sample-Efficient Deep Neural Network Mapping Multi-Agenten-Verstärkungs-Lernen für stichprobeneffiziente Tiefen-Neural-Netzwerk-Mapping 用于抽样有效深神经网络绘图的多机构强化学习 2507.16249v1 -
505 07-22 OPC: One-Point-Contraction Unlearning Toward Deep Feature Forgetting OPC: One-Point-Contraction Unlearning Toward Deep Feature Vergessen OPC: 一点-合同拆开学习深地地貌的遗忘 2507.07754v2 -
506 07-22 IPPRO: Importance-based Pruning with PRojective Offset for Magnitude-indifferent Structural Pruning IPPRO: Wichtiges Pruning mit PRojective Offset für Magnitude-indifferent Structural Pruning IPPRO: 以重力为根据的谨慎与磁度偏差结构谨慎的倾斜偏移 2507.14171v2 -
507 07-22 MPO: An Efficient Post-Processing Framework for Mixing Diverse Preference Alignment MPO: Ein effizientes Post-Processing-Framework zum Mischen unterschiedlicher Präferenzen MPO: 混合多种优惠协调的高效处理后框架 2502.18699v3 -
508 07-22 LLM-Enhanced Reranking for Complementary Product Recommendation LLM-erweitertes Reranking für ergänzende Produktempfehlung LLM-加强补充产品建议书的重新排名 2507.16237v1 -
509 07-22 PAC Off-Policy Prediction of Contextual Bandits PAC Off-Policy Vorhersage von Kontext Banditen PAC 非政策性背景强盗预测 2507.16236v1 -
510 07-22 Aligned Manifold Property and Topology Point Clouds for Learning Molecular Properties Aligned Manifold Property and Topology Point Clouds for Learning Molecular Properties 用于学习分子特性的实心式地产和地形点云 2507.16223v1 -
511 07-22 Toward Routine CSP of Pharmaceuticals: A Fully Automated Protocol Using Neural Network Potentials Auf dem Weg zu einem routinemäßigen CSP of Pharmaceuticals: Ein vollautomatisiertes Protokoll zur Nutzung neuraler Netzwerkpotentiale 迈向药物常规CSP:利用神经网络潜力的全自动协议 2507.16218v1 -
512 07-22 Towards Compute-Optimal Many-Shot In-Context Learning Auf dem Weg zu einem rechnerisch-optimalen, viel scharfen In-Context-Lernen 迈向计算最优化的多个热点内文体学习 2507.16217v1 -
513 07-22 FedWSQ: Efficient Federated Learning with Weight Standardization and Distribution-Aware Non-Uniform Quantization FedWSQ: Effizientes Federated Learning mit Gewichtsstandardisierung und distributionssicherer, nicht-einheitlicher Quantisierung FFWSQ: 节能的联邦学习,重标准化和发行软件非统一量化 2506.23516v3 -
514 07-22 METER: Multi-modal Evidence-based Thinking and Explainable Reasoning – Algorithm and Benchmark METER: Multimodales Evidenzbasiertes Denken und Erklärbare Begründung – Algorithmen und Benchmark 多式联运循证思考和可解释的理由 – – 等级和基准 2507.16206v1 -
515 07-22 SVAgent: AI Agent for Hardware Security Verification Assertion SVAgent: KI-Agent für Hardware-Sicherheitsprüfung Assertion AI 硬件安全核查认证代理商 2507.16203v1 -
516 07-22 RealBench: Benchmarking Verilog Generation Models with Real-World IP Designs RealBench: Benchmarking von Verilog-Generationsmodellen mit Real-World-IP-Designs ReealBeonch:以现实世界的IP设计为标准,将风险生成模型与现实世界的IP设计作为基准 2507.16200v1 -
517 07-22 Diffusion-Modeled Reinforcement Learning for Carbon and Risk-Aware Microgrid Optimization Diffusionsmodelliertes Verstärkungslernen für die Optimierung von Kohlenstoff und risikobehafteten Mikrogrids 促进碳和风险软件微型电磁优化的传播模式强化学习 2507.16867v1 -
518 07-22 Learning to Bid in Non-Stationary Repeated First-Price Auctions Lernen, in nicht-stationären wiederholten Erstpreis-Auktionen Gebot 学习在非标准重复第一次价格拍卖中投标 2501.13358v2 -
519 07-22 EBaReT: Expert-guided Bag Reward Transformer for Auto Bidding EBaReT: fachkundiger Taschen-Reward-Transformator für Auto-Bidding EBARET: 自动投标专家指导的袋奖励变换器 2507.16186v1 -
520 07-22 Balanced Image Stylization with Style Matching Score Ausgeglichene Bildstilisierung mit Style Matching Score 带有样式匹配评分的平衡图像同步化 2503.07601v2 -
521 07-22 Feature Construction Using Network Control Theory and Rank Encoding for Graph Machine Learning Feature Konstruktion mit Network Control Theorie und Rang Encoding für Graph Machine Learning 图形机器学习使用网络控制理论和排名编码 2507.15195v2 -
522 07-22 A Goal-Oriented Reinforcement Learning-Based Path Planning Algorithm for Modular Self-Reconfigurable Satellites Ein zielorientierter Verstärkungs-Lernpfadplanungs-Algorithmus für modulare selbstkonfigurierbare Satelliten 面向目标的加强学习学习的模块自可自配置卫星的路线图规划算法 2505.01966v2 -
523 07-22 LLM Data Selection and Utilization via Dynamic Bi-level Optimization LLM-Datenauswahl und -Verwendung über dynamische Bi-Level-Optimierung 通过动态双级优化优化选择和利用LLM数据 2507.16178v1 -
524 07-22 Energy-Efficient and Real-Time Sensing for Federated Continual Learning via Sample-Driven Control Energieeffizientes und Echtzeit-Sensing für ein Federated Continual Learning via Sample-Driven Control 通过抽样分散控制为联邦持续学习提供节能实时遥感 2310.07497v2 -
525 07-22 Curating Demonstrations using Online Experience Kuratierende Demonstrationen mit Online Experience 利用在线经验治理示范活动 2503.03707v2 -
526 07-22 A Collaborative Framework Integrating Large Language Model and Chemical Fragment Space: Mutual Inspiration for Lead Design Ein kollaborativer Rahmen für die Integration von Large Language Model und Chemical Fragment Space: Gegenseitige Inspiration für Lead Design 整合大语言模型和化学碎片空间:铅设计相互促进 2507.13580v2 -
527 07-22 R-Bot: An LLM-based Query Rewrite System R-Bot: Ein LLM-basiertes Abfrage-Rewrite-System R-Bot:一个基于LLM的查询重写系统 2412.01661v2 -
528 07-22 Attacking interpretable NLP systems Angriff auf interpretierbare NLP-Systeme 攻击可解释的NLP系统 2507.16164v1 -
529 07-22 Adapt On-the-Go: Behavior Modulation for Single-Life Robot Deployment Adapt On-the-Go: Verhaltensmodulierung für Single-Life-Roboter-Einsatz 即时适应:单生机器人部署行为改变 2311.01059v3 -
530 07-22 Learning Patient-Specific Spatial Biomarker Dynamics via Operator Learning for Alzheimer’s Disease Progression Lernen patientenspezifische räumliche Biomarker-Dynamik über den Bediener Lernen für Alzheimer-Krankheitsfortschritt 通过操作员学习阿尔茨海默氏病发展趋势的学习者学习特定病人空间生物标志动力学 2507.16148v1 -
531 07-22 Global Convergence and Rich Feature Learning in $L$-Layer Infinite-Width Neural Networks under $μ$P Parametrization Global Convergence and Rich Feature Learning in $L$-Layer Infinite-Width Neural Networks unter $μ$P Parametrization 全球融合和丰富地物学习,以美元-无线-网络神经网络计值,低于美元-美元-美元 2503.09565v2 -
532 07-22 Equivariant Goal Conditioned Contrastive Reinforcement Learning Gleichwertiges Ziel Conditioned Kontrastive Verstärkungslernen 有条件的违反规定强化学习 2507.16139v1 -
533 07-22 Aitomia: Your Intelligent Assistant for AI-Driven Atomistic and Quantum Chemical Simulations Aitomia: Ihr intelligenter Assistent für KI-getriebene Atomistische und Quantum Chemical Simulationen Aitomia:您对AI-Driven原子学和量子化学模拟的智能助理 2505.08195v3 -
534 07-22 L4Q: Parameter Efficient Quantization-Aware Fine-Tuning on Large Language Models L4Q: Parameter Effiziente Quantisierungsware Feinsteuerung bei großen Sprachmodellen L4Q:大语言模型参数有效量化-软件精美推荐 2402.04902v6 -
535 07-21 (1) Expert-Guided LLM Reasoning for Battery Discovery: From AI-Driven Hypothesis to Synthesis and Characterization Expertengeführte LLM-Gründung für die Batterieentdeckung: Von der KI-getriebenen Hypothese zur Synthese und Charakterisierung 电池发现原因:从AI-Driven假说到合成和特性 2507.16110v1 -
536 07-21 Recursive Equations For Imputation Of Missing Not At Random Data With Sparse Pattern Support Rekursive Gleichungen für die Imputation von fehlenden nicht zufälligen Daten mit Sparse Pattern Support 支持简化模式支持的非随机数据失踪的计算结果的递归等量 2507.16107v1 -
537 07-21 Analysis of the 2024 BraTS Meningioma Radiotherapy Planning Automated Segmentation Challenge Analyse der Strahlentherapie 2024 BraTS Meningiom Planung Automatisierte Segmentierung Herausforderung 分析2024年BRATS Meningioma辐射治疗规划自动化分割挑战 2405.18383v3 -
538 07-21 TorchAO: PyTorch-Native Training-to-Serving Model Optimization TorchAO: PyTorch-Native Training-to-Serving Modelloptimierung 火炬 – – 火炬 – – 火炬 – – 火炬 – – 培训到服务模式优化模式 2507.16099v1 -
539 07-21 DP-TLDM: Differentially Private Tabular Latent Diffusion Model DP-TLDM: Differential Private Tabular Latent Diffusion Model DP-TLDM:有区别的私人制表式冷流传播模型 2403.07842v2 -
540 07-21 Reinforcement Learning in hyperbolic space for multi-step reasoning Verstärkung Lernen im hyperbolischen Raum für mehrstufiges Denken 用于多步推理的双曲空间强化学习 2507.16864v1 -
541 07-21 Audio Geolocation: A Natural Sounds Benchmark Audio Geolocation: Ein natürlicher Klang Benchmark 音频地理定位:自然声音基准 2505.18726v2 -
542 07-21 Feature Selection and Junta Testing are Statistically Equivalent Feature Selection und Junta-Tests sind statistisch gleichwertig 特征选择和 Junta 测试为统计等值 2505.04604v2 -
543 07-21 Efficient Compositional Multi-tasking for On-device Large Language Models Effizientes kompositorisches Multi-Tasking für On-Device große Sprachmodelle 内部设计大型语言模型的高效组成多任务 2507.16083v1 -
544 07-21 Randomization Can Reduce Both Bias and Variance: A Case Study in Random Forests Randomisierung kann sowohl Bias als auch Varianz reduzieren: Eine Fallstudie in Random Forests 随机性可减少偏见和差异:随机森林案例研究 2402.12668v4 -
545 07-21 A Lower Bound for the Number of Linear Regions of Ternary ReLU Regression Neural Networks Eine niedrigere Grenze für die Anzahl der linearen Regionen der Ternary ReLU Regressions-Neural-Netzwerke Ternary ReLU后退神经网络线性区域数目的下界宽度 2507.16079v1 -
546 07-21 AI-driven Orchestration at Scale: Estimating Service Metrics on National-Wide Testbeds KI-getriebene Orchestrierung im Maßstab: Bewertung von Service-Metriken auf national-breiten Testbeds AI驱动的缩放式手动手动手动手动手动手动手动手动手动手动手动手动手动手动手动手动手动手动手动手动手动手动手动手动手动手动手动手动手动手动手动手动手动手动手动手动手动手动手动手动手动手动手动手动手动手动手动手动手动手动手动手动手动手动手动手动手动手动手动手动手动手动手动手动手动手动手动手动手动手动手动手 2507.16077v1 -
547 07-21 Exploring How Generative MLLMs Perceive More Than CLIP with the Same Vision Encoder Erforschen, wie Generative MLLMs mehr als CLIP mit dem gleichen Vision Encoder wahrnehmen 使用相同的愿景编码器探索如何产生比 CLIP 更远的多见性大型LLMs 2411.05195v3 -
548 07-21 Antibiotic Resistance Microbiology Dataset (ARMD): A Resource for Antimicrobial Resistance from EHRs Antibiotikaresistenz Mikrobiologie Datensatz (ARMD): Eine Ressource für antimikrobielle Resistenz von EHRs 抗生素抗药性微生物生物学数据集(ARMD):EHR的抗微生物抗药性资源 2503.07664v2 -
549 07-21 Manifold Learning with Normalizing Flows: Towards Regularity, Expressivity and Iso-Riemannian Geometry Manifold Learning mit normalisierenden Strömungen: Auf dem Weg zu Regelmäßigkeit, Expressivität und iso-Riemannsche Geometrie 以正常流动方式进行多重学习:走向规律、直观和Iso-Riemannian 几何 2505.08087v2 -
550 07-21 Interpreting CFD Surrogates through Sparse Autoencoders Verdolmetschen von CFD Surrogats durch Sparse Autoencoder 通过Sparse Autoencolders解释 CFD 代理代理 2507.16069v1 -
551 07-21 Erasing Conceptual Knowledge from Language Models Auslöschen von konzeptionellen Kenntnissen aus Sprachmodellen 将概念知识从语言模式中除去 2410.02760v3 -
552 07-21 Is memory all you need? Data-driven Mori-Zwanzig modeling of Lagrangian particle dynamics in turbulent flows Ist Gedächtnis alles, was Sie brauchen? Datengesteuerte Mori-Zwanzig Modellierung der lagrangischen Teilchendynamik in turbulenten Strömungen 数据驱动的Mori- Zwanzig 模拟在动荡中流动的拉格朗江粒子动态。 2507.16058v1 -
553 07-21 Radiological and Biological Dictionary of Radiomics Features: Addressing Understandable AI Issues in Personalized Breast Cancer; Dictionary Version BM1.0 Radiologisches und Biologisches Wörterbuch der Radiomik Features: Adressierung verständlicher KI-Probleme in Personalisierte Brustkrebs; Wörterbuch Version BM1.0 放射特征的辐射和生物词典:解决个人化乳腺癌中可理解的AI问题;字典版BM1.0。 2507.16041v1 -
554 07-21 Reactivation: Empirical NTK Dynamics Under Task Shifts Reaktivierung: Empirische NTK-Dynamik unter Aufgabenverschiebungen 重新激活: 任务变换下的NTK实证动态 2507.16039v1 -
555 07-21 Autocomp: LLM-Driven Code Optimization for Tensor Accelerators Autocomp: LLM-gesteuerte Code-Optimierung für Tensor-Beschleuniger 自动comp: LLM- Driven 代码对 Tensor 加速器的优化 2505.18574v3 -
556 07-21 Beyond the ATE: Interpretable Modelling of Treatment Effects over Dose and Time Jenseits der ATE: Interpretierbare Modellierung von Behandlungseffekten über Dosis und Zeit 超越ATE:可解释的剂量和时间处理效果模型 2507.07271v2 -
557 07-21 Neural Probabilistic Shaping: Joint Distribution Learning for Optical Fiber Communications Neurale probabilistische Formgebung: Gemeinsames Vertriebslernen für die optische Faserkommunikation 神经概率形状:光纤通信联合分发学习 2507.16012v1 -
558 07-21 Enhancing Stability of Physics-Informed Neural Network Training Through Saddle-Point Reformulation Verbesserung der Stabilität der physikinformierten neuralen Netzwerkschulung durch Sättel-Punkt-Reformulation 通过散装式点式调整加强物理内成形神经网络培训的稳定 2507.16008v1 -
559 07-21 Risks of AI Scientists: Prioritizing Safeguarding Over Autonomy Risiken von KI-Wissenschaftlern: Priorisierender Schutz vor Autonomie AI 科学家的风险:将保障自治作为优先事项 2402.04247v5 -
560 07-21 AutoMAT: A Hierarchical Framework for Autonomous Alloy Discovery AutoMAT: Hierarchischer Rahmen für die autonome Legierungsentdeckung AutomAT: 自主合金发现等级框架 2507.16005v1 -
561 07-21 Minor Embedding for Quantum Annealing with Reinforcement Learning Geringfügige Einbettung für Quantum Annealing mit Verstärkungslernen 以强化学习为量子安纳林进行小嵌入 2507.16004v1 -
562 07-21 Learning without training: The implicit dynamics of in-context learning Lernen ohne Ausbildung: Die implizite Dynamik des In-Context-Lernens 缺乏培训的学习:内通性学习的隐含动态 2507.16003v1 -
563 07-21 Automated Design of Structured Variational Quantum Circuits with Reinforcement Learning Automatisiertes Design von strukturierten Variations-Quantum-Schaltungen mit Verstärkungs-Lernen 结构变化量子电路的自动设计与强化学习 2507.16001v1 -
564 07-21 Learning Neural Differential Algebraic Equations via Operator Splitting Neurale Differentialalgebraische Gleichungen über Operator-Splitting lernen 通过运算符分割进行学习神经差异 2403.12938v3 -
565 07-21 Omni-Router: Sharing Routing Decisions in Sparse Mixture-of-Experts for Speech Recognition Omni-Router: Routing-Entscheidungen in Sparse Mixture-of-Experts für die Spracherkennung teilen Omni-Router: 分享语音识别专家的松散混集决定 2507.05724v2 -
566 07-21 Semantic-Aware Gaussian Process Calibration with Structured Layerwise Kernels for Deep Neural Networks Semantisch-bewusste Gaußische Prozesskalibrierung mit strukturierten schichtweisen Kernen für tiefe neurale Netzwerke 深神经网络结构图层内心校准 2507.15987v1 -
567 07-21 Investigation of unsupervised and supervised hyperspectral anomaly detection Untersuchung des nicht überwachten und überwachten hyperspektralen Anomaliennachweises 调查无人监督和监管的超光谱异常现象探测 2408.07114v2 -
568 07-21 On the transferability of Sparse Autoencoders for interpreting compressed models Über die Übertragbarkeit von Sparse Autoencodern zur Interpretation komprimierter Modelle 用于解释压缩模型的 Sparse Autoencards 可转让性 2507.15977v1 -
569 07-21 Efficient dataset construction using active learning and uncertainty-aware neural networks for plasma turbulent transport surrogate models Effizienter Datensatzaufbau durch aktives Lernen und unsichere neuronale Netze für turbulente Transportsurrogatmodelle im Plasma 利用积极学习和有不确定性的神经网络,为等离子体动荡运输替代模型建立高效的数据集构建 2507.15976v1 -
570 07-21 The Impact of Language Mixing on Bilingual LLM Reasoning Die Auswirkungen des Sprachmixens auf die zweisprachige LLM-Reasoning 语言混合对双语LLM理由解释的影响 2507.15849v1 -
571 07-21 Transparent Trade-offs between Properties of Explanations Transparente Kompromisse zwischen den Eigenschaften von Erklärungen 解释属性之间的透明权衡取舍 2410.23880v2 -
572 07-21 FASTGEN: Fast and Cost-Effective Synthetic Tabular Data Generation with LLMs FASTGEN: Schnelle und kosteneffektive synthetische Tabellendatenerstellung mit LLMs FASTGEN: 利用LLMs快速和成本-效益高的合成图表数据生成 2507.15839v1 -
573 07-21 Optimizing Canaries for Privacy Auditing with Metagradient Descent Optimierung von Kanarien für die Datenschutzprüfung mit Metagradient Descent 优化 “ 与代谢人后裔 “ 进行隐私审计的金库 2507.15836v1 -
574 07-21 Multi-Strategy Improved Snake Optimizer Accelerated CNN-LSTM-Attention-Adaboost for Trajectory Prediction Multi-Strategy Verbesserte Schlangenoptimierung beschleunigte CNN-LSTM-Achtung-Adaboost für Flugbahnvorhersage CNN-LSTM-Tenned-Adabowst 跟踪预测多战略改进蛇优化加速器 2507.15832v1 -
575 07-21 Just Ask for Music (JAM): Multimodal and Personalized Natural Language Music Recommendation Fragen Sie einfach nach Musik (JAM): Multimodale und personalisierte natürliche Sprache Musik Empfehlung 仅询问音乐(JAM):多式和个性化自然语言音乐建议 2507.15826v1 -
576 07-21 ACS: An interactive framework for conformal selection ACS: Ein interaktives Framework für die konforme Auswahl ACC: 兼容性选择互动框架 2507.15825v1 -
577 07-21 Efficient Multi-Camera Tokenization with Triplanes for End-to-End Driving Effiziente Multi-Kamera-Tokenisierung mit Triplanes für End-to-End-Fahren 利用三边飞机进行端到端驱动 2506.12251v2 -
578 07-21 Federated Split Learning with Improved Communication and Storage Efficiency Federated Split Learning mit verbesserter Kommunikation und Speichereffizienz 改进通信和储存效率的联邦分化学习 2507.15816v1 -
579 07-21 LLM Economist: Large Population Models and Mechanism Design in Multi-Agent Generative Simulacra LLM Economist: Große Bevölkerungsmodelle und Mechanism Design in Multi-Agent Generative Simulacra LLM 经济学家:多机构生成模拟中大型人口模型和机制设计 2507.15815v1 -
580 07-21 Splitting criteria for ordinal decision trees: an experimental study Aufteilung der Kriterien für Ordinalentscheidungsbäume: eine experimentelle Studie 例常决策树的分割标准:一项实验研究 2412.13697v3 -
581 07-21 MSGM: A Multi-Scale Spatiotemporal Graph Mamba for EEG Emotion Recognition MSGM: Multi-Scale Spatiotemporal Graph Mamba für EEG-Emotionserkennung MMSGM: 承认EEG情感的多空间反光图 Mamba 2507.15914v1 -
582 07-21 Rethinking Inductive Bias in Geographically Neural Network Weighted Regression Induktive Bias im geographisch neuralen Netzwerk neu denken Gewichtete Regression 重新思考在地理神经网络中诱导的偏见 2507.09958v4 -
583 07-21 Automatic dimensionality reduction of Twin-in-the-Loop Observers Automatische Dimensionalitätsreduktion von Twin-in-the-Loop-Beobachtern 双在洛op观察家的自动维度减少 2401.10945v2 -
584 07-21 Diffusion models for multivariate subsurface generation and efficient probabilistic inversion Diffusionsmodelle für multivariate Untergrunderzeugung und effiziente probabilistische Inversion 多变地表下产生和高效概率转换的多变地表下生成扩散模型 2507.15809v1 -
585 07-21 ConformalSAM: Unlocking the Potential of Foundational Segmentation Models in Semi-Supervised Semantic Segmentation with Conformal Prediction ConformalSAM: Das Potenzial von Basissegmentierungsmodellen in semi-überwachter semantischer Segmentierung mit konformer Vorhersage freisetzen 非正式拼音系统:在半超超语义分解中释放基础分解模型的潜力,同时进行非正式预测 2507.15803v1 -
586 07-21 Hypergraphs on high dimensional time series sets using signature transform Hypergraphen auf hochdimensionalen Zeitreihen-Sets mit Signatur-Transformation 使用签名变换的高维时间序列数 2507.15802v1 -
587 07-21 In-depth Analysis of Low-rank Matrix Factorisation in a Federated Setting Detaillierte Analyse der Low-Rank-Matrix-Fabrizierung in einem Federated Setting 深入分析联邦体系中低级别母体因数化 2409.08771v2 -
588 07-21 TensorSocket: Shared Data Loading for Deep Learning Training TensorSocket: Shared Data Loading für Deep Learning Training TensorSocket: 用于深学习培训的共享数据加载 2409.18749v3 -
589 07-21 Optimizer’s Information Criterion: Dissecting and Correcting Bias in Data-Driven Optimization Optimizer’s Information Criterion: Dissektion und Korrektur von Bias in der datengesteuerten Optimierung 优化信息标准:在数据驱动优化中解剖和纠正偏见 2306.10081v4 -
590 07-21 Small LLMs Do Not Learn a Generalizable Theory of Mind via Reinforcement Learning Kleine LLMs lernen keine verallgemeinerbare Theorie des Geistes durch Verstärkungslernen 小型LLMs Do Loms Don not Learn a Global For Syor of Mind Syory 通过加强学习学习学习不学习普通心理理论的小型LLMs 2507.15788v1 -
591 07-21 Graph Attention Specialized Expert Fusion Model for Node Classification: Based on Cora and Pubmed Datasets Grafik-Achtung Spezialisiertes Experten-Fusionsmodell für Knotenklassifikation: Basierend auf Cora- und Pubmed-Datensätzen 节点分类专门专家融合模型:以科拉和普米德数据集为基础 2507.15784v1 -
592 07-21 Dissociating model architectures from inference computations Trennen von Modellarchitekturen von Inferenzberechnungen 将模型结构与推断计算分离 2507.15776v1 -
593 07-21 Dynamics is what you need for time-series forecasting! Dynamics ist das, was Sie für die Zeitreihenvorhersage brauchen! 时间序列预测需要动力! 2507.15774v1 -
594 07-21 Deep-Learning Investigation of Vibrational Raman Spectra for Plant-Stress Analysis Deep-Learning-Untersuchung von Vibrations-Raman-Spektren für Pflanzen-Stress-Analysen 用于植物压力分析的振动性拉曼-斯佩特拉深度学习调查 2507.15772v1 -
595 07-21 Multi-Modal Sensor Fusion for Proactive Blockage Prediction in mmWave Vehicular Networks Multi-Modal Sensor Fusion für proaktive Blockierungsvorhersage in mmWave Vehicular Networks 毫米WVVVVVVVVLAVLAVVVVVVVVVVVVVE 模拟屏蔽预测的多式多式传感器聚合 2507.15769v1 -
596 07-21 Quantum Learning Theory Beyond Batch Binary Classification Quanten-Lern-Theorie jenseits der Batch Binary Klassifikation 超出批次二进制分类的量子学习理论 2302.07409v5 -
597 07-21 Predictive Planner for Autonomous Driving with Consistency Models Predictive Planer für autonomes Fahren mit konsistenten Modellen 与一致性模式一致自主驾驶的预测规划员 2502.08033v3 -
598 07-21 Reciprocity-Aware Convolutional Neural Networks for Map-Based Path Loss Prediction Reziprocity-Aware Convolutional Neural Networks for Map-Based Path Loss Prediction 地图路径损耗预测对等天体对流神经网络 2504.03625v2 -
599 07-21 Steering into New Embedding Spaces: Analyzing Cross-Lingual Alignment Induced by Model Interventions in Multilingual Language Models Steuerung in neue Einbettungsräume: Analyse der Cross-Lingual Alignment Induziert durch Modellinterventionen in mehrsprachigen Sprachmodellen 指导进入新嵌入空间:分析多语文模式示范干预措施所引出的不同语言之间的横向一致 2502.15639v2 -
600 07-21 Comparative Evaluation of Radiomics and Deep Learning Models for Disease Detection in Chest Radiography Vergleichende Auswertung von Radiomik und Deep-Learning-Modellen zur Erkennung von Krankheiten in der Brustradiographie 比较评价用于在胸针射电摄影中检测疾病辐射学和深学习模型的比较评价 2504.12249v3 -
601 07-21 Towards physician-centered oversight of conversational diagnostic AI Auf dem Weg zur ärztlichen Aufsicht über gesprächsdiagnostische KI 致力于以医生为中心对谈话诊断进行监督 AI 2507.15743v1 -
602 07-21 Conformal and kNN Predictive Uncertainty Quantification Algorithms in Metric Spaces Konforme und kNN Predictive Uncertainty Quantification Algorithmen in Metric Spaces 计量空间中正规和kNN 预测不确定性的量化数值 2507.15741v1 -
603 07-21 Competitive Algorithms for Cooperative Multi-Agent Ski-Rental Problems Wettbewerbsfähige Algorithmen für kooperative Multi-Agenten-Ski-Mietprobleme 合作性多机构天空-天空问题的竞争价值 2507.15727v1 -
604 07-21 A Review of Bayesian Uncertainty Quantification in Deep Probabilistic Image Segmentation Eine Überprüfung der Bayesian Uncertainty Quantification in Deep Probabilistic Image Segmentation 对贝叶斯不确定因素在深概率图像分割中量化的回顾 2411.16370v6 -
605 07-21 Explainable Anomaly Detection for Electric Vehicles Charging Stations Erklärbare Anomalieerkennung für Elektroautos Ladestationen 电动车辆充电站可解释异常探测 2507.15718v1 -
606 07-21 Model-Based Exploration in Monitored Markov Decision Processes Modellbasierte Exploration in überwachten Markov-Entscheidungsprozessen 在监测的Markov决策过程中进行基于模型的探索 2502.16772v6 -
607 07-21 Winning Big with Small Models: Knowledge Distillation vs. Self-Training for Reducing Hallucination in Product QA Agents Groß gewinnen mit kleinen Modellen: Wissensdestillation vs. Selbsttraining zur Reduktion der Halluzination in Produkt-QA-Agenten 以小型模型赢得大奖:知识蒸馏与减少产品质量保证剂中幻觉的自我培训 2502.19545v2 -
608 07-21 CoLD: Counterfactually-Guided Length Debiasing for Process Reward Models CoLD: Counterfactually-Führungslängen-Debiasing für Prozess-Reward-Modelle CoLD: 反事实引导进程奖励模型的长度偏差 2507.15698v1 -
609 07-21 Gradient-Guided Annealing for Domain Generalization Gradient-Guided Annealing für Domain Generalization 域通用化的渐渐引导安纳林 2502.20162v7 -
610 07-21 Understanding the training of infinitely deep and wide ResNets with Conditional Optimal Transport Verständnis der Ausbildung von unendlich tiefen und breiten ResNets mit Conditional Optimal Transport 理解如何以有条件最佳运输方式培训无限深和宽的ResNet 2403.12887v2 -
611 07-21 Missing value imputation with adversarial random forests – MissARF Fehlender Wert imputation mit konversarischen zufälligen Wäldern – MissARF 对抗性随机随机森林缺失的估算值 – – MissARRF 2507.15681v1 -
612 07-21 GeoHNNs: Geometric Hamiltonian Neural Networks GeoHNNs: Geometrische Hamiltonische Neuronale Netzwerke GeoHNNs:几何汉密尔顿神经网络 2507.15678v1 -
613 07-21 Executable Functional Abstractions: Inferring Generative Programs for Advanced Math Problems Ausführbare Funktionsabstractions: Ausleiten von Generativen Programmen für fortgeschrittene Math-Probleme 可执行的功能性抽象:为高级数学问题推导产生方案 2504.09763v2 -
614 07-21 Attention with Markov: A Framework for Principled Analysis of Transformers via Markov Chains Aufmerksamkeit bei Markov: Ein Rahmen für die grundsätzliche Analyse von Transformatoren über Markov Ketten 注意Markov:通过Markov 链条对变形器进行原则分析的框架 2402.04161v2 -
615 07-21 Further exploration of binding energy residuals using machine learning and the development of a composite ensemble model Weitere Erforschung von Bindungsenergieresten mittels maschinellem Lernen und der Entwicklung eines zusammengesetzten Ensemblemodells 利用机器学习和开发复合组合组合模型,进一步利用机器学习和开发综合组合模型,探索具有约束力的能源残余物 2503.11066v3 -
616 07-21 Sparsification Under Siege: Defending Against Poisoning Attacks in Communication-Efficient Federated Learning Sparsifikation unter Belagerung: Abwehr gegen vergiftende Angriffe in kommunikativ-effizientem Federated Learning 隔离下的隔离:在通信-高效联邦学习中防范毒物攻击 2505.01454v4 -
617 07-21 Towards Explainable Anomaly Detection in Shared Mobility Systems Auf dem Weg zu einer erklärbaren Anomalienerkennung in gemeinsamen Mobilitätssystemen 共同流动系统中可解释的异常探测 2507.15643v1 -
618 07-21 Ultra-fast feature learning for the training of two-layer neural networks in the two-timescale regime Ultraschnelles Feature-Lernen für das Training von zweischichtigen neuronalen Netzwerken im Zwei-Zeit-Regime 用于培训两层神经网络的超快专题学习 2504.18208v2 -
619 07-21 Data Mixing Agent: Learning to Re-weight Domains for Continual Pre-training Data Mixing Agent: Erlernen von Re-Gewicht Domains für kontinuierliches Pre-Training 数据混合代理: 学习为连续培训前学习重新加权域域 2507.15640v1 -
620 07-21 Dual Turing Test: A Framework for Detecting and Mitigating Undetectable AI Dual Turing Test: Ein Framework zur Erkennung und Abmilderung nicht nachweisbarer KI 双重图示试验:检测和减缓不可检测的AI值的框架 2507.15907v1 -
621 07-21 Accelerating HEC-RAS: A Recurrent Neural Operator for Rapid River Forecasting Beschleunigung von HEC-RAS: Ein wiederkehrender Neuraloperator für Rapid River Forecasting 加速ECC-RAS:快速河流预报经常神经操作员 2507.15614v1 -
622 07-21 Brain-Inspired Online Adaptation for Remote Sensing with Spiking Neural Network Gehirn-inspirierte Online-Anpassung zur Fernerkundung mit Spiking Neural Network 利用Spiking神经网络进行有脑启发的遥感在线适应 2409.02146v2 -
623 07-21 Deep Learning for Computing Convergence Rates of Markov Chains Deep Learning for Computing Convergence Rates of Markov Ketten Markov 链条计算聚合率深入学习 2405.20435v2 -
624 07-21 Optimal Batch-Size Control for Low-Latency Federated Learning with Device Heterogeneity Optimale Batch-Size-Steuerung für Low-Latency-Federated Learning mit Geräte Heterogenität 具有不同设备差异的低长期联邦学习最佳批次和最佳程度控制 2507.15601v1 -
625 07-21 Applying the Chinese Wall Reverse Engineering Technique to Large Language Model Code Editing Anwendung der Technik der chinesischen Wandumkehrtechnik auf die Bearbeitung von großen Sprachmodellen 将中国长墙反向工程技术应用到大语言模式编辑 2507.15599v1 -
626 07-21 Being-H0: Vision-Language-Action Pretraining from Large-Scale Human Videos Being-H0: Vision-Sprache-Aktion Vorschulung von großformatigen menschlichen Videos 人与人:通过大型人类视频进行视觉-语言-行动预培训 2507.15597v1 -
627 07-21 Red-Team Multi-Agent Reinforcement Learning for Emergency Braking Scenario Red-Team Multi-Agent Verstärkungs-Lernen für Notfall-Brems-Szenario 红队多机构强化学习,用于紧急制动设想方案 2507.15587v1 -
628 07-21 We Need to Rethink Benchmarking in Anomaly Detection Wir müssen Benchmarking bei Anomalienerkennung neu denken 我们需要重新思考异常探测的基准 2507.15584v1 -
629 07-21 Automated Classification of Volcanic Earthquakes Using Transformer Encoders: Insights into Data Quality and Model Interpretability Automatisierte Klassifizierung von Vulkan-Erdbeben mit Transformer-Encodern: Einblicke in Datenqualität und Modellinterpretierbarkeit 利用变换器计算器对火山地震进行自动分类:对数据质量和模型解释的透视 2507.01260v2 -
630 07-21 GeMix: Conditional GAN-Based Mixup for Improved Medical Image Augmentation GeMix: Bedingtes GAN-basiertes Mixup für verbesserte medizinische Bildvergrößerung GeMix:改进医学图像放大条件性 GAN 混合组合 2507.15577v1 -
631 07-21 On the Role of AI in Managing Satellite Constellations: Insights from the ConstellAI Project Über die Rolle der KI bei der Verwaltung von Satellitenkonstellationen: Einblicke aus dem ConstellAI-Projekt 关于AI在管理卫星星座方面的作用:ConstellAI项目透视 2507.15574v1 -
632 07-21 Derivative-Free Diffusion Manifold-Constrained Gradient for Unified XAI Derivativ-freie Diffusion Manifold-Contrained Gradient für Unified XAI 用于统一 XAI 的衍生-无扩散扩散操纵受训练梯度 2411.15265v2 -
633 07-21 Generalized Consistency Trajectory Models for Image Manipulation Generalisierte Konsistenz-Trajektorien für die Bildmanipulation 用于图像操纵的通用一致轨迹模型 2403.12510v4 -
634 07-21 Towards Reliable, Uncertainty-Aware Alignment Zuverlässige, unsichere Ausrichtung 实现可靠、不确定和不确定的软件统一 2507.15906v1 -
635 07-21 Trade-offs between elective surgery rescheduling and length-of-stay prediction accuracy Kompromisse zwischen der Neuplanung der Wahloperation und der Genauigkeit der Langzeitprognose 选择性外科重新安排与停留期预测准确性之间的权衡取舍 2507.15566v1 -
636 07-21 zkFL: Zero-Knowledge Proof-based Gradient Aggregation for Federated Learning zkFL: Null-Knowledge Proof-based Gradient Aggregation für Federated Learning zkFL: 联邦学习零知识校验渐进汇总 2310.02554v5 -
637 07-21 PhysGym: Benchmarking LLMs in Interactive Physics Discovery with Controlled Priors PhysGym: Benchmarking von LLMs in der interaktiven Physik-Discovery mit kontrollierten Prioren PhysGym: 与受控前科互动物理发现中基准化LLMs 2507.15550v1 -
638 07-21 The added value for MRI radiomics and deep-learning for glioblastoma prognostication compared to clinical and molecular information Der Mehrwert für MRT-Radiomik und Deep-Learning für Glioblastom-Prognostik im Vergleich zu klinischen und molekularen Informationen 与临床和分子信息相比,MRI放射性辐射学和深层学习对于遗传性血浆瘤预测的增加值 2507.15548v1 -
639 07-21 Improving AEBS Validation Through Objective Intervention Classification Leveraging the Prediction Divergence Principle Verbesserung der AEBS-Validierung durch Ziel-Interventions-Klassifikation Begünstigung des Prinzips der Prognoseabweichung 通过利用预测差异原则的客观干预分类,改进对AEBS的验证 2507.07872v2 -
640 07-21 Data Aware Differentiable Neural Architecture Search for Tiny Keyword Spotting Applications Data Aware Differentiable Neural Architecture Suche nach winzigen Keyword-Spoting-Anwendungen Data Ental Invecled 不同神经结构搜索微小关键词点点名应用 2507.15545v1 -
641 07-21 Foundation Models and Transformers for Anomaly Detection: A Survey Grundlagenmodelle und Transformer zur Erkennung von Anomalien: Eine Umfrage 异常探测的基础模型和变形模型:调查 2507.15905v1 -
642 07-21 Controlled Model Debiasing through Minimal and Interpretable Updates Controlled Model Debiasing durch minimale und interpretierbare Updates 通过最小和可解释的更新减少偏差 2502.21284v2 -
643 07-21 Closed-form Solutions: A New Perspective on Solving Differential Equations Closed-form Lösungen: Eine neue Perspektive zur Lösung von Differentialgleichungen 封闭式解决办法:解决差异等量的新视角 2405.14620v4 -
644 07-21 Safe and High-Performance Learning of Model Predicitve Control using Kernel-Based Interpolation Sicheres und hochleistungsfähiges Lernen der Modellprädizitve-Steuerung mittels Kernel-basierter Interpolation 利用以内核为基础的内流内插,安全而高绩效地学习示范先决控制模型 2410.06771v2 -
645 07-21 An Investigation of Test-time Adaptation for Audio Classification under Background Noise Eine Untersuchung der Testzeitanpassung für die Audioklassifikation unter Hintergrundgeräuschen 关于背景噪音下音频分类的试验时间适应情况调查 2507.15523v1 -
646 07-21 Dictionary-Learning-Based Data Pruning for System Identification Wörterbuch-Learning-basierte Datenprüfung für die Systemidentifikation 用于系统识别的词典 – – 以学习为基础的数据保护 2502.11484v2 -
647 07-21 MDNF: Multi-Diffusion-Nets for Neural Fields on Meshes MDNF: Multi-Diffusionsnetze für neurale Felder auf Maschen MDNF:Mshes神经场多传播网络 2409.03034v2 -
648 07-21 Off-Policy Corrected Reward Modeling for Reinforcement Learning from Human Feedback Off-Policy korrigierte Prämienmodellierung für verstärktes Lernen aus menschlichem Feedback 利用人类反馈加强学习的非政策纠正奖励模型 2507.15507v1 -
649 07-21 ASPERA: A Simulated Environment to Evaluate Planning for Complex Action Execution ASPERA: Eine simulierte Umgebung, um Planung für komplexe Aktionen zu bewerten ASPERA:评估复杂行动执行规划的模拟环境 2507.15501v1 -
650 07-21 Ranking-Based At-Risk Student Prediction Using Federated Learning and Differential Features Rankingbasierte At-Risk-Prognose von Studenten mit Federated Learning und Differential Features 利用联邦学习和不同特点,按等级排列的在风险时学生预测 2505.09287v2 -
651 07-21 Fast-VAT: Accelerating Cluster Tendency Visualization using Cython and Numba Schnell-MwSt: Beschleunigung der Cluster-Tendenzvisualisierung mit Cython und Numba 快速VAT:使用Cython和Numba加速集束密度可视化 2507.15904v1 -
652 07-21 Dense-depth map guided deep Lidar-Visual Odometry with Sparse Point Clouds and Images Tiefe Karte geführte tiefe Lidar-Visual-Odometrie mit Sparse Point Clouds und Bildern 带散点云和图象的深深深带深深深带深深深带深深带深深带深深带深带深带深带深带深带深带深带深带深带深带深带深带深带深带深带深带深带深带深带深带深带深带深带深带深带的光度图像和微微点云和图象的图案 2507.15496v1 -
653 07-21 Bayesian Optimization for Molecules Should Be Pareto-Aware Bayesian Optimierung für Moleküle sollte Pareto-Bewusst sein Bayesian Bayesian 分子优化应该是 Pareto- Aware 2507.13704v2 -
654 07-21 OMoE: Diversifying Mixture of Low-Rank Adaptation by Orthogonal Finetuning OMoE: Diversifizierende Mischung aus Low-Rank-Anpassung durch Orthogonal Finetuning OMoE:通过矫形微调使低Rank适应混合体多样化 2501.10062v2 -
655 07-21 Low-dimensional Functions are Efficiently Learnable under Randomly Biased Distributions Low-dimensionale Funktionen sind unter Randomly Biased Distributions effizient erlernbar 低维函数可在随机偏差分布下高效学习 2502.06443v2 -
656 07-21 Information Preserving Line Search via Bayesian Optimization Informationen Erhaltung der Liniensuche über Bayesian Optimization 通过 Bayesian 最佳优化保存信息 2507.15485v1 -
657 07-21 The Constitutional Controller: Doubt-Calibrated Steering of Compliant Agents Der Verfassungsverantwortliche: Zweifelsfrei gesteuerte Steuerung von konformen Agenten 宪制主计长:经核查的反怀疑管制人员指导员 2507.15478v1 -
658 07-21 How to Leverage Predictive Uncertainty Estimates for Reducing Catastrophic Forgetting in Online Continual Learning Wie man Predictive Uncertainty Schätzungen für die Verringerung der Katastrophenvergessenheit in Online-Kontinual Learning 如何利用预测的不确定性估算来减少在线持续学习中的灾难性遗忘 2407.07668v3 -
659 07-21 An Adaptive Random Fourier Features approach Applied to Learning Stochastic Differential Equations Ein adaptives Random Fourier Features Ansatz angewandt, um stochastische Differentialgleichungen zu lernen 用于学习斯托卡差异等量的适应性随机随机四变特性方法 2507.15442v1 -
660 07-21 The calculus of variations of the Transformer on the hyperspherical tangent bundle Die Variationsrechnung des Transformers auf dem hypersphärischen Tangentenbündel 超球正切捆绑上变形器变形的微积分 2507.15431v1 -
661 07-21 SynthCTI: LLM-Driven Synthetic CTI Generation to enhance MITRE Technique Mapping SynthCTI: LLM-getriebene synthetische CTI-Generation zur Verbesserung der MITRE-Technikmapping 合成技术:利用LLM-Driven 合成CTI新一代,加强MITRE技术绘图 2507.16852v1 -
662 07-21 Attend or Perish: Benchmarking Attention in Algorithmic Reasoning Teilnahme oder Perish: Benchmarking-Achtung bei algorithmischer Vernunft 出勤或风险:在算法理由中设定关注基准 2503.01909v2 -
663 07-21 STUN: Structured-Then-Unstructured Pruning for Scalable MoE Pruning STUN: Strukturierte und dann unstrukturierte Pruning für skalierbare MoE Pruning STUN: 结构化的当时无结构化的为可缩缩的MoE Pruning提供结构化的当时无结构化的谨慎 2409.06211v2 -
664 07-21 Predictive Process Monitoring Using Object-centric Graph Embeddings Predictive Process Monitoring mit objektzentrierten Graphen-Einbettungen 利用以物体为中心的图示嵌入器进行预测过程监测 2507.15411v1 -
665 07-21 Towards Mitigation of Hallucination for LLM-empowered Agents: Progressive Generalization Bound Exploration and Watchdog Monitor Zur Milderung der Halluzination für LLM-fähige Agenten: Progressive Generalization Bound Exploration und Watchdog Monitor 努力减少LLM-动力剂的幻觉:逐步普遍化探矿和监视监测仪表监测 2507.15903v1 -
666 07-21 MAP Estimation with Denoisers: Convergence Rates and Guarantees MAP-Schätzung mit Denoisern: Konvergenzraten und Garantien MAP 与Denoisers的估算:趋同率和保障 2507.15397v1 -
667 07-21 Learning to Gridize: Segment Physical World by Wireless Communication Channel Gridize lernen: Segment Physical World per Wireless Communication Channel 学习网络化:通过无线通信频道进行分形物理世界 2507.15386v1 -
668 07-21 To Label or Not to Label: PALM – A Predictive Model for Evaluating Sample Efficiency in Active Learning Models Beschriftung oder Nichtbeschriftung: PALM - ein vorausschauendes Modell zur Bewertung der Probeneffizienz in aktiven Lernmodellen 标签或非标签标签:PALM – – 积极学习模式样本效率评价预测模型 2507.15381v1 -
669 07-21 RL4CO: an Extensive Reinforcement Learning for Combinatorial Optimization Benchmark RL4CO: ein umfangreiches Verstärkungslernen für kombinatorische Optimierungs-Benchmark RL4CO:综合优化基准的广泛强化学习 2306.17100v6 -
670 07-21 Proficient Graph Neural Network Design by Accumulating Knowledge on Large Language Models Proficient Graph Neural Network Design durch Akkumulation von Wissen über große Sprachmodelle 通过积累关于大语言模型的知识设计精巧的图形神经网络 2408.06717v2 -
671 07-21 EEG-based Epileptic Prediction via a Two-stage Channel-aware Set Transformer Network EEG-basierte epileptische Vorhersage über ein zweistufiges Channel-aware Set Transformer Network 通过两阶段频道感应装置变形器网络进行基于EEG的月球预测 2507.15364v1 -
672 07-21 Meta4XNLI: A Crosslingual Parallel Corpus for Metaphor Detection and Interpretation Meta4XNLI: Ein Crosslingual Parallel Corpus für die Erkennung und Interpretation von Metaphoren Meta4XNLI: 用于识别和解释代名词的跨语言平行体 2404.07053v3 -
673 07-21 Constrained Optimal Fuel Consumption of HEVs under Observational Noise Eingeschränkter optimaler Kraftstoffverbrauch von HEV unter Beobachtungslärm 在观测噪音下控制最佳燃料消耗 2410.20913v2 -
674 07-21 Efficient Visual Appearance Optimization by Learning from Prior Preferences Effiziente optische Erscheinungsbildsoptimierung durch Lernen aus vorherigen Präferenzen 学习从先前优惠制获得最佳优化 2507.15355v1 -
675 07-21 ChronoSense: Exploring Temporal Understanding in Large Language Models with Time Intervals of Events ChronoSense: Erforschen des zeitlichen Verständnisses in großen Sprachmodellen mit Zeitintervallen von Ereignissen Chronossensensense:探索具有时际事件间隔的大型语言模型中的时间理解 2501.03040v2 -
676 07-21 Scaling Decentralized Learning with FLock Skalierung dezentrales Lernen mit FLock 与 FLock 的分散化学习 2507.15349v1 -
677 07-21 Probing Information Distribution in Transformer Architectures through Entropy Analysis Probing Information Distribution in Transformer-Architekturen durch Entropie-Analyse 通过 Entropy 分析在变形结构中进行测试信息发布 2507.15347v1 -
678 07-21 LionGuard 2: Building Lightweight, Data-Efficient & Localised Multilingual Content Moderators LionGuard 2: Leichte, dateneffiziente und lokalisierte Mehrsprachige Inhaltsmoderatoren bauen 狮子座标2:轻量、数据效率和本地化多语种内容主持人 2507.15339v1 -
679 07-21 Beyond Model Base Selection: Weaving Knowledge to Master Fine-grained Neural Network Design Jenseits der Modell-Basis-Auswahl: Wissen weben, um feinkörniges neurales Netzwerk-Design zu meistern 超越示范基础选择:将知识编织到精巧神经网络设计硕士 2507.15336v1 -
680 07-21 Mixture-of-Recursions: Learning Dynamic Recursive Depths for Adaptive Token-Level Computation Mixture-of-Recursions: Dynamische Rekursive Tiefen für adaptive Token-Level-Computation lernen 混合流流流:学习适应调控级计算法的动态回流深度 2507.10524v2 -
681 07-21 Language Generation in the Limit: Noise, Loss, and Feedback Sprachgenerierung im Limit: Lärm, Verlust und Feedback 限制范围内的语言生成:噪音、损失和反馈 2507.15319v1 -
682 07-21 Universal crystal material property prediction via multi-view geometric fusion in graph transformers Universelle Kristallmaterial-Eigenschaftsvorhersage über Multi-View-Geometrische Fusion in Graphentransformatoren 通过在图形变压器中多视图几几何聚合预测通用晶体物质特性 2507.15303v1 -
683 07-21 JAMUN: Bridging Smoothed Molecular Dynamics and Score-Based Learning for Conformational Ensembles JAMUN: Überbrückung geglätteter molekularer Dynamik und Score-basiertes Lernen für konformationelle Ensembles JAMUN:连接通融的分子动态和基于分数的学习,以便组成组合 2410.14621v2 -
684 07-21 Variational Mode-Driven Graph Convolutional Network for Spatiotemporal Traffic Forecasting Variationelles modegetriebenes Graphenkonvolutionales Netzwerk für die räumliche Verkehrsprognose 瞬时交通流量预测变化模式驱动图集演变网络 2408.16191v3 -
685 07-21 Feel-Good Thompson Sampling for Contextual Bandits: a Markov Chain Monte Carlo Showdown Feel-Good Thompson Sampling für Kontext Bandits: ein Markov-Kette Monte Carlo Showdown 汤普森对背景强盗的抽样:马可夫链条蒙特卡洛秀 2507.15290v1 -
686 07-21 Preferential subspace identification (PSID) with forward-backward smoothing Präferenzielle Subraum-Identifikation (PSID) mit nach vorne gerichteter Glättung 优先次空间识别(PSID),前向平滑 2507.15288v1 -
687 07-21 Mixture of Autoencoder Experts Guidance using Unlabeled and Incomplete Data for Exploration in Reinforcement Learning Mischung von Autoencoder Experten Anleitung mit unmarkierten und unvollständigen Daten für die Exploration in Verstärkungs-Lernen 使用无标签和不完整数据进行强化学习探索的自动编码器混合专家指导 2507.15287v1 -
688 07-21 Machine Unlearning for Streaming Forgetting Maschine-Entlernen für Streaming Vergessen 为流出遗忘而取消机器学习 2507.15280v1 -
689 07-21 Temporal Basis Function Models for Closed-Loop Neural Stimulation Temporale Basis-Funktionsmodelle für die Closed-Loop-Neuralstimulation 闭闭路神经刺激的时时基础功能模型 2507.15274v1 -
690 07-21 Developing Cryptocurrency Trading Strategy Based on Autoencoder-CNN-GANs Algorithms Entwicklung einer Cryptowährungs-Handelsstrategie auf der Grundlage von Autoencoder-CNN-GAN-Algorithmen 制定基于自动编码器-CNN-GANs算法的加密货币交易战略 2412.18202v6 -
691 07-21 Self-Tuning Self-Supervised Image Anomaly Detection Selbst-Tuning Selbst-überwachte Bildanomalie-Erkennung 自自上自上自上图像异常检测 2306.12033v3 -
692 07-21 Resolving Token-Space Gradient Conflicts: Token Space Manipulation for Transformer-Based Multi-Task Learning Token-Space-Gradient-Konflikte lösen: Token Space-Manipulation für transformerbasiertes Multi-Task-Learning 解决 Token- Space 渐变冲突: 用于以变换器为基础的多任务学习的 Token 空间操纵 2507.07485v2 -
693 07-21 CHORDS: Diffusion Sampling Accelerator with Multi-core Hierarchical ODE Solvers CHORDS: Diffusions-Probenahmebeschleuniger mit multicore-hierarchischen ODE-Solvers CHORDS: 多分级等级式极分解解码器扩散取样加速器 2507.15260v1 -
694 07-21 Synchronizing Task Behavior: Aligning Multiple Tasks during Test-Time Training Aufgabenverhalten synchronisieren: Mehrere Aufgaben während der Test-Time-Schulung ausrichten 同步任务行为: 测试时训练中对齐多个任务 2507.07778v2 -
695 07-21 Physics-Informed Learning of Proprietary Inverter Models for Grid Dynamic Studies Physik-informiertes Lernen von proprietären Wechselrichtermodellen für Grid Dynamic Studies 电网动态研究专有反转器模型物理学习 2507.15259v1 -
696 07-21 Interaction-Merged Motion Planning: Effectively Leveraging Diverse Motion Datasets for Robust Planning Interaction-Merged Motion Planning: Diverse Motion-Datensätze für robuste Planung effektiv nutzen 交互式组合式动态规划:有效利用多种移动式数据集进行强力规划 2507.04790v2 -
697 07-21 MEETI: A Multimodal ECG Dataset from MIMIC-IV-ECG with Signals, Images, Features and Interpretations MEETI: Ein multimodaler EKG-Datensatz von MIMIC-IV-ECG mit Signalen, Bildern, Features und Interpretationen MIMIMIMI-IV-ECG的多式ECG数据集,带有信号、图像、特征和解释 2507.15255v1 -
698 07-21 Disentangling Homophily and Heterophily in Multimodal Graph Clustering Entwirren von Homophilie und Heterophilie in multimodalen Graphenclustern 在多式图表群集中分离同形和异形 2507.15253v1 -
699 07-21 Testing the spin-bath view of self-attention: A Hamiltonian analysis of GPT-2 Transformer Testen der Spin-Bad-Ansicht der Selbstachtung: Eine Hamiltonian Analyse von GPT-2 Transformer 测试自觉自觉的自吹泡泡视图:汉密尔顿对GPT-2变形器的分析 2507.00683v4 -
700 07-21 A Practical Guide for Evaluating LLMs and LLM-Reliant Systems Ein praktischer Leitfaden für die Bewertung von LLMs und LLM-Reliant Systemen 评估LLMM和LLM-Resiont Systems的实用指南 2506.13023v2 -
701 07-21 Improving the Generation of VAEs with High Dimensional Latent Spaces by the use of Hyperspherical Coordinates Verbesserung der Generierung von VAEs mit hochdimensionalen Latentenräumen durch den Einsatz von Hypersphärischen Koordinaten 通过使用超球坐标改进具有高维维度低空空间的VAE的生成 2507.15900v1 -
702 07-21 Spatio-Temporal Demand Prediction for Food Delivery Using Attention-Driven Graph Neural Networks Spatio-Temporale Nachfragevorhersage für die Lebensmittellieferung mit aufmerksamkeitsgetriebenem Graphen-Neural-Netzwerk 利用引人注意的图形神经网络对食品提供情况进行时需求预测 2507.15246v1 -
703 07-21 EVOLVE-X: Embedding Fusion and Language Prompting for User Evolution Forecasting on Social Media EVOLVE-X: Integration von Fusionen und Sprachen für die Prognose der Nutzerentwicklung in sozialen Medien EVOLVE-X:社会媒体用户演变预测的嵌入融合和语言提示 2507.16847v1 -
704 07-21 Cross-Domain Few-Shot Learning with Coalescent Projections and Latent Space Reservation Cross-Domain Wenig-heißes Lernen mit koaleszierenden Projektionen und Latent Space Reservation 与煤白预测和暗地空间保留有关的零热学习 2507.15243v1 -
705 07-21 Exact Reformulation and Optimization for Direct Metric Optimization in Binary Imbalanced Classification Exakte Neuformulierung und Optimierung für die direkte Metrische Optimierung in der binären Imbalanced Classification 二元平衡分类中直接计量优化的精确调整和优化 2507.15240v1 -
706 07-21 HEPPO-GAE: Hardware-Efficient Proximal Policy Optimization with Generalized Advantage Estimation HEPPO-GAE: Hardwareeffiziente proximale Politikoptimierung mit generalisierter Vorteilsschätzung HEPPO-GAE: 采用通用的先进估计法优化政策 2501.12703v2 -
707 07-21 SOI Matters: Analyzing Multi-Setting Training Dynamics in Pretrained Language Models via Subsets of Interest SOI Matters: Analyse von Multi-Setting-Trainingsdynamiken in vorgebildeten Sprachmodellen über Teilmengen von Interesse SOI事项:分析通过利益子集分析培训前语言模式中多设置培训动态 2507.15236v1 -
708 07-21 Accelerated Bayesian Optimal Experimental Design via Conditional Density Estimation and Informative Data Beschleunigte Bayesian Optimal Experimental Design über Conditional Density Abschätzung und Informative Data 通过有条件密度估计和信息数据快速加速的巴伊西亚最佳实验设计 2507.15235v1 -
709 07-21 Plan and Budget: Effective and Efficient Test-Time Scaling on Large Language Model Reasoning Plan und Budget: Effektive und effiziente Test-Zeit-Skalierung auf großsprachliche Modell-Reasoning 计划和预算:关于大语言示范理由的高效率、高效益、高效率的测试时间 2505.16122v2 -
710 07-21 Multimodal Fine-grained Reasoning for Post Quality Evaluation Multimodale feinkörnige Begründung für die Bewertung der Post-Qualität 高质量后评价的多式联运优化理由 2507.17934v1 -
711 07-21 Robust and Differentially Private PCA for non-Gaussian data Robustes und differenziert privates PCA für nicht-gaussische Daten 用于非高加索数据的强力和有区别的私人五氯苯甲醚 2507.15232v1 -
712 07-21 Temporal Conformal Prediction (TCP): A Distribution-Free Statistical and Machine Learning Framework for Adaptive Risk Forecasting Temporal Conformal Prediction (TCP): Ein verteilungsfreies statistisches und maschinelles Lernkonzept für adaptive Risikoprognosen 时空危机预测:用于适应风险预测的不分发的统计和机器学习框架 2507.05470v2 -
713 07-21 Structural DID with ML: Theory, Simulation, and a Roadmap for Applied Research Strukturelle DID mit ML: Theorie, Simulation und Fahrplan für angewandte Forschung ML结构:理论、模拟和应用研究路线图 2507.15899v1 -
714 07-21 Solving Formal Math Problems by Decomposition and Iterative Reflection Formale Math-Probleme durch Zersetzung und iterative Reflexion lösen 通过分解和迭代反射解决正规数学问题 2507.15225v1 -
715 07-21 Return Capping: Sample-Efficient CVaR Policy Gradient Optimisation Return Capping: Sample-Efficient CVaR Policy Gradient Optimierung 返回标记: 样本有效 CVaR 政策级政策优化优化 2504.20887v2 -
716 07-21 Misspecifying non-compensatory as compensatory IRT: analysis of estimated skills and variance Unbestimmtes Nicht-Kompensatorisches als kompensatorisches IRT: Analyse der geschätzten Fähigkeiten und Varianz 排除作为补偿性IRT的非补偿性补偿:估计技能和差异分析 2507.15222v1 -
717 07-21 Detecting PTSD in Clinical Interviews: A Comparative Analysis of NLP Methods and Large Language Models PTSD in klinischen Interviews erkennen: Eine vergleichende Analyse von NLP-Methoden und großen Sprachmodellen 临床访谈中检测创伤后创伤后精神紧张症:国家语言规划方法和大语言模式的比较分析 2504.01216v2 -
718 07-21 Benchmarking Mobile Device Control Agents across Diverse Configurations Benchmarking Mobile Device Control Agents über verschiedene Konfigurationen hinweg 制定跨不同配置的移动设备控制工具基准 2404.16660v3 -
719 07-21 A Large Language Model-Enhanced Q-learning for Capacitated Vehicle Routing Problem with Time Windows Ein großes Sprachmodell-erweitertes Q-Lernen für kapazitierte Fahrzeugrouting-Probleme mit Zeitfenstern 用时间窗口解决机动车辆停放问题大型语文强化快速学习模型 2505.06178v2 -
720 07-21 Federated Continual Instruction Tuning Föderated Continual Instruction Tuning 联邦连续教学 2503.12897v2 -
721 07-21 RetroDiff: Retrosynthesis as Multi-stage Distribution Interpolation RetroDiff: Retrosynthese als mehrstufige Verteilungsinterpolation RetroDiff: 作为多阶段分销的回溯合成 2311.14077v2 -
722 07-21 Joint-Local Grounded Action Transformation for Sim-to-Real Transfer in Multi-Agent Traffic Control Gemeinsam-Lokale Erdungstransformation für Sim-to-Real-Transfer in Multi-Agent Traffic Control 在多机构交通管制中进行即时到实物转移的联合-当地行动转变 2507.15174v1 -
723 07-21 Better Models and Algorithms for Learning Ising Models from Dynamics Bessere Modelle und Algorithmen zum Lernen von Modellen aus der Dynamik 从动态中学习型号模型的更好的模型和算法 2507.15173v1 -
724 07-21 ReDi: Rectified Discrete Flow ReDi: Rektifizierter diskreter Fluss Redi: 纠正的分异流 2507.15897v1 -
725 07-21 LaViPlan : Language-Guided Visual Path Planning with RLVR LaViPlan : Sprachgeführte visuelle Pfadplanung mit RLVR Laviplan: RLVR 语言引导视觉路径规划 2507.12911v2 -
726 07-20 (7) Designing User-Centric Metrics for Evaluation of Counterfactual Explanations Designing User-Centric Metrics für die Auswertung von gegenfaktischen Erklärungen 设计用于评价反事实解释的用户中心计量器 2507.15162v1 -
727 07-20 Resonant-Tunnelling Diode Reservoir Computing System for Image Recognition Resonant-Tunnelling Diode Reservoir Computing System für die Bilderkennung 图像识别共振二氧化二氮储量计算系统 2507.15158v1 -
728 07-20 CBAGAN-RRT: Convolutional Block Attention Generative Adversarial Network for Sampling-Based Path Planning CBAGAN-RRT: Convolutional Block Attention Generatives Adversarial Network für die stichprobengestützte Pfadplanung CBAGAN-RRT: 以抽样为基础的路径规划革命性阻力引引引反向网络 2305.10442v3 -
729 07-20 Constraint-aware Learning of Probabilistic Sequential Models for Multi-Label Classification Constraint-aware Learning of Probabilistic Sequential Models for Multi-Label Classification 严格了解多标签分类概率序列模型 2507.15156v1 -
730 07-20 Studying Classifier(-Free) Guidance From a Classifier-Centric Perspective Studieren Klassifikator(-frei) Anleitung aus einer klassifikator-zentralen Perspektive 从分类分类中心角度研究分类(无)指导 2503.10638v2 -
731 07-20 What Level of Automation is “Good Enough”? A Benchmark of Large Language Models for Meta-Analysis Data Extraction Welche Stufe der Automatisierung ist “Gut genug”? Ein Benchmark für große Sprachmodelle für die Meta-Analyse-Datenextraktion 自动化的等级是“好到好”? 元分析数据提取大语言模式的基准 2507.15152v1 -
732 07-20 Design of an Edge-based Portable EHR System for Anemia Screening in Remote Health Applications Design eines Edge-basierten tragbaren EHR-Systems für die Anämie-Screening in Remote Health-Anwendungen 设计一个以边缘为基础的远程保健应用中贫血筛查的便携EHR系统 2507.15146v1 -
733 07-20 Quantum Machine Learning for Secure Cooperative Multi-Layer Edge AI with Proportional Fairness Quantum Machine Learning für sichere kooperative Multi-Layer Edge KI mit proportionaler Fairness 以比例公平方式进行量子学习,确保多层合作和多层边缘安全合作 2507.15145v1 -
734 07-20 Transforming Datasets to Requested Complexity with Projection-based Many-Objective Genetic Algorithm Transformation von Datensätzen auf geforderte Komplexität mit Projektions-basiertem Viel-Objektive-Genetischen Algorithmus 将数据集转换为具有基于投影的多目标遗传算法的复杂度 2507.15132v1 -
735 07-20 A Semantic-based Optimization Approach for Repairing LLMs: Case Study on Code Generation Ein semantisch-basierter Optimierungsansatz zur Reparatur von LLMs: Fallstudie zur Codegenerierung 修复LLMLM 的基于语义的优化优化方法:关于代码生成的案例研究 2503.12899v3 -
736 07-20 Restrictions on Physical Stochastic Reservoir Computers Beschränkungen der physikalischen stochastischen Speicherrechner 限制物理储藏电脑 2307.14474v5 -
737 07-20 Are We Overlooking the Dimensions? Learning Latent Hierarchical Channel Structure for High-Dimensional Time Series Forecasting Sind wir über die Dimensionen? Lernen Latent Hierarchical Channel Struktur für High-Dimensional Time Series Forecasting 我们是不是忽略了维度?学习高级时代系列预测的 旧高阶通道结构 2507.15119v1 -
738 07-20 Graph Attention Networks for Detecting Epilepsy from EEG Signals Using Accessible Hardware in Low-Resource Settings Graph Aufmerksamkeit Netzwerke zur Erkennung von Epilepsie von EEG-Signalen mit zugänglicher Hardware in Low-Resource-Einstellungen 低资源设置设置中使用无障碍硬件从EEG信号中检测出癫痫的图示关注网络 2507.15118v1 -
739 07-20 Distributional Unlearning: Forgetting Distributions, Not Just Samples Verteilungsloses Lernen: Verteilungen vergessen, nicht nur Proben 分发的不学习:忘记分发,而不仅仅是抽样 2507.15112v1 -
740 07-20 LoopNet: A Multitasking Few-Shot Learning Approach for Loop Closure in Large Scale SLAM LoopNet: Ein multitasking weniger heißer Lernansatz für Loop Closure in Large Scale SLAM 环网:大规模SLAMM环圈封闭的多任务、很少热的多学习方法 2507.15109v1 -
741 07-20 Beyond Sin-Squared Error: Linear-Time Entrywise Uncertainty Quantification for Streaming PCA Über Sin-Squared-Fehler hinaus: Linear-Time Entrywise Uncertainty Quantification for Streaming PCA Sin-Squred 错误:流动五氯苯甲醚的线性时序入门不确定性的量化 2506.12655v2 -
742 07-20 AnalogFed: Federated Discovery of Analog Circuit Topologies with Generative AI AnalogFed: Federated Discovery of Analog Circuit Topologies with Generative AI 模拟: 具有生成性人工智能的模拟电路地形的联邦发现 2507.15104v1 -
743 07-20 PhysioWave: A Multi-Scale Wavelet-Transformer for Physiological Signal Representation PhysioWave: Multi-Scale Wavelet-Transformer für Physiologische Signaldarstellung PhysioWave: 生理信号代表的多阶段波盘转换器 2506.10351v2 -
744 07-20 Learning under Latent Group Sparsity via Diffusion on Networks Lernen unter Latent Group Sparsity über Diffusion in Netzwerken 通过网络传播在中端群体平等下学习 2507.15097v1 -
745 07-20 Enhancing Lung Disease Diagnosis via Semi-Supervised Machine Learning Verbesserung der Diagnose von Lungenerkrankungen durch semi-überwachtes maschinelles Lernen 通过半监督机器学习加强肺病诊断 2507.16845v1 -
746 07-20 ModelVerification.jl: a Comprehensive Toolbox for Formally Verifying Deep Neural Networks ModelVerification.jl: eine umfassende Toolbox zur formalen Überprüfung tiefer neuraler Netzwerke 模型核查.jl:用于正式核查深神经网络的综合工具箱 2407.01639v2 -
747 07-20 Simulation-Prior Independent Neural Unfolding Procedure Simulation-Prior Unabhängiges Neural-Entfaltungsverfahren 模拟 - 模拟 - 模拟 - 原始 - 独立神经元集载程序 2507.15084v1 -
748 07-20 Beyond Win Rates: A Clustering-Based Approach to Character Balance Analysis in Team-Based Games Beyond Win Rates: Ein Clustering-basierter Ansatz zur Charakter-Balance-Analyse in Team-Based Games 超越赢率:在团队运动会中采用基于集群办法进行性平衡分析 2502.01250v2 -
749 07-20 PGT-I: Scaling Spatiotemporal GNNs with Memory-Efficient Distributed Training PGT-I: Scaling Spatiotemporal GNNs mit speichereffizienter verteilter Ausbildung PGT-I: 具有记忆有效分配培训的Splap Spatotomotial GNNs 2507.11683v2 -
750 07-20 Robust Control with Gradient Uncertainty Robuste Steuerung mit gradienter Unsicherheit 带渐变不确定性的强力控制 2507.15082v1 -
751 07-20 Knowing When to Quit: Probabilistic Early Exits for Speech Separation Zu wissen, wann man aufhören soll: probabilistische frühe Ausgänge für Sprachtrennung 了解何时退出:语言分离的概率早期出场 2507.09768v2 -
752 07-20 Isotonic Quantile Regression Averaging for uncertainty quantification of electricity price forecasts Isotonische Quantile Regression Mittelung der Unsicherheit Quantifizierung der Strompreisprognosen 电价预测量化不确定性的误差 2507.15079v1 -
753 07-20 Reinforcement Learning for Flow-Matching Policies Verstärktes Lernen für Flow-Matching-Politiken 流动派接政策强化学习 2507.15073v1 -
754 07-20 ROBAD: Robust Adversary-aware Local-Global Attended Bad Actor Detection Sequential Model ROBAD: Robustes Adversary-aware Local-Global besucht Bad Actor Detection Sequential Model ROBAD: 强力反逆觉觉悟当地-全球 出席的不良行为器检测序列模型 2507.15067v1 -
755 07-20 Time-RA: Towards Time Series Reasoning for Anomaly with LLM Feedback Time-RA: Auf dem Weg zu einer Zeitreihe, die mit LLM Feedback zu Anomalie führt 时间-RA:采用LLM反馈办法为异常情况寻找时间序列理由 2507.15066v1 -
756 07-20 Quantum Annealing for Machine Learning: Applications in Feature Selection, Instance Selection, and Clustering Quantenanaling für maschinelles Lernen: Anwendungen in der Feature Selection, Instance Selection und Clustering 机器学习的保密:特写选择、选案和集群方面的应用 2507.15063v1 -
757 07-20 Touch in the Wild: Learning Fine-Grained Manipulation with a Portable Visuo-Tactile Gripper Touch in the Wild: Fine-Grained Manipulation mit einem tragbaren Visuo-Taktilen Greifer lernen 野生触摸:学习用便携活性手动Grippper 进行精密的操纵 2507.15062v1 -
758 07-20 LibLMFuzz: LLM-Augmented Fuzz Target Generation for Black-box Libraries LibLMFuzz: LLM-Augmented Fuzz Target Generation für Black-Box-Bibliotheken LibLMFuzz: 黑盒图书馆LLM- 推荐的模糊目标生成 2507.15058v1 -
759 07-20 Frequency-Aligned Knowledge Distillation for Lightweight Spatiotemporal Forecasting Frequenzabhängige Wissensdestillation für leichte Spatiotemporale Vorhersagen 轻量度对时预报知识蒸馏 2507.02939v2 -
760 07-20 Integrating Reason-Based Moral Decision-Making in the Reinforcement Learning Architecture Integrieren von reason-based Moral Decision-Making in die Lernarchitektur der Stärkung 将合理道德决策纳入强化学习架构 2507.15895v1 -
761 07-20 OpenBreastUS: Benchmarking Neural Operators for Wave Imaging Using Breast Ultrasound Computed Tomography OpenBreastUS: Benchmarking Neural Operators für Wave Imaging mit Breast Ultrasound Computed Tomography Open BrestUS:使用乳房超声波计算地形学进行波成像基准神经操作员 2507.15035v1 -
762 07-20 The hunt for new pulsating ultraluminous X-ray sources: a clustering approach Die Jagd nach neuen pulsierenden ultrahellen Röntgenquellen: ein Clustering-Ansatz 寻找新的脉动极光X光新来源:集群办法 2507.15032v1 -
763 07-20 Integrating Newton’s Laws with deep learning for enhanced physics-informed compound flood modelling Integration von Newtons Gesetzen mit tiefem Lernen für verbesserte Physik-informierte Mischflutmodellierung 将牛顿法律与深层学习相结合,加强物理学知情复合物洪水建模 2507.15021v1 -
764 07-20 Sampling Decisions Stichprobenentscheidungen 抽样决定 2503.14549v2 -
765 07-20 Can Mental Imagery Improve the Thinking Capabilities of AI Systems? Kann Mental Imagery die Denkfähigkeiten von KI-Systemen verbessern? 精神形象能提高人工智能系统的思考能力吗? 2507.12555v2 -
766 07-20 Credit Risk Analysis for SMEs Using Graph Neural Networks in Supply Chain Kreditrisikoanalyse für KMU mit Hilfe von Graph Neural Networks in der Lieferkette 利用供应链中图表神经网络的中小企业信贷风险分析 2507.07854v2 -
767 07-20 Neural networks for bifurcation and linear stability analysis of steady states in partial differential equations Neurale Netze zur Bifurkation und linearen Stabilitätsanalyse von Steady States in partiellen Differentialgleichungen 以部分差异方程对稳定状态进行双向和线性稳定分析的神经网络 2407.19707v4 -
768 07-20 The Rise of AI Teammates in Software Engineering (SE) 3.0: How Autonomous Coding Agents Are Reshaping Software Engineering Der Aufstieg von KI-Teamkollegen in der Software-Engineering (SE) 3.0: Wie autonome Coding-Agenten Software-Engineering umgestalten AI软件工程(SE)3.0:自动编码代理人如何重组软件工程 2507.15003v1 -
769 07-20 Clustered Federated Learning for Generalizable FDIA Detection in Smart Grids with Heterogeneous Data Clustered Federated Learning for Generalizable FDA Detection in Smart Grids with Heterogenous Data 具有异种数据的智能网格中的探测 2507.14999v1 -
770 07-20 Language Integration in Fine-Tuning Multimodal Large Language Models for Image-Based Regression Sprachintegration in multimodale Großsprachenmodelle für die Bild-basierte Regression 以图像为基础的倒退的精细调整多式大语言模型中的语言融合 2507.14997v1 -
771 07-20 Greedy Low-Rank Gradient Compression for Distributed Learning with Convergence Guarantees Greedy Low-Rank Gradient Compression für verteiltes Lernen mit Konvergenzgarantien 利用聚合担保分配学习的贪婪低频梯度压缩 2507.08784v2 -
772 07-20 TD-Interpreter: Enhancing the Understanding of Timing Diagrams with Visual-Language Learning TD-Interpreter: Mit Visual-Language-Lernen das Verständnis von Timing-Diagrammen verbessern TD-解释:用视觉语言学习增进对时间图的了解 2507.16844v1 -
773 07-20 AlphaAlign: Incentivizing Safety Alignment with Extremely Simplified Reinforcement Learning AlphaAlign: Förderung der Sicherheitsausrichtung mit extrem vereinfachtem Verstärkungslernen 字母名称:以极其简化的强化学习方式激励安全调整 2507.14987v1 -
774 07-20 FedWCM: Unleashing the Potential of Momentum-based Federated Learning in Long-Tailed Scenarios FedWCM: Entfesseln des Potenzials von Momentum-basiertem Föderierten Lernen in langanhaltenden Szenarien FedWCM:在长期失败情况下释放基于动力的联邦学习潜力 2507.14980v1 -
775 07-20 A Comparative Analysis of Statistical and Machine Learning Models for Outlier Detection in Bitcoin Limit Order Books Eine vergleichende Analyse statistischer und maschineller Lernmodelle zur Erkennung von Ausreißern in Bitcoin Limit Order Books Bittcoin限制单书中用于外部探测的统计和机器学习模型比较分析 2507.14960v1 -
776 07-20 FullRecall: A Semantic Search-Based Ranking Approach for Maximizing Recall in Patent Retrieval FullRecall: Ein semantischer Search-Based-Ranking-Ansatz zur Maximierung des Recalls im Patent Retrieval 完全回想:在专利检索中最大限度地回想的语义搜索排名法 2507.14946v1 -
777 07-20 Trustworthy Text-to-Image Diffusion Models: A Timely and Focused Survey Vertrauenswürdige Text-zu-Bild-Diffusionsmodelle: Eine zeitgerechte und fokussierte Umfrage 可信赖的文本到图像传播模型:及时和有重点的调查 2409.18214v2 -
778 07-20 SOC-DGL: Social Interaction Behavior Inspired Dual Graph Learning Framework for Drug-Target Interaction Identification SOC-DGL: Social Interaction Behavior Inspired Dual Graph Learning Framework for Drug-Target Interaction Identification SOC-DGL:由社会互动行为启发的药物目标互动识别双重图示学习框架 2506.01405v2 -
779 07-20 Measuring Leakage in Concept-Based Methods: An Information Theoretic Approach Messung von Leckagen in konzeptbasierten Methoden: Ein informationenstheoretischer Ansatz 衡量基于概念方法中的流失:信息理论方法 2504.09459v2 -
780 07-20 Old Rules in a New Game: Mapping Uncertainty Quantification to Quantum Machine Learning Alte Regeln in einem neuen Spiel: Mapping Uncertainty Quantification to Quantum Machine Learning 新游戏中的旧规则: 将不确定性量化成量子机器学习 2507.14919v1 -
781 07-20 Interactive proofs for verifying (quantum) learning and testing Interaktive Nachweise für das (Quantum-)Lernen und Testen 用于核实(量表)学习和测试的交互式证明 2410.23969v2 -
782 07-20 Partial Symmetry Enforced Attention Decomposition (PSEAD): A Group-Theoretic Framework for Equivariant Transformers in Biological Systems Partielle Symmetrie verstärkter Aufmerksamkeitsabbau (PSEAD): Ein gruppentheoretischer Rahmen für äquivalente Transformer in biologischen Systemen 部分对称强制强制注意力分解:生物系统中等离异变异变异器的集团理论框架 2507.14908v1 -
783 07-20 5G Traffic Prediction with Time Series Analysis 5G Verkehrsvorhersage mit Zeitreihenanalyse 5G 具有时间序列分析的交通预测 2110.03781v2 -
784 07-20 Learning Nonlinear Causal Reductions to Explain Reinforcement Learning Policies Nichtlineares Erlernen von Ursachenreduktionen zur Erklärung von Maßnahmen zur Stärkung des Lernens 解释加强学习政策的非线性因果减量 2507.14901v1 -
785 07-20 Application-Specific Component-Aware Structured Pruning of Deep Neural Networks via Soft Coefficient Optimization Anwendungsspezifische Komponente-Bewusst strukturierte Pruning Deep Neural Networks durch Soft Coefficient Optimization 通过软合效益优化对深神经网络进行调节 2507.14882v1 -
786 07-20 Enhanced Pruning Strategy for Multi-Component Neural Architectures Using Component-Aware Graph Analysis Verbesserte Pruning-Strategie für Mehrkomponenten-Neuralarchitekturen unter Verwendung von Komponenten-Aware Graphenanalyse 利用组件软件图分析,加强多功能神经结构的审慎战略 2504.13296v2 -
787 07-20 Neural Flow Samplers with Shortcut Models Neural Flow Sampler mit Shortcut-Modellen 带有快捷模式的神经流样板 2502.07337v2 -
788 07-20 The Tsetlin Machine Goes Deep: Logical Learning and Reasoning With Graphs Die Tsetlin-Maschine geht tief: Logisches Lernen und Nachdenken mit Graphen Tsetlin 机器深层:逻辑学习和用图表解释 2507.14874v1 -
789 07-20 Recent Advances in Simulation-based Inference for Gravitational Wave Data Analysis Jüngste Fortschritte bei der simulationsbasierten Schlussfolgerung für die Analyse von Gravitationswellendaten 引力波数据分析模拟推导法最近的进展 2507.11192v3 -
790 07-20 Transformers and Ensemble methods: A solution for Hate Speech Detection in Arabic languages Transformer und Ensemble-Methoden: Eine Lösung für Hass-Spracherkennung in arabischen Sprachen 变换器和组合方法:用阿拉伯语探测仇恨言论的解决方案 2303.09823v2 -
791 07-20 A Privacy-Centric Approach: Scalable and Secure Federated Learning Enabled by Hybrid Homomorphic Encryption Ein Datenschutz-Centric-Ansatz: Skalierbares und sicheres Federated Learning durch hybride homomorphe Verschlüsselung ermöglicht 隐私中心方法:通过混合单态加密实现可扩展和安全的联邦学习 2507.14853v1 -
792 07-20 Grounding Degradations in Natural Language for All-In-One Video Restoration Erdungsdegradationen in natürlicher Sprache für die Wiederherstellung eines Video-All-in-One-Videos 全体一体行动,恢复视频 2507.14851v1 -
793 07-20 Hierarchical Multi-Agent Reinforcement Learning with Control Barrier Functions for Safety-Critical Autonomous Systems Hierarchisches Mehr-Agenten-Verstärkungs-Lernen mit Kontrollbarrierefunktionen für sicherheitskritische autonome Systeme 具有控制障碍功能的高级多机构强化学习 2507.14850v1 -
794 07-20 Vector Quantization Prompting for Continual Learning Vector Quantization Prompting für kontinuierliches Lernen 吸引持续学习的矢量量化 2410.20444v2 -
795 07-20 Time-Aware Attention for Enhanced Electronic Health Records Modeling Zeitbewusste Aufmerksamkeit für verbesserte elektronische Gesundheitsdatensysteme 提高电子健康记录强化建模时间意识关注 2507.14847v1 -
796 07-20 Calibrated and Robust Foundation Models for Vision-Language and Medical Image Tasks Under Distribution Shift Kalibrierte und robuste Fundamentierungsmodelle für Vision-Sprache und medizinische Bildaufgaben unter Verteilungsverschiebung 分配变化下的愿景语言和医疗图像任务模型 2507.09222v2 -
797 07-20 The Invisible Leash: Why RLVR May Not Escape Its Origin Die unsichtbare Leine: Warum RLVR seinem Ursprung nicht entkommen kann 隐形Leash:为什么RLVR不能逃离其起源 2507.14843v1 -
798 07-20 Unraveling the Interplay between Carryover Effects and Reward Autocorrelations in Switchback Experiments Entschlüsselung des Interplays zwischen Übertragungseffekten und Belohnungsautokorrelationen in Switchback-Experimenten 在回转实验中解开结转效应与回转回实验中回调自动关系之间的交互作用 2403.17285v6 -
799 07-20 An explainable operator approximation framework under the guideline of Green’s function Ein erklärbarer Bediener-Annäherungsrahmen unter der Leitlinie der Green-Funktion Green职能准则下的可解释的运营商近似近似框架 2412.16644v2 -
800 07-20 Differentially Private Synthetic Graphs Preserving Triangle-Motif Cuts Unterschiedlich Private Synthetische Graphen Vorhalten von Dreieck-Motif-Schnitten 不同的私人合成图 保持三角-摩蒂夫剪切 2507.14835v1 -
801 07-20 Interpretable Reward Modeling with Active Concept Bottlenecks Interpretierbare Prämienmodellierung mit Active Concept Engpässen 具有主动概念瓶颈的可解释的奖励模型 2507.04695v2 -
802 07-20 eMargin: Revisiting Contrastive Learning with Margin-Based Separation eMargin: Kontrastives Lernen mit Marge-basierter Trennung eMargin: 重新审查与边际离职的矛盾学习 2507.14828v1 -
803 07-20 Efficient Visual Transformer by Learnable Token Merging Effizienter Visual Transformer durch erlernbares Token Merging 以学习 Tok 合并方式高效视觉变形器 2407.15219v2 -
804 07-20 Benchmarking Foundation Models with Multimodal Public Electronic Health Records Benchmarking-Stiftungsmodelle mit multimodalen Public Electronic Health-Datensätzen 采用多式公共电子健康记录模式的基准基础模型 2507.14824v1 -
805 07-20 A Near-Optimal Single-Loop Stochastic Algorithm for Convex Finite-Sum Coupled Compositional Optimization Ein nahezu optimaler Single-Loop-Stochastischer Algorithmus für Convex-Finite-Sum-gekoppelte kompositorische Optimierung 近于最佳的、 精度- Sum 组合构成优化的近最佳单极单极托盘算法 2312.02277v6 -
806 07-20 Transaction Profiling and Address Role Inference in Tokenized U.S. Treasuries Transaktion Profilierung und Adresse Rolle Inferenz in Tokenized US Treasuries 美国金融债券中的交易分析和处理角色推断 2507.14808v1 -
807 07-20 Subliminal Learning: Language models transmit behavioral traits via hidden signals in data Subliminales Lernen: Sprachmodelle übertragen Verhaltensmerkmale über versteckte Signale in Daten 潜质学习:语言模式通过数据中隐藏的信号传递行为特征 2507.14805v1 -
808 07-20 Lizard: An Efficient Linearization Framework for Large Language Models Lizard: Ein effizienter Linearisierungsrahmen für große Sprachmodelle Lizard:大型语言模型的高效线性框架 2507.09025v2 -
809 07-20 Robust Local Polynomial Regression with Similarity Kernels Robuste lokale polynomische Regression mit Ähnlichkeitskernen 具有相似内核的强力局部多面回归 2501.10729v2 -
810 07-20 Composing Linear Layers from Irreducibles Das Komponieren von linearen Schichten aus Irreduzierbaren 将来自不灵异的线性图层合成成线性图层 2507.11688v2 -
811 07-20 NegRefine: Refining Negative Label-Based Zero-Shot OOD Detection NegRefine: Verfeinerung negativer Label-basierter Zero-Shot-OOD-Erkennung NegRefine: 改进以标签为基的零热 OOOD 检测 2507.09795v2 -
812 07-20 Flow Equivariant Recurrent Neural Networks Strömungsgleiche recurrente neurale Netzwerke 流动等量经常经常神经网络 2507.14793v1 -
813 07-20 Exploring the In-Context Learning Capabilities of LLMs for Money Laundering Detection in Financial Graphs Erforschung der In-Context-Learning-Fähigkeiten von LLMs für Geldwäscheerkennung in Finanzgraphen 探索金融图中洗钱侦查LLMs的学习能力 2507.14785v1 -
814 07-20 Video-based Exercise Classification and Activated Muscle Group Prediction with Hybrid X3D-SlowFast Network Videobasierte Trainingsklassifikation und Aktivierung der Muskelgruppenvorhersage mit Hybrid X3D-SlowFast Netzwerk 与混合X3D-低速网络的视频作业分类和启动式肌肉组预测 2406.06703v2 -
815 07-20 Uncertainty Quantification for Machine Learning-Based Prediction: A Polynomial Chaos Expansion Approach for Joint Model and Input Uncertainty Propagation Ungewissheitsquantifizierung für Machine Learning-based Prediction: Ein polynomialer Chaos-Expansionsansatz für gemeinsame Modell- und Input-Unsicherheitspropagation 机械学习预测的不确定性量化:用于联合示范和投入不确定性传播的多元混乱扩大办法 2507.14782v1 -
816 07-20 A Mathematical Framework and a Suite of Learning Techniques for Neural-Symbolic Systems Ein mathematischer Rahmen und eine Suite von Lerntechniken für neural-symbolische Systeme 神经-交响系统数学框架和学习技术套件 2407.09693v2 -
817 07-20 Optimal Task Order for Continual Learning of Multiple Tasks Optimale Auftragsreihenfolge für kontinuierliches Lernen mehrerer Aufgaben 继续不断学习多种任务的最佳任务顺序 2502.03350v2 -
818 07-20 MultiKernelBench: A Multi-Platform Benchmark for Kernel Generation MultiKernelBench: Ein Multi-Platform Benchmark für die Kernel-Generation 多KenneelBench: 核心生成的多平台基准 2507.17773v1 -
819 07-20 HI-PMK: A Data-Dependent Kernel for Incomplete Heterogeneous Data Representation HI-PMK: Ein Data-Dependent-Kernel für unvollständige heterogene Datendarstellung HI-PMK:一个数据依赖核心,用于不完全异基因数据代表 2501.04300v2 -
820 07-20 Improving Group Robustness on Spurious Correlation via Evidential Alignment Verbesserung der Robustheit der Gruppe bei sauberer Korrelation durch Evidential Alignment 通过证据协调改进小组对净关系关联的威力 2506.11347v3 -
821 07-20 Rethinking Memorization Measures and their Implications in Large Language Models Rethinking Memoring Measures and their Implikationen in Large Language Models 重新思考记忆措施及其对大语言模式的影响 2507.14777v1 -
822 07-20 Conditional Front-door Adjustment for Heterogeneous Treatment Assignment Effect Estimation Under Non-adherence Bedingte Front-Tür-Anpassung für heterogene Behandlung Zuordnungseffektschätzung unter Nichtbefolgung 不遵守规定情况下对不同不同待遇不同待遇的 条件性前门调整 外门调整 2505.05677v4 -
823 07-19 (6) Fine-Tuning Diffusion Generative Models via Rich Preference Optimization Feintuning Diffusion Generative Modelle über Rich Preference Optimization 通过富有普惠最佳化的精美推广创 创 创 创 型 型 型 型 型 型 2503.11720v4 -
824 07-19 Collusion-Resilient Hierarchical Secure Aggregation with Heterogeneous Security Constraints Kollusion-Resiliente Hierarchische Sichere Aggregation mit heterogenen Sicherheitsbeschränkungen 协同-抗力强的等级安全聚合与不同不同安全因素的限制 2507.14768v1 -
825 07-19 XplainAct: Visualization for Personalized Intervention Insights XplainAct: Visualisierung für personalisierte Interventions-Insights XPlainAct: 个性干预观察的可视化 2507.14767v1 -
826 07-19 CXR-TFT: Multi-Modal Temporal Fusion Transformer for Predicting Chest X-ray Trajectories CXR-TFT: Multi-Modal Temporal Fusion Transformer zur Vorhersage von Röntgen-Trajektorien im Brustkorb CXR-TFT:用于预测胸透X射线轨迹的多模式时际拆解变换器 2507.14766v1 -
827 07-19 Score-based Causal Representation Learning: Linear and General Transformations Score-based Causal Representation Learning: Lineare und allgemeine Transformationen 基于计分的因果代表制学习:线性转变和一般转变 2402.00849v5 -
828 07-19 RACR-MIL: Rank-aware contextual reasoning for weakly supervised grading of squamous cell carcinoma using whole slide images RACR-MIL: Rank-aware kontextuelle Argumentation für schwach überwachte Einstufung von Plattenepithelkarzinom mit ganzen Diabildern RACR-MIL: 使用整张幻灯片图像对典型细胞癌进行监管不力分类的背景推理 2308.15618v2 -
829 07-19 QUTCC: Quantile Uncertainty Training and Conformal Calibration for Imaging Inverse Problems QUTCC: Quantile Uncertainty Training und Konforme Kalibrierung für bildgebende Inverse Probleme QUTCC: 成反向问题量化不确定性培训和常规校准 2507.14760v1 -
830 07-19 A Structure-Guided Gauss-Newton Method for Shallow ReLU Neural Network Eine strukturgeführte Gauß-Newton-Methode für shallow ReLU Neural Network 浅光 ReLU 神经网络结构引导高斯-牛顿方法 2404.05064v2 -
831 07-19 Iceberg: Enhancing HLS Modeling with Synthetic Data Iceberg: Verbesserung der HLS-Modellierung mit synthetischen Daten 冰山:加强利用合成数据建立HLS模型 2507.09948v2 -
832 07-19 Supervised Graph Contrastive Learning for Gene Regulatory Network Überwachtes Graph Kontrastives Lernen für Gene Regulatory Network 受监督的基因监管网络图表对比性学习 2505.17786v3 -
833 07-19 Domain-Adaptive Small Language Models for Structured Tax Code Prediction Domain-Adaptive kleine Sprachmodelle für strukturierte Steuervorhersage 结构化税法预测结构化税法 2507.10880v2 -
834 07-19 SemiOccam: A Robust Semi-Supervised Image Recognition Network Using Sparse Labels SemiOccam: Ein robustes semi-überwachtes Bilderkennungsnetzwerk mit Sparse-Labels 半 Occam: 使用粗略标签粗略标签的强力半半超图像识别网络 2506.03582v3 -
835 07-19 Skill Learning via Policy Diversity Yields Identifiable Representations for Reinforcement Learning Kompetenzerwerb durch politische Vielfalt führt zu identifizierbaren Repräsentationen für verstärktes Lernen 通过政策多样性学习技能 2507.14748v1 -
836 07-19 Pruning Increases Orderedness in Recurrent Computation Pruning erhöht Ordnung in der recurrent Computation 经常计算中审慎增加的有秩序性 2507.14747v1 -
837 07-19 Sampling from Gaussian Processes: A Tutorial and Applications in Global Sensitivity Analysis and Optimization Probenahme aus gaussischen Prozessen: Ein Tutorial und Anwendungen in der globalen Sensitivitätsanalyse und Optimierung Gaussian进程抽样:全球敏感性分析和优化的教学和应用 2507.14746v1 -
838 07-19 Beyond the Single-Best Model: Rashomon Partial Dependence Profile for Trustworthy Explanations in AutoML Jenseits des Single-Best-Modells: Rashomon Partial Dependence Profile für vertrauenswürdige Erklärungen in AutoML 超越单一最佳模式:自动ML中可信赖解释的Rashomon部分依赖性简介 2507.14744v1 -
839 07-19 Better Training Data Attribution via Better Inverse Hessian-Vector Products Bessere Datenzuweisung durch bessere inverse hessisch-Vektor-Produkte 通过 “ 更好的反向 “ 赫森 – – 选民产品更好地分配培训数据 2507.14740v1 -
840 07-19 Multi-parameter Control for the $(1+(λ,λ))$-GA on OneMax via Deep Reinforcement Learning Multiparameter-Steuerung für das $(1+(λ,λ))$-GA auf OneMax über Deep Reinforcement Learning (1+(,,)$-GA的多参数控制 2505.12982v3 -
841 07-19 Reevaluating Policy Gradient Methods for Imperfect-Information Games Neubewertung der Politik Gradient Methoden für Imperfect-Informations-Spiele 重新评估不完善信息运动会的逐步政策方法 2502.08938v2 -
842 07-19 Balancing Expressivity and Robustness: Constrained Rational Activations for Reinforcement Learning Ausbalancierende Expressivität und Robustheit: eingeschränkte rationale Aktivierungen für verstärktes Lernen 平衡表达性和强力:加强学习的有节制的理性行动 2507.14736v1 -
843 07-19 Attention-Based Reconstruction of Full-Field Tsunami Waves from Sparse Tsunameter Networks Aufmerksamkeitsbasierte Rekonstruktion von Ganzfeld-Tsunamiwellen aus Sparse Tsunameter-Netzwerken 利用微缩起子网络重建全战地海啸波 2411.12948v5 -
844 07-19 Beyond Atomic Geometry Representations in Materials Science: A Human-in-the-Loop Multimodal Framework Jenseits von atomaren Geometriedarstellungen in der Materialwissenschaft: Ein multimodaler Rahmen für Mensch-in-the-Loop 超出原子几何在材料科学中的代表性以外的原子几何代表性:人类在洛博多模式框架 2506.00302v2 -
845 07-19 Task-Agnostic Continual Prompt Tuning with Gradient-Based Selection and Decoding Task-Agnostic Continual Prompt Tuning mit gradient-based Auswahl und Decodierung 以渐进选择和下限方式进行任务不可确定的持续快速快速调试 2507.14725v1 -
846 07-19 The unknotting number, hard unknot diagrams, and reinforcement learning Die unknotierende Zahl, harte Unknot-Diagramme und das Erlernen von Verstärkungen 点点数, 硬点点点数图表, 和强化学习 2409.09032v2 -
847 07-19 LeanTree: Accelerating White-Box Proof Search with Factorized States in Lean 4 LeanTree: Beschleunigen der White-Box-Proof-Suche mit faktorisierten Zuständen in Lean 4 利安特里:在利安4区与加工业国家加速白纸体校对搜索 2507.14722v1 -
848 07-19 Exploring the Dynamic Scheduling Space of Real-Time Generative AI Applications on Emerging Heterogeneous Systems Erforschung des dynamischen Planungsraums von Echtzeit-Generativen KI-Anwendungen auf entstehenden Heterogenen Systemen 探索新兴异变体系统实时产生AI应用的动态日程安排空间 2507.14715v1 -
849 07-19 Sortformer: A Novel Approach for Permutation-Resolved Speaker Supervision in Speech-to-Text Systems Sorformer: Ein neuartiger Ansatz für Permutations-Resolved Speaker Supervision in Speech-to-Text Systemen 排序前:语音到文字系统变换解决的议长监督新办法 2409.06656v3 -
850 07-19 Fraud is Not Just Rarity: A Causal Prototype Attention Approach to Realistic Synthetic Oversampling Betrug ist nicht nur Seltenheit: Ein kausaler Prototyp Aufmerksamkeit Ansatz zur realistischen synthetischen Oversampling 欺诈不仅仅是报复:对现实的合成合成过度抽样采取因果原型关注方法 2507.14706v1 -
851 07-19 APIGen-MT: Agentic Pipeline for Multi-Turn Data Generation via Simulated Agent-Human Interplay APIGen-MT: Agentische Pipeline für die Multi-Turn-Datengenerierung über simuliertes Agent-Human-Interplay PAPIGen-MT: 通过模拟代理人间相互作用生成多发数据时的代理管道 2504.03601v4 -
852 07-19 Towards the Next Frontier in Speech Representation Learning Using Disentanglement Auf dem Weg zur nächsten Front in der Sprachrepräsentanz Lernen mit Entflechtung 走向使用分离手段进行演讲代表学习的下一个前沿 2407.02543v2 -
853 07-19 Spatial-Temporal Transformer with Curriculum Learning for EEG-Based Emotion Recognition Raum-Temporal Transformer mit Curriculum-Lernen für EEG-basierte Emotionserkennung 具有基于EEG的情感识别课程学习的空间时空变换器 2507.14698v1 -
854 07-19 Forecasting Faculty Placement from Patterns in Co-authorship Networks Forecasting Fakultät Platzierung aus Mustern in Co-Autorship Networks 共同领导网络中基于模式的学院定位预测 2507.14696v1 -
855 07-19 Caching Techniques for Reducing the Communication Cost of Federated Learning in IoT Environments Caching-Techniken zur Reduzierung der Kommunikationskosten von Federated Learning in IoT-Umgebungen 降低在IoT环境中联邦学习的传播成本的缓冲技术 2507.17772v1 -
856 07-19 Rethinking Suicidal Ideation Detection: A Trustworthy Annotation Framework and Cross-Lingual Model Evaluation Umdenken bei der Erkennung von Selbstmordgedanken: Ein vertrauensvolles Annotations-Framework und Cross-Lingual Model Evaluation 重新思考潮ideideididation 探测:可信赖的注解框架和跨语言模式评价 2507.14693v1 -
857 07-19 Mind the Gap: A Review of Arabic Post-Training Datasets and Their Limitations Mind the Gap: Eine Überprüfung der arabischen Post-Training-Datensätze und deren Einschränkungen 《思想差距:对阿拉伯培训后数据集及其局限性的审查》 2507.14688v1 -
858 07-19 Revisiting Graph Contrastive Learning on Anomaly Detection: A Structural Imbalance Perspective Überblick auf Graph Kontrastives Lernen über Anomalienerkennung: Eine strukturelle Ungleichgewichtsperspektive 重新审视异常探测方面的对比图表学习:结构不平衡的视角 2507.14677v1 -
859 07-19 Rec-AD: An Efficient Computation Framework for FDIA Detection Based on Tensor Train Decomposition and Deep Learning Recommendation Model Rec-AD: Ein effizienter Berechnungsrahmen für die FDA-Erkennung auf der Grundlage von Tensor Train Decomposition und Deep Learning Empfehlungsmodell Res-AD:基于Tensor 列车分解和深学习建议模型的FDIA探测有效计算框架 2507.14668v1 -
860 07-19 Which Experiences Are Influential for RL Agents? Efficiently Estimating The Influence of Experiences Welche Erfahrungen sind einflussreich für RL-Agenten? Effiziente Einschätzung des Einflusses von Erfahrungen RL代理机构有哪些经验可资借鉴? 有效估计经验的影响 2405.14629v3 -
861 07-19 When few labeled target data suffice: a theory of semi-supervised domain adaptation via fine-tuning from multiple adaptive starts Wenn nur wenige beschriftete Zieldaten ausreichen: eine Theorie der semi-überwachten Domänenanpassung durch Feinabstimmung von mehreren adaptiven Starts 当贴有标签的目标数据数量很少时,只要有以下标记的目标数据就足够:从多重适应开始进行微调,通过半监督的域域适应理论 2507.14661v1 -
862 07-19 Learning to Communicate in Multi-Agent Reinforcement Learning for Autonomous Cyber Defence Lernen zur Kommunikation im Mehr-Agenten-Verstärkungs-Lernen für die autonome Cyber-Verteidigung 学习多机构强化学习,以交流多机构强化学习,促进自动网络防御 2507.14658v1 -
863 07-19 State-observation augmented diffusion model for nonlinear assimilation with unknown dynamics State-observation Augmented Diffusion Modell für nichtlineare Assimilation mit unbekannter Dynamik 国家观测扩大非线性同同化扩散模型,具有未知动态 2407.21314v3 -
864 07-19 Accelerating Hamiltonian Monte Carlo for Bayesian Inference in Neural Networks and Neural Operators Beschleunigen Hamiltonian Monte Carlo für Bayesian Inferenz in neuralen Netzwerken und neuralen Betreibern 加速汉密尔顿·蒙特卡洛的神经网络和神经操作员中的贝耶斯推理速度 2507.14652v1 -
865 07-19 Deep Learning-Based Survival Analysis with Copula-Based Activation Functions for Multivariate Response Prediction Deep Learning-Based Survival Analysis mit Copula-basierten Aktivierungsfunktionen für Multivariate Response Prediction 具有多变量反应预测以科普拉为基础的启动功能的深学习生存分析 2507.14641v1 -
866 07-19 KinForm: Kinetics Informed Feature Optimised Representation Models for Enzyme $k_{cat}$ and $K_{M}$ Prediction KinForm: Kinetics Informiertes Feature Optimierte Darstellungsmodelle für Enzyme $k_{cat}$ und $K_{M}$ Vorhersage 基质形式: Enzyme $kcat} 和 $KM} 预测值的动因、知情地物最佳代表模型 2507.14639v1 -
867 07-19 Agentic Satellite-Augmented Low-Altitude Economy and Terrestrial Networks: A Survey on Generative Approaches Agentische Satelliten-Augmented Low-Altitude Economy and Terrestrial Networks: Eine Umfrage zu generativen Ansätzen 高空低空经济和地面网络:关于创造方法的调查 2507.14633v1 -
868 07-19 $k$-PCA for (non-squared) Euclidean Distances: Polynomial Time Approximation $k$-PCA für (nicht quadratisch) Euklidische Entfernungen: Polynomzeit-Annäherung 用于(非平方)欧洲大陆距离:多边时间接近 2507.14631v1 -
869 07-19 Knockout: A simple way to handle missing inputs Knockout: Ein einfacher Weg, um fehlende Eingänge zu handhaben Knookout: 处理缺失输入的简单方法 2405.20448v3 -
870 07-19 Hierarchical Reinforcement Learning for Temporal Abstraction of Listwise Recommendation Hierarchisches Verstärkungslernen für die zeitliche Abstraktion der Listwise-Empfehlung 用于清单建议时间摘要汇编的等级强化学习 2409.07416v2 -
871 07-19 Exp-Graph: How Connections Learn Facial Attributes in Graph-based Expression Recognition Exp-Graph: Wie Verbindungen Gesichtsattribute in Graph-basierter Expression erkennen lernen Exp-Graph: 图形表达式识别中连接如何学习模糊属性 2507.14608v1 -
872 07-19 Understanding Matching Mechanisms in Cross-Encoders Vergleichbare Mechanismen in Cross-Encodern verstehen 跨企业的匹配机制 2507.14604v1 -
873 07-19 Towards a Proactive Autoscaling Framework for Data Stream Processing at the Edge using GRU and Transfer Learning Auf dem Weg zu einem proaktiven Autoscaling-Framework für die Datenstromverarbeitung am Rand mittels GRU und Transfer Learning 争取在边缘使用GRU和转移学习实现数据流处理的主动自动调整框架 2507.14597v1 -
874 07-19 PLADIS: Pushing the Limits of Attention in Diffusion Models at Inference Time by Leveraging Sparsity PLADIS: Die Grenzen der Aufmerksamkeit bei Diffusionsmodellen zur Folgezeit durch Sparsamkeit drücken PLADIS:通过杠杆利用公平,在推论时间传播模型中提高注意的限度 2503.07677v3 -
875 07-19 Coordinate Heart System: A Geometric Framework for Emotion Representation Koordinaten-Herzsystem: Ein geometrisches Rahmenwerk für die Emotionsdarstellung 协调心脏系统:情感代表的几何框架 2507.14593v1 -
876 07-19 A Transformer-Based Conditional GAN with Multiple Instance Learning for UAV Signal Detection and Classification Ein transformerbasierter Bedingter GAN mit Multiple Instance-Lernen für UAV-Signalerkennung und -Klassifizierung 以变换器为基础的条件性GAN,具有用于无人驾驶飞行器信号探测和分类的多实例学习 2507.14592v1 -
877 07-19 AlgoTune: Can Language Models Speed Up General-Purpose Numerical Programs? AlgoTune: Können Sprachmodelle allgemeine numerische Programme beschleunigen? AlgoTune: 语言模型能加速通用计算程序吗? 2507.15887v1 -
878 07-19 LPS-GNN : Deploying Graph Neural Networks on Graphs with 100-Billion Edges LPS-GNN : Einsatz von Graphen-Neuralnetzwerken auf Graphen mit 100-Billionen-Kanten LPS-GNN:在100亿米边缘的图图上部署图形神经网络 2507.14570v1 -
879 07-19 The Origin of Self-Attention: From Pairwise Affinity Matrices to Transformers Der Ursprung der Selbstachtung: Von Paarweiser Affinität zu Transformern 自我关注的起源:从对等亲和矩阵到变异体 2507.14560v1 -
880 07-19 Maximum Causal Entropy IRL in Mean-Field Games and GNEP Framework for Forward RL Maximale Causal Entropy IRL in Mittelfeldspielen und GNEP-Rahmen für Forward RL 中场运动会和GNEP 前转转场框架的最大因果导入性IRL 2401.06566v2 -
881 07-19 Brain Foundation Models: A Survey on Advancements in Neural Signal Processing and Brain Discovery Brain Foundation Models: Eine Umfrage über Fortschritte bei der Neural Signalverarbeitung und Gehirnentdeckung 脑基础模型:神经信号处理和脑发现进展调查 2503.00580v2 -
882 07-19 Real Time Captioning of Sign Language Gestures in Video Meetings Echtzeit-Beschriftung von Gesten in Gebärdensprache in Video-Treffen 视频会议手语手语手势实时定位 2507.14543v1 -
883 07-19 Characterizing State Space Model (SSM) and SSM-Transformer Hybrid Language Model Performance with Long Context Length Charakterisieren von State Space Model (SSM) und SSM-Transformer Hybrid Language Model Performance mit langer Kontextlänge 确定国家空间模型(SSM)和SSM-过渡混合语言模型长内性性能特点 2507.12442v2 -
884 07-19 Kernel Based Maximum Entropy Inverse Reinforcement Learning for Mean-Field Games Kernelbasiertes maximales Entropie-Inverse-Verstärkung-Lernen für Mittelfeld-Spiele 以核心为核心的中场运动会最大内心反向强化学习 2507.14529v1 -
885 07-19 Positive-Unlabeled Learning for Control Group Construction in Observational Causal Inference Positiv unbeschriftetes Lernen für den Aufbau von Kontrollgruppen in beobachtungsbedingtem Kausalzusammenhang 在观察性因果关系中进行控制组建设的积极无标签学习 2507.14528v1 -
886 07-19 Explainable Graph Neural Networks via Structural Externalities Erklärbare Graph Neuronale Netzwerke über strukturelle Externalitäten 通过结构外貌可解释的图形神经网络 2507.17848v1 -
887 07-19 Diffusion Models for Time Series Forecasting: A Survey Diffusionsmodelle für die Zeitreihenprognose: Eine Umfrage 时间序列预测传播模型:调查 2507.14507v1 -
888 07-19 Generalized Linear Bandits with Limited Adaptivity Generalisierte Linear Banditen mit begrenzter Adaptivität 有限适应性通用直线强盗 2404.06831v5 -
889 07-19 RingFormer: A Neural Vocoder with Ring Attention and Convolution-Augmented Transformer RingFormer: Ein neuraler Vocoder mit Ringaufmerksamkeit und Convolution-Augmented Transformer Ringformer: 具有响起注意力和革命推动变形器的神经导体Vocoder 2501.01182v2 -
890 07-19 Generative Distribution Distillation Generative Verteilungsdestillation 蒸馏 2507.14503v1 -
891 07-19 Neural Brownian Motion Neural Brownian Bewegung 神经棕色运动 2507.14499v1 -
892 07-19 Rethinking Data Protection in the (Generative) Artificial Intelligence Era Datenschutz im Zeitalter der (generativen) Künstlichen Intelligenz neu denken 在人工(人工)情报时代重新思考数据保护问题 2507.03034v3 -
893 07-19 Glitches in Decision Tree Ensemble Models Glitches in Decision Tree Ensemble Modelle 决策树组合模型中的漏洞 2507.14492v1 -
894 07-19 Numerical Artifacts in Learning Dynamical Systems Numerische Artefakte im Lernen dynamischer Systeme 学习动态系统中的数值手法 2507.14491v1 -
895 07-19 Federated Reinforcement Learning in Heterogeneous Environments Föderiertes Stärkungslernen in heterogenen Umgebungen 不同不同环境的联邦强化学习 2507.14487v1 -
896 07-19 ReDiSC: A Reparameterized Masked Diffusion Model for Scalable Node Classification with Structured Predictions ReDiSC: Ein reparameterisiertes Maskiertes Diffusionsmodell für skalierbare Knotenklassifikation mit strukturierten Vorhersagen ReDISC:具有结构预测的可缩放节节点分类可修复的蒙面扩散模型 2507.14484v1 -
897 07-19 Label-semantics Aware Generative Approach for Domain-Agnostic Multilabel Classification Label-Semantik Aware Generativer Ansatz für Domain-Agnostic Multilabel-Klassifikation 域-不可知性多标签分类的认知生成方法 2506.06806v2 -
898 07-19 Learning Stochastic Hamiltonian Systems via Stochastic Generating Function Neural Network Stochastische Hamiltonische Systeme über stochastische Generierungsfunktion neurales Netzwerk lernen 通过Stochatic生成功能神经网络学习斯托卡特·汉密尔顿系统 2507.14467v1 -
899 07-19 SWI: Speaking with Intent in Large Language Models SWI: Sprechen mit Intent in großen Sprachmodellen SWI:用大语言模型表达意向 2503.21544v2 -
900 07-19 AlphaDPO: Adaptive Reward Margin for Direct Preference Optimization AlphaDPO: Adaptive Prämienspanne für direkte Präferenzoptimierung AlphaDPO: 直接优化优惠的适应性回报边缘 2410.10148v4 -
901 07-19 Continual Learning with Neuromorphic Computing: Foundations, Methods, and Emerging Applications Kontinuierliches Lernen mit neuromorphem Rechnen: Grundlagen, Methoden und neu entstehende Anwendungen 与神经陆基计算机的不断学习:基础、方法和新兴应用 2410.09218v3 -
902 07-19 Faster Low-Rank Approximation and Kernel Ridge Regression via the Block-Nyström Method Schnellere Low-Rank-Annäherung und Kernel Ridge-Regression über die Block-Nyström-Methode 通过块-Nyström方法更快地低兰克相近和内核脊回归 2506.17556v2 -
903 07-19 DiCE-Extended: A Robust Approach to Counterfactual Explanations in Machine Learning DiCE-erweitert: Ein robuster Ansatz gegenfaktische Erklärungen im maschinellen Lernen DiCE-Expended: 机械学习中反事实解释的有力方法 2504.19027v2 -
904 07-19 Statistical and Algorithmic Foundations of Reinforcement Learning Statistische und algorithmische Grundlagen des verstärkten Lernens 强化学习的统计和算法基础 2507.14444v1 -
905 07-19 The Perception of Phase Intercept Distortion and its Application in Data Augmentation Die Wahrnehmung von Phase Intercept Distortion und ihre Anwendung in der Datenvergrößerung 阶段拦截干扰的感知及其在数据增加中的应用 2506.14571v2 -
906 07-19 Adversarial bandit optimization for approximately linear functions Adversariale Bandit-Optimierung für etwa lineare Funktionen 大约直线功能的对面土匪优化 2505.20734v4 -
907 07-19 Escaping Saddle Points for Nonsmooth Weakly Convex Functions via Perturbed Proximal Algorithms Escaping Sattel Punkte für nonsmooth schwach Konvex Funktionen über Perturbed Proximal Algorithmen 通过 Perturbed Proximal Proximal 精度算法为非mooth 微弱 Convex 函数解开套接合点 2102.02837v3 -
908 07-19 ShiftKD: Benchmarking Knowledge Distillation under Distribution Shift ShiftKD: Benchmarking Knowledge Destillation unter Distribution Shift ShiftKD: 分配转移下知识蒸馏基准 2312.16242v3 -
909 07-19 Grokking at the Edge of Linear Separability Grokking am Rande der linearen Separierbarkeit 位于线性分离的边缘 2410.04489v2 -
910 07-19 Likelihood-Free Gaussian Process for Regression Wahrscheinlichkeitsfreier Gauß-Prozess für Regression 高斯回归进程 2006.13456v5 -
911 07-19 Decomposed Quadratization: Efficient QUBO Formulation for Learning Bayesian Network Zersetzte Quadratisierung: Effiziente QUBO-Formulierung für Bayesisches Netzwerk 分解四分化:高效的QUBO 制定学习海湾网络 2006.06926v7 -
912 07-19 It’s Not That Simple. An Analysis of Simple Test-Time Scaling Es ist nicht so einfach. Eine Analyse der einfachen Test-Zeit-Skalierung 不是那么简单 简单的测试时间缩放分析 2507.14419v1 -
913 07-18 (5) BARNN: A Bayesian Autoregressive and Recurrent Neural Network BARNN: Ein bayesisches Autoregressives und recurrentes Neuronales Netzwerk Bayesian自动递减和经常性神经网络 2501.18665v2 -
914 07-18 Fail Fast, or Ask: Mitigating the Deficiencies of Reasoning LLMs with Human-in-the-Loop Systems Engineering Fail Fast oder Ask: Die Defizite von LLMs mit Human-in-the-Loop-System-Engineering abzumildern 快速失灵, 或问: 减轻Loop系统人文工程公司在理据有限性方面的缺陷 2507.14406v1 -
915 07-18 ADEPTS: A Capability Framework for Human-Centered Agent Design ADEPTS: Ein Capability Framework für das Design von Human-Centered Agents ADEPTS:以人为中心的制剂设计能力框架 2507.15885v1 -
916 07-18 Incremental Causal Graph Learning for Online Cyberattack Detection in Cyber-Physical Infrastructures Inkrementales Causal Graph Learning für Online Cyberattack Detection in Cyber-Physical Infrastructures 网络物理基础设施在线网络攻击探测的递增因果图表学习 2507.14387v1 -
917 07-18 Statistical learning for constrained functional parameters in infinite-dimensional models Statistisches Lernen für eingeschränkte funktionale Parameter in unendlich-dimensionalen Modellen 关于无限模式中有限功能参数的统计学习 2404.09847v2 -
918 07-18 Combinatorial Optimization for All: Using LLMs to Aid Non-Experts in Improving Optimization Algorithms Kombinatorische Optimierung für alle: Verwendung von LLMs zur Unterstützung von Nicht-Experten bei der Verbesserung von Optimierungsalgorithmen 组合优化全民:利用LLMs帮助非专家改进最佳化算法 2503.10968v2 -
919 07-18 Schemora: schema matching via multi-stage recommendation and metadata enrichment using off-the-shelf llms Schema: Schema-Matching über mehrstufige Empfehlung und Metadaten-Anreicherung mit Off-the-Shelf-llms Schemora:通过多阶段建议和元数据利用现成光束进行元数据浓缩的匹配方案 2507.14376v1 -
920 07-18 Prompt Smart, Pay Less: Cost-Aware APO for Real-World Applications Prompt Smart, weniger zahlen: Kosten-Bewusst-APO für Real-World-Anwendungen 即时智能,低薪:用于现实世界应用的成本软件APO 2507.15884v1 -
921 07-18 Smarter Together: Combining Large Language Models and Small Models for Physiological Signals Visual Inspection Smarter Together: Kombination von großen Sprachmodellen und kleinen Modellen für die visuelle Inspektion physiologischer Signale 将大语言模型和生理信号视觉检查小模型结合起来 2501.16215v2 -
922 07-18 Layerwise Recall and the Geometry of Interwoven Knowledge in LLMs Layerwise Recall und die Geometrie des verwobenen Wissens in LLMs 平整图层回溯和LLM 中互交知识的几何 2502.10871v2 -
923 07-18 Oversmoothing Alleviation in Graph Neural Networks: A Survey and Unified View Überglättende Linderung in Graph Neural Networks: Eine Umfrage und Unified View 图形神经网络的压倒性缓解:调查和统一观点 2405.01663v2 -
924 07-18 Comparing skill of historical rainfall data based monsoon rainfall prediction in India with NWP forecasts Vergleich der Fähigkeiten von historischen Niederschlagsdaten basierend auf Monsunregen Vorhersage in Indien mit NWP Prognosen 将印度基于历史降雨数据的历史降雨量数据季风降雨量预测与内罗毕工作方案预测的技能进行比较 2402.07851v2 -
925 07-18 Generative Models and Connected and Automated Vehicles: A Survey in Exploring the Intersection of Transportation and AI Generative Modelle und vernetzte und Automatisierte Fahrzeuge: Eine Umfrage bei der Erforschung der Intersektion von Transport und KI 生成模型以及连接和自动化车辆:探索运输和AI的交叉路口调查 2403.10559v2 -
926 07-18 Relative Entropy Pathwise Policy Optimization Relative Entropie pfadweise politische Optimierung 相对 Entrop 路径式政策优化 2507.11019v2 -
927 07-18 Solo Connection: A Parameter Efficient Fine-Tuning Technique for Transformers Solo-Anschluss: Eine Parameter-Effiziente Feintuning-Technik für Transformatoren Solo 连接: 用于变形器的参数节能微调技术 2507.14353v1 -
928 07-18 Still More Shades of Null: An Evaluation Suite for Responsible Missing Value Imputation Noch mehr Schattierungen von Null: Eine Bewertungs-Suite für verantwortungsbewusste wertvermißte Imputation 更多 “ 无 “ 的阴影:负责任的缺失价值估计评估套件 2409.07510v6 -
929 07-18 Influence Functions for Preference Dataset Pruning Einflussfunktionen für Preference Dataset Pruning 优先数据集缓冲影响函数 2507.14344v1 -
930 07-18 MENO: Hybrid Matrix Exponential-based Neural Operator for Stiff ODEs. Application to Thermochemical Kinetics MENO: Hybrid-Matrix Exponential-basierter Neural-Operator für Stiff-ODEs. Anwendung in der thermochemischen Kinetik MENO: Stiff DES 混合矩阵指数基神经操作器。 2507.14341v1 -
931 07-18 Topological Social Choice: Designing a Noise-Robust Polar Distance for Persistence Diagrams Topologische soziale Wahl: Entwerfen einer Rausch-Robusten Polardistanz für Persistenzdiagramme 地形社会选择:为持久性图解设计一个噪音-沸流极地距离 2507.14340v1 -
932 07-18 Fiduciary AI for the Future of Brain-Technology Interactions Fiduciary KI für die Zukunft von Brain-Technology Interaktionen 未来脑-技术相互作用协会 2507.14339v1 -
933 07-18 Document Haystack: A Long Context Multimodal Image/Document Understanding Vision LLM Benchmark Document Haystack: Ein langer Kontext Multimodales Bild/Dokument Verständnis Vision LLM Benchmark Haystack文件:长期、多模式图像/文件理解愿景LLM基准 2507.15882v1 -
934 07-18 GreenCrossingAI: A Camera Trap/Computer Vision Pipeline for Environmental Science Research Groups GreenCrossingAI: Eine Kamerafalle/Computer Vision Pipeline für Forschungsgruppen der Umweltwissenschaften GreenCrossingAI:环境科学研究小组的相机陷阱/计算机视觉管道 2507.09410v2 -
935 07-18 Development and Deployment of Hybrid ML Models for Critical Heat Flux Prediction in Annulus Geometries Entwicklung und Einsatz von Hybrid-ML-Modellen für kritische Wärmeflussprognosen in Annulus Geometrien 开发和部署安努卢斯地貌特征下临界热量流量预测混合模型模型 2507.14332v1 -
936 07-18 Defending Against Unforeseen Failure Modes with Latent Adversarial Training Verteidigung gegen unvorhergesehene Ausfallmodi mit latenten Adversarial Training 利用远程反反向培训,防范意外失灵模式 2403.05030v5 -
937 07-18 Plan for Speed: Dilated Scheduling for Masked Diffusion Language Models Plan für Geschwindigkeit: Erweitertes Scheduling für maskierte Diffusions-Sprachmodelle 速度计划: 遮蔽传播语言模型的饱和日程安排 2506.19037v2 -
938 07-18 TREET: TRansfer Entropy Estimation via Transformers TREET: TRansfer-Entropieschätzung über Transformatoren TREET: 通过变压器对TRansfer Entropy 估计 2402.06919v4 -
939 07-18 Rethinking Individual Fairness in Deepfake Detection Individuelle Fairness in Deepfake Detection neu denken 重新思考个人在深假探测中的公平性 2507.14326v1 -
940 07-18 The Elicitation Game: Evaluating Capability Elicitation Techniques Das Elizitation Spiel: Evaluieren der Fähigkeit Elizitationstechniken Eliucation Game: Elicative Elication Techniques: Elicity Elicucation Technologies 引用游戏:评估能力应用技术 2502.02180v3 -
941 07-18 FedStrategist: A Meta-Learning Framework for Adaptive and Robust Aggregation in Federated Learning FedStrategist: Ein Meta-Learning-Framework für adaptive und robuste Aggregation im Federated Learning 联邦战略:联邦学习中适应性和强力聚合的元学习框架 2507.14322v1 -
942 07-18 Symbolic Mixture-of-Experts: Adaptive Skill-based Routing for Heterogeneous Reasoning Symbolische Mixture-of-Experts: Adaptives Skill-basiertes Routing für heterogene Vernunft 专家的混合符号:基于适应性技能的异异源理据调离 2503.05641v3 -
943 07-18 Aligning Large Language Models to Low-Resource Languages through LLM-Based Selective Translation: A Systematic Study Ausrichtung großer Sprachmodelle auf ressourcenarme Sprachen durch LLM-basierte Selektive Übersetzung: Eine systematische Studie 通过基于LLM的选择性翻译,使大语言模式与低资源语言相一致:系统研究 2507.14304v1 -
944 07-18 A universal augmentation framework for long-range electrostatics in machine learning interatomic potentials Ein universeller Augmentations-Rahmen für Langstrecken-Elektrostatik in interatomaren Potenzialen des maschinellen Lernens 用于机器学习跨原子潜能的远程电磁学的通用扩增框架 2507.14302v1 -
945 07-18 Age of Information Minimization in UAV-Enabled Integrated Sensing and Communication Systems Alter der Informationsminimierung in UAV-fähigen integrierten Sensing- und Kommunikationssystemen 无人驾驶航空器 – – 使用无人驾驶航空器的 综合遥感和通信系统信息最小化的时代 2507.14299v1 -
946 07-18 A Simple “Try Again” Can Elicit Multi-Turn LLM Reasoning Ein einfaches “Testen Sie wieder” kann die Multi-Turn LLM Reasoning beseitigen 简单“ 再试一次 ” , 能够将多发 LLM 解析 2507.14295v1 -
947 07-18 Toward Temporal Causal Representation Learning with Tensor Decomposition Auf dem Weg zur zeitlichen kausalen Repräsentation Lernen mit Tensor-Zersetzung 走向时间性因果代表制学习,使Tensor分解 2507.14126v1 -
948 07-18 A General Framework for Inference-time Scaling and Steering of Diffusion Models Ein allgemeiner Rahmen für Schlussfolgerungs-Zeit-Skalierung und Steuerung von Diffusionsmodellen 传播模型的推推时间缩放和引导总框架 2501.06848v5 -
949 07-18 Kolmogorov Arnold Networks (KANs) for Imbalanced Data – An Empirical Perspective Kolmogorov Arnold Networks (KANs) für unausgewogene Daten – Eine empirische Perspektive Kolmogorov Arnold 数据不平衡网络 – – 经验视角 2507.14121v1 -
950 07-18 Harmony in Divergence: Towards Fast, Accurate, and Memory-efficient Zeroth-order LLM Fine-tuning Harmonie in Divergenz: Auf dem Weg zu einer schnellen, präzisen und speichereffizienten Null-Order-LLM Feinabstimmung 和谐共存:快速、准确和记忆效率高的零级LLM微调 2502.03304v2 -
951 07-18 NoHumansRequired: Autonomous High-Quality Image Editing Triplet Mining NoHumansRequired: Autonome High-Quality Bildbearbeitung Triplet Mining 无人要求:自主高品质图像编辑三线采矿 2507.14119v1 -
952 07-18 Quantum Boltzmann Machines using Parallel Annealing for Medical Image Classification Quantum Boltzmann Maschinen mit paralleler Abschirmung für medizinische Bildklassifikation 使用平行安内处理医疗图像分类的 量子波尔兹曼机器 2507.14116v1 -
953 07-18 An Adversarial-Driven Experimental Study on Deep Learning for RF Fingerprinting Eine adversariell-getriebene Experimentalstudie zum Deep Learning für RF-Fingerprinting 为RF指纹的深入学习进行反versarial-Driven实验研究 2507.14109v1 -
954 07-18 UGPL: Uncertainty-Guided Progressive Learning for Evidence-Based Classification in Computed Tomography UGPL: Ungewissheitsorientiertes Progressives Lernen für evidenzbasierte Klassifizierung in der berechneten Tomographie UGPL: 计算地形学循证分类的不确定性-指导渐进学习 2507.14102v1 -
955 07-18 On Logical Extrapolation for Mazes with Recurrent and Implicit Networks Über Logische Extrapolation für Labyrinthe mit recurrenten und impliziten Netzwerken 经常和隐含网络的磁带逻辑外推法 2410.03020v2 -
956 07-18 Multi-Centre Validation of a Deep Learning Model for Scoliosis Assessment Multi-Centre-Validierung eines Deep-Learning-Modells für Skoliose Assessment 多中心校验脊柱病评估深学习模型 2507.14093v1 -
957 07-18 Learning to Reason at the Frontier of Learnability Vernunft lernen an der Grenze der Lernfähigkeit 学习在可学习的前沿学习理性 2502.12272v5 -
958 07-18 Uncertainty-Aware Explanations Through Probabilistic Self-Explainable Neural Networks Ungewissheitsbewusste Erklärungen durch probabilistische selbsterklärbare neurale Netzwerke 通过概率性自我探索的神经神经网络的不确定性—- 软件解释 2403.13740v3 -
959 07-18 DPMT: Dual Process Multi-scale Theory of Mind Framework for Real-time Human-AI Collaboration DPMT: Dualer Prozess Multi-Skala Theorie des Geistes Rahmen für Echtzeit Mensch-AI-Kollaboration DPMT: 人类-AI实时合作的多规模思维框架的多层次理论 2507.14088v1 -
960 07-18 DENSE: Longitudinal Progress Note Generation with Temporal Modeling of Heterogeneous Clinical Notes Across Hospital Visits DENSE: Longitudinal Progress Note Generation mit zeitlicher Modellierung von heterogenen klinischen Anmerkungen über Krankenhausbesuche hinweg DENS: 医院全程探视不同临床诊断说明的实时建模纵向进展说明的生成 2507.14079v1 -
961 07-18 Glucose-ML: A collection of longitudinal diabetes datasets for development of robust AI solutions Glucose-ML: Sammlung von Längsschnittdatensätzen für die Entwicklung robuster KI-Lösungen Glucose-ML:收集纵向糖尿病数据集,以制定稳健的AI解决方案 2507.14077v1 -
962 07-18 Statistical and Computational Guarantees of Kernel Max-Sliced Wasserstein Distances Statistische und rechnerische Garantien von Kern Max-Sliced Wasserstein Distanzen 内核断层断层断层瓦色斯坦距离的统计和计算保障 2405.15441v4 -
963 07-18 Critiques of World Models Kritik an Weltmodellen 世界模式的证明 2507.05169v2 -
964 07-18 The Duality of Generative AI and Reinforcement Learning in Robotics: A Review Die Dualität des Generativen KI- und Verstärkungslernens in der Robotik: Ein Rückblick 机器人学创性人工智能和强化学习的质量:审查 2410.16411v2 -
965 07-18 Preference-based Multi-Objective Reinforcement Learning Präferenzbasiertes Mehrziel-Verstärkungs-Lernen 以优惠为基础的多目标强化学习 2507.14066v1 -
966 07-18 Architect of the Bits World: Masked Autoregressive Modeling for Circuit Generation Guided by Truth Table Architekt der Bits-Welt: Masked Autoregressive Modellierung für Schaltungsgeneration von Truth Table geführt Bits World 建筑师:真相表引导电路生成的蒙面自动递减模型 2502.12751v2 -
967 07-18 Step-DAD: Semi-Amortized Policy-Based Bayesian Experimental Design Schritt-DAD: Semi-amortisiertes politikbasiertes Bayesian Experimental Design 渐进式DAD:半统一政策基巴伊斯实验设计 2507.14057v1 -
968 07-18 Noradrenergic-inspired gain modulation attenuates the stability gap in joint training Noradrenergisch inspirierte Gain Modulation dämpft die Stabilitätslücke im gemeinsamen Training 调整适应,缩小联合培训中的稳定差距 2507.14056v1 -
969 07-18 D2IP: Deep Dynamic Image Prior for 3D Time-sequence Pulmonary Impedance Imaging D2IP: Deep Dynamic Image Prior für 3D-Zeitsequenz Pulmonäre Impedanz-Imaging D2IP: 3D 时间序列肺阻力成像前深动态图像 2507.14046v1 -
970 07-18 DONUT: Physics-aware Machine Learning for Real-time X-ray Nanodiffraction Analysis DONUT: Physik-bewusstes maschinelles Lernen für Echtzeit-Röntgen-Nanodiffraktionsanalyse DONUT: 实时X射线纳米中伤分析物理意识机器学习 2507.14038v1 -
971 07-18 QuantEIT: Ultra-Lightweight Quantum-Assisted Inference for Chest Electrical Impedance Tomography QuantEIT: Ultraleichte Quantum-Assistente Schlussfolgerung für die elektrische Impedanztomographie im Brustkorb QautEIT: 胸前电气阻碍肿瘤学超重量量量辅助量子推断 2507.14031v1 -
972 07-18 Equivalent and Compact Representations of Neural Network Controllers With Decision Trees Gleichwertige und kompakte Darstellungen von neuralen Netzwerkcontrollern mit Entscheidungsbäumen 神经网络主计长与决策树的等效和契约代表 2304.06049v3 -
973 07-18 Conformalized Regression for Continuous Bounded Outcomes Conformalisierte Regression für kontinuierliche geschlossene Ergebnisse 持续受损害结果的正规回归 2507.14023v1 -
974 07-18 CPC-CMS: Cognitive Pairwise Comparison Classification Model Selection Framework for Document-level Sentiment Analysis CPC-CMS: Kognitives Paarweises Vergleichs-Klassifikation Modellauswahl-Framework für Dokument-Level-Sentimentanalyse CPC-CMS:文件级别感知分析文件级别感应分析的认知对称比较比较分类示范选择框架 2507.14022v1 -
975 07-18 Byzantine-resilient federated online learning for Gaussian process regression Byzantinisch-resilient föderiertes Online-Lernen für Gaußsche Prozessregression Byzantine抗拜占庭弹性联邦联盟在线学习,促进高斯进程回归 2507.14021v1 -
976 07-18 Efficient Temporal Tokenization for Mobility Prediction with Large Language Models Effiziente zeitliche Tokenisierung für Mobilitätsvorhersage mit großen Sprachmodellen 具有大语言模式的流动预测高效时时适调 2507.14017v1 -
977 07-18 On the Fundamental Limitations of Dual Static CVaR Decompositions in Markov Decision Processes Über die grundlegenden Einschränkungen der dualen statischen CVaR-Zersetzungen in Markov-Entscheidungsprozessen 关于Markov决定程序中双重静态CVaR分解的基本限制 2507.14005v1 -
978 07-18 Multi-Objective Reinforcement Learning for Adaptable Personalized Autonomous Driving Multi-Zielives Stärkungslernen für anpassungsfähiges, personalisiertes autonomes Fahren 适应性个性自主驾驶多目标强化学习 2505.05223v2 -
979 07-18 ParallelTime: Dynamically Weighting the Balance of Short- and Long-Term Temporal Dependencies ParallelTime: Dynamische Gewichtung der Balance von kurz- und langfristigen zeitlichen Abhängigkeiten 平行时间:动态加权短期和长期时间依赖的平衡 2507.13998v1 -
980 07-18 Machine learning applications in archaeological practices: a review Anwendungen des maschinellen Lernens in archäologischen Praktiken: eine Rezension 考古学实践中的机械学习应用:审查 2501.03840v3 -
981 07-18 $ε$-rank and the Staircase Phenomenon: New Insights into Neural Network Training Dynamics $ε$-rank und das Staircase-Phänomen: Neue Einblicke in die neurale Netzwerk-Trainingsdynamik 美元-先令和阶梯现象:对神经网络培训动态的新透视 2412.05144v3 -
982 07-18 Structural Connectome Harmonization Using Deep Learning: The Strength of Graph Neural Networks Structural Connectome Harmonization Using Deep Learning: Die Stärke von Graph Neuronalen Netzwerken 利用深层学习实现结构连接统一:图表神经网络的实力 2507.13992v1 -
983 07-18 Agentic Neural Networks: Self-Evolving Multi-Agent Systems via Textual Backpropagation Agentische Neuronale Netzwerke: Selbstständige Multi-Agenten-Systeme über textuelle Backpropagation 动态神经网络:通过文字反向分析实现自我演进的多行为者系统 2506.09046v2 -
984 07-18 Interpretable Imitation Learning via Generative Adversarial STL Inference and Control Interpretable Imitation Lernen über generative Adversariale STL-Inferenz und -Kontrolle 通过产生反逆反生成的STL 推断与控制进行可解释的模拟学习 2402.10310v2 -
985 07-18 Ev2R: Evaluating Evidence Retrieval in Automated Fact-Checking Ev2R: Evidence Retrieval im automatisierten Fact-Checking bewerten Ev2R:评价自动实况调查中的证据检索 2411.05375v2 -
986 07-18 Signs of the Past, Patterns of the Present: On the Automatic Classification of Old Babylonian Cuneiform Signs Zeichen der Vergangenheit, Muster der Gegenwart: Auf der automatischen Klassifikation der alten babylonischen Kuneiform Zeichen 过去的迹象,现在的模式:关于旧巴比伦古代古代古代符号的自动分类 2507.13959v1 -
987 07-18 DUALRec: A Hybrid Sequential and Language Model Framework for Context-Aware Movie Recommendation DUALRec: Ein hybrides Sequenz- und Sprachmodell-Framework für die Kontext-Bewusste Film-Empfehlung AUALRec:背景软件电影建议混合顺序和语言示范框架 2507.13957v1 -
988 07-18 Robust Anomaly Detection with Graph Neural Networks using Controllability Robuste Anomalieerkennung mit Graphen-Neuralen Netzen mit Kontrollierbarkeit 使用可控性对图形神经网络进行强力异常探测 2507.13954v1 -
989 07-18 MoDyGAN: Combining Molecular Dynamics With GANs to Investigate Protein Conformational Space MoDyGAN: Kombination molekularer Dynamik mit GANs zur Untersuchung des Proteinkonformationsraums MODYGAN:将分子动态与GANs相结合,以调查蛋白质变形空间 2507.13950v1 -
990 07-18 Generalist Forecasting with Frozen Video Models via Latent Diffusion Generalist Prognose mit gefrorenen Videomodellen über Latent Diffusion 利用冷冻视频模型通过冷冻传播进行一般预测 2507.13942v1 -
991 07-18 Machine-Learning Analysis of Radiative Decays to Dark Matter at the LHC Machine-Learning-Analyse von Strahlungsdefekten zur Dunklen Materie am LHC LHC实验室辐射衰减到黑暗物质的机学分析 2410.13799v3 -
992 07-18 Two-Stage Pretraining for Molecular Property Prediction in the Wild Zweistufige Vorschulung für molekulare Property Prediction in the Wild 野生生物分子财产预测两阶段培训前 2411.03537v2 -
993 07-18 Reframing attention as a reinforcement learning problem for causal discovery Widerspenstige Aufmerksamkeit als Verstärkungs-Lernproblem für kausale Entdeckung 将注意力重新定位为因果发现的一个强化学习问题 2507.13920v1 -
994 07-18 Generalization in Reinforcement Learning for Radio Access Networks Generalisierung im Ausbau-Lernen für Funkzugangsnetze 无线电接入网络强化学习一般化 2507.06602v2 -
995 07-18 Self-supervised learning on gene expression data Selbstüberwachtes Lernen über Genexpressionsdaten 自我监督的基因表达数据学习 2507.13912v1 -
996 07-18 LOCUS: LOcalization with Channel Uncertainty and Sporadic Energy LOCUS: LOcalization mit Kanalunsicherheit und sporadischer Energie LOCUS: 与频道不确定和零散能源的分级 2302.09409v3 -
997 07-18 Deep Learning based Key Information Extraction from Business Documents: Systematic Literature Review Deep Learning based Key Information Extraction from Business Documents: Systematic Literature Review 从商业文件中提取的基于深学习的关键信息:系统文献审查 2408.06345v2 -
998 07-18 On the optimal approximation of Sobolev and Besov functions using deep ReLU neural networks Zur optimalen Annäherung von Sobolev- und Besov-Funktionen mittels tiefer ReLU-Neuralnetze 利用深RELU神经网络在Sobolev 和Besov 功能的最佳近似上使用深RELU神经网络 2409.00901v3 -
999 07-18 Improved DDIM Sampling with Moment Matching Gaussian Mixtures Verbesserte DDIM-Probenahme mit momentgenauen Gauß-Mischungen 改进DDIM抽样,与高山混合体相匹配的时速相匹配 2311.04938v3 -
1000 07-18 Explainable AI in Genomics: Transcription Factor Binding Site Prediction with Mixture of Experts Erklärbare KI in der Genomik: Transkriptionsfaktor Bindung Site Prediction mit Mischung von Experten 在基因组学中可解释的AI:与专家混合的转移要素约束性现场预测 2507.09754v2 -
1001 07-18 A Survey of Dimension Estimation Methods Ein Überblick über die Dimensionsschätzungsmethoden 尺寸估计方法调查 2507.13887v1 -
1002 07-18 Safety Certification in the Latent space using Control Barrier Functions and World Models Sicherheitszertifizierung im Latent-Raum mit Control Barrier-Funktionen und Weltmodellen 利用控制障碍功能和世界模型对低端空间使用控制障碍功能和世界模型进行安全认证 2507.13871v1 -
1003 07-18 Towards scientific discovery with dictionary learning: Extracting biological concepts from microscopy foundation models Auf dem Weg zu wissenschaftlicher Entdeckung mit Wörterbuch-Lernen: Gewinnung biologischer Konzepte aus Mikroskopie-Stiftungsmodellen 以字典学习实现科学发现:从显微镜基础模型中提取生物概念 2412.16247v3 -
1004 07-18 Recalibrating binary probabilistic classifiers Rekalibrierung von binären probabilistischen Klassifikatoren 重新计算二进制概率分解器 2505.19068v2 -
1005 07-18 Towards Regulated Deep Learning Auf dem Weg zu reguliertem Deep Learning 走向监管的深学习 1912.13122v8 -
1006 07-18 VA-MoE: Variables-Adaptive Mixture of Experts for Incremental Weather Forecasting VA-MoE: Variables-Adaptive Mischung von Experten für inkrementale Wettervorhersage VA-MoE:增量天气预报专家可变适应混合 2412.02503v2 -
1007 07-18 Linearized Diffusion Map Linearisierte Diffusionskarte 线状扩散地图 2507.14257v1 -
1008 07-18 Load Forecasting for Households and Energy Communities: Are Deep Learning Models Worth the Effort? Lastprognosen für Haushalte und Energiegemeinschaften: Sind Deep-Learning-Modelle die Mühe wert? 家庭和能源界的负载预测:深层学习模式值得努力吗? 2501.05000v5 -
1009 07-18 Conformal Data Contamination Tests for Trading or Sharing of Data Konforme Datenkontaminationstests für den Handel oder die Weitergabe von Daten 交换或分享数据的非正式数据污染测试 2507.13835v1 -
1010 07-18 Scalable Submodular Policy Optimization via Pruned Submodularity Graph Skalierbare submodulare Optimierung der Politik über Pruned Submodularity Graph 通过审慎次模块图实现可缩放子模块政策优化 2507.13834v1 -
1011 07-18 Question-Answer Extraction from Scientific Articles Using Knowledge Graphs and Large Language Models Frage-Antwort-Extraktion aus wissenschaftlichen Artikeln mit Wissensgraphen und großen Sprachmodellen 利用知识图和大语言模型从科学文章中提取问题答案 2507.13827v1 -
1012 07-18 Bridging Local and Global Knowledge via Transformer in Board Games Überbrückung von lokalem und globalem Wissen über Transformer in Brettspielen 通过棋盘运动会变换器连接地方和全球知识 2410.05347v2 -
1013 07-18 Demographic-aware fine-grained classification of pediatric wrist fractures Demografiebewusste feinkörnige Klassifizierung von pädiatrischen Handgelenkfrakturen 人口意识小儿科手腕骨折细细细分分类 2507.12964v2 -
1014 07-18 XpertAI: uncovering regression model strategies for sub-manifolds XpertAI: Aufdecken von Regressionsmodellstrategien für Submanifolds XpertAI:发现次奴隶皮回归示范战略 2403.07486v4 -
1015 07-18 DP2Unlearning: An Efficient and Guaranteed Unlearning Framework for LLMs DP2Unlearning: Ein effizientes und garantiertes Unlearning Framework für LLMs DP2重新学习:LLMM 高效和有保证的不学习框架 2504.13774v2 -
1016 07-18 Equilibrium Propagation for Learning in Lagrangian Dynamical Systems Equilibrium Propagation für das Lernen in lagrangischen dynamischen Systemen Lagrangian动态系统学习平衡促进平衡 2505.07363v3 -
1017 07-18 DiffGradCAM: A Universal Class Activation Map Resistant to Adversarial Training DiffGradCAM: Eine universelle Aktivierungskarte der Klasse, die dem adversarialen Training standhält DiffGradCAM: 通用级启动地图抗反向培训 2506.08514v2 -
1018 07-18 On-the-Fly Fine-Tuning of Foundational Neural Network Potentials: A Bayesian Neural Network Approach On-the-Fly Fine-Tuning von Grundlagen-Neural-Netzwerk-Potenziale: Ein bayesischer Neural-Netzwerk-Ansatz 基础神经网络潜力的实时微调调整:贝耶斯神经网络方法 2507.13805v1 -
1019 07-18 Exploiting Label Skewness for Spiking Neural Networks in Federated Learning Ausnutzung von Label Skewness für spikende neurale Netzwerke im Federated Learning 利用Label Sskwonence 用于联邦学习联盟的Spiking神经网络 2412.17305v3 -
1020 07-18 Feature Engineering is Not Dead: Reviving Classical Machine Learning with Entropy, HOG, and LBP Feature Fusion for Image Classification Feature Engineering is Not Dead: Wiederbelebung des klassischen maschinellen Lernens mit Entropie, HOG und LBP-Feature Fusion für die Bildklassifizierung 特色工程没有死:恢复古典机器学习与英音、HOG和LBP图像分类的特征融合 2507.13772v1 -
1021 07-18 Geometry-Informed Neural Networks Geometrie-informierte Neuronale Netzwerke 几何内建神经网络 2402.14009v4 -
1022 07-18 Insights into a radiology-specialised multimodal large language model with sparse autoencoders Einblicke in ein radiologisch spezialisiertes multimodales Großsprachmodell mit spärlichen Autoencodern 深入观察放射学专门化多式联运大型语言模型,无甚多的自动编码器 2507.12950v2 -
1023 07-18 Dual-Center Graph Clustering with Neighbor Distribution Dual-Center Graph Clustering mit Nachbarschaftsverteilung 与邻居分布相邻的双中心图集 2507.13765v1 -
1024 07-18 Learning to Reject Low-Quality Explanations via User Feedback Lernen, Low-Quality-Erklärungen per User Feedback abzulehnen 通过用户反馈学习拒绝低质量解释 2507.12900v2 -
1025 07-18 SIC: Similarity-Based Interpretable Image Classification with Neural Networks SIC: Ähnlichkeitsbasierte Interpretierbare Bildklassifikation mit neuralen Netzwerken SIC: 神经网络的基于相似性的解释性图像分类 2501.17328v3 -
1026 07-18 Convolution-weighting method for the physics-informed neural network: A Primal-Dual Optimization Perspective Convolution-Gewichtungsmethode für das physikinformierte neuronale Netzwerk: Eine primär-duale Optimierungsperspektive 物理学-知情神经网络的革命加权法:原始-多极优化视角 2506.19805v2 -
1027 07-18 A Simple Baseline for Stable and Plastic Neural Networks Eine einfache Basis für stabile und plastische Neuralnetze 稳定神经网络和可塑神经网络的简单基线 2507.10637v2 -
1028 07-18 Robustness Evaluation of Offline Reinforcement Learning for Robot Control Against Action Perturbations Robustheitsbewertung von Offline-Verstärkungslernen für die Robotersteuerung gegen Aktionsstörungen 对用于控制机器人控制行动干扰的离线强化学习的强力评价 2412.18781v2 -
1029 07-18 Search-Optimized Quantization in Biomedical Ontology Alignment Search-Optimierte Quantisierung in der biomedizinischen Ontologie Ausrichtung 生物医学肿瘤协调方面的搜索优化定量化 2507.13742v1 -
1030 07-18 SamGoG: A Sampling-Based Graph-of-Graphs Framework for Imbalanced Graph Classification SamGoG: Ein stichprobenbasierter Graph-of-Graphs-Rahmen für eine unausgewogene Graphenklassifikation SamGG: 以抽样为基础的图示图示图图示图分类框架 2507.13741v1 -
1031 07-18 Eye-tracked Virtual Reality: A Comprehensive Survey on Methods and Privacy Challenges Virtual Reality: Eine umfassende Umfrage zu Methoden und Datenschutz-Herausforderungen 双轨虚拟现实:关于方法和隐私挑战的全面调查 2305.14080v2 -
1032 07-18 Honesty in Causal Forests: When It Helps and When It Hurts Ehrlichkeit im Kausalwald: Wenn es hilft und wenn es weh tut Causal森林中的诚实:当它帮助时,当它伤害时 2506.13107v2 -
1033 07-18 An End-to-End DNN Inference Framework for the SpiNNaker2 Neuromorphic MPSoC Ein End-to-End DNN-Inferenz-Framework für den SpiNNaker2 Neuromorphic MMPSoC SpinNNAker2神经地态 MPSC 的端对端 DNN 推推框架 2507.13736v1 -
1034 07-18 Prompt-Tuning Bandits: Enabling Few-Shot Generalization for Efficient Multi-Task Offline RL Prompt-Tuning Bandits: Ermöglichung der wenigen scharfen Verallgemeinerung für effiziente Multi-Task Offline RL 即时派遣强盗:为高效的多任务离线转线 2502.06358v3 -
1035 07-18 The Judge Variable: Challenging Judge-Agnostic Legal Judgment Prediction Die Richtervariable: Herausfordernde Richter-agnostische rechtliche Urteilsvorhersage 法官变量:挑战法官-不可接受法律判决预测 2507.13732v1 -
1036 07-18 Adversarial Training Improves Generalization Under Distribution Shifts in Bioacoustics Adversariale Ausbildung verbessert Generalisierung unter Verteilungsverschiebungen in der Bioakustik 反向培训改进了生物精算学分布变化下的普及化 2507.13727v1 -
1037 07-18 FourCastNet 3: A geometric approach to probabilistic machine-learning weather forecasting at scale FourCastNet 3: Ein geometrischer Ansatz zur probabilistischen maschinellen Wettervorhersage im Maßstab 4CastNet 3: 大规模机学习气象预测概率的几何方法 2507.12144v2 -
1038 07-18 Tackling fake images in cybersecurity – Interpretation of a StyleGAN and lifting its black-box In Cybersecurity gefälschte Bilder zu packen – Interpretation eines StyleGAN und Aufhebung seiner Blackbox 在网络安全中处理假图像 – – StyleGAN 的解读和取消黑盒 2507.13722v1 -
1039 07-18 Graph-Structured Data Analysis of Component Failure in Autonomous Cargo Ships Based on Feature Fusion Graphstrukturierte Datenanalyse des Bauteilausfalls in autonomen Frachtschiffen basierend auf Feature Fusion 根据地貌分化对自主货运船舶部件故障进行图结构化数据分析 2507.13721v1 -
1040 07-18 Bi-GRU Based Deception Detection using EEG Signals Bi-GRU-basierte Erkennung mit EEG-Signalen Bi-GRU 使用 EEG 信号检测的基于Bi-GRU的欺骗性检测 2507.13718v1 -
1041 07-18 Benchmarking of EEG Analysis Techniques for Parkinson’s Disease Diagnosis: A Comparison between Traditional ML Methods and Foundation DL Methods Benchmarking von EEG-Analysetechniken für Parkinson-Krankheitsdiagnose: Ein Vergleich zwischen traditionellen ML-Methoden und Stiftungs-DL-Methoden Parkinson疾病诊断的EEG分析技术基准基准基准:传统ML方法与DL基础方法的比较 2507.13716v1 -
1042 07-18 LLaPipe: LLM-Guided Reinforcement Learning for Automated Data Preparation Pipeline Construction LLaPipe: LLM-geführtes Verstärkungslernen für die automatisierte Datenvorbereitung Pipeline-Konstruktion LLaPipe:LLM-指导强化学习,用于自动数据准备管道建设 2507.13712v1 -
1043 07-18 MuteSwap: Visual-informed Silent Video Identity Conversion MuteSwap: Visuell informierte Silent Video Identity Conversion MuteSwap: 视觉知情的静音视频身份转换 2507.00498v2 -
1044 07-18 CogniQ-H: A Soft Hierarchical Reinforcement Learning Paradigm for Automated Data Preparation CogniQ-H: Ein weiches Hierarchisches Verstärkungs-Lernparadigma für die automatisierte Datenvorbereitung CogniQ-H: 用于自动编制数据的软级级强化学习模型 2507.13710v1 -
1045 07-18 To Code or not to Code? Adaptive Tool Integration for Math Language Models via Expectation-Maximization Um zu kodieren oder nicht zu kodieren? Adaptive Toolintegration für Math Language Models über Erwartungs-Maximierung 代码或非代码?通过期望-最大化将数学语言模型整合的适应性工具集成 2502.00691v4 -
1046 07-18 Mitigating Goal Misgeneralization via Minimax Regret Zielverallgemeinerung durch Minimax-Regret abmildern 通过Minimmax Regret 推广 2507.03068v2 -
1047 07-18 Improving DAPO from a Mixed-Policy Perspective Verbesserung der DAPO aus gemischter Politik 从混合政策角度改进残疾和残疾人组织 2507.12931v2 -
1048 07-18 MoTM: Towards a Foundation Model for Time Series Imputation based on Continuous Modeling MoTM: Auf dem Weg zu einem Basismodell für Zeitreihen Imputation basierend auf kontinuierlicher Modellierung MoTM:建立基于连续建模的时间序列计算基础模型 2507.13207v2 -
1049 07-18 Policy Verification in Stochastic Dynamical Systems Using Logarithmic Neural Certificates Politikprüfung in stochastischen dynamischen Systemen mit logarithmischen Neuralzertifikaten 使用对数神经神经证书进行斯托卡动态系统的政策核查 2406.00826v4 -
1050 07-18 Learning Deformable Body Interactions With Adaptive Spatial Tokenization Verformbare Körperinteraktionen mit adaptiver räumlicher Tokenisierung lernen 学习与适应性空间拳击的变形身体互动 2507.13707v1 -
1051 07-18 EgoVLA: Learning Vision-Language-Action Models from Egocentric Human Videos EgoVLA: Vision-Language-Action-Modelle von egozentrischen menschlichen Videos lernen EgoVLA:从以以地球为中心的人类视频中学习愿景-语言-行动模式 2507.12440v3 -
1052 07-18 Binarizing Physics-Inspired GNNs for Combinatorial Optimization Verbindliche Physik-inspirierte GNNs für die kombinatorische Optimierung 联合优化的由物理启发的GNNs 2507.13703v1 -
1053 07-18 Can we ease the Injectivity Bottleneck on Lorentzian Manifolds for Graph Neural Networks? Können wir den Injektivitätsengpass auf Lorentzian Manifolds für Graphen-Neural-Netzwerke erleichtern? 我们能否为图形神经网络 减轻Lorentzian Manifolds的 射入波特内克? 2504.00142v5 -
1054 07-18 Tight Bounds for Answering Adaptively Chosen Concentrated Queries Enge Grenzen für die Antwort auf adaptiv ausgewählte konzentrierte Abfragen 用于回答适应性选择的集中查询的紧闭环环环 2507.13700v1 -
1055 07-18 FedDifRC: Unlocking the Potential of Text-to-Image Diffusion Models in Heterogeneous Federated Learning FedDifRC: Entsperren des Potenzials von Text-zu-Bild-Diffusionsmodellen im Heterogenen Federated Learning FedDifRC:在异质联邦学习中释放文本到图像传播模型的潜力 2507.06482v2 -
1056 07-18 An AI-powered Technology Stack for Solving Many-Electron Field Theory Ein KI-powered Technologie Stack für die Lösung von Viel-Elektronen-Feld-Theorie 用于解决多电场理论的AI-动力技术堆叠 2403.18840v2 -
1057 07-18 Kolmogorov-Arnold Networks-based GRU and LSTM for Loan Default Early Prediction Kolmogorov-Arnold Networks-basierte GRU und LSTM für Kredit-Standard-Frühvorhersage Kolmogorov-Arnold网络基于GRU和LSTM的贷款默认早期预测 2507.13685v1 -
1058 07-18 FireQ: Fast INT4-FP8 Kernel and RoPE-aware Quantization for LLM Inference Acceleration FireQ: Schnelle INT4-FP8-Kernel- und RoPE-gestützte Quantisierung für LLM-Inferenzbeschleunigung 消防:快速INT4-FFP8 内核和ROPE-感知的LLM 推推加速量 2505.20839v3 -
1059 07-18 MUSO: Achieving Exact Machine Unlearning in Over-Parameterized Regimes MUSO: Exaktes Lernen der Maschine in überparameterisierten Regimes MUSO:在过度测量制度中实现精确的机械脱学 2410.08557v2 -
1060 07-18 HeCoFuse: Cross-Modal Complementary V2X Cooperative Perception with Heterogeneous Sensors HeCoFuse: Cross-Modal Complementary V2X kooperative Wahrnehmung mit Heterogenen Sensoren HEFuse:跨模式补充V2X合作感知与异源感应器 2507.13677v1 -
1061 07-18 Complex non-backtracking matrix for directed graphs Komplexe Nicht-Rückverfolgungsmatrix für gerichtete Graphen 定向图表的复杂非后跟踪矩阵表 2507.12503v2 -
1062 07-18 Breaking the Illusion of Security via Interpretation: Interpretable Vision Transformer Systems under Attack Breaking the Illusion of Security via Interpretation: Interpretable Vision Transformer Systeme unter Angriff 通过解释打破对安全的幻觉:被攻击的可解释的愿景变形系统 2507.14248v1 -
1063 07-18 When Person Re-Identification Meets Event Camera: A Benchmark Dataset and An Attribute-guided Re-Identification Framework Wenn Person Re-Identification auf Ereigniskamera trifft: Ein Benchmark-Datensatz und ein Attribut-geführtes Re-Identification Framework 当人员重新确认与事件相遇时:基准数据集和属性指导的重新确定框架 2507.13659v1 -
1064 07-18 Towards Foundation Models for Experimental Readout Systems Combining Discrete and Continuous Data Auf dem Weg zu Grundlagenmodellen für experimentelle Auslesesysteme zur Kombination von diskreten und kontinuierlichen Daten 建立分立和连续数据合并的实验读出系统基础模型 2505.08736v2 -
1065 07-18 A Comprehensive Review of Transformer-based language models for Protein Sequence Analysis and Design Eine umfassende Überprüfung von Transformer-basierten Sprachmodellen für Proteinsequenzanalyse und -design 全面审查以变换器为基础的蛋白序列分析和设计语言模型 2507.13646v1 -
1066 07-18 The Illusion of Thinking: Understanding the Strengths and Limitations of Reasoning Models via the Lens of Problem Complexity Die Illusion des Denkens: Die Stärken und Grenzen von Vernunftmodellen über das Lens of Problem Complexity verstehen 思考的幻觉:通过问题复杂焦点了解理性模型的长处和局限性 2506.06941v2 -
1067 07-18 KEPLA: A Knowledge-Enhanced Deep Learning Framework for Accurate Protein-Ligand Binding Affinity Prediction KEPLA: Ein Wissen-erweitertes Deep Learning Framework für präzise Protein-Ligand Bindung Affinity Prediction KEPLA:一个知识强化的更深层学习框架,用于准确预测蛋白-银-捆绑性近亲关系 2506.13196v3 -
1068 07-18 Differential Privacy in Kernelized Contextual Bandits via Random Projections Differentielle Privatsphäre in Kernelisierten Kontext Bandits über Random Projektionen 通过随机预测在核心环境强盗中的不同隐私 2507.13639v1 -
1069 07-18 State Space Models Naturally Produce Traveling Waves, Time Cells, and Scale to Abstract Cognitive Functions State Space Models erzeugen natürlich reisende Wellen, Zeitzellen und skalieren zu abstrakten kognitiven Funktionen 自然产生旅行波、时格和按抽象认知功能的尺度衡量的自然生成空间模型 2507.13638v1 -
1070 07-18 Large Language Models in Cybersecurity: Applications, Vulnerabilities, and Defense Techniques Große Sprachmodelle in Cybersecurity: Anwendungen, Schwachstellen und Verteidigungstechniken 网络安全大语言模式:应用、脆弱性和国防技术 2507.13629v1 -
1071 07-18 FedSkipTwin: Digital-Twin-Guided Client Skipping for Communication-Efficient Federated Learning FedSkipTwin: Digital-Twin-geführter Client Skipping für kommunikatives und effizientes Federated Learning FedSkipTwin: 数字双向指导客户跳过客户端, 用于沟通高效的联邦学习 2507.13624v1 -
1072 07-18 Deep Q-Learning with Gradient Target Tracking Deep Q-Learning mit gradientem Target Tracking 与渐进目标跟踪进行深度学习 2503.16700v3 -
1073 07-18 ZKP-FedEval: Verifiable and Privacy-Preserving Federated Evaluation using Zero-Knowledge Proofs ZKP-FedEval: Überprüfbare und datenschutzschonende Federated Evaluation mit Null-Wissensnachweisen ZKP-FedEval:使用零知识证明进行可核查和隐私保护的联邦评价 2507.11649v2 -
1074 07-18 Merge Kernel for Bayesian Optimization on Permutation Space Zusammenführen Kernel für Bayesian Optimierung auf Permutationsraum Bayesian Permodation 空间优化合并核心圈 2507.13263v2 -
1075 07-18 Off-Policy Evaluation and Learning for Matching Markets Off-Policy-Evaluierung und -Lernen für Matching-Märkte 非政策评价和学习以匹配市场 2507.13608v1 -
1076 07-18 Improving Low-Cost Teleoperation: Augmenting GELLO with Force Verbesserung der Low-Cost-Teleoperation: GELLO mit Kraft erweitern 改进低费技术合作:加强GELLLO 2507.13602v1 -
1077 07-18 Position: Untrained Machine Learning for Anomaly Detection by using 3D Point Cloud Data Position: Untrainiertes maschinelles Lernen zur Erkennung von Anomalien durch Verwendung von 3D-Punkt-Cloud-Daten 位置: 使用 3D 点云数据进行异常检测的未经训练的机器学习 2502.03876v2 -
1078 07-18 Accelerating RF Power Amplifier Design via Intelligent Sampling and ML-Based Parameter Tuning Beschleunigung des RF-Leistungsverstärkers über intelligente Probenahme und ML-basierte Parameter-Tuning 通过智能取样和以 ML 为基础的参数图集加速 RF 功率放大器设计 2507.11928v2 -
1079 07-18 GIFT: Gradient-aware Immunization of diffusion models against malicious Fine-Tuning with safe concepts retention GIFT: Gradient-aware Immunisierung von Diffusionsmodellen gegen bösartiges Fein-Tuning mit sicherer Konzeptbindung GIFT: 逐步对防止恶意微调的传播模式进行逐步免疫免疫,并保留安全概念 2507.13598v1 -
1080 07-18 An Empirical Risk Minimization Approach for Offline Inverse RL and Dynamic Discrete Choice Model Ein empirischer Risikominimierungsansatz für Offline-Inverse-RL- und Dynamische Diskrete-Choice-Modell 离线反转转流和动态分辨选择模式的经验风险最小化办法 2502.14131v4 -
1081 07-18 BLAST: A Stealthy Backdoor Leverage Attack against Cooperative Multi-Agent Deep Reinforcement Learning based Systems BLAST: Eine stealthy Hintertür Hebelangriff gegen kooperative Multi-Agent Deep-Verstärkung-Learning-basierte Systeme BLAST:对基于合作的多机构加强深层强化学习系统的隐秘后门利用攻击 2501.01593v2 -
1082 07-18 AI-Accelerated Flow Simulation: A Robust Auto-Regressive Framework for Long-Term CFD Forecasting KI-beschleunigte Flusssimulation: Robustes Auto-Regressives Framework für langfristige CFD-Prognose AI-加速流动模拟:长期CFD预测的强有力的自动递减框架 2412.05657v3 -
1083 07-18 FuSeFL: Fully Secure and Scalable Cross-Silo Federated Learning FuSeFL: Vollsicheres und skalierbares Cross-Silo-Federated Learning FFSFL: 完全安全和可缩放的跨西罗联邦学习 2507.13591v1 -
1084 07-18 A million-scale dataset and generalizable foundation model for nanomaterial-protein interactions Ein millionengroßes Datensatz- und verallgemeinerbares Fundamentmodell für Nanomaterial-Protein-Interaktionen 关于纳米材料-蛋白质相互作用的百万尺度数据集和通用基础模型 2507.14245v1 -
1085 07-17 (4) Learning Pluralistic User Preferences through Reinforcement Learning Fine-tuned Summaries Pluralistische Benutzerpräferenzen durch Verstärkung lernen Feinabstimmungen lernen 通过强化学习,精调简编,通过强化学习,提供多元学习用户首选 2507.13579v1 -
1086 07-17 Apple Intelligence Foundation Language Models: Tech Report 2025 Apple Intelligence Foundation Sprachmodelle: Tech Report 2025 苹果情报基金会语言模式:2025年技术报告 2507.13575v1 -
1087 07-17 Generative Deep Learning Framework for Inverse Design of Fuels Generatives Deep-Learning-Framework für das Inverse Design von Kraftstoffen 燃料反向设计生成深深学习框架 2504.12075v2 -
1088 07-17 Understanding Reasoning in Thinking Language Models via Steering Vectors Verständnis von Vernunft im Denken von Sprachmodellen über Lenkungs-Vektoren 通过指导矢量来理解思考语言模式的理由 2506.18167v3 -
1089 07-17 An Approach for Auto Generation of Labeling Functions for Software Engineering Chatbots Ein Ansatz zur automatischen Generierung von Beschriftungsfunktionen für Software Engineering Chatbots 软件工程聊天器自动生成标签功能的方法 2410.07094v2 -
1090 07-17 Change of Thought: Adaptive Test-Time Computation Gedankenwechsel: Adaptive Test-Time Computation 改变思想:适应性试验时间计算 2507.13569v1 -
1091 07-17 Why Isn’t Relational Learning Taking Over the World? Warum übernimmt das relationale Lernen nicht die Welt? 为什么关系学习不超越世界? 2507.13558v1 -
1092 07-17 Time Series Forecastability Measures Zeitreihen Vorausschätzungsmaßnahmen 时间序列 时间序列 时间序列 时间序列 时间序列 时间序列 时间序列 时间序列 时间序列 时间序列 时间序列 时间序列 时间序列 2507.13556v1 -
1093 07-17 Loss-Complexity Landscape and Model Structure Functions Verlust-Komplexität Landschaft und Modellstruktur Funktionen 地形和模型结构功能 2507.13543v1 -
1094 07-17 Instruct-MusicGen: Unlocking Text-to-Music Editing for Music Language Models via Instruction Tuning Instruct-MusicGen: Entsperren von Text-zu-Musik-Editing für Musik Sprachmodelle über Instruction Tuning 指令-音乐Gen:通过指令调制解锁文字到音乐编辑音乐语言模型 2405.18386v3 -
1095 07-17 Acoustic Index: A Novel AI-Driven Parameter for Cardiac Disease Risk Stratification Using Echocardiography Akustischer Index: Ein neuartiger KI-getriebener Parameter für die Risikoteilung von Herzerkrankungen mittels Echokardiographie 声学指数:使用心心电图进行心电图分析的心病风险分解的新AI-Driven参数 2507.13542v1 -
1096 07-17 Provable Low-Frequency Bias of In-Context Learning of Representations Wahrscheinliche frequenzarme Bias des In-Context-Lernens von Repräsentationen 可实现的低公平率代表制的理论内学习 2507.13540v1 -
1097 07-17 How Not to Detect Prompt Injections with an LLM Wie man Injektionen mit einem LLM nicht erkennen kann 如何不用LLM检测快速注射 2507.05630v2 -
1098 07-17 Sugar-Beet Stress Detection using Satellite Image Time Series Sugar-Beet-Stress-Erkennung mit Satellitenbild-Zeitreihe 利用卫星图像图像时间序列检测糖甜甜豆应激反应 2507.13514v1 -
1099 07-17 Inverse Synthetic Aperture Fourier Ptychography Inverse Synthetische Blende Fourier Ptychographie 反向合成孔径孔径 2507.03733v2 -
1100 07-17 PHASE: Passive Human Activity Simulation Evaluation PHASE: Passive Simulation der menschlichen Aktivität PHASE:被动的人类活动模拟评价 2507.13505v1 -
1101 07-17 Gradient Descent Finds Over-Parameterized Neural Networks with Sharp Generalization for Nonparametric Regression Gradient Descent findet überparameterisierte neurale Netzwerke mit scharfer Generalisierung für nichtparametrische Regression 梯度梯度下发现超计神经网络,具有非参数回归的锐化概括化 2411.02904v4 -
1102 07-17 SpecMaskFoley: Steering Pretrained Spectral Masked Generative Transformer Toward Synchronized Video-to-audio Synthesis via ControlNet SpecMaskFoley: Steuerung vortrainierter Spektralmasken Generativer Transformer zum synchronisierten Video-zu-Audio-Synthese über ControlNet SpecMaskFoley:通过控制网实现同步录相合成 2505.16195v2 -
1103 07-17 Model-free Reinforcement Learning for Model-based Control: Towards Safe, Interpretable and Sample-efficient Agents Modellfreies Verstärkungslernen für modellbasierte Steuerung: Auf dem Weg zu sicheren, interpretierbaren und mustereffizienten Agenten 示范式控制示范性强化学习:建立安全、可解释和高效采样的代用品 2507.13491v1 -
1104 07-17 ParaPO: Aligning Language Models to Reduce Verbatim Reproduction of Pre-training Data ParaPO: Sprachmodelle so ausrichten, dass verbatime Reproduktion von Vortrainingsdaten reduziert wird ParaPO:调整语文模式,减少培训前数据的逐字记录 2504.14452v2 -
1105 07-17 Neural Architecture Search with Mixed Bio-inspired Learning Rules Neurale Architektur Suche mit gemischten bio-inspirierten Lernregeln 具有混合生物启发混合学习规则的神经结构搜索 2507.13485v1 -
1106 07-17 Improving Out-of-distribution Human Activity Recognition via IMU-Video Cross-modal Representation Learning Verbesserung der außerbetrieblichen Anerkennung menschlicher Tätigkeiten durch IMU-Video Cross-modal Representative Learning 通过IMU-Video跨模式代表性学习,改善对人的活动在分配外的认可 2507.13482v1 -
1107 07-17 Multiresolution local smoothness detection in non-uniformly sampled multivariate signals Multiauflösende lokale Glättedetektion in nicht einheitlich abgetasteten multivariaten Signalen 在非统一抽样的多变量信号中多分辨率多分辨率局部平稳探测 2507.13480v1 -
1108 07-17 psifx – Psychological and Social Interactions Feature Extraction Package psifx – Psychologische und soziale Interaktionen Feature Extraction Package psifx – – 心理和社会互动 2407.10266v4 -
1109 07-17 Base3: a simple interpolation-based ensemble method for robust dynamic link prediction Base3: eine einfache, interpolationsbasierte Ensemble-Methode für robuste dynamische Link-Vorhersage 基数3:一种简单的基于内插的共合方法,用于稳健动态链接预测 2506.12764v2 -
1110 07-17 Graph Neural Network Surrogates for Contacting Deformable Bodies with Necessary and Sufficient Contact Detection Graph Neural Network Surrogates für Kontakt mit deformierbaren Körpern mit notwendiger und ausreichender Kontakterkennung 与必要和足够接触检测器接触变形机体的神经网络代号 2507.13459v1 -
1111 07-17 Domain-randomized deep learning for neuroimage analysis Domain-randomisiertes Deep Learning für Neuroimage-Analysen 用于神经影像分析的内地随机深层学习 2507.13458v1 -
1112 07-17 Hierarchical Rectified Flow Matching with Mini-Batch Couplings Hierarchischer rektifizierter Fluss passend zu Mini-Batch-Kupplungen 与小批量相匹配的梯级校正流程 2507.13350v1 -
1113 07-17 VisionThink: Smart and Efficient Vision Language Model via Reinforcement Learning VisionThink: Intelligentes und effizientes Vision-Sprachmodell durch Verstärkungslernen 远景设想:通过强化学习建立聪明、高效的愿景语言模式 2507.13348v1 -
1114 07-17 Latent Policy Steering with Embodiment-Agnostic Pretrained World Models Latent Policy Steering mit prätrainierten Weltmodellen der Embodiment-Agnostik 与Embodiment-Agnnocistic未受训练世界模型的原始政策指导 2507.13340v1 -
1115 07-17 Training Transformers with Enforced Lipschitz Constants Trainingstransformatoren mit verstärkter Lipschitz-Konstanten 培训具有强制立利普施茨常数的变革者 2507.13338v1 -
1116 07-17 GeoReg: Weight-Constrained Few-Shot Regression for Socio-Economic Estimation using LLM GeoReg: Gewicht-beschränkt Wenig-heiße Regression für sozioökonomische Abschätzung mit LLM Georg: 使用LLM法理学模型,为社会经济估算而进行微慢回归,但受重力约束的微弱回缩 2507.13323v1 -
1117 07-17 Federated Learning: A Survey on Privacy-Preserving Collaborative Intelligence Föderiertes Lernen: Eine Umfrage zum Datenschutz-Schutz Kollaborativer Intelligenz 联邦学习:保护隐私合作情报调查 2504.17703v2 -
1118 07-17 Learning from Reward-Free Offline Data: A Case for Planning with Latent Dynamics Models Von reward-free Offline-Daten lernen: Ein Fall für die Planung mit latenten Dynamics-Modellen 从无回报脱离线数据中学习:利用隐时动态模型进行规划的一个案例 2502.14819v2 -
1119 07-17 Boosting Team Modeling through Tempo-Relational Representation Learning Teammodellierung durch Tempo-Relationales Repräsentationslernen fördern 通过Tempo-关系代表制学习促进团队模拟 2507.13305v1 -
1120 07-17 Retraining-Free Merging of Sparse MoE via Hierarchical Clustering Retraining-Free Merging von Sparse MoE über Hierarchical Clustering 通过等级式集束式集成,无培训地重新合并粗微中小部 2410.08589v3 -
1121 07-17 Advancing Seasonal Prediction of Tropical Cyclone Activity with a Hybrid AI-Physics Climate Model Förderung der saisonalen Vorhersage Tropischer Zyklonaktivität mit einem Hybrid-KI-Physik-Klimamodell 采用AI-物理混合气候模型推进热带气旋活动季节性预测 2505.01455v2 -
1122 07-17 SIDDA: SInkhorn Dynamic Domain Adaptation for Image Classification with Equivariant Neural Networks SIDDA: SInkhorn Dynamische Domain-Anpassung für die Bildklassifizierung mit Gleichwertigen Neuronalen Netzwerken SIDDA: 利用等质神经网络进行图像分类的SInkhorn动态域域适应 2501.14048v2 -
1123 07-17 Air Traffic Controller Task Demand via Graph Neural Networks: An Interpretable Approach to Airspace Complexity Flugverkehrskontroller Auftragsnachfrage über Graph Neural Networks: Ein interpretierbarer Ansatz für die Komplexität des Luftraums 通过图形神经网络的空中交通管制主计长任务需求:对空气空间复杂度的一种解释性办法 2507.13423v1 -
1124 07-17 crowd-hpo: Realistic Hyperparameter Optimization and Benchmarking for Learning from Crowds with Noisy Labels crowd-hpo: Realistische Hyperparameter-Optimierung und Benchmarking zum Lernen von Crowds mit Noisy-Labels 现实主义超超参数最佳化和基准化,用噪音标签从人群中学习 2504.09085v2 -
1125 07-17 Optimal Empirical Risk Minimization under Temporal Distribution Shifts Optimale Empirische Risikominimierung unter zeitlichen Verteilungsverschiebungen 时间分布变化下最佳实证风险最小化 2507.13287v1 -
1126 07-17 Stochastic Weakly Convex Optimization Under Heavy-Tailed Noises Stochastisch schwache Konvex-Optimierung unter schwerfälligen Geräuschen 在重故障噪音下优化 2507.13283v1 -
1127 07-17 Generative Diffusion Models for Resource Allocation in Wireless Networks Generative Diffusionsmodelle zur Ressourcenallokation in drahtlosen Netzwerken 无线网络资源分配生成传播模型 2504.20277v2 -
1128 07-17 Evaluating Reinforcement Learning Algorithms for Navigation in Simulated Robotic Quadrupeds: A Comparative Study Inspired by Guide Dog Behaviour Bewertung von Stärkungslernen Algorithmen für die Navigation in simulierten Roboter-Quadrupen: Eine vergleichende Studie, inspiriert von Guide Dog Behaviour 评价模拟机器人四重干扰模拟机器人四重干扰导航中强化学习的教学比值:受导狗行为启发的比较研究 2507.13277v1 -
1129 07-17 Do you know what q-means? Weißt du, was q-bedeutet? 你知道什么是q - means吗? 2308.09701v3 -
1130 07-17 Automating Steering for Safe Multimodal Large Language Models Automatisierungslenkung für sichere multimodale große Sprachmodelle 安全多式联运大语言模式自动化指导 2507.13255v1 -
1131 07-17 A Roadmap for Climate-Relevant Robotics Research Ein Fahrplan für die klimarelevante Robotikforschung 气候相关机器人研究路线图 2507.11623v2 -
1132 07-17 Leveraging Asynchronous Cross-border Market Data for Improved Day-Ahead Electricity Price Forecasting in European Markets Nutzung asynchroner grenzübergreifender Marktdaten für eine verbesserte Tagesprognose der Strompreise in den europäischen Märkten 利用非同步跨界市场数据改进欧洲市场日间电力价格预测 2507.13250v1 -
1133 07-17 Approximation Rates for Shallow ReLU$^k$ Neural Networks on Sobolev Spaces via the Radon Transform Annäherungssätze für Shallow ReLU$^k$ Neurale Netze auf Sobolev-Räumen über die Radon-Transformation Sobolev空间的浅光RELU$QK$美元神经网络通过拉子变换的近似率 2408.10996v2 -
1134 07-17 The carbon cost of materials discovery: Can machine learning really accelerate the discovery of new photovoltaics? Die CO2-Kosten der Materialentdeckung: Kann maschinelles Lernen die Entdeckung neuer Photovoltaik wirklich beschleunigen? 材料发现的碳成本:机器学习能否真正加速新光伏发电的发现? 2507.13246v1 -
1135 07-17 VectorFit : Adaptive Singular & Bias Vector Fine-Tuning of Pre-trained Foundation Models VectorFit : Adaptive Singular & Bias Vector Fine-Tuning von vortrainierten Foundation-Modellen 矢量Fit:培训前基金会模型的适应性单单项和比亚斯矢量微调 2503.19530v2 -
1136 07-17 Multiple-Frequencies Population-Based Training Mehrfachhäufigkeiten bevölkerungsbasierte Ausbildung 以人口为基础的培训 2506.03225v2 -
1137 07-17 Computational-Statistical Tradeoffs from NP-hardness Computational-Statistical Tradeoffs von NP-Härte 对NP-硬度的计算-统计取舍 2507.13222v1 -
1138 07-17 V-Max: A Reinforcement Learning Framework for Autonomous Driving V-Max: Ein Rahmen für verstärktes Lernen für autonomes Fahren V-Max:加强自主驾驶学习框架 2503.08388v3 -
1139 07-17 Compositional Discrete Latent Code for High Fidelity, Productive Diffusion Models Komponativ diskreter latenter Code für High Fidelity, Produktive Diffusionsmodelle 高菲力、生产性扩散模型、生产性扩散模型 2507.12318v2 -
1140 07-17 Branching Stein Variational Gradient Descent for sampling multimodal distributions Verzweigung Stein Variational Gradient Descent für die Probenahme multimodaler Verteilungen 用于抽样多式联运分销的 2506.13916v2 -
1141 07-17 Relation-Aware Slicing in Cross-Domain Alignment Verhältnis-Bewusstsein-Slicing in Cross-Domain-Alignment 跨域对齐中的关系软件切切 2507.13194v1 -
1142 07-17 GradNetOT: Learning Optimal Transport Maps with GradNets GradNetOT: Optimale Transportkarten mit GradNets lernen GradNetOT: 与 GradNets一起学习最佳交通地图 2507.13191v1 -
1143 07-17 Bounding the Worst-class Error: A Boosting Approach Den Fehler der schlechtesten Klasse zu überwinden: Ein Boosting-Ansatz 绕过最坏的错误 : 推动方法 2310.14890v3 -
1144 07-17 Spectral Bellman Method: Unifying Representation and Exploration in RL Spektral Bellman-Methode: Vereinheitliche Darstellung und Exploration in RL 光谱钟门方法:统一代表与探索 2507.13181v1 -
1145 07-17 SHIELD: A Secure and Highly Enhanced Integrated Learning for Robust Deepfake Detection against Adversarial Attacks SHIELD: Ein sicheres und hochverstärktes integriertes Lernen für robuste Deepfake-Erkennung gegen feindliche Angriffe SHIELD: 可靠和高度强化的综合学习,以强有力地发现深假,防止反向攻击 2507.13170v1 -
1146 07-17 Orbis: Overcoming Challenges of Long-Horizon Prediction in Driving World Models Orbis: Herausforderungen der Langzeit-Vorhersage bei treibenden Weltmodellen überwinden Orbis:克服在推动世界模式方面长期预测的挑战 2507.13162v1 -
1147 07-17 Inverse Reinforcement Learning Meets Large Language Model Post-Training: Basics, Advances, and Opportunities Inverse Stärkung Lernen trifft auf großes Sprachmodell Post-Training: Grundlagen, Fortschritte und Chancen 培训后培训:基础、进步和机会 2507.13158v1 -
1148 07-17 AI-ming backwards: Vanishing archaeological landscapes in Mesopotamia and automatic detection of sites on CORONA imagery KI-Ming rückwärts: Auslöschende archäologische Landschaften in Mesopotamien und automatische Erkennung von Stätten auf CORONA-Bildern AI-Ming倒向:美索不达米亚消失的考古景观和自动探测CORONA图像上的遗址 2507.13420v1 -
1149 07-17 NonverbalTTS: A Public English Corpus of Text-Aligned Nonverbal Vocalizations with Emotion Annotations for Text-to-Speech NonverbalTTS: A Public English Corpus of Text-Aligned Nonverbal Vocalizations with Emotion Annotations for Text-to-Speech 非口头翻译:一个以文本为主的非口头演唱的英文公共单位,带有文字对语音情感说明 2507.13155v1 -
1150 07-17 NGTM: Substructure-based Neural Graph Topic Model for Interpretable Graph Generation NGTM: Substrukturbasiertes Neural Graph Topic Model für die interpretierbare Graphengenerierung NGTM: 以次级结构为基础的可解释图形生成神经图专题模型 2507.13133v1 -
1151 07-17 PINT: Physics-Informed Neural Time Series Models with Applications to Long-term Inference on WeatherBench 2m-Temperature Data PINT: Physik-informierte Neuralzeit-Serienmodelle mit Anwendungen zur langfristigen Schlussfolgerung auf WeatherBench 2m-Temperaturdaten PINT: 应用气象区2m-温度数据长期推断的物理化神经时间序列模型 2502.04018v2 -
1152 07-17 Search for Z/2 eigenfunctions on the sphere using machine learning Suche nach Z/2 Eigenfunktionen auf der Kugel mittels maschinellem Lernen 使用机器学习在球体上搜索 Z/2 电子元件 2507.13122v1 -
1153 07-17 RS-TinyNet: Stage-wise Feature Fusion Network for Detecting Tiny Objects in Remote Sensing Images RS-TinyNet: Stage-wise Feature Fusion Network zur Erkennung winziger Objekte in Bildern der Fernerkundung RS-TinyNet:在遥感图像中探测小物体的分阶段地貌融合网络 2507.13120v1 -
1154 07-17 Generative AI Models for Learning Flow Maps of Stochastic Dynamical Systems in Bounded Domains Generative KI-Modelle zum Lernen von Flusskarten stochastischer dynamischer Systeme in gebundenen Bereichen 生成 “ AI “ 模块,用于生成 “ 封闭域 “ 内存储动态系统动态系统的学习流程图 “ 模型 2507.15990v1 -
1155 07-17 Task-Circuit Quantization: Leveraging Knowledge Localization and Interpretability for Compression Task-Circuit Quantization: Nutzung von Wissen Lokalisierung und Dolmetschbarkeit für Komprimierung 任务-环境环境定量:利用知识本地化和压缩解释 2504.07389v2 -
1156 07-17 Deep Learning-Based Fetal Lung Segmentation from Diffusion-weighted MRI Images and Lung Maturity Evaluation for Fetal Growth Restriction Deep Learning-based Fetal Lung Segmentation aus diffusionsgewichteten MRT-Bildern und Lungenreife-Evaluierung für fetale Wachstumsbeschränkung 从传播加权磁RI图像和对胎儿生长限制的肺期评估中分离出的深学习-基于学习的胎儿肺部切片 2507.13106v1 -
1157 07-17 SemCSE: Semantic Contrastive Sentence Embeddings Using LLM-Generated Summaries For Scientific Abstracts SemCSE: Semantische kontrastive Satzeinbettungen mit LLM-generierten Zusammenfassungen für wissenschaftliche Abstracts SEMCSE: 使用LLM创制的科学摘要摘要 2507.13105v1 -
1158 07-17 Unified Triplet-Level Hallucination Evaluation for Large Vision-Language Models Unified Triplet-Level Halluzination Evaluation für große Vision-Sprache Modelle 大型视觉语言模型统一三维级幻觉评价 2410.23114v4 -
1159 07-17 Uni-Instruct: One-step Diffusion Model through Unified Diffusion Divergence Instruction Uni-Instruct: Einstufiges Diffusionsmodell durch Unified Diffusion Divergence Instruction Uni- Instruct: 通过统一扩散分散指令单步扩散模型 2505.20755v2 -
1160 07-17 Unsupervised Ground Metric Learning Unüberwachtes metrisches Lernen am Boden 不受监督的地面计量学习 2507.13094v1 -
1161 07-17 Truthful Elicitation of Imprecise Forecasts Wahre Botschaft von ungenauen Prognosen 以真真真真真真真真真切的易感简易预报 2503.16395v4 -
1162 07-17 Uncertainty-Aware Cross-Modal Knowledge Distillation with Prototype Learning for Multimodal Brain-Computer Interfaces Ungewissheitsbewusste Cross-Modal Knowledge Destillation mit Prototypenlernen für multimodale Gehirn-Computer-Schnittstellen 与多式脑-计算机界面的原型学习相结合的不确定-软件软件的跨模式知识蒸馏 2507.13092v1 -
1163 07-17 Super Resolution for Renewable Energy Resource Data With Wind From Reanalysis Data and Application to Ukraine Super Auflösung für erneuerbare Energien Ressourcendaten mit Wind Von der Reanalyse Daten und Anwendung in die Ukraine 乌克兰可再生能源资源数据利用风向再分析数据和应用于乌克兰的超级分辨率 2407.19086v2 -
1164 07-17 Soft-ECM: An extension of Evidential C-Means for complex data Soft-ECM: Erweiterung von Evidential C-Means für komplexe Daten 软体-电子内容软体:复杂数据的证据性C-方法的扩展 2507.13417v1 -
1165 07-17 MUPAX: Multidimensional Problem Agnostic eXplainable AI MUPAX: Multidimensionales Problem Agnostic eXplainable KI MUPAX: 多元问题Agnistic EXlable AI 2507.13090v1 -
1166 07-17 DASViT: Differentiable Architecture Search for Vision Transformer DASViT: Unterschiedliche Architektur Suche nach Vision Transformer DASVVT:不同建筑搜索视野变异器 2507.13079v1 -
1167 07-17 On the Effectiveness of the z-Transform Method in Quadratic Optimization Über die Wirksamkeit der z-Transform Methode in der quadratischen Optimierung 关于四压压优化中z变形方法有效性问题 2507.03404v2 -
1168 07-17 Single- to multi-fidelity history-dependent learning with uncertainty quantification and disentanglement: application to data-driven constitutive modeling Single- to Multi-Fidelity history-dependent Learning mit Unsicherheitsquantifizierung und Disentanglementierung: Anwendung auf datengesteuerte konstitutive Modellierung 具有不确定性的量化和分解:适用于数据驱动的构成型建模 2507.13416v1 -
1169 07-17 MedPix 2.0: A Comprehensive Multimodal Biomedical Data set for Advanced AI Applications with Retrieval Augmented Generation and Knowledge Graphs MedPix 2.0: Umfassender multimodaler biomedizinischer Datensatz für fortgeschrittene KI-Anwendungen mit retrieval Augmented Generation und Wissensgraphen MedPix 2.0:一套综合多式生物医学数据集,用于高级AI应用,并附有回收增加的生成和知识图 2407.02994v5 -
1170 07-17 On statistical learning of graphs Statistisches Erlernen von Schaubildern 关于统计学图表 2507.13054v1 -
1171 07-17 Mining Voter Behaviour and Confidence: A Rule-Based Analysis of the 2022 U.S. Elections Mining Voter Behaviour and Confidence: Eine regelbasierte Analyse der Wahlen 2022 in den USA 采矿选民行为和信任:对2022年美国选举的基于规则的分析 2507.14236v1 -
1172 07-17 Gauge Flow Models Modelle für den Messfluss Gage 流程模型 2507.13414v1 -
1173 07-17 Uncertainty quantification for White Matter Hyperintensity segmentation detects silent failures and improves automated Fazekas quantification Unsicherheits-Quantifizierung für White Matter Hyperintensitätssegmentierung erkennt leise Ausfälle und verbessert die automatisierte Fazekas-Quantifizierung 白色物质超密度分离的不确定性量化,可检测静态故障,改进自动Fazekas量化 2411.17571v2 -
1174 07-17 The Power of Architecture: Deep Dive into Transformer Architectures for Long-Term Time Series Forecasting Die Kraft der Architektur: Tiefgehen in Transformer-Architekturen für langfristige Zeitreihen 建筑力量:为长期时间序列预测而向变形结构深度下潜 2507.13043v1 -
1175 07-17 Confidence-Filtered Relevance (CFR): An Interpretable and Uncertainty-Aware Machine Learning Framework for Naturalness Assessment in Satellite Imagery Confidence-Filtered Relevance (CFR): Ein interpretierbares und unsicheres Machine Learning Framework für die Bewertung von Natürlichkeit in Satellitenbildern 信任改变的相关性:卫星图像中自然评估的 解释性和不确定性和不确定性-智能学习框架 2507.13034v1 -
1176 07-17 (Exhaustive) Symbolic Regression and model selection by minimum description length (Erschöpfend) Symbolische Regression und Modellauswahl nach minimaler Beschreibungslänge 按最低描述长度分列的符号回归和模型选择 2507.13033v1 -
1177 07-17 When Pattern-by-Pattern Works: Theoretical and Empirical Insights for Logistic Models with Missing Values Wenn Pattern-by-Pattern arbeitet: Theoretische und Empirische Einblicke für Logistische Modelle mit fehlenden Werten 代代办法:缺少价值的后勤模式理论和经验透视 2507.13024v1 -
1178 07-17 Fault detection and diagnosis for the engine electrical system of a space launcher based on a temporal convolutional autoencoder and calibrated classifiers Fehlererkennung und Diagnose für das elektrische Motorsystem eines Raumwerfers basierend auf einem zeitlich konvolutionären Autoencoder und kalibrierten Klassifikatoren 以时富集自动编码器和校准分类器为基础的空间发射装置发动机电气系统的故障检测和诊断 2507.13022v1 -
1179 07-17 The late-stage training dynamics of (stochastic) subgradient descent on homogeneous neural networks Die Spätphasen-Trainingsdynamik des (stochastischen) subgradienten Abstiegs auf homogenen neuronalen Netzwerken 在同质神经网络上的(随机)亚梯级下降的后阶段培训动态 2502.05668v3 -
1180 07-17 SMART: Relation-Aware Learning of Geometric Representations for Knowledge Graphs SMART: Beziehungsorientiertes Lernen geometrischer Darstellungen für Wissensgraphen SMART:知识图表几何表示法关系-知识学习 2507.13001v1 -
1181 07-17 Differential-informed Sample Selection Accelerates Multimodal Contrastive Learning Differential-informierte Probenauswahl beschleunigt multimodales kontrastives Lernen 不同知情的抽样甄选加速多模式差异学习 2507.12998v1 -
1182 07-17 (Almost) Free Modality Stitching of Foundation Models (Fast) Freie Modalitätsstiche von Stiftungsmodellen (几乎) 基金会模型的免费方式 2507.10015v3 -
1183 07-17 Teach Old SAEs New Domain Tricks with Boosting Lehren Sie alte SAEs neue Domain Tricks mit Förderung 教授旧的 SAEs 新域圈套 2507.12990v1 -
1184 07-17 Variance-Based Pruning for Accelerating and Compressing Trained Networks Varianzbasiertes Pruning für beschleunigte und komprimierende Ausgebildete Netzwerke 加快和压缩经过训练的网络 2507.12988v1 -
1185 07-17 FedGA: A Fair Federated Learning Framework Based on the Gini Coefficient FedGA: Ein faires, auf dem Gini-Koeffizienten basierendes Föderated Learning Framework FDGA:基于基尼系数的公平联邦学习框架 2507.12983v1 -
1186 07-17 A Distributed Generative AI Approach for Heterogeneous Multi-Domain Environments under Data Sharing constraints Ein verteilter generativer KI-Ansatz für heterogene Multi-Domain-Umgebungen unter Datenfreigabebeschränkungen 在数据共享制约下,对异种多领域不同环境采取分散的AI方法 2507.12979v1 -
1187 07-17 WaveletInception Networks for Drive-by Vibration-Based Infrastructure Health Monitoring WaveletInception-Netzwerke für Drive-by-Vibrationsbasierte Infrastruktur-Gesundheitsüberwachung 驱动振动基础设施健康监测波动感知网络 2507.12969v1 -
1188 07-17 Investigating Forecasting Models for Pandemic Infections Using Heterogeneous Data Sources: A 2-year Study with COVID-19 Untersuchung von Prognosemodellen für Pandemieinfektionen unter Verwendung heterogener Datenquellen: Eine 2-jährige Studie mit COVID-19 利用异源数据源调查利用异源数据对传染病的预测模型:COVID-19的两年期研究 2507.12966v1 -
1189 07-17 A Spectral Interpretation of Redundancy in a Graph Reservoir Eine spektrale Interpretation der Redundanz in einem Graph Reservoir 图表储量中剩余性的旁观解释 2507.12963v1 -
1190 07-17 Characterizing Dynamical Stability of Stochastic Gradient Descent in Overparameterized Learning Dynamische Stabilität des stochastischen Gradienten Absinkens im überparameterisierten Lernen charakterisierend 将过度量化的学习中存储层渐变源的动态稳定化特性化 2407.20209v3 -
1191 07-17 A Progressive Image Restoration Network for High-order Degradation Imaging in Remote Sensing Ein Progressives Bildwiederherstellungsnetzwerk für High-Order Degradation Imaging in Remote Sensing 遥感中高顺序退化成像的逐步图像恢复网络 2412.07195v2 -
1192 07-17 A Brain Tumor Segmentation Method Based on CLIP and 3D U-Net with Cross-Modal Semantic Guidance and Multi-Level Feature Fusion Eine Gehirntumor-Segmentierungsmethode basierend auf CLIP und 3D U-Net mit Cross-Modal Semantic Guidance und Multi-Level-Feature Fusion 以CLIP和3D U-Net为基础的脑肿瘤分解法,并配有跨模式语义指导和多功能融合 2507.09966v2 -
1193 07-17 cIDIR: Conditioned Implicit Neural Representation for Regularized Deformable Image Registration cIDIR: Bedingte implizite Neuraldarstellung für regularisierte, deformierbare Bildregistrierung cIDIR: 定期变形图像注册的有条件的、隐含的神经代表 2507.12953v1 -
1194 07-17 Signal Recovery Using a Spiked Mixture Model Signalwiederherstellung mit einem Spiked Mixture Model 使用斯派混合混合模型恢复信号 2501.01840v2 -
1195 07-17 MMOne: Representing Multiple Modalities in One Scene MMUne: Vertretung mehrerer Modalitäten in einer Szene MMIO: 在一个场景中代表多种模式 2507.11129v2 -
1196 07-17 Probabilistic Soundness Guarantees in LLM Reasoning Chains Probabilistische Solidität garantiert in LLM-Aufklärungsketten LLM 理赔链条的概率稳妥性保障 2507.12948v1 -
1197 07-17 LightAutoDS-Tab: Multi-AutoML Agentic System for Tabular Data LightAutoDS-Tab: Multi-AutoML Agentic System für Tabellendaten LightautoDS-Tab:用于表格数据的多自动ML剂系统 2507.13413v1 -
1198 07-17 Global urban visual perception varies across demographics and personalities Globale urbane visuelle Wahrnehmung variiert je nach Demografie und Persönlichkeit 全球城市视觉认识因人口和个性而异 2505.12758v3 -
1199 07-17 MC$^2$A: Enabling Algorithm-Hardware Co-Design for Efficient Markov Chain Monte Carlo Acceleration MC$^2$A: Algorithm-Hardware Co-Design für effiziente Markov-Kette Monte Carlo Beschleunigung MC$$2$A: 提高Markov链节蒙特卡洛速度加速速度的辅助算法-Hardware共同设计 2507.12935v1 -
1200 07-17 DMQ: Dissecting Outliers of Diffusion Models for Post-Training Quantization DMQ: Ausreißer von Diffusionsmodellen für die Quantisierung nach dem Training DMQ: 解剖培训后量化传播模型的外源离子 2507.12933v1 -
1201 07-17 Trace Reconstruction with Language Models Trace Rekonstruktion mit Sprachmodellen 使用语言模式进行追踪重建 2507.12927v1 -
1202 07-17 Robust Explanations Through Uncertainty Decomposition: A Path to Trustworthier AI Robuste Erklärungen durch Unsicherheitszersetzung: Ein Weg zu vertrauensvoller KI 通过不确定性的分解作出有力的解释:通往信托的路径 AI 2507.12913v1 -
1203 07-17 Fremer: Lightweight and Effective Frequency Transformer for Workload Forecasting in Cloud Services Fremer: Leichter und effektiver Frequenztransformator für Workload-Prognose in Cloud Services Fremer:云服务工作量预测的轻型和有效频率变压器 2507.12908v1 -
1204 07-17 A column generation algorithm with dynamic constraint aggregation for minimum sum-of-squares clustering Ein Spaltengenerierungsalgorithmus mit dynamischer Constraint-Aggregation für minimale Summe von Quadraten 为最小平方和组合组合组合组合而具有动态约束聚合的列生成算法 2410.06187v2 -
1205 07-17 Generalist Bimanual Manipulation via Foundation Video Diffusion Models Generalist Bimanual Manipulation über Stiftung Video Diffusion Modelle 通过基金会录像传播模型进行通用二手操作 2507.12898v1 -
1206 07-17 VAR-MATH: Probing True Mathematical Reasoning in Large Language Models via Symbolic Multi-Instance Benchmarks VAR-MATH: Wahre mathematische Vernunft in großen Sprachmodellen anhand symbolischer Multi-Instance-Benchmarks VAR-MATH:通过符号性多因基准在大语言模型中验证真实的数学理由 2507.12885v1 -
1207 07-17 Autonomous Resource Management in Microservice Systems via Reinforcement Learning Autonomes Ressourcenmanagement in Mikroservice-Systemen durch Verstärkungslernen 通过加强学习,对微小服务系统进行自主资源管理 2507.12879v1 -
1208 07-17 Bayesian Modeling and Estimation of Linear Time-Variant Systems using Neural Networks and Gaussian Processes Bayesische Modellierung und Abschätzung von linearen Zeitvariantsystemen unter Verwendung neuraler Netzwerke und Gaußschen Prozessen 利用神经网络和高斯进程模拟和估计线性时间变化系统 2507.12878v1 -
1209 07-17 Topology-Aware Activation Functions in Neural Networks Topologie-Bewusst-Aktivierungsfunktionen in neuralen Netzwerken 神经网络中的地形-软件启动功能 2507.12874v1 -
1210 07-17 An Investigation of Ear-EEG Signals for a Novel Biometric Authentication System Untersuchung von Ohr-EEG-Signalen für ein neuartiges biometrisches Authentifizierungssystem 关于新生物测定鉴定系统耳电信号的调查 2507.12873v1 -
1211 07-17 WhoFi: Deep Person Re-Identification via Wi-Fi Channel Signal Encoding WhoFi: Deep Person Re-Identification via Wi-Fi Channel Signal Encoding WhoFi: 通过 Wi-Fi 频道信号编码来识别深层人的身份 2507.12869v1 -
1212 07-17 Supervised Fine Tuning on Curated Data is Reinforcement Learning (and can be improved) Beaufsichtigte Feinabstimmung auf kuratierten Daten ist Verstärktes Lernen (und kann verbessert werden) 受监督的 “ 封闭数据 “ 微调微调是 “ 强化学习 “ (并可以改进) 2507.12856v1 -
1213 07-17 Latent Diffusion Model Based Denoising Receiver for 6G Semantic Communication: From Stochastic Differential Theory to Application Latent Diffusion Modellbasierter Denoisierungsempfänger für 6G Semantische Kommunikation: Von der stochastischen Differentialtheorie zur Anwendung 用于 6G 语义通讯: 从斯托卡差异理论到应用的 6G 语义通讯的 以 DEM 为基础的前传播模型模型 2506.05710v3 -
1214 07-17 Transformer-Based Person Identification via Wi-Fi CSI Amplitude and Phase Perturbations Transformerbasierte Personenidentifikation über Wi-Fi CSI Amplitude und Phasenstörungen 通过Wi-Fi CSI进行基于变压器的人的识别 2507.12854v1 -
1215 07-17 Site-Level Fine-Tuning with Progressive Layer Freezing: Towards Robust Prediction of Bronchopulmonary Dysplasia from Day-1 Chest Radiographs in Extremely Preterm Infants Site-Level Feintuning mit Progressive Layer Freezing: Auf dem Weg zur robusten Vorhersage der Bronchopulmonalen Dysplasie von Tag 1 Brustradiographen bei extrem prätermen Säuglingen 与累进层冷冻有关的地点级微调级微调:对极期前婴儿每日1号胸前无线电报上的布朗-希波本二元病原体进行强有力的预测 2507.12269v2 -
1216 07-17 Formalising causal inference as prediction on a target population Formalisierende kausale Schlussfolgerungen als Vorhersage für eine Zielpopulation 将因果推断正规化,作为对目标人口的预测 2407.17385v3 -
1217 07-17 Dataset resulting from the user study on comprehensibility of explainable AI algorithms Datensatz aus der Nutzerstudie zur Verständlichkeit erklärbarer KI-Algorithmen 用户关于可解释的AI算法的可理解性研究产生的数据集 2411.02419v2 -
1218 07-17 A Kernel Distribution Closeness Testing Eine Näherungsprüfung der Kernelverteilung A 内核分布近距离测试 2507.12843v1 -
1219 07-17 Task-Specific Generative Dataset Distillation with Difficulty-Guided Sampling Aufgabenspezifische Generative Datensatzdestillation mit schwer wiegender Probenahme 利用难于指导的抽样抽样进行任务特定生成数据集蒸馏 2507.03331v2 -
1220 07-17 We should avoid the assumption of data-generating probability distributions in social settings Wir sollten die Annahme von datengenerierenden Wahrscheinlichkeitsverteilungen in sozialen Settings vermeiden 我们应该避免假设在社会环境中产生数据的概率分布 2407.17395v4 -
1221 07-17 Bridging the Gap: Leveraging Retrieval-Augmented Generation to Better Understand Public Concerns about Vaccines Bridging the Gap: Leveraging Retrieval-Augmented Generation zu besser verstehen öffentliche Bedenken über Impfstoffe 缩小差距:利用利用回收-养殖一代来更好地了解公众对疫苗的关切 2507.12840v1 -
1222 07-17 Understanding the Evolution of the Neural Tangent Kernel at the Edge of Stability Die Evolution des neuralen Tangentenkerns am Rande der Stabilität verstehen 了解稳定边缘的内心内核核心的演变 2507.12837v1 -
1223 07-17 MVA 2025 Small Multi-Object Tracking for Spotting Birds Challenge: Dataset, Methods, and Results MVA 2025 Kleines Multi-Objekt-Tracking für die Vogelbeobachtung Herausforderung: Datensatz, Methoden und Ergebnisse MVA 2025 发现鸟类挑战小型多目标跟踪:数据集、方法和结果 2507.12832v1 -
1224 07-17 Autoregressive Speech Enhancement via Acoustic Tokens Autoregressive Sprachverbesserung durch akustische Token 通过声调声调增强自动递减语音 2507.12825v1 -
1225 07-17 Self Balancing Neural Network: A Novel Method to Estimate Average Treatment Effect Self Balancing Neural Network: Eine neuartige Methode zur Schätzung des durchschnittlichen Behandlungseffekts 自我平衡神经网络:估计平均治疗效果的新办法 2507.12818v1 -
1226 07-17 From Novelty to Imitation: Self-Distilled Rewards for Offline Reinforcement Learning Von der Neuheit zur Imitation: Selbstdestillierte Belohnungen für Offline-Verstärkungslernen 从新闻到消化:为脱线强化学习自行提炼奖项 2507.12815v1 -
1227 07-17 RONOM: Reduced-Order Neural Operator Modeling RONOM: Reduzierte Neuraloperator-Modellierung RONOM: 降低轨道神经操作员模型 2507.12814v1 -
1228 07-17 ZClassifier: Temperature Tuning and Manifold Approximation via KL Divergence on Logit Space ZClassifier: Temperatur-Tuning und Manifold-Annäherung über KL Divergenz auf Logit Space ZClasizer: 通过在登录空间的 KL diggence 进行温度调制和调控相近 2507.10638v2 -
1229 07-17 Holistix: A Dataset for Holistic Wellness Dimensions Analysis in Mental Health Narratives Holistix: Ein Datensatz für ganzheitliche Wellness-Dimensionen Analyse in psychischen Gesundheits-Erzählungen Holistix:心理健康叙事中整体健康层面分析数据集 2507.09565v2 -
1230 07-17 Quantum Long Short-Term Memory for Drug Discovery Quantenlanges Kurzzeitgedächtnis für die Drogenentdeckung 药物发现长期短期记忆 2407.19852v2 -
1231 07-17 PolyServe: Efficient Multi-SLO Serving at Scale PolyServe: Effizientes Multi-SLO Servieren im Maßstab 多边服务:在规模上有效的多种服务 2507.17769v1 -
1232 07-17 Large Language Models’ Internal Perception of Symbolic Music Die innere Wahrnehmung symbolischer Musik durch große Sprachmodelle 大语言模型内部对符号音乐的感知 2507.12808v1 -
1233 07-17 PMKLC: Parallel Multi-Knowledge Learning-based Lossless Compression for Large-Scale Genomics Database PMKLC: Parallele Multi-Knowledge Learning-basierte Lossless-Kompression für großformatige Genomics-Datenbank PMKLC: 大型基因组数据库的平行多知识学习-无损失压缩 2507.12805v1 -
1234 07-17 Physics-Informed Linear Model (PILM): Analytical Representations and Application to Crustal Strain Rate Estimation Physik-informiertes Linearmodell (PILM): Analytische Darstellungen und Anwendung auf Crustal Strain Rate Abschätzung 物理内建线性模型(PILM):对结壳定流速率估计的分析说明和应用 2507.12218v2 -
1235 07-17 FLDmamba: Integrating Fourier and Laplace Transform Decomposition with Mamba for Enhanced Time Series Prediction FLDmamba: Integration von Fourier und Laplace-Transformationszersetzung mit Mamba für verbesserte Zeitreihenvorhersage FLDmamba:将Fourier和Laple变形变形变形与Mamba结合,以提高时间序列预测 2507.12803v1 -
1236 07-17 ReCode: Updating Code API Knowledge with Reinforcement Learning ReCode: Aktualisierung von Code-API-Kenntnissen mit Verstärkungslernen ReCode:更新法规API知识与强化学习 2506.20495v2 -
1237 07-17 Beyond Architectures: Evaluating the Role of Contextual Embeddings in Detecting Bipolar Disorder on Social Media Beyond Architectures: Bewertung der Rolle kontextueller Einbettungen bei der Erkennung bipolarer Störungen in sozialen Medien 超越建筑:评价背景嵌入在发现社会媒体两极分极分崩离析现象中的作用 2507.14231v1 -
1238 07-17 Multi-Channel Graph Neural Network for Financial Risk Prediction of NEEQ Enterprises Multi-Channel Graph Neural Network for Financial Risk Prediction of NEEQ Enterprises NEEQ企业金融风险预测多通道图图神经网络 2507.12787v1 -
1239 07-17 COREVQA: A Crowd Observation and Reasoning Entailment Visual Question Answering Benchmark COREVQA: Eine Crowd-Beobachtung und Begründung zur Detaillierung Visual Question Answering Benchmark COREVQA: 聚众观察和理性视觉问题回答基准 2507.13405v1 -
1240 07-17 Compact Vision Transformer by Reduction of Kernel Complexity Kompakter Vision Transformer durch Reduktion der Kernelkomplexität 减少内核复杂度,实现全球契约愿景转型 2507.12780v1 -
1241 07-17 Demystifying MuZero Planning: Interpreting the Learned Model MuZero-Planung entmystifizieren: Das gelernte Modell interpretieren 消除神秘的 “ 零零规划 “ :解释 “ 总结经验 “ 模式 2411.04580v2 -
1242 07-17 A Comprehensive Survey of Electronic Health Record Modeling: From Deep Learning Approaches to Large Language Models Eine umfassende Umfrage zur elektronischen Gesundheitsdatenmodellierung: Von Deep Learning Ansätzen bis hin zu großen Sprachmodellen 《电子健康记录模型综合调查:从深学习方法到大语言模式》 2507.12774v1 -
1243 07-17 Sample-Constrained Black Box Optimization for Audio Personalization Sample-Constrained Black Box Optimierung für Audio-Personalisierung 优化音频个性化 2507.12773v1 -
1244 07-17 AnyPos: Automated Task-Agnostic Actions for Bimanual Manipulation AnyPos: Automatisierte Task-Agnostische Aktionen zur bimanuellen Manipulation 任何 波 : 用于二手操纵的自动任务- 不可允许动作 2507.12768v1 -
1245 07-17 Layer Separation Deep Learning Model with Auxiliary Variables for Partial Differential Equations Ebenentrennung Deep Learning Modell mit Hilfsvariablen für partielle Differentialgleichungen 图层分离深学习模型,带有局部差异等量的辅助变量 2507.12766v1 -
1246 07-17 Golden Noise for Diffusion Models: A Learning Framework Goldene Geräusche für Diffusionsmodelle: Ein Lernrahmen 传播模型的黄金噪音:学习框架 2411.09502v5 -
1247 07-17 TBDetector:Transformer-Based Detector for Advanced Persistent Threats with Provenance Graph TBDetector:Transformer-basierter Detektor für erweiterte persistente Bedrohungen mit Provenienzgraph TB 检测器:用证明图测出先进持久性威胁的转移前检测器 2304.02838v2 -
1248 07-17 World Model-Based End-to-End Scene Generation for Accident Anticipation in Autonomous Driving Weltmodellbasierte End-to-End-Szenengenerierung für Unfallvorhersage im autonomen Fahren 以世界模式为基础的在自主驾驶中事故预防端至终点到终点示范景点一代 2507.12762v1 -
1249 07-17 Faster and Space Efficient Indexing for Locality Sensitive Hashing Schnellere und raumsparende Indexierung für Lokalitätssensitive Hashing 地方敏感散列更快和空间高效索引编制 2503.06737v2 -
1250 07-17 A Comprehensive Survey of Synthetic Tabular Data Generation Eine umfassende Übersicht über die Erstellung von synthetischen Tabellendaten 合成图表数据生成综合调查 2504.16506v3 -
1251 07-17 Domain-Enhanced Dual-Branch Model for Efficient and Interpretable Accident Anticipation Domain-Enhanced Dual-Branch-Modell für effiziente und interpretierbare Unfallvorhersage 高效和可解释的意外事故预测的强化双重-双重-双重强化模式 2507.12755v1 -
1252 07-17 Multimodal-Guided Dynamic Dataset Pruning for Robust and Efficient Data-Centric Learning Multimodal geführtes dynamisches Datenset Pruning für robustes und effizientes datenzentrales Lernen 灵活、高效、高效的数据中心学习的多式指导动态数据集 2507.12750v1 -
1253 07-17 Learning Universal Human Mobility Patterns with a Foundation Model for Cross-domain Data Fusion Lernen von universellen Mobilitätsmustern mit einem Basismodell für die domänenübergreifende Datenfusion 具有跨领域数据融合基础模型的学习通用人类流动模式 2503.15779v2 -
1254 07-17 How does Labeling Error Impact Contrastive Learning? A Perspective from Data Dimensionality Reduction Wie wirkt sich Beschriftungsfehler auf das kontrasive Lernen aus? Eine Perspektive aus der Datendimensionalitätsreduktion 标签错误影响差异影响学习如何进行? 减少数据多维度的视角 2507.11161v2 -
1255 07-17 Enhancing Quantization-Aware Training on Edge Devices via Relative Entropy Coreset Selection and Cascaded Layer Correction Verbesserung der Quantization-Aware-Schulung auf Edge-Geräten durch relative Entropie-Coreset-Auswahl und kaskaded Layer-Korrektur 通过相对内心核心选择和层层层校正,加强边缘设备量化-软件培训 2507.17768v1 -
1256 07-17 Unifying Explainable Anomaly Detection and Root Cause Analysis in Dynamical Systems Vereinheitlichung der erklärbaren Anomalienerkennung und der Ursachenanalyse in dynamischen Systemen 动态系统中不可解释的异常探测和根本原因分析 2502.12086v3 -
1257 07-17 Multi-View Node Pruning for Accurate Graph Representation Multi-View-Knotenschnitt für eine exakte Graphendarstellung 多查看节点 精确图表代表 2503.11737v4 -
1258 07-17 Scaling Trends for Data Poisoning in LLMs Skalierungstrends für Datenvergiftungen in LLMs LLMM中数据中毒趋势的扩大趋势 2408.02946v6 -
1259 07-17 From SGD to Spectra: A Theory of Neural Network Weight Dynamics Von SGD zu Spectra: Eine Theorie der neuralen Netzwerkgewichtsdynamik 从SGD到Spetra:神经网络强度动态理论 2507.12709v1 -
1260 07-17 PinFM: Foundation Model for User Activity Sequences at a Billion-scale Visual Discovery Platform PinFM: Gründungsmodell für Benutzeraktivität Sequenzen auf einer Visual Discovery Platform im Milliardenmaßstab PinFM:十亿规模视觉发现平台用户活动序列基础模型 2507.12704v1
Article 0
Title@2025-07-24 (4): Pseudo-Labeling for Kernel Ridge Regression under Covariate Shift
Title: Pseudo-Labeling for Kernel Ridge Regression under Covariate Shift | Pseudo-Labeling für Kernel Ridge Regression unter Kovariate Shift | 共变移下内核循环脊回归的优多环流 2302.10160v4 |
Authors (1): Kaizheng Wang
We develop and analyze a principled approach to kernel ridge regression under covariate shift. The goal is to learn a regression function with small mean squared error over a target distribution, based on unlabeled data from there and labeled data that may have a different feature distribution. We propose to split the labeled data into two subsets, and conduct kernel ridge regression on them separately to obtain a collection of candidate models and an imputation model. We use the latter to fill the missing labels and then select the best candidate accordingly. Our non-asymptotic excess risk bounds demonstrate that our estimator adapts effectively to both the structure of the target distribution and the covariate shift. This adaptation is quantified through a notion of effective sample size that reflects the value of labeled source data for the target regression task. Our estimator achieves the minimax optimal error rate up to a polylogarithmic factor, and we find that using pseudo-labels for model selection does not significantly hinder performance.
我们开发并分析一种原则性的方法, 在共变式转换下对内核脊回归进行原则性分析。 目标是根据未贴标签的数据和标签数据, 以不同特性分布为基础, 在目标分布上学习一个有小平均值正方差的回归函数。 我们提议将标签数据分成两个子集, 并在它们上分别进行内核脊回归, 以获得候选模型和估算模型的集合。 我们用后者填充缺失的标签, 然后相应选择最佳的候选。 我们的非防腐性超重风险框表明, 我们的测算器能够有效地适应目标分布结构和共变换转移。 这种调整是通过反映目标回归任务中标签源数据值的有效样本规模概念加以量化的。 我们的测算器将最小最大误率提高到一个多logricy系数, 我们发现, 模型选择中使用伪标签不会严重妨碍性能。
Article 1
Title@2025-07-24 (4): SIDA: Synthetic Image Driven Zero-shot Domain Adaptation
Title: SIDA: Synthetic Image Driven Zero-shot Domain Adaptation | SIDA: Synthetisches Bild angetrieben Null-Schuss Domain-Anpassung | SIDA: 合成图像驱动器零弹射域适应 2507.18632v1 |
Authors (5): Ye-Chan Kim, SeungJu Cha, Si-Woo Kim, Taewhan Kim, Dong-Jin Kim
Zero-shot domain adaptation is a method for adapting a model to a target domain without utilizing target domain image data. To enable adaptation without target images, existing studies utilize CLIP’s embedding space and text description to simulate target-like style features. Despite the previous achievements in zero-shot domain adaptation, we observe that these text-driven methods struggle to capture complex real-world variations and significantly increase adaptation time due to their alignment process. Instead of relying on text descriptions, we explore solutions leveraging image data, which provides diverse and more fine-grained style cues. In this work, we propose SIDA, a novel and efficient zero-shot domain adaptation method leveraging synthetic images. To generate synthetic images, we first create detailed, source-like images and apply image translation to reflect the style of the target domain. We then utilize the style features of these synthetic images as a proxy for the target domain. Based on these features, we introduce Domain Mix and Patch Style Transfer modules, which enable effective modeling of real-world variations. In particular, Domain Mix blends multiple styles to expand the intra-domain representations, and Patch Style Transfer assigns different styles to individual patches. We demonstrate the effectiveness of our method by showing state-of-the-art performance in diverse zero-shot adaptation scenarios, particularly in challenging domains. Moreover, our approach achieves high efficiency by significantly reducing the overall adaptation time.
零点域适应是一种在不使用目标域图像数据的情况下使模型适应目标域的方法。 为了能够不使用目标域图像进行适应, 现有研究使用CLIP的嵌入空间和文本描述来模拟类似目标的样式特性。 尽管先前在零点域适应方面已经取得了成就, 我们观察到, 这些文本驱动的方法在捕捉复杂的真实世界变异, 并且由于它们的校正过程而大大增加了适应时间。 我们不依靠文字描述, 我们探索利用图像数据提供不同和更细微样式提示的解决方案。 在这项工作中, 我们提议SIDA, 一种创新和高效零点域适应方法来利用合成图像。 为了生成合成图像, 我们首先创建详细、 源式图像, 并应用图像翻译来反映目标域的风格。 我们随后利用这些合成图像的样式特征来捕捉复杂的真实世界变异, 我们引入了Domain Mix 和 补式样式转换模块, 从而能够有效地模拟真实世界变异。 特别是, Domaine Mix 混合多种样式来扩大内部的显示内部时段显示, 缩样式转换方式转换方式转换方式在目标域中显示我们高度的升级的功能。
Article 2
Title@2025-07-24 (4): Gait Recognition Based on Tiny ML and IMU Sensors
Title: Gait Recognition Based on Tiny ML and IMU Sensors | Gait-Erkennung basierend auf winzigen ML- und IMU-Sensoren | 基于小ML和IMU传感器的Gait识别 2507.18627v1 |
Authors (3): Jiahang Zhang, Mingtong Chen, Zhengbao Yang
This project presents the development of a gait recognition system using Tiny Machine Learning (Tiny ML) and Inertial Measurement Unit (IMU) sensors. The system leverages the XIAO-nRF52840 Sense microcontroller and the LSM6DS3 IMU sensor to capture motion data, including acceleration and angular velocity, from four distinct activities: walking, stationary, going upstairs, and going downstairs. The data collected is processed through Edge Impulse, an edge AI platform, which enables the training of machine learning models that can be deployed directly onto the microcontroller for real-time activity classification.The data preprocessing step involves extracting relevant features from the raw sensor data using techniques such as sliding windows and data normalization, followed by training a Deep Neural Network (DNN) classifier for activity recognition. The model achieves over 80% accuracy on a test dataset, demonstrating its ability to classify the four activities effectively. Additionally, the platform enables anomaly detection, further enhancing the robustness of the system. The integration of Tiny ML ensures low-power operation, making it suitable for battery-powered or energy-harvesting devices.
该项目利用小型机器学习(Tiny ML)和惯性测量单位(IMU)传感器开发了运动识别系统,该系统利用XIAO-nRF52840Sense微控制器和LSM6DS3IMU传感器,从四个不同的活动中获取运动数据,包括加速和角速度,这四个不同的活动是行走、固定、上楼和下楼。收集的数据是通过边缘AI平台Edge Impulse处理的,该平台能够对机器学习模型进行培训,这些模型可以直接部署到微控制器,用于实时活动分类。数据预处理步骤包括利用滑动窗口和数据正常化等技术从原始传感器数据中提取相关特征,随后培训深神经网络分类,以确认活动。该模型在测试数据集上达到80%的准确度,表明其有效分类四项活动的能力。此外,该平台能够发现异常现象,进一步提高系统强度。小ML的整合确保低功率操作,使其适合电池动力或能源采集装置。
Article 3
Title@2025-07-24 (4): Moving Out: Physically-grounded Human-AI Collaboration
Title: Moving Out: Physically-grounded Human-AI Collaboration | Ausstieg: physikalisch begründete Mensch-AI-Kollaboration | 搬出:基于身体的人类 – – AI协作 2507.18623v1 |
Authors (5): Xuhui Kang, Sung-Wook Lee, Haolin Liu, Yuyan Wang, Yen-Ling Kuo
The ability to adapt to physical actions and constraints in an environment is crucial for embodied agents (e.g., robots) to effectively collaborate with humans. Such physically grounded human-AI collaboration must account for the increased complexity of the continuous state-action space and constrained dynamics caused by physical constraints. In this paper, we introduce \textit{Moving Out}, a new human-AI collaboration benchmark that resembles a wide range of collaboration modes affected by physical attributes and constraints, such as moving heavy items together and maintaining consistent actions to move a big item around a corner. Using Moving Out, we designed two tasks and collected human-human interaction data to evaluate models’ abilities to adapt to diverse human behaviors and unseen physical attributes. To address the challenges in physical environments, we propose a novel method, BASS (Behavior Augmentation, Simulation, and Selection), to enhance the diversity of agents and their understanding of the outcome of actions. Our experiments show that BASS outperforms state-of-the-art models in AI-AI and human-AI collaboration. The project page is available at \href{https://live-robotics-uva.github.io/movingout_ai/}{https://live-robotics-uva.github.io/movingout_ai/}.
适应环境中的物理动作和限制的能力,对于内装物剂(如机器人)有效与人类合作至关重要。这种基于物理的人类-AI合作必须说明持续的国家-行动空间和因物理限制造成的受限动态的日益复杂性。在本文中,我们引入了一个新的人类-AI合作基准,类似于受物理属性和限制影响的广泛合作模式,例如将重物品一起移动,并保持一致的行动,以移动一个大项目。我们利用“搬出去”设计了两项任务,并收集了人与人的互动数据,以评价模型适应不同人类行为和看不见的物理特征的能力。为了应对物理环境中的挑战,我们提出了一种新颖的方法,即“BASS”(Behavior 增强、模拟和选择),以加强代理人的多样性和他们对行动结果的理解。我们的实验显示,BAS在AI-AI和人类-AI的协作中超越了“艺术”的状态模式。项目页面可在以下查阅:hrefefes://live-rovotiotio. am_ívotios/ buvotius/autrus/ mauttius.
Article 4
Title@2025-07-24 (4): Diffusion Beats Autoregressive in Data-Constrained Settings
Title: Diffusion Beats Autoregressive in Data-Constrained Settings | Diffusion schlägt Autoregressive in datenbeschränkten Einstellungen | 在受数据约束的设置中自动递减 2507.15857v2 |
Authors (5): Mihir Prabhudesai, Menging Wu, Amir Zadeh, Katerina Fragkiadaki, Deepak Pathak
Autoregressive (AR) models have long dominated the landscape of large language models, driving progress across a wide range of tasks. Recently, diffusion-based language models have emerged as a promising alternative, though their advantages over AR models remain underexplored. In this paper, we systematically study masked diffusion models in data-constrained settings-where training involves repeated passes over limited data-and find that they significantly outperform AR models when compute is abundant but data is scarce. Diffusion models make better use of repeated data, achieving lower validation loss and superior downstream performance. We interpret this advantage as implicit data augmentation: masked diffusion exposes the model to a diverse distribution of token orderings and prediction tasks, unlike AR’s fixed left-to-right factorization. We find new scaling laws for diffusion models and derive a closed-form expression for the critical compute threshold at which diffusion begins to outperform AR. These results suggest that when data, not compute, is the bottleneck, diffusion models offer a compelling alternative to the standard AR paradigm. Our code is available at: https://diffusion-scaling.github.io.
长期以来,自动递减(AR)模型在大型语言模型的景观中占据了主导地位,推动了一系列广泛任务的进展。最近,基于扩散的语言模型作为一种有希望的替代模式出现,尽管它们相对于AR模型的优势仍未得到充分探讨。在本文件中,我们系统地研究数据封闭环境中的蒙面扩散模型,培训涉及对有限数据的反复传递,发现在计算数据时,这些模型明显优于AR模型,但在计算数据丰富但数据稀少。扩散模型更好地利用了重复的数据,实现了较低的验证损失和更高的下游性能。我们将此优势解释为隐含的数据增强:遮蔽的传播将模型暴露在代号订单和预测任务的不同分布上,与AR的固定左对右系数化不同。我们为扩散模型找到新的缩放法,并为关键折算阈值的表达方式,而扩散开始超过AR。这些结果显示,当数据(不是编译的)是瓶颈时,扩散模型为标准的AR模型提供了令人信服的替代方法。我们的代码可以在 https://difcilting-scalinging.github.io.
Article 5
Title@2025-07-24 (4): TRPrompt: Bootstrapping Query-Aware Prompt Optimization from Textual Rewards
Title: TRPrompt: Bootstrapping Query-Aware Prompt Optimization from Textual Rewards | TRPrompt: Bootstrapping Query-Aware Prompt Optimierung von Textbelohnungen | TRPropt: 从文本奖励中促进解答询问软件快速优化 2507.18618v1 |
Authors (5): Andreea Nica, Ivan Zakazov, Nicolas Mario Baldwin, Saibo Geng, Robert West
Prompt optimization improves the reasoning abilities of large language models (LLMs) without requiring parameter updates to the target model. Following heuristic-based “Think step by step” approaches, the field has evolved in two main directions: while one group of methods uses textual feedback to elicit improved prompts from general-purpose LLMs in a training-free way, a concurrent line of research relies on numerical rewards to train a special prompt model, tailored for providing optimal prompts to the target model. In this paper, we introduce the Textual Reward Prompt framework (TRPrompt), which unifies these approaches by directly incorporating textual feedback into training of the prompt model. Our framework does not require prior dataset collection and is being iteratively improved with the feedback on the generated prompts. When coupled with the capacity of an LLM to internalize the notion of what a “good” prompt is, the high-resolution signal provided by the textual rewards allows us to train a prompt model yielding state-of-the-art query-specific prompts for the problems from the challenging math datasets GSMHard and MATH.
快速优化可以提高大型语言模型(LLMS)的推理能力,而不需要更新目标模型的参数。根据基于超常的“逐步思考”方法,该字段在两个主要方向上演进:一组方法使用文字反馈,以无培训的方式从普通用途LMS中获取更好的提示,同时进行一系列研究,依靠数字奖励来培训特别快速模型,专门为目标模型提供最佳提示。在本文中,我们引入了文本快速框架(TRPrompt),该框架通过直接将文字反馈纳入快速模型的培训而统一了这些方法。我们的框架不需要先前的数据集收集,并且正在随着对生成的提示的反馈的迭代改进。当LLMM有能力将什么是“好的”概念内部化时,文本奖励提供的高分辨率信号使我们能够训练一个迅速的模型,生成具有挑战性的数学数据集 GSMHard 和 MATH 所出现的问题的最新查询提示。
Article 6
Title@2025-07-24 (4): SynC: Synthetic Image Caption Dataset Refinement with One-to-many Mapping for Zero-shot Image Captioning
Title: SynC: Synthetic Image Caption Dataset Refinement with One-to-many Mapping for Zero-shot Image Captioning | SynC: Synthetische Bildunterschrift Datensatzverfeinerung mit ein-zu-vielen Mapping für Zero-shot Bildunterschrift | 合成图像说明: 合成图像说明数据集精化,用一到多个绘图进行零光图像说明的合成图像说明 2507.18616v1 |
Authors (6): Si-Woo Kim, MinJu Jeon, Ye-Chan Kim, Soeun Lee, Taewhan Kim, Dong-Jin Kim
Zero-shot Image Captioning (ZIC) increasingly utilizes synthetic datasets generated by text-to-image (T2I) models to mitigate the need for costly manual annotation. However, these T2I models often produce images that exhibit semantic misalignments with their corresponding input captions (e.g., missing objects, incorrect attributes), resulting in noisy synthetic image-caption pairs that can hinder model training. Existing dataset pruning techniques are largely designed for removing noisy text in web-crawled data. However, these methods are ill-suited for the distinct challenges of synthetic data, where captions are typically well-formed, but images may be inaccurate representations. To address this gap, we introduce SynC, a novel framework specifically designed to refine synthetic image-caption datasets for ZIC. Instead of conventional filtering or regeneration, SynC focuses on reassigning captions to the most semantically aligned images already present within the synthetic image pool. Our approach employs a one-to-many mapping strategy by initially retrieving multiple relevant candidate images for each caption. We then apply a cycle-consistency-inspired alignment scorer that selects the best image by verifying its ability to retrieve the original caption via image-to-text retrieval. Extensive evaluations demonstrate that SynC consistently and significantly improves performance across various ZIC models on standard benchmarks (MS-COCO, Flickr30k, NoCaps), achieving state-of-the-art results in several scenarios. SynC offers an effective strategy for curating refined synthetic data to enhance ZIC.
零点图像显示( ZIC ) 越来越多地使用文本到图像模型( T2I ) 生成的合成数据集, 以减轻成本高昂的人工批注需求。 然而, 这些 T2I 模型往往生成显示语义不匹配的图像, 其相应的输入标题( 例如, 缺失的天体, 不正确的属性) 导致合成图像显示对配对噪音, 从而阻碍模式培训。 现有的数据集调整技术主要设计用于消除网络模拟数据( T2I ) 模型生成的噪音文字。 然而, 这些方法不适合合成数据的独特挑战, 其中标题通常结构完善, 但图像的表达可能不准确。 为了缩小这一差距,我们引入了 SynC , 这是一个专门为 ZIC 改进合成图像显示数据集配置的新型框架。 SynC 重点不是常规的过滤或再生,而是为合成图像库中已经存在的最具有语义一致性的图像重新配置字幕( 我们的方法是先重新定位多个相关候选人图像的配置战略 ) , 将一个状态到一个状态制图战略 , 初步检索多个相关的图像分析候选人 , 将每个图像浏览周期的系统进行不断校正化的校正化的校正校正 。
Article 7
Title@2025-07-24 (4): BEARCUBS: A benchmark for computer-using web agents
Title: BEARCUBS: A benchmark for computer-using web agents | BEARCUBS: Benchmark für computergestützte Web-Agenten | BEARCUBS:计算机使用网络代理器的基准 2503.07919v3 |
Authors (6): Yixiao Song, Katherine Thai, Chau Minh Pham, Yapei Chang, Mazin Nadaf, Mohit Iyyer
Modern web agents possess computer use abilities that allow them to interact with webpages by sending commands to a virtual keyboard and mouse. While such agents have considerable potential to assist human users with complex tasks, evaluating their capabilities in real-world settings poses a major challenge. To this end, we introduce BEARCUBS, a “smallbut mighty” benchmark of 111 information-seeking questions designed to evaluate a web agent’s ability to search, browse, and identify factual information from the web. Unlike prior web agent benchmarks, solving BEARCUBS requires (1) accessing live web content rather than synthetic or simulated pages, which captures the unpredictability of real-world web interactions; and (2) performing a broad range of multimodal interactions (e.g., video understanding, 3D navigation) that cannot be bypassed via text-based workarounds. Each question in BEARCUBS has a corresponding short, unambiguous answer and a human-validated browsing trajectory, allowing for transparent evaluation of agent performance and strategies. A human study confirms that BEARCUBS questions are solvable but non-trivial (84.7% human accuracy), revealing domain knowledge gaps and overlooked details as common failure points. We find that ChatGPT Agent significantly outperforms other computer-using agents with an overall accuracy of 65.8% (compared to e.g., Operator’s 23.4%), showcasing substantial progress in tasks involving real computer use, such as playing web games and navigating 3D environments. Nevertheless, closing the gap to human performance requires improvements in areas like fine control, complex data filtering, and execution speed. To facilitate future research, BEARCUBS will be updated periodically to replace invalid or contaminated questions, keeping the benchmark fresh for future generations of web agents.
现代网络代理拥有计算机使用能力,使其能够通过向虚拟键盘和鼠标发送指令与网页互动。 虽然这些代理具有巨大的潜力协助人类用户完成复杂任务, 但评估其在现实世界环境中的能力是一项重大挑战。 为此,我们引入了BEARCUBS, 这是一个“小型但强大的”基准, 共111个信息搜索问题, 旨在评估网络代理商搜索、浏览和识别网络上的事实信息的能力。 与先前的网络代理商基准不同, 解决 BEARCUBS 需要 (1) 访问现场网络内容而不是合成或模拟网页, 从而捕捉到真实世界网络互动的不可预测性; 以及 (2) 开展广泛的多式联运互动( 如视频理解、 3D导航) , 这是一项无法通过基于文本的工作周期绕开的。 BEARCUBS 的每一个问题都有相应的短、 明确的答复和有人类价值的浏览路径的浏览轨迹, 能够透明地评估代理商的绩效和战略。 一项人类研究证实 BECURBS 的改进问题是可避免的, , 但是, 在未来运行过程中, 需要大量使用网络代理商 。
Article 8
Title@2025-07-24 (4): Explainable Mapper: Charting LLM Embedding Spaces Using Perturbation-Based Explanation and Verification Agents
Title: Explainable Mapper: Charting LLM Embedding Spaces Using Perturbation-Based Explanation and Verification Agents | Erklärbarer Mapper: LLM-Embedding-Räume mit Perturbation-basierten Erklärungs- und Verifikations-Agenten kartographieren | 可解释的成像仪:利用以扰动为基础的解释和核查仪器绘制LLM内嵌空间图 2507.18607v1 |
Authors (5): Xinyuan Yan, Rita Sevastjanova, Sinie van der Ben, Mennatallah El-Assady, Bei Wang
Large language models (LLMs) produce high-dimensional embeddings that capture rich semantic and syntactic relationships between words, sentences, and concepts. Investigating the topological structures of LLM embedding spaces via mapper graphs enables us to understand their underlying structures. Specifically, a mapper graph summarizes the topological structure of the embedding space, where each node represents a topological neighborhood (containing a cluster of embeddings), and an edge connects two nodes if their corresponding neighborhoods overlap. However, manually exploring these embedding spaces to uncover encoded linguistic properties requires considerable human effort. To address this challenge, we introduce a framework for semi-automatic annotation of these embedding properties. To organize the exploration process, we first define a taxonomy of explorable elements within a mapper graph such as nodes, edges, paths, components, and trajectories. The annotation of these elements is executed through two types of customizable LLM-based agents that employ perturbation techniques for scalable and automated analysis. These agents help to explore and explain the characteristics of mapper elements and verify the robustness of the generated explanations. We instantiate the framework within a visual analytics workspace and demonstrate its effectiveness through case studies. In particular, we replicate findings from prior research on BERT’s embedding properties across various layers of its architecture and provide further observations into the linguistic properties of topological neighborhoods.
大型语言模型( LLMS) 产生高维嵌入, 捕捉文字、 句子和概念之间丰富的语义和合成关系。 通过映射图形调查 LLM 嵌入空间的地形结构, 使我们能够理解其基础结构。 具体地说, 映像图概述了嵌入空间的地形结构, 每个节点代表着一个地形相邻( 包含嵌入群集) , 如果相应的邻里相重叠, 边际连接两个节点。 但是, 人工探索这些嵌入空间以发现编码的语言特性需要大量人力努力。 为了应对这一挑战, 我们引入了一个框架, 用于这些嵌入属性的半自动注释。 为了组织勘探进程, 我们首先定义了嵌入空间空间空间空间空间空间空间空间的元素的分类结构, 并验证了这些元素在地图、 边缘、 路径、 组成部分和轨迹图中的位置。 通过两种可定制的 LLMM 代理器, 使用渗透和自动分析技术进一步探索和解释。 这些代理器有助于探索和解释这些映射方元素元素的属性的特性特性特征, 我们通过直观的剖析图式的构造, 展示了其先前的构造研究, 展示了它生成的构造, 并验证了它生成了它的直观的构造。
Article 9
Title@2025-07-24 (4): Hybrid quantum-classical algorithm for near-optimal planning in POMDPs
Title: Hybrid quantum-classical algorithm for near-optimal planning in POMDPs | Hybrider quantenklassischer Algorithmus zur nahezu optimalen Planung in POMDPs | POMDPs中接近最佳规划的混合量子-古典量子算法 2507.18606v1 |
Authors (5): Gilberto Cunha, Alexandra Ramôa, André Sequeira, Michael de Oliveira, Luís Barbosa
Reinforcement learning (RL) provides a principled framework for decision-making in partially observable environments, which can be modeled as Markov decision processes and compactly represented through dynamic decision Bayesian networks. Recent advances demonstrate that inference on sparse Bayesian networks can be accelerated using quantum rejection sampling combined with amplitude amplification, leading to a computational speedup in estimating acceptance probabilities.\ Building on this result, we introduce Quantum Bayesian Reinforcement Learning (QBRL), a hybrid quantum-classical look-ahead algorithm for model-based RL in partially observable environments. We present a rigorous, oracle-free time complexity analysis under fault-tolerant assumptions for the quantum device. Unlike standard treatments that assume a black-box oracle, we explicitly specify the inference process, allowing our bounds to more accurately reflect the true computational cost. We show that, for environments whose dynamics form a sparse Bayesian network, horizon-based near-optimal planning can be achieved sub-quadratically faster through quantum-enhanced belief updates. Furthermore, we present numerical experiments benchmarking QBRL against its classical counterpart on simple yet illustrative decision-making tasks. Our results offer a detailed analysis of how the quantum computational advantage translates into decision-making performance, highlighting that the magnitude of the advantage can vary significantly across different deployment settings.
加强学习(RL)为部分可观测环境中的决策提供了一个原则性框架,可以仿照Markov决策过程,并通过动态决策贝叶西亚网络进行精密代表。最近的进展表明,利用量子拒绝抽样,加上振幅放大,可以加快稀有的巴伊西亚网络的推论速度,从而在估计接受概率方面实现计算加速。\ 以这一结果为基础,我们引入了量子贝伊西亚强化学习(QBRL),这是在部分可观测环境中基于模型的RL的混合量子级经典外观算法。我们根据量子装置的容错假设,提出了严格、无孔、无时间复杂性分析。与假定黑盒或触雷的标准处理不同,我们明确规定了推论过程,使我们的界限能够更准确地反映真实的计算成本。我们表明,对于动态形成稀薄贝网络的环境,通过量子但又能见度的信念更新,可以更快地实现次赤道级超前的量级算算算算算法。此外,我们提出了一个数字性实验性实验,将简单的QBRL决定的精确度定位转化为模拟的模型分析,可以大大地将我们简单的计算分析用于简单的计算。
Article 10
Title@2025-07-24 (4): Beyond Euclid: An Illustrated Guide to Modern Machine Learning with Geometric, Topological, and Algebraic Structures
Title: Beyond Euclid: An Illustrated Guide to Modern Machine Learning with Geometric, Topological, and Algebraic Structures | Beyond Euklid: Ein illustrierter Leitfaden zum modernen maschinellen Lernen mit geometrischen, topologischen und algebraischen Strukturen | 欧几里特以外:带有几何、地形学和代数结构的现代机器学习设计指南 2407.09468v2 |
Authors (11): Mathilde Papillon, Sophia Sanborn, Johan Mathe, Louisa Cornelis, Abby Bertics, Domas Buracas, Hansen J Lillemark, Christian Shewmake, Fatih Dinc, Xavier Pennec, Nina Miolane
The enduring legacy of Euclidean geometry underpins classical machine learning, which, for decades, has been primarily developed for data lying in Euclidean space. Yet, modern machine learning increasingly encounters richly structured data that is inherently nonEuclidean. This data can exhibit intricate geometric, topological and algebraic structure: from the geometry of the curvature of space-time, to topologically complex interactions between neurons in the brain, to the algebraic transformations describing symmetries of physical systems. Extracting knowledge from such non-Euclidean data necessitates a broader mathematical perspective. Echoing the 19th-century revolutions that gave rise to non-Euclidean geometry, an emerging line of research is redefining modern machine learning with non-Euclidean structures. Its goal: generalizing classical methods to unconventional data types with geometry, topology, and algebra. In this review, we provide an accessible gateway to this fast-growing field and propose a graphical taxonomy that integrates recent advances into an intuitive unified framework. We subsequently extract insights into current challenges and highlight exciting opportunities for future development in this field.
古典机器学习是古典机器学习的基础,数十年来,古典机器学习主要是为位于欧几里德空间的数据而开发的。然而,现代机器学习越来越多地遇到具有丰富结构、本质上非欧几里德的数据。这些数据可以展示复杂的几何、地貌学和代数结构:从时空曲线的几何学,到大脑神经神经元之间的地形复杂互动,到描述物理系统对称的代数变迁。从这种非欧几里德数据提取知识需要更广泛的数学视角。回馈19世纪的革命导致非欧几里德地理测量,正在形成的研究线是重新定义与非欧几里德结构的现代机器学习。其目标:将古典方法归纳为非常规数据类型,包括地貌学、表学和代数学。在本次审查中,我们为这个快速增长的字段提供了一个可进入的门户,并提出了将最近的进展纳入一个直观统一框架的图形化的分类学方法。我们随后为当前发展领域探索了各种机遇。
Article 11
Title@2025-07-24 (4): Demystify Protein Generation with Hierarchical Conditional Diffusion Models
Title: Demystify Protein Generation with Hierarchical Conditional Diffusion Models | Entmystifizieren Protein-Generation mit Hierarchische Bedingte Diffusion Modelle | 使用等级级有条件扩散模型解密蛋白一代 2507.18603v1 |
Authors (5): Zinan Ling, Yi Shi, Da Yan, Yang Zhou, Bo Hui
Generating novel and functional protein sequences is critical to a wide range of applications in biology. Recent advancements in conditional diffusion models have shown impressive empirical performance in protein generation tasks. However, reliable generations of protein remain an open research question in de novo protein design, especially when it comes to conditional diffusion models. Considering the biological function of a protein is determined by multi-level structures, we propose a novel multi-level conditional diffusion model that integrates both sequence-based and structure-based information for efficient end-to-end protein design guided by specified functions. By generating representations at different levels simultaneously, our framework can effectively model the inherent hierarchical relations between different levels, resulting in an informative and discriminative representation of the generated protein. We also propose a Protein-MMD, a new reliable evaluation metric, to evaluate the quality of generated protein with conditional diffusion models. Our new metric is able to capture both distributional and functional similarities between real and generated protein sequences while ensuring conditional consistency. We experiment with the benchmark datasets, and the results on conditional protein generation tasks demonstrate the efficacy of the proposed generation framework and evaluation metric.
生成新型和功能性蛋白序列对于生物学的广泛应用至关重要。最近有条件传播模型的进步表明,在蛋白质生成任务方面的经验性表现令人印象深刻。然而,可靠的几代蛋白仍然是新蛋白设计中的一个开放研究问题,特别是在有条件传播模型方面。考虑到蛋白质的生物功能是由多层次结构决定的,我们提议了一个新型的多层次有条件传播模型,将基于序列和基于结构的信息结合起来,以特定功能为指导,用于高效端到端蛋白设计。通过在不同层次同时生成代表,我们的框架可以有效地模拟不同层次之间固有的等级关系,从而产生出产蛋白质的信息和歧视性代表。我们还提议了一个新的可靠评估指标,即蛋白质的蛋白质质量和有条件传播模型。我们的新指标能够捕捉到真实蛋白序列和生成蛋白序列之间的分布和功能相似性,同时确保有条件的一致性。我们试验基准数据集,以及有条件蛋白生成任务的结果展示了拟议生成框架和评估指标的功效。
Article 12
Title@2025-07-24 (4): Sparse Logit Sampling: Accelerating Knowledge Distillation in LLMs
Title: Sparse Logit Sampling: Accelerating Knowledge Distillation in LLMs | Sparse Logit Sampling: Beschleunigung der Wissensdestillation in LLMs | 粗略的登录抽样:加速在LLMs中进行知识蒸馏 2503.16870v2 |
Authors (8): Anshumann, Mohd Abbas Zaidi, Akhil Kedia, Jinwoo Ahn, Taehwak Kwon, Kangwook Lee, Haejun Lee, Joohyung Lee
Knowledge distillation can be a cost-effective technique to distill knowledge in Large Language Models, if the teacher output logits can be pre-computed and cached. However, successfully applying this to pre-training remains largely unexplored. In this work, we prove that naive approaches for sparse knowledge distillation such as caching Top-K probabilities, while intuitive, provide biased estimates of teacher probability distribution to the student, resulting in suboptimal performance and calibration. We propose an importance-sampling-based method `Random Sampling Knowledge Distillation’, which provides unbiased estimates, preserves the gradient in expectation, and requires storing significantly sparser logits. Our method enables faster training of student models with marginal overhead (<10%) compared to cross-entropy based training, while maintaining competitive performance compared to full distillation, across a range of model sizes from 300M to 3B.
知识蒸馏可以是一种在大语言模型中蒸馏知识的具有成本效益的技术,如果教师产出记录可以预先计算和缓存的话。然而,成功地将它应用于培训前的学习,基本上尚未探索。在这项工作中,我们证明,对诸如caching Top-K概率等知识蒸馏的幼稚方法,我们虽然直观,但向学生提供教师概率分布的偏差估计,从而导致不尽善的性能和校准。我们建议一种基于重要性的采样方法“兰多姆采样知识蒸馏”,提供不偏颇的估计,保持期望的梯度,并需要储存大量稀疏的登录。我们的方法使得对边际间接( < 10%) 的学生模型比跨机率培训更快地培训,同时保持竞争性的性能与完全蒸馏相比,从300M到3B不等的模型规模。
Article 13
Title@2025-07-24 (4): Linear Memory SE(2) Invariant Attention
Title: Linear Memory SE(2) Invariant Attention | Linearer Speicher SE(2) Invariante Aufmerksamkeit | 线性内存 SE(2) 惯性注意 2507.18597v1 |
Authors (6): Ethan Pronovost, Neha Boloor, Peter Schleede, Noureldin Hendy, Andres Morales, Nicholas Roy
Processing spatial data is a key component in many learning tasks for autonomous driving such as motion forecasting, multi-agent simulation, and planning. Prior works have demonstrated the value in using SE(2) invariant network architectures that consider only the relative poses between objects (e.g. other agents, scene features such as traffic lanes). However, these methods compute the relative poses for all pairs of objects explicitly, requiring quadratic memory. In this work, we propose a mechanism for SE(2) invariant scaled dot-product attention that requires linear memory relative to the number of objects in the scene. Our SE(2) invariant transformer architecture enjoys the same scaling properties that have benefited large language models in recent years. We demonstrate experimentally that our approach is practical to implement and improves performance compared to comparable non-invariant architectures.
处理空间数据是运动预测、多试剂模拟和规划等许多自主驱动学习任务的一个关键组成部分。先前的工程已经证明了使用SE(2)变量网络结构的价值,这些结构只考虑物体之间的相对构成(例如其他物剂、交通道等场景特征)。然而,这些方法计算了所有对象对一对的相对构成,明确需要二次体积内存。在这项工作中,我们提出了SE(2)变量扩大的点产品关注机制,它要求相对于现场物体的数量进行线性内存。我们的SE(2)变量变异变异变异变异变异器结构具有相同的缩放特性,近年来使大型语言模型受益。我们实验性地证明,我们的方法是切合实际的,可以执行和改进与可比的非变量结构的性能。
Article 14
Title@2025-07-24 (4): Private Counterfactual Retrieval
Title: Private Counterfactual Retrieval | Private kontraaktische Retrieval | 私人反事实检索 2410.13812v2 |
Authors (5): Mohamed Nomeir, Pasan Dissanayake, Shreya Meel, Sanghamitra Dutta, Sennur Ulukus
Transparency and explainability are two extremely important aspects to be considered when employing black-box machine learning models in high-stake applications. Providing counterfactual explanations is one way of fulfilling this requirement. However, this also poses a threat to the privacy of both the institution that is providing the explanation as well as the user who is requesting it. In this work, we propose multiple schemes inspired by private information retrieval (PIR) techniques which ensure the \emph{user’s privacy} when retrieving counterfactual explanations. We present a scheme which retrieves the \emph{exact} nearest neighbor counterfactual explanation from a database of accepted points while achieving perfect (information-theoretic) privacy for the user. While the scheme achieves perfect privacy for the user, some leakage on the database is inevitable which we quantify using a mutual information based metric. Furthermore, we propose strategies to reduce this leakage to achieve an advanced degree of database privacy. We extend these schemes to incorporate user’s preference on transforming their attributes, so that a more actionable explanation can be received. Since our schemes rely on finite field arithmetic, we empirically validate our schemes on real datasets to understand the trade-off between the accuracy and the finite field sizes. Finally, we present numerical results to support our theoretical findings, and compare the database leakage of the proposed schemes.
透明度和解释性是使用黑盒机器在高空应用中学习模型时需要考虑的两个极为重要的方面。 提供反事实解释是满足这一要求的一种方式。 但是,这也威胁到提供解释的机构以及要求解释的用户的隐私。 在这项工作中,我们提出了由私人信息检索(PIR)技术启发的多种计划,确保在检索反事实解释时确保用户的隐私。 我们提出了一个计划,从一个公认的点数据库中检索到近邻的反事实解释,同时实现用户的完美(信息理论)隐私。 虽然这个计划实现了用户的完全隐私,但数据库中的某些漏洞是不可避免的,我们用基于共同信息的标准量化。此外,我们提出了减少这种泄漏的战略,以达到数据库隐私的更高程度。我们扩大了这些计划,以纳入用户对改变其属性的偏好,从而能够收到更可操作的解释。 由于我们的计划依赖于有限的字段的计算,我们从实验性地验证了我们目前关于真实数据准确性(信息理论-理论-理论-)的系统,我们用一个量化的数据- 来理解我们最终理解数据交易的结果。
Article 15
Title@2025-07-24 (4): DRWKV: Focusing on Object Edges for Low-Light Image Enhancement
Title: DRWKV: Focusing on Object Edges for Low-Light Image Enhancement | DRWKV: Fokussierung auf Objektkanten für Low-Light Image Enhancement | DRWKV: 关注低光图像增强对象边缘 2507.18594v1 |
Authors (8): Xuecheng Bai, Yuxiang Wang, Boyu Hu, Qinyuan Jie, Chuanzhi Xu, Hongru Xiao, Kechen Li, Vera Chung
Low-light image enhancement remains a challenging task, particularly in preserving object edge continuity and fine structural details under extreme illumination degradation. In this paper, we propose a novel model, DRWKV (Detailed Receptance Weighted Key Value), which integrates our proposed Global Edge Retinex (GER) theory, enabling effective decoupling of illumination and edge structures for enhanced edge fidelity. Secondly, we introduce Evolving WKV Attention, a spiral-scanning mechanism that captures spatial edge continuity and models irregular structures more effectively. Thirdly, we design the Bilateral Spectrum Aligner (Bi-SAB) and a tailored MS2-Loss to jointly align luminance and chrominance features, improving visual naturalness and mitigating artifacts. Extensive experiments on five LLIE benchmarks demonstrate that DRWKV achieves leading performance in PSNR, SSIM, and NIQE while maintaining low computational complexity. Furthermore, DRWKV enhances downstream performance in low-light multi-object tracking tasks, validating its generalization capabilities.
低光图像增强仍然是一项艰巨的任务,特别是在保护极端照明降解下的物体边缘连续性和精细结构细节方面。在本文件中,我们提出一个新的模型,即DRWKV(详细受受重键值),它整合了我们拟议的全球边缘视像(GER)理论,使得照明和边缘结构能够有效脱钩,以提高边缘对等性。第二,我们引入了动态WKV注意,一个螺旋扫描机制,它能够更有效地捕捉空间边缘连续性和模型异常结构。第三,我们设计了双边光谱对称仪(Bi-SAB)和定制的MS2-LOLIE模型,以联合协调发光和染色特征,改善视觉自然性,减轻人工制品。关于五部LIE基准的广泛实验表明,DRWKV在保持低计算复杂性的同时,在PSNRR、SSIM和NIQE中取得了领先的性能。此外,DRWKV提高了低光多点跟踪任务下游的性。
Article 16
Title@2025-07-24 (4): On the Convergence of Gradient Descent on Learning Transformers with Residual Connections
Title: On the Convergence of Gradient Descent on Learning Transformers with Residual Connections | Über die Konvergenz des gradienten Abstiegs auf Lerntransformatoren mit residualen Verbindungen | 关于有残余连接的学习变异器的 “ 渐渐后代 “ 趋同 2506.05249v3 |
Authors (3): Zhen Qin, Jinxin Zhou, Zhihui Zhu
Transformer models have emerged as fundamental tools across various scientific and engineering disciplines, owing to their outstanding performance in diverse applications. Despite this empirical success, the theoretical foundations of Transformers remain relatively underdeveloped, particularly in understanding their training dynamics. Existing research predominantly examines isolated components–such as self-attention mechanisms and feedforward networks–without thoroughly investigating the interdependencies between these components, especially when residual connections are present. In this paper, we aim to bridge this gap by analyzing the convergence behavior of a structurally complete yet single-layer Transformer, comprising self-attention, a feedforward network, and residual connections. We demonstrate that, under appropriate initialization, gradient descent exhibits a linear convergence rate, where the convergence speed is determined by the minimum and maximum singular values of the output matrix from the attention layer. Moreover, our analysis reveals that residual connections serve to ameliorate the ill-conditioning of this output matrix, an issue stemming from the low-rank structure imposed by the softmax operation, thereby promoting enhanced optimization stability. We also extend our theoretical findings to a multi-layer Transformer architecture, confirming the linear convergence rate of gradient descent under suitable initialization. Empirical results corroborate our theoretical insights, illustrating the beneficial role of residual connections in promoting convergence stability.
尽管取得了这一经验性的成功,但变形者的理论基础仍然相对不够发达,特别是在理解培训动态方面。现有研究主要考察孤立的部件,如自我注意机制和进化前网络,而没有彻底调查这些组成部分之间的相互依存关系,特别是在存在剩余连接的情况下。在本文件中,我们的目标是通过分析结构完整但单一层次变形器的趋同行为来弥合这一差距,该变形器包括自我注意、进料向前网络和剩余连接。我们证明,在适当的初始化下,梯度下降趋势显示一种线性趋同率,这种趋同速度是由注意层产出矩阵的最低和最高单值决定的。此外,我们的分析表明,残余联系有助于改善这一产出矩阵的不适当调节,这是由软式操作所强加的低级结构引起的一个问题,从而促进优化稳定性。我们还将我们的理论发现扩大到多层次变形结构,确认了在适当初始化下梯度下降的线性趋同率。
Article 17
Title@2025-07-24 (4): Agentar-Fin-R1: Enhancing Financial Intelligence through Domain Expertise, Training Efficiency, and Advanced Reasoning
Title: Agentar-Fin-R1: Enhancing Financial Intelligence through Domain Expertise, Training Efficiency, and Advanced Reasoning | Agent-Fin-R1: Verbesserung der Finanzintelligenz durch Domain-Expertise, Trainingseffizienz und Advanced Reasoning | Agentar Fin-Fin-R1:通过域域专门知识、培训效率和高级理由加强金融情报 2507.16802v3 |
Authors (13): Yanjun Zheng, Xiyang Du, Longfei Liao, Xiaoke Zhao, Zhaowen Zhou, Jingze Song, Bo Zhang, Jiawei Liu, Xiang Qi, Zhe Li, Zhiqiang Zhang, Wei Wang, Peng Zhang
Large Language Models (LLMs) exhibit considerable promise in financial applications; however, prevailing models frequently demonstrate limitations when confronted with scenarios that necessitate sophisticated reasoning capabilities, stringent trustworthiness criteria, and efficient adaptation to domain-specific requirements. We introduce the Agentar-Fin-R1 series of financial large language models (8B and 32B parameters), specifically engineered based on the Qwen3 foundation model to enhance reasoning capabilities, reliability, and domain specialization for financial applications. Our optimization approach integrates a high-quality, systematic financial task label system with a comprehensive multi-layered trustworthiness assurance framework. This framework encompasses high-quality trustworthy knowledge engineering, multi-agent trustworthy data synthesis, and rigorous data validation governance. Through label-guided automated difficulty-aware optimization, tow-stage training pipeline, and dynamic attribution systems, we achieve substantial improvements in training efficiency. Our models undergo comprehensive evaluation on mainstream financial benchmarks including Fineva, FinEval, and FinanceIQ, as well as general reasoning datasets such as MATH-500 and GPQA-diamond. To thoroughly assess real-world deployment capabilities, we innovatively propose the Finova evaluation benchmark, which focuses on agent-level financial reasoning and compliance verification. Experimental results demonstrate that Agentar-Fin-R1 not only achieves state-of-the-art performance on financial tasks but also exhibits exceptional general reasoning capabilities, validating its effectiveness as a trustworthy solution for high-stakes financial applications. The Finova bench is available at https://github.com/antgroup/Finova.
大型语言模型(LLMS)在财务应用方面有相当大的希望;然而,现行模型在遇到需要精密推理能力、严格可信标准以及有效适应具体领域要求的情景时往往显示出局限性。我们引入了金融大语言模型(8B和32B参数)的Astrar-Fin-R1系列财务大语言模型(8B和32B参数),具体根据Quen3基础模型设计,以提高金融应用的推理能力、可靠性和领域专业化。我们的优化方法将高质量、系统的金融任务标签系统与全面的多层次可靠保证框架结合起来。这一框架包括高质量的可信赖的知识工程、多剂可信赖的数据合成和严格的数据验证治理。我们通过标签引导自动自动识别困难优化、启动阶段培训管道和动态归属系统,我们在培训效率方面取得了重大改进。我们的模型对主流金融基准(包括Finva、FinEval、FinIQQ,以及MATH-500和GPQA-diamon等通用推理数据集等)进行了全面评价。这一框架包括高质量的实际部署能力评估,我们提出了Finova评估基准评估基准-Final-ILA-ILA-ILA-ILA-I),该基准,该基准也显示其高级的合规性检验能力,该基准,该标准,该基准也显示其用于用于进行高水平性财务水平性财务水平的透明性检验。
Article 18
Title@2025-07-24 (4): Beyond Internal Data: Constructing Complete Datasets for Fairness Testing
Title: Beyond Internal Data: Constructing Complete Datasets for Fairness Testing | Jenseits interner Daten: Konstruieren vollständiger Datensätze für Fairness-Tests | 超越内部数据:为公平测试建立完整的数据集 2507.18561v1 |
Authors (4): Varsha Ramineni, Hossein A. Rahmani, Emine Yilmaz, David Barber
As AI becomes prevalent in high-risk domains and decision-making, it is essential to test for potential harms and biases. This urgency is reflected by the global emergence of AI regulations that emphasise fairness and adequate testing, with some mandating independent bias audits. However, procuring the necessary data for fairness testing remains a significant challenge. Particularly in industry settings, legal and privacy concerns restrict the collection of demographic data required to assess group disparities, and auditors face practical and cultural challenges in gaining access to data. Further, internal historical datasets are often insufficiently representative to identify real-world biases. This work focuses on evaluating classifier fairness when complete datasets including demographics are inaccessible. We propose leveraging separate overlapping datasets to construct complete synthetic data that includes demographic information and accurately reflects the underlying relationships between protected attributes and model features. We validate the fidelity of the synthetic data by comparing it to real data, and empirically demonstrate that fairness metrics derived from testing on such synthetic data are consistent with those obtained from real data. This work, therefore, offers a path to overcome real-world data scarcity for fairness testing, enabling independent, model-agnostic evaluation of fairness, and serving as a viable substitute where real data is limited.
由于大赦国际在高风险领域和决策中很普遍,因此必须测试潜在的伤害和偏见。这一紧迫性反映在大赦国际条例的全球出现,这些条例强调公平和充分测试,并授权进行独立的偏见审计。然而,为公平测试获取必要的数据仍是一项重大挑战。特别是在行业环境中,法律和隐私方面的关切限制了为评估群体差异而收集人口数据,审计员在获取数据方面面临实际和文化挑战。此外,内部历史数据集往往没有足够的代表性,无法识别真实世界的偏见。这项工作的重点是在无法获得包括人口在内的完整数据集时评价分类员的公正性。我们提议利用单独的重叠数据集来构建完整的合成数据,其中包括人口信息,准确反映受保护特征和模型特征之间的根本关系。我们通过将合成数据与真实数据进行比较来验证合成数据的准确性,并实证地证明从此类合成数据测试中获得的公平性指标与从真实数据中获得的指标一致。因此,这项工作提供了一条克服真实世界数据稀缺性的途径,以便进行公平测试,从而能够进行独立的、模型性公正评估,并作为一种可行的替代手段,在实际数据有限的情况下作为可行的替代。
Article 19
Title@2025-07-24 (4): Neural Tangent Kernels and Fisher Information Matrices for Simple ReLU Networks with Random Hidden Weights
Title: Neural Tangent Kernels and Fisher Information Matrices for Simple ReLU Networks with Random Hidden Weights | Neural Tangent Kernel und Fisher Information Matrizen für einfache ReLU-Netzwerke mit zufälligen versteckten Gewichten | 带有随机隐藏重的简单 ReLU 网络神经相垂直内核和渔业信息矩阵 2507.18555v1 |
Authors (6): Jun’ichi Takeuchia, Yoshinari Takeishia, Noboru Muratab, Kazushi Mimurac, Ka Long Keith Hod, Hiroshi Nagaoka
Fisher information matrices and neural tangent kernels (NTK) for 2-layer ReLU networks with random hidden weight are argued. We discuss the relation between both notions as a linear transformation and show that spectral decomposition of NTK with concrete forms of eigenfunctions with major eigenvalues. We also obtain an approximation formula of the functions presented by the 2-layer neural networks.
我们讨论了作为线性变换的两个概念之间的关系,并表明NTK的光谱分解与主要电子值的脑功能的混凝土形式。我们还获得了由2层神经网络提供的功能的近似公式。
Article 20
Title@2025-07-24 (4): Omni-Thinker: Scaling Cross-Domain Generalization in LLMs via Multi-Task RL with Hybrid Rewards
Title: Omni-Thinker: Scaling Cross-Domain Generalization in LLMs via Multi-Task RL with Hybrid Rewards | Omni-Thinker: Skalierung der Cross-Domain-Verallgemeinerung in LLMs über Multi-Task RL mit Hybrid Rewards | Omni-Thinker:通过多任务RL与混合奖励在LLMLM中扩大跨域通用化 2507.14783v2 |
Authors (13): Derek Li, Jiaming Zhou, Amirreza Kazemi, Qianyi Sun, Abbas Ghaddar, Mohammad Ali Alomrani, Liheng Ma, Yu Luo, Dong Li, Feng Wen, Jianye Hao, Mark Coates, Yingxue Zhang
The advancement of general-purpose artificial intelligence relies on large language models (LLMs) that excel across a wide range of tasks, from structured reasoning to creative generation. However, post-training methods like Supervised Fine-Tuning (SFT) often struggle with generalization, favoring memorization over transferable learning. In this work, we introduce Omni-Thinker, a unified reinforcement learning (RL) framework that enhances LLM performance across diverse tasks by combining rule-based verifiable rewards with generative preference signals via LLM-as-a-Judge evaluations. Our approach enables consistent optimization across task types and scales RL-based training to subjective domains. We further investigate training strategies, demonstrating that a curriculum-based progression that orders tasks from structured to open-ended improves performance and reduces forgetting. Experimental results across four domains reveal that curriculum learning improves performance by 5.2% over joint training and 9.1% over model merging. These results highlight the importance of task-aware sampling and hybrid supervision in scaling RL-based post-training for general-purpose LLMs.
通用人工智能的进步取决于大型语言模型(LLMS),这些模型在从结构推理到创造性的生成等一系列任务中都具有超强性能。然而,监督性美术(SFT)等培训后方法往往与一般化斗争,偏重于记忆而不是可转让学习。在这项工作中,我们引入了Omni-Thinker(一个统一的强化学习框架),通过将基于规则的可核实奖励与通过LLM-as-a-judge评价的基因优惠信号相结合,提高LLM的绩效。我们的方法使得任务类型和规模基于RL的训练能够一致优化到主观领域。我们进一步调查培训战略,表明从结构化到开放化的任务按课程排列的进度可以改进业绩并减少遗忘。四个领域的实验结果表明,课程学习提高了业绩,5.2%高于联合培训,9.1%高于模式合并。这些结果突出表明,基于任务抽样和混合监督对于扩大基于RL的普通LLL的后培训十分重要。
Article 21
Title@2025-07-24 (4): LagKV: Lag-Relative Information of the KV Cache Tells Which Tokens Are Important
Title: LagKV: Lag-Relative Information of the KV Cache Tells Which Tokens Are Important | LagKV: Lag-Relative Information des KV-Cache erzählt, welche Token wichtig sind | LagKV: KV 缓存告诉哪个 Tokens 重要, 而 KV 缓存的拉格- 相对信息Name 2504.04704v2 |
Authors (4): Manlai Liang, JiaMing Zhang, Xiong Li, Jinlong Li
The increasing size of the Key-Value (KV) cache during the Large Language Models long-context inference is the main obstacle for its balance between the deployment cost and task accuracy. To reduce the KV cache size in such scenarios, most previous efforts leveraged on the attention weight to evict non-critical cache tokens. But there is a trade-off in those methods, they usually require major modification of the inference infrastructure and significant computation overhead. Based on the fact that the Large Language models are autoregressive models, we propose LagKV, a KV compression strategy only relying on straight forward comparison among KV themselves. It is a totally attention free method which offers easy integration to the main stream inference platform and comparable performance comparing to other complicated KV compression methods. Results on RULER benchmark show that, our approach outperforms SnapKV and StreamingLLM in different compression ratios. Especially in the 64-digit passkey retrieval task, our method outperforms the attention weight based method $H_2O$ over $50\%$ with same compression ratios. Our code is available at https://github.com/AI-Lab-China-Merchants-Bank/LagKV.
在大语言模型中,Key-Value(KV)缓存在长长的文本推论中日益扩大的大小是其部署成本和任务准确性之间平衡的主要障碍。为了减少KV缓存规模,大多数先前的努力都以驱离非关键缓存符号的重力为杠杆。但是,这些方法有一个权衡,它们通常要求对推论基础设施和重要的计算间接费用进行重大修改。基于大语言模型是自动递增模型这一事实,我们提议LagKV,即KV压缩战略,仅依靠直接前向比较KV本身。这是一种完全免费的注意方法,可以很容易地融入主流推断平台,比较其他复杂的KV压缩方法,比较性能。RULER基准的结果显示,我们的方法超越了不同压缩比率的SningKV和StreamingLLLLM。特别是在64位的过关钥匙检索任务中,我们的方法比基于注意权重的方法高了$H_2O$50美元以上,与相同的压缩比率。我们的代码可以在 https://Kgimas/Banants/Ban-Mest/Canants.
Article 22
Title@2025-07-24 (4): The Geometry of LLM Quantization: GPTQ as Babai’s Nearest Plane Algorithm
Title: The Geometry of LLM Quantization: GPTQ as Babai’s Nearest Plane Algorithm | Die Geometrie der LLM-Quantisierung: GPTQ als Babai’s nächste Flugzeugalgorithmus | LLM 定量法的几何测量:GPTQ作为Babai最接近的平地 2507.18553v1 |
Authors (3): Jiale Chen, Torsten Hoefler, Dan Alistarh
Quantizing the weights of large language models (LLMs) from 16-bit to lower bitwidth is the de facto approach to deploy massive transformers onto more affordable accelerators. GPTQ emerged as one of the standard methods for one-shot post-training quantization at LLM scale. Yet, its inner workings are described as a sequence of ad-hoc algebraic updates that obscure any geometric meaning or worst-case guarantees. In this work, we show that, when executed back-to-front (from the last to first dimension) for a linear layer, GPTQ is mathematically identical to Babai’s nearest plane algorithm for the classical closest vector problem (CVP) on a lattice defined by the Hessian matrix of the layer’s inputs. This equivalence is based on a sophisticated mathematical argument, and has two analytical consequences: (i) the GPTQ error propagation step gains an intuitive geometric interpretation; (ii) GPTQ inherits the error upper bound of Babai’s algorithm under the no-clipping condition. Taken together, these results place GPTQ on firm theoretical footing and open the door to importing decades of progress in lattice algorithms towards the design of future quantization algorithms for billion-parameter models.
将大语言模型(LLMS)从16位位数到低位位数的重量量化为16位数到低位数是将大型变压器部署到更廉价的加速器上的实际方法。 GPTQ 成为了LLM 级一次性培训后量化的标准方法之一。 然而,它的内部功能被描述为一种超高代数更新序列,模糊了任何几何含义或最坏的保证。 在这项工作中,我们表明,在对线性层进行背对背执行(从最后至第一维)时,GPTQ在数学上与Babai最接近的对传统最接近的矢量问题(CVP)的平面算法完全相同,这是由Hesian语层投入矩阵界定的平面平面平面。这一等值基于一个复杂的数学参数推论,具有两个分析结果:(一) GPTQ错误传播步骤获得直观的几何解释;(二) GPTQ在不平底线层值状态下继承Babai算算法的错误上限,在数十亿位数模型上将GPTQ在数十年开的模型上取得。
Article 23
Title@2025-07-24 (4): Zeroth-Order Fine-Tuning of LLMs in Random Subspaces
Title: Zeroth-Order Fine-Tuning of LLMs in Random Subspaces | Zeroth-Order Feinsteuerung von LLMs in Random Subspaces | 随机子空间中LLMs的零级微调微调 2410.08989v3 |
Authors (6): Ziming Yu, Pan Zhou, Sike Wang, Jia Li, Mi Tian, Hua Huang
Fine-tuning Large Language Models (LLMs) has proven effective for a variety of downstream tasks. However, as LLMs grow in size, the memory demands for backpropagation become increasingly prohibitive. Zeroth-order (ZO) optimization methods offer a memory-efficient alternative by using forward passes to estimate gradients, but the variance of gradient estimates typically scales linearly with the model’s parameter dimension$\unicode{x2013}$a significant issue for LLMs. In this paper, we propose the random Subspace Zeroth-order (SubZero) optimization to address the challenges posed by LLMs’ high dimensionality. We introduce a low-rank perturbation tailored for LLMs that significantly reduces memory consumption while improving training performance. Additionally, we prove that our gradient estimation closely approximates the backpropagation gradient, exhibits lower variance than traditional ZO methods, and ensures convergence when combined with SGD. Experimental results show that SubZero enhances fine-tuning performance and achieves faster convergence compared to standard ZO approaches like MeZO across various language modeling tasks. Code is available at https://github.com/zimingyy/SubZero.
微调大型语言模型(LLMS)在一系列下游任务中被证明是有效的。然而,随着LLMS规模的扩大,对回推进的内存需求变得越来越令人望而却步。 Zero-order(ZO)优化方法通过使用远端传票估计梯度,提供了一种记忆效率高的替代方法,但梯度估计的差异通常与模型的参数维度成线缩放,$\unicode{x2013}美元,对于LLMS来说是一个重大问题。在本文中,我们建议随机的子空间零顺序(SubZero)优化,以应对LLMS的高维度所带来的挑战。我们为LMS引入了一种低级的低档次振荡,大大降低记忆消耗量,同时改进了培训绩效。此外,我们证明我们的梯度估计非常接近了反光度梯度梯度梯度梯度梯度,比传统的ZO方法差,并且确保与SGDGD相结合时的趋同。实验结果表明,SubZero将提高微调性性性性性性性性性性性性性工作,并比MeZO等标准的ZO方法在各种语言模型任务中实现更快的趋同。可调合。 http://Zy/Subzyution/comming/Zode可查。
Article 24
Title@2025-07-24 (4): On the Performance of Concept Probing: The Influence of the Data (Extended Version)
Title: On the Performance of Concept Probing: The Influence of the Data (Extended Version) | Zur Performance von Konzept-Probing: Der Einfluss der Daten (Erweiterte Version) | 关于 “ 概念检验:数据的影响 “ 的绩效(扩展版) 2507.18550v1 |
Authors (3): Manuel de Sousa Ribeiro, Afonso Leote, João Leite
Concept probing has recently garnered increasing interest as a way to help interpret artificial neural networks, dealing both with their typically large size and their subsymbolic nature, which ultimately renders them unfeasible for direct human interpretation. Concept probing works by training additional classifiers to map the internal representations of a model into human-defined concepts of interest, thus allowing humans to peek inside artificial neural networks. Research on concept probing has mainly focused on the model being probed or the probing model itself, paying limited attention to the data required to train such probing models. In this paper, we address this gap. Focusing on concept probing in the context of image classification tasks, we investigate the effect of the data used to train probing models on their performance. We also make available concept labels for two widely used datasets.
最近,概念探测作为一种帮助解释人造神经网络的方法引起了越来越多的兴趣,这种解释既涉及其典型的庞大规模,也涉及其亚共振性质,最终使其无法直接进行人类解释; 概念探测工作,培训更多的分类人员将模型的内部表现映射成人类定义的感兴趣概念,从而使人类能够窥视人造神经网络; 概念探测研究主要侧重于被探测的模型或探测模型本身,对培训这种实验模型所需的数据给予有限的关注; 本文讨论了这一差距; 侧重于在图像分类任务中进行概念探测,我们调查用于培训模型的模型对其性能的影响; 我们还为两种广泛使用的数据集提供概念标签。
Article 25
Title@2025-07-24 (4): The Price equation reveals a universal force-metric-bias law of algorithmic learning and natural selection
Title: The Price equation reveals a universal force-metric-bias law of algorithmic learning and natural selection | Die Preisgleichung zeigt ein universelles Gesetz des algorithmischen Lernens und der natürlichen Selektion. | 价格方程式揭示了一种通用的算法学习法和自然选择法 2507.18549v1 |
Authors (1): Steven A. Frank
Diverse learning algorithms, optimization methods, and natural selection share a common mathematical structure, despite their apparent differences. Here I show that a simple notational partitioning of change by the Price equation reveals a universal force-metric-bias (FMB) law: $\Delta\mathbf{\theta} = \mathbf{M}\,\mathbf{f} + \mathbf{b} + \mathbf{\xi}$. The force $\mathbf{f}$ drives improvement in parameters, $\Delta\mathbf{\theta}$, through the covariance between the parameters and performance. The metric $\mathbf{M}$ rescales movement by inverse curvature. The bias $\mathbf{b}$ adds momentum or changes in the frame of reference. The noise $\mathbf{\xi}$ enables exploration. This framework unifies natural selection, Bayesian updating, Newton’s method, stochastic gradient descent, stochastic Langevin dynamics, Adam optimization, and most other algorithms as special cases of the same underlying process. The Price equation also reveals why Fisher information, Kullback-Leibler divergence, and d’Alembert’s principle arise naturally in learning dynamics. By exposing this common structure, the FMB law provides a principled foundation for understanding, comparing, and designing learning algorithms across disciplines.
不同的学习算法、 优化方法和自然选择尽管存在明显的差异, 共有一个共同的数学结构。 我在这里显示, 以价格方程式对变化进行简单的符号分割, 通过参数和性能之间的共变法, 揭示了一种通用的力度偏差法: $\ Delta\ mathbff\theta} =\ mathbf{M,\ mathbf{f} +\ mathbf{b} +\ mathbbfxxy} +\ mathbbxxy$。 噪音 $mathbf{f} 允许探索。 这个框架将自然选择、 Newton 方法、 STartta\mathbthf_mathf} =thetaxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
Article 26
Title@2025-07-24 (4): Learning Gentle Grasping Using Vision, Sound, and Touch
Title: Learning Gentle Grasping Using Vision, Sound, and Touch | Sanftes Greifen lernen mit Vision, Sound und Touch | 利用愿景、声音和触摸进行轻巧的学习 2503.07926v2 |
Authors (2): Ken Nakahara, Roberto Calandra
In our daily life, we often encounter objects that are fragile and can be damaged by excessive grasping force, such as fruits. For these objects, it is paramount to grasp gently – not using the maximum amount of force possible, but rather the minimum amount of force necessary. This paper proposes using visual, tactile, and auditory signals to learn to grasp and regrasp objects stably and gently. Specifically, we use audio signals as an indicator of gentleness during the grasping, and then train an end-to-end action-conditional model from raw visuo-tactile inputs that predicts both the stability and the gentleness of future grasping candidates, thus allowing the selection and execution of the most promising action. Experimental results on a multi-fingered hand over 1,500 grasping trials demonstrated that our model is useful for gentle grasping by validating the predictive performance (3.27% higher accuracy than the vision-only variant) and providing interpretations of their behavior. Finally, real-world experiments confirmed that the grasping performance with the trained multi-modal model outperformed other baselines (17% higher rate for stable and gentle grasps than vision-only). Our approach requires neither tactile sensor calibration nor analytical force modeling, drastically reducing the engineering effort to grasp fragile objects. Dataset and videos are available at https://lasr.org/research/gentle-grasping.
在我们的日常生活中,我们经常遇到脆弱而且可能受到过度掌握力(如水果)破坏的物体,例如水果。对于这些物体,最重要的是轻轻地抓住 – – 不是使用尽可能大的力量,而是使用必要的最小的武力。本文提议使用视觉、触摸和听觉信号来学习如何用刀刺和轻轻的方式抓住和重新刻画物体。具体地说,我们用音频信号作为捕捉时温和的标志,然后从原始的相对触动性投入中训练一个端到端端的行动条件模型,预测未来候选人的稳定性和温柔性,从而允许选择和执行最有希望的行动。多指针的实验结果表明,我们的模型对于通过验证预测性能(比仅视像变异的变异方式高3.27%的精度)和解释它们的行为是有用的。最后,现实世界实验证实,经过训练的多模式的掌握性表现优于其他基线(17 % ) , 用来稳定而温和的捕捉取式动作/努力,因此,允许选择最有希望的行动。我们的数据模型和易变动的变压的变压式方法都不需要我们的数据动作, 。我们的数据动作方法不需要在可变动的变压的变压的变压的变压的变压的变压的变压的变压的变压的变压的变压的变压的变压。
Article 27
Title@2025-07-24 (4): Deep Variational Free Energy Calculation of Hydrogen Hugoniot
Title: Deep Variational Free Energy Calculation of Hydrogen Hugoniot | Tiefe Variationsfreie Energieberechnung von Wasserstoff Hugoniot | 雨原氢能深变化式自由能源计算 2507.18540v1 |
Authors (4): Zihang Li, Hao Xie, Xinyang Dong, Lei Wang
We develop a deep variational free energy framework to compute the equation of state of hydrogen in the warm dense matter region. This method parameterizes the variational density matrix of hydrogen nuclei and electrons at finite temperature using three deep generative models: a normalizing flow model that represents the Boltzmann distribution of the classical nuclei, an autoregressive transformer that models the distribution of electrons in excited states, and a permutational equivariant flow model that constructs backflow coordinates for electrons in Hartree-Fock orbitals. By jointly optimizing the three neural networks to minimize the variational free energy, we obtain the equation of state and related thermodynamic properties of dense hydrogen. We compare our results with other theoretical and experimental results on the deuterium Hugoniot curve, aiming to resolve existing discrepancies. The calculated results provide a valuable benchmark for deuterium in the warm dense matter region.
我们开发了一个深度的可变自由能源框架,以计算温暖稠密物质区域的氢状态方程式。这个方法使用三个深层基因模型,将氢核和定温电子的可变密度矩阵参数化:一个代表古典核的布尔兹曼分布的正常流模型,一个模拟刺激状态中电子分布的自动递减变变变变变变变变变变变变变变变变变变变变的变换流模型,一个用于构建Hartree-Fock轨道中电子回流坐标的变换等同变流模型。通过联合优化三个神经网络以尽量减少无变换能源,我们获得了密度氢的状态和热力特性等同。我们将我们的结果与其他理论和实验结果进行比较,目的是解决现有的差异。计算结果为热稠密物质区域内的离子提供了宝贵的基准。
Article 28
Title@2025-07-24 (4): AI/ML Life Cycle Management for Interoperable AI Native RAN
Title: AI/ML Life Cycle Management for Interoperable AI Native RAN | AI/ML Life Cycle Management für interoperable KI Native RAN | AI/ML 土著RAN 2507.18538v1 |
Authors (3): Chu-Hsiang Huang, Chao-Kai Wen, Geoffrey Ye Li
Artificial intelligence (AI) and machine learning (ML) models are rapidly permeating the 5G Radio Access Network (RAN), powering beam management, channel state information (CSI) feedback, positioning, and mobility prediction. However, without a standardized life-cycle management (LCM) framework, challenges, such as model drift, vendor lock-in, and limited transparency, hinder large-scale adoption. 3GPP Releases 16-20 progressively evolve AI/ML from experimental features to managed, interoperable network functions. Beginning with the Network Data Analytics Function (NWDAF) in Rel-16, subsequent releases introduced standardized interfaces for model transfer, execution, performance monitoring, and closed-loop control, culminating in Rel-20’s two-sided CSI-compression Work Item and vendor-agnostic LCM profile. This article reviews the resulting five-block LCM architecture, KPI-driven monitoring mechanisms, and inter-vendor collaboration schemes, while identifying open challenges in resource-efficient monitoring, environment drift detection, intelligent decision-making, and flexible model training. These developments lay the foundation for AI-native transceivers as a key enabler for 6G.
人工智能(AI)和机器学习(ML)模型正在迅速渗透5G无线电接入网络(RAN),授权波束管理、频道国家信息反馈、定位和流动预测,然而,如果没有标准化的生命周期管理框架,挑战,如模型漂移、供应商锁定和有限透明度,妨碍大规模采用。3GP 16-20号版本将AI/ML从实验特征逐步演变为管理、互操作网络功能。从Rel-16的网络数据分析功能(NWDAF)开始,随后的发布引入了用于模式转让、执行、性能监测和闭路控制的标准界面,最终形成雷尔-20号双面CSI压缩项目和供应商保密 LCM概况。本文章回顾了由此形成的5个街区LM结构、KPI驱动的监测机制和供应商间协作计划,同时确定了资源效率监测、环境漂移探测、智能决策以及灵活模式培训方面的公开挑战。这些动态为AI-Nation-National 中转器提供了基础。
Article 29
Title@2025-07-24 (4): External Knowledge Injection for CLIP-Based Class-Incremental Learning
Title: External Knowledge Injection for CLIP-Based Class-Incremental Learning | Externe Wissensinjektion für CLIP-basiertes Klassen-Inkrementelles Lernen | 为基于CLIP的高级类强化学习提供外部知识注射 2503.08510v2 |
Authors (6): Da-Wei Zhou, Kai-Wen Li, Jingyi Ning, Han-Jia Ye, Lijun Zhang, De-Chuan Zhan
Class-Incremental Learning (CIL) enables learning systems to continuously adapt to evolving data streams. With the advancement of pre-training, leveraging pre-trained vision-language models (e.g., CLIP) offers a promising starting point for CIL. However, CLIP makes decisions by matching visual embeddings to class names, overlooking the rich contextual information conveyed through language. For instance, the concept of ``cat’’ can be decomposed into features like tail, fur, and face for recognition. Besides, since the model is continually updated, these detailed features are overwritten in CIL, requiring external knowledge for compensation. In this paper, we introduce ExterNal knowledGe INjEction (ENGINE) for CLIP-based CIL. To enhance knowledge transfer from outside the dataset, we propose a dual-branch injection tuning framework that encodes informative knowledge from both visual and textual modalities. The visual branch is enhanced with data augmentation to enrich the visual features, while the textual branch leverages GPT-4 to rewrite discriminative descriptors. In addition to this on-the-fly knowledge injection, we also implement post-tuning knowledge by re-ranking the prediction results during inference. With the injected knowledge, the model can better capture informative features for downstream tasks as data evolves. Extensive experiments demonstrate the state-of-the-art performance of ENGINE. Code is available at: https://github.com/LAMDA-CL/ICCV25-ENGINE
类入门学习(CIL) 使学习系统能够不断适应不断变化的数据流。 随着培训前阶段的进步,利用预先培训的视觉语言模型(例如CLIP)为CIL提供了一个很有希望的起点。 但是, CLIP通过将视觉嵌入与类名相匹配来作出决定,同时忽略通过语言传递的丰富的背景信息。例如,“Cat’”的概念可以分解成尾、毛皮和面等特征以备识别。此外,由于模型不断更新,这些详细特征在CIL中被过度写,需要外部知识来补偿。在本文件中,我们为基于CILIP的CILIL引入ExterNal knowleadGe InjEction(ENGINEBINEction ENGINEGINGINGINGINGINGINGINGINGINGINGINGINGINGINGINGINGINGING) 。为了从数据集外加强知识转移,我们建议一个双管注入调框架,将视觉和文字方式的资讯知识分会通过数据放大来丰富视觉特征,同时将GPTPTER-4用于重写歧视性定义的描述。 此外, IMLILILLILILILILILILLLLAINAL IMLOL IML IMAL IML IML IML IML INAD INAD INAD INAD ING IM IM IM IM IM IM IM IM IM IM IM IML IML IML IM IM IM IML IM IM IML IMD IM IML IML IML IMD IML IML IML IM IM IML IML IML IM IM IM IM IM IM IM IML IML IML 上,还可以可改进可改进可改进可改进可改进可改进 上,我们可改进数据。
Article 30
Title@2025-07-24 (4): Elucidating the Design Space of Arbitrary-Noise-Based Diffusion Models
Title: Elucidating the Design Space of Arbitrary-Noise-Based Diffusion Models | Erklärung des Design-Raums für willkürlich-lärmbasierte Diffusionsmodelle | 说明以任意噪音为基础的传播模型的设计空间 2507.18534v1 |
Authors (10): Xingyu Qiu, Mengying Yang, Xinghua Ma, Dong Liang, Yuzhen Li, Fanding Li, Gongning Luo, Wei Wang, Kuanquan Wang, Shuo Li
EDM elucidates the unified design space of diffusion models, yet its fixed noise patterns restricted to pure Gaussian noise, limit advancements in image restoration. Our study indicates that forcibly injecting Gaussian noise corrupts the degraded images, overextends the image transformation distance, and increases restoration complexity. To address this problem, our proposed EDA Elucidates the Design space of Arbitrary-noise-based diffusion models. Theoretically, EDA expands the freedom of noise pattern while preserving the original module flexibility of EDM, with rigorous proof that increased noise complexity incurs no additional computational overhead during restoration. EDA is validated on three typical tasks: MRI bias field correction (global smooth noise), CT metal artifact reduction (global sharp noise), and natural image shadow removal (local boundary-aware noise). With only 5 sampling steps, EDA outperforms most task-specific methods and achieves state-of-the-art performance in bias field correction and shadow removal.
我们的研究显示,强行注射高斯噪音会腐蚀退化的图像,超长图像转换距离,并增加恢复的复杂性。为了解决这一问题,我们提议的埃达公司将任意噪音扩散模型的设计空间命名为Eluciates。理论上,埃达公司扩大了噪音模式的自由,同时保留了EDM公司原有模块的灵活性,严格证明,在恢复过程中,增加的噪音复杂性不会导致额外的计算间接间接费用。 EDA公司在三种典型任务上进行了验证:磁共振偏差场校正(全球光滑噪)、CT金属制品减少(全球尖声)和自然影子清除(地方边界意识噪音),只有5个取样步骤,EDA公司超越了大多数任务特定方法,在纠正偏见场和清除阴影方面实现了最先进的表现。
Article 31
Title@2025-07-24 (4): C2G-KD: PCA-Constrained Generator for Data-Free Knowledge Distillation
Title: C2G-KD: PCA-Constrained Generator for Data-Free Knowledge Distillation | C2G-KD: PCA-Constrained Generator für datenfreie Wissensdestillation | C2G-KD: 五氯苯甲醚-经培训的无数据知识蒸馏生成器 2507.18533v1 |
Authors (2): Magnus Bengtsson, Kenneth Östberg
We introduce C2G-KD, a data-free knowledge distillation framework where a class-conditional generator is trained to produce synthetic samples guided by a frozen teacher model and geometric constraints derived from PCA. The generator never observes real training data but instead learns to activate the teacher’s output through a combination of semantic and structural losses. By constraining generated samples to lie within class-specific PCA subspaces estimated from as few as two real examples per class, we preserve topological consistency and diversity. Experiments on MNIST show that even minimal class structure is sufficient to bootstrap useful synthetic training pipelines.
我们引入了无数据知识蒸馏框架C2G-KD, 这是一种无数据知识蒸馏框架, 使用这种框架, 使用冷冻的教师模型和来自五氯苯甲醚的几何限制,对一级有条件的发电机进行生产合成样品的培训。 发电机从不观察真正的培训数据,而是通过语义和结构损失的组合学习激活教师的输出。 通过限制生成的样本位于每类估计只有两个实际例子的单类特定五氯苯甲醚子空间内, 我们保持了表层一致性和多样性。 对MNIST的实验显示, 即使最低的等级结构也足以覆盖有用的合成培训管道。
Article 32
Title@2025-07-24 (4): Diffuse and Disperse: Image Generation with Representation Regularization
Title: Diffuse and Disperse: Image Generation with Representation Regularization | Diffuse und Disperse: Bildgenerierung mit Repräsentationsregularisierung | Diffuse & diffperse: 形象生成,有代表性的规范化 2506.09027v2 |
Authors (2): Runqian Wang, Kaiming He
The development of diffusion-based generative models over the past decade has largely proceeded independently of progress in representation learning. These diffusion models typically rely on regression-based objectives and generally lack explicit regularization. In this work, we propose \textit{Dispersive Loss}, a simple plug-and-play regularizer that effectively improves diffusion-based generative models. Our loss function encourages internal representations to disperse in the hidden space, analogous to contrastive self-supervised learning, with the key distinction that it requires no positive sample pairs and therefore does not interfere with the sampling process used for regression. Compared to the recent method of representation alignment (REPA), our approach is self-contained and minimalist, requiring no pre-training, no additional parameters, and no external data. We evaluate Dispersive Loss on the ImageNet dataset across a range of models and report consistent improvements over widely used and strong baselines. We hope our work will help bridge the gap between generative modeling and representation learning.
在过去十年中,开发基于传播的基因模型的工作基本上与代表性学习的进展无关,这些扩散模型通常依赖基于回归的目标,而且一般缺乏明确的规范化。在这项工作中,我们提议了\ textit{ 分散性损失},这是一个简单的插件和功能常规化器,可以有效地改进基于传播的基因模型。我们的损失功能鼓励内部的表述在隐蔽空间分散,类似于对比式的自我监督学习,其关键区别是它不需要正面的样本配对,因此不干扰用于回归的抽样过程。与最近的代表性调整方法(REPA)相比,我们的方法是自成一体的和最低限度的,不需要培训前的、额外的参数和外部数据。我们评估一系列模型中图像网络数据集的分散性损失,并报告广泛使用和强大的基线的一致改进。我们希望我们的工作将有助于缩小基因化模型和代表性学习之间的差距。
Article 33
Title@2025-07-24 (4): Are AI-Generated Fixes Secure? Analyzing LLM and Agent Patches on SWE-bench
Title: Are AI-Generated Fixes Secure? Analyzing LLM and Agent Patches on SWE-bench | Sind KI-erzeugte Fixes sicher? LLM und Agent Patches auf der SWE-Bench analysieren | AI - 具有安全性吗? 分析SWE-bench 上的LLM 和代理补丁 2507.02976v2 |
Authors (3): Amirali Sajadi, Kostadin Damevski, Preetha Chatterjee
Large Language Models (LLMs) and their agentic frameworks are increasingly adopted to automate software development tasks such as issue resolution and program repair. While prior work has identified security risks in LLM-generated code, most evaluations have focused on synthetic or isolated settings, leaving open questions about the security of these systems in real-world development contexts. In this study, we present the first large-scale security analysis of LLM-generated patches using 20,000+ issues from the SWE-bench dataset. We evaluate patches produced by a standalone LLM (Llama 3.3) and compare them to developer-written patches. We also assess the security of patches generated by three top-performing agentic frameworks (OpenHands, AutoCodeRover, HoneyComb) on a subset of our data. Finally, we analyze a wide range of code, issue, and project-level factors to understand the conditions under which LLMs and agents are most likely to generate insecure code. Our findings reveal that the standalone LLM introduces nearly 9x more new vulnerabilities than developers, with many of these exhibiting unique patterns not found in developers’ code. Agentic workflows also generate a significant number of vulnerabilities, particularly when granting LLMs more autonomy, potentially increasing the likelihood of misinterpreting project context or task requirements. We find that vulnerabilities are more likely to occur in LLM patches associated with a higher number of files, more lines of generated code, and GitHub issues that lack specific code snippets or information about the expected code behavior and steps to reproduce. These results suggest that contextual factors play a critical role in the security of the generated code and point toward the need for proactive risk assessment methods that account for both code and issue-level information to complement existing vulnerability detection tools.
大型语言模型(LLMS)及其代理框架日益被采用,以自动化软件开发任务,如问题解析和程序修补等。虽然先前的工作已经查明LLM生成代码的安全风险,但大多数评价侧重于合成或孤立的设置,留下了关于这些系统在现实世界开发背景下的安全的开放问题。在这项研究中,我们利用SWE-bench数据集的20,000个+问题对LLMS生成的补丁进行了首次大规模安全分析。我们评估了独立LM(Llama3.3)生成的补丁,并将其与开发者制作的补丁进行比较。我们还评估了三个顶级代理框架(OpenHands、AutoCodeRover、HoneComb)生成的安全风险风险风险。最后,我们分析了一系列广泛的代码、问题和项目级因素,以了解LWMS和代理商最有可能生成不安全代码的条件。我们发现,独立LMRM比开发者引入了近9x新的脆弱性,其中很多这些显示在开发者代码中无法找到的独特模式, 也显示在高级代码中可能增加磁带风险的路径。
Article 34
Title@2025-07-24 (4): The Moral Gap of Large Language Models
Title: The Moral Gap of Large Language Models | Die moralische Kluft großer Sprachmodelle | 大语言模式的道德差距 2507.18523v1 |
Authors (2): Maciej Skorski, Alina Landowska
Moral foundation detection is crucial for analyzing social discourse and developing ethically-aligned AI systems. While large language models excel across diverse tasks, their performance on specialized moral reasoning remains unclear. This study provides the first comprehensive comparison between state-of-the-art LLMs and fine-tuned transformers across Twitter and Reddit datasets using ROC, PR, and DET curve analysis. Results reveal substantial performance gaps, with LLMs exhibiting high false negative rates and systematic under-detection of moral content despite prompt engineering efforts. These findings demonstrate that task-specific fine-tuning remains superior to prompting for moral reasoning applications.
检测道德基础对于分析社会话语和发展符合道德要求的人工智能系统至关重要。虽然大型语言模型在各种任务中都非常出色,但它们在专门道德推理方面的表现仍然不明确。本研究报告首次全面比较了利用ROC、PR和DET曲线分析的Twitter和Reddit数据集中最先进的LMS和经精细调整的变压器。结果显示,业绩差距很大,尽管迅速进行了工程工作,但LOMS表现出很高的假负率和对道德内容的系统检测不足。这些调查结果表明,具体任务的微调仍然优于道德推理应用。
Article 35
Title@2025-07-24 (4): Optimal Transport Regularized Divergences: Application to Adversarial Robustness
Title: Optimal Transport Regularized Divergences: Application to Adversarial Robustness | Optimaler Transport Regularisierte Divergenzen: Anwendung auf widrige Robustheit | 优化运输 常规化差异:适用于逆向强力 2309.03791v3 |
Authors (2): Jeremiah Birrell, Reza Ebrahimi
We introduce a new class of optimal-transport-regularized divergences, $D^c$, constructed via an infimal convolution between an information divergence, $D$, and an optimal-transport (OT) cost, $C$, and study their use in distributionally robust optimization (DRO). In particular, we propose the $ARMOR_D$ methods as novel approaches to enhancing the adversarial robustness of deep learning models. These DRO-based methods are defined by minimizing the maximum expected loss over a $D^c$-neighborhood of the empirical distribution of the training data. Viewed as a tool for constructing adversarial samples, our method allows samples to be both transported, according to the OT cost, and re-weighted, according to the information divergence; the addition of a principled and dynamical adversarial re-weighting on top of adversarial sample transport is a key innovation of $ARMOR_D$. $ARMOR_D$ can be viewed as a generalization of the best-performing loss functions and OT costs in the adversarial training literature; we demonstrate this flexibility by using $ARMOR_D$ to augment the UDR, TRADES, and MART methods and obtain improved performance on CIFAR-10 and CIFAR-100 image recognition. Specifically, augmenting with $ARMOR_D$ leads to 1.9\% and 2.1\% improvement against AutoAttack, a powerful ensemble of adversarial attacks, on CIFAR-10 and CIFAR-100 respectively. To foster reproducibility, we made the code accessible at https://github.com/star-ailab/ARMOR.
我们引入了一个新的最佳运输规则差异类别,即$D美元,这是通过信息差异、美元和最佳运输成本(OT)成本($C美元)之间不成熟的混凝土建造的,目的是研究在分配上强力优化(DRO)中如何使用这些差异。我们特别建议采用美元-D美元的方法,作为加强深层次学习模式对抗性强力的新办法。这些基于DRO的方法的定义是通过最大限度地减少培训数据的经验性分布超过美元-美元-接近度的最大预期损失来界定的。作为建立对抗性样本的工具,我们的方法允许根据OT成本和最佳运输成本(OT)成本($C$),同时根据信息差异,将样品用于分配上强力优化(DROR)的优化(美元-D美元),在对抗性能强力学习模式上增加原则性和动态性对抗争力性重新加权。 $-AROR_D美元,在对抗性培训文献中将最佳表现损失功能和OT成本的概括化;我们通过使用AAR_RAR-RAR-RA的改进性攻击来展示这种灵活性,在100美元-RIR-IAR-IAR-IAR-IAR的升级上提高和不断提升。
Article 36
Title@2025-07-24 (4): GCC-Spam: Spam Detection via GAN, Contrastive Learning, and Character Similarity Networks
Title: GCC-Spam: Spam Detection via GAN, Contrastive Learning, and Character Similarity Networks | GCC-Spam: Spam-Erkennung über GAN, Kontrastives Lernen und Charaktergleichheitsnetzwerke | 海合会-Spam:通过全球大气监测网、反竞争学习和特征相似网络探测垃圾邮件 2507.14679v2 |
Authors (3): Zhijie Wang, Zixin Xu, Zhiyuan Pan
The exponential growth of spam text on the Internet necessitates robust detection mechanisms to mitigate risks such as information leakage and social instability. This work addresses two principal challenges: adversarial strategies employed by spammers and the scarcity of labeled data. We propose a novel spam-text detection framework GCC-Spam, which integrates three core innovations. First, a character similarity network captures orthographic and phonetic features to counter character-obfuscation attacks and furthermore produces sentence embeddings for downstream classification. Second, contrastive learning enhances discriminability by optimizing the latent-space distance between spam and normal texts. Third, a Generative Adversarial Network (GAN) generates realistic pseudo-spam samples to alleviate data scarcity while improving model robustness and classification accuracy. Extensive experiments on real-world datasets demonstrate that our model outperforms baseline approaches, achieving higher detection rates with significantly fewer labeled examples.
互联网上垃圾邮件文本的指数增长要求建立强有力的检测机制,以减少信息泄漏和社会不稳定等风险。这项工作解决了两个主要挑战:垃圾邮件使用的对抗策略和标签数据稀缺。我们提出了一个新的垃圾邮件文本检测框架(GCC-Spam),其中结合了三个核心创新。首先,特征相似网络捕捉了拼写和语音特征,以对抗字符模糊攻击,并进一步为下游分类提供了句子嵌入。第二,对比学习通过优化垃圾邮件与正常文本之间的潜空距离,增加了差异性。第三,基因自动网络(GAN)生成现实的假垃圾样本,以减轻数据稀缺,同时提高模型的稳健性和分类准确性。关于现实世界数据集的广泛实验表明,我们的模型比基线方法更符合基准要求,通过少几个标签实例实现更高的检测率。
Article 37
Title@2025-07-24 (4): Robust sensitivity control in digital pathology via tile score distribution matching
Title: Robust sensitivity control in digital pathology via tile score distribution matching | Robuste Sensitivitätskontrolle in der digitalen Pathologie über Kacheln-Score-Verteilungsabgleich | 通过瓷砖计分分布匹配对数字病理学中的强力敏感度控制 2502.20144v3 |
Authors (4): Arthur Pignet, John Klein, Genevieve Robin, Antoine Olivier
Deploying digital pathology models across medical centers is challenging due to distribution shifts. Recent advances in domain generalization improve model transferability in terms of aggregated performance measured by the Area Under Curve (AUC). However, clinical regulations often require to control the transferability of other metrics, such as prescribed sensitivity levels. We introduce a novel approach to control the sensitivity of whole slide image (WSI) classification models, based on optimal transport and Multiple Instance Learning (MIL). Validated across multiple cohorts and tasks, our method enables robust sensitivity control with only a handful of calibration samples, providing a practical solution for reliable deployment of computational pathology systems.
由于分布变化,在各医疗中心部署数字病理学模型具有挑战性,最近领域一般化的进展提高了模型在 “ 曲线下区域 “ (AUC)所测量的总体性能的可转让性,然而,临床条例往往要求控制其他指标的可转让性,例如规定的敏感度水平。我们采用了一种新办法,根据最佳交通和多例学习(MIL)控制整个幻灯片图像分类模型的灵敏性。经过对多个组群和任务进行验证,我们的方法只提供少量校准样本,从而能够实现稳健的灵敏控制,为可靠部署计算病理系统提供了切实可行的解决办法。
Article 38
Title@2025-07-24 (4): GLANCE: Graph Logic Attention Network with Cluster Enhancement for Heterophilous Graph Representation Learning
Title: GLANCE: Graph Logic Attention Network with Cluster Enhancement for Heterophilous Graph Representation Learning | GLANCE: Graph Logic Attention Network mit Cluster Enhancement für heterophiles Graph Representation Learning | 图表逻辑关注网络,通过群集增强混合图示代表性学习 2507.18521v1 |
Authors (5): Zhongtian Sun, Anoushka Harit, Alexandra Cristea, Christl A. Donnelly, Pietro Liò
Graph Neural Networks (GNNs) have demonstrated significant success in learning from graph-structured data but often struggle on heterophilous graphs, where connected nodes differ in features or class labels. This limitation arises from indiscriminate neighbor aggregation and insufficient incorporation of higher-order structural patterns. To address these challenges, we propose GLANCE (Graph Logic Attention Network with Cluster Enhancement), a novel framework that integrates logic-guided reasoning, dynamic graph refinement, and adaptive clustering to enhance graph representation learning. GLANCE combines a logic layer for interpretable and structured embeddings, multi-head attention-based edge pruning for denoising graph structures, and clustering mechanisms for capturing global patterns. Experimental results in benchmark datasets, including Cornell, Texas, and Wisconsin, demonstrate that GLANCE achieves competitive performance, offering robust and interpretable solutions for heterophilous graph scenarios. The proposed framework is lightweight, adaptable, and uniquely suited to the challenges of heterophilous graphs.
图表神经网络(GNNs)在从图表结构数据中学习方面表现出了巨大的成功,但往往在混合的图表上挣扎,因为相连接的节点在特征或类标签上各不相同。这一限制产生于相邻的任意聚合和高阶结构模式的整合不足。为了应对这些挑战,我们建议GLance(Graph逻辑关注网络与集群增强相结合),这是一个新颖的框架,将逻辑引导推理、动态图形完善和适应性集群结合起来,以加强图表代表性学习。GLance将可解释和结构化嵌入的逻辑层、对解密的图形结构的多头注意边缘划线和捕捉全球模式的集群机制结合起来。在包括康奈尔、得克萨斯和威斯康斯康斯在内的基准数据集中的实验结果表明,GLance取得了竞争性的绩效,为异性图表情景提供了强有力和可解释的解决方案。拟议的框架是轻量、适应性和独特的,适合超重的图表的挑战。
Article 39
Title@2025-07-24 (4): Euclidean Distance Deflation Under High-Dimensional Heteroskedastic Noise
Title: Euclidean Distance Deflation Under High-Dimensional Heteroskedastic Noise | Euklidische Distanz Deflation unter hochdimensionalen heteroskedastischen Geräuschen | 高多变性热电传噪声下的远距离通缩 2507.18520v1 |
Authors (3): Keyi Li, Yuval Kluger, Boris Landa
Pairwise Euclidean distance calculation is a fundamental step in many machine learning and data analysis algorithms. In real-world applications, however, these distances are frequently distorted by heteroskedastic noise$\unicode{x2014}$a prevalent form of inhomogeneous corruption characterized by variable noise magnitudes across data observations. Such noise inflates the computed distances in a nontrivial way, leading to misrepresentations of the underlying data geometry. In this work, we address the tasks of estimating the noise magnitudes per observation and correcting the pairwise Euclidean distances under heteroskedastic noise. Perhaps surprisingly, we show that in general high-dimensional settings and without assuming prior knowledge on the clean data structure or noise distribution, both tasks can be performed reliably, even when the noise levels vary considerably. Specifically, we develop a principled, hyperparameter-free approach that jointly estimates the noise magnitudes and corrects the distances. We provide theoretical guarantees for our approach, establishing probabilistic bounds on the estimation errors of both noise magnitudes and distances. These bounds, measured in the normalized $\ell_1$ norm, converge to zero at polynomial rates as both feature dimension and dataset size increase. Experiments on synthetic datasets demonstrate that our method accurately estimates distances in challenging regimes, significantly improving the robustness of subsequent distance-based computations. Notably, when applied to single-cell RNA sequencing data, our method yields noise magnitude estimates consistent with an established prototypical model, enabling accurate nearest neighbor identification that is fundamental to many downstream analyses.
帕西里德的距离计算是许多机器学习和数据分析算法中的一个基本步骤。 但是,在现实世界应用中,这些距离经常被偏心噪声的偏差扭曲。 但是,在现实世界应用中,这些偏差经常被一个普遍的不协调的腐败形式扭曲,其特点是数据观测中的杂音数量不一。这种噪音以非边际的方式使计算距离膨胀,导致对基本数据几何的误差。在这项工作中,我们处理的是估算每次观测的噪音数量和纠正在热心噪声下方的对等离差。也许令人惊讶的是,我们显示,在一般的高度环境中,在没有事先掌握关于清洁数据结构或噪音分布的知识的情况下,这两个任务都可以可靠地进行。具体地说,我们开发了一种有原则的、无偏差的方法,共同估计噪音数量并纠正了距离。 我们从理论上保证了我们的方法,在估算噪音数量和距离的准确度差下,在精确的精确度上,在精确的精确的精确度上,在精确的精确的精确度上,在精确的度上,在精确的精确的精确的精确的精确的精确的精确的精确的精确的精确的精确度上,数据比值上,在精确的精确的度上,在精确的精确的精确的精确的精确的精确的精确的精确的精确的精确的精确的精确的精确的精确的精确的精确的精确的精确的精确的精确的精确的精确度上,数据比值上,在精确的精确的精确的精确的精确的精确的精确的精确的精确的精确的精确的精确的精确的精确的精确的精确的精确的精确的精确的精确的精确的精确的精确的精确的精确的精确的精确的精确的精确的精确的精确的精确的精确的精确的精确的精确的精确的精确的精确的精确的精确的精确的精确的精确的精确的精确的精确的精确的精确的精确的精确的精确的精确的精确的精确的精确的精确的精确的精确的精确的精确的精确的精确的精确的精确的精确的精确的精确的精确的精确的精确的精确的精确的精确的精确的精确的精确的精确的精确的精确的精确的精确的精确的精确的精确的精确的精确的精确的精确的精确的精确的
Article 40
Title@2025-07-24 (4): Revisiting Bisimulation Metric for Robust Representations in Reinforcement Learning
Title: Revisiting Bisimulation Metric for Robust Representations in Reinforcement Learning | Revisiting Bisimulation Metric für robuste Darstellungen in Verstärkungs-Lernen | 重新研究强化学习中强力代表制的模拟比照模型 2507.18519v1 |
Authors (4): Leiji Zhang, Zeyu Wang, Xin Li, Yao-Hui Li
Bisimulation metric has long been regarded as an effective control-related representation learning technique in various reinforcement learning tasks. However, in this paper, we identify two main issues with the conventional bisimulation metric: 1) an inability to represent certain distinctive scenarios, and 2) a reliance on predefined weights for differences in rewards and subsequent states during recursive updates. We find that the first issue arises from an imprecise definition of the reward gap, whereas the second issue stems from overlooking the varying importance of reward difference and next-state distinctions across different training stages and task settings. To address these issues, by introducing a measure for state-action pairs, we propose a revised bisimulation metric that features a more precise definition of reward gap and novel update operators with adaptive coefficient. We also offer theoretical guarantees of convergence for our proposed metric and its improved representation distinctiveness. In addition to our rigorous theoretical analysis, we conduct extensive experiments on two representative benchmarks, DeepMind Control and Meta-World, demonstrating the effectiveness of our approach.
长期以来,在各种强化学习任务中,生物模拟衡量标准一直被视为一种有效的控制相关代表性学习技术。然而,在本文件中,我们确定了常规强化衡量标准中的两个主要问题:(1) 无法代表某些独特的情景;(2) 在循环更新过程中,依赖对奖赏和随后状态差异的预先界定的权重;我们发现,第一个问题产生于对奖赏差距的不精确定义,而第二个问题则产生于对不同培训阶段和任务环境之间奖赏差异和次州区别的不同重要性的忽视。为了解决这些问题,我们为国家行动对口引入了一项措施,我们提出了一个订正的优惠衡量标准,该标准对奖赏差距作了更精确的定义,并对适应系数的操作者作了新的更新。我们还从理论上保证我们提议的衡量标准及其更好的代表性特点趋于一致。除了我们严格的理论分析外,我们还对两个具有代表性的基准,即DeepMind控制和Meta-World进行了广泛的实验,以证明我们的方法的有效性。
Article 41
Title@2025-07-24 (4): Visual Adaptive Prompting for Compositional Zero-Shot Learning
Title: Visual Adaptive Prompting for Compositional Zero-Shot Learning | Visuelle Adaptive Prompting für kompositorisches Zero-Shot-Lernen | 零热学习的视觉适应性促进 2502.20292v6 |
Authors (4): Kyle Stein, Arash Mahyari, Guillermo Francia, Eman El-Sheikh
Vision-Language Models (VLMs) have demonstrated impressive multimodal capabilities in learning joint representations of visual and textual data, making them powerful tools for tasks such as Compositional Zero-Shot Learning (CZSL). CZSL requires models to generalize to novel combinations of visual primitives–such as attributes and objects–that were not explicitly encountered during training. Recent works in prompting for CZSL have focused on modifying inputs for the text encoder, often using static prompts that do not change across varying visual contexts. However, these approaches struggle to fully capture varying visual contexts, as they focus on text adaptation rather than leveraging visual features for compositional reasoning. To address this, we propose a Visual Adaptive Prompting System (VAPS) that leverages a learnable visual prompt repository and similarity-based retrieval mechanism within the framework of VLMs to bridge the gap between semantic and visual features. Our method introduces a dynamic visual prompt repository mechanism that selects the most relevant attribute and object prompts based on the visual features of the image. Our proposed system includes a visual prompt adapter that encourages the model to learn a more generalizable embedding space. Experiments on three CZSL benchmarks, across both closed and open-world scenarios, demonstrate state-of-the-art results.
视觉-语言模型(VLMS)展示了在学习视觉和文字数据联合表述方面令人印象深刻的多式联运能力,使这些模型成为了用于诸如成文零热学习(CZSL)等任务的强大工具。 CZSL要求模型能够概括成视觉原始(如属性和对象)的新组合,这些在培训过程中没有被明确遇到。最近为CZSL提供的促动工作侧重于修改文本编码器的投入,通常使用静态的提示器,而不会在不同视觉环境中发生变化。然而,这些方法努力充分捕捉不同的视觉环境,因为它们侧重于文字适应,而不是利用视觉特征进行构思。为此,我们提议了一个视觉适应激励系统(VAPS),在VLMS框架内利用可学的视觉快速存储器和类似基于功能的检索机制,以弥合语义和视觉特征之间的差距。我们的方法引入了一个动态的视觉快速存储器,根据图像的视觉特征选择最相关的属性和对象提示器。我们提议的系统包括一个直观的适应器,即即快速调整器,鼓励将视觉适应特性作为空间-SL模型的模型,以学习更加可扩展的三星级的模型。
Article 42
Title@2025-07-24 (4): A Transfer Learning-Based Method for Water Body Segmentation in Remote Sensing Imagery: A Case Study of the Zhada Tulin Area
Title: A Transfer Learning-Based Method for Water Body Segmentation in Remote Sensing Imagery: A Case Study of the Zhada Tulin Area | Eine Transfer-Lernmethode für die Segmentierung von Wasserkörpern in Fernerkundungsbildern: Eine Fallstudie des Zhada-Tulin-Gebiets | 遥感图像中水体分离的转让学习方法:Zhada Tulin地区的案例研究 2507.10084v2 |
Authors (2): Haonan Chen, Xin Tong
The Tibetan Plateau, known as the Asian Water Tower, faces significant water security challenges due to its high sensitivity to climate change. Advancing Earth observation for sustainable water monitoring is thus essential for building climate resilience in this region. This study proposes a two-stage transfer learning strategy using the SegFormer model to overcome domain shift and data scarcit–key barriers in developing robust AI for climate-sensitive applications. After pre-training on a diverse source domain, our model was fine-tuned for the arid Zhada Tulin area. Experimental results show a substantial performance boost: the Intersection over Union (IoU) for water body segmentation surged from 25.50% (direct transfer) to 64.84%. This AI-driven accuracy is crucial for disaster risk reduction, particularly in monitoring flash flood-prone systems. More importantly, the high-precision map reveals a highly concentrated spatial distribution of water, with over 80% of the water area confined to less than 20% of the river channel length. This quantitative finding provides crucial evidence for understanding hydrological processes and designing targeted water management and climate adaptation strategies. Our work thus demonstrates an effective technical solution for monitoring arid plateau regions and contributes to advancing AI-powered Earth observation for disaster preparedness in critical transboundary river headwaters.
西藏高原(称为亚洲水塔)因其对气候变化的高度敏感性而面临巨大的水安全挑战。因此,推进地球观测以促进可持续水监测对于建设该地区气候抗御能力至关重要。本研究报告提出使用SegFormer模型的两阶段转移学习战略,以克服在开发强大的气候敏感应用AI时的域转移和数据尖锐障碍。在对不同源域进行预先培训后,我们的模型对干旱的Zhada Tulin地区进行了精细调整。实验结果显示一个显著的性能提升:水体分解的跨联盟(IoU)从25.50%(直接转移)上升到64.84%。这一由AI驱动的精确性对于减少灾害风险至关重要,特别是在监测易发洪水系统方面。更重要的是,高精确度地图显示水的空间分布高度集中,80%以上的水区被限制在河道长度的20%以下。这一定量调查为了解水文进程和设计有针对性的水资源管理和适应气候变化战略提供了重要证据。因此,我们的工作展示了监测旱地高原观测的有效技术解决方案。
Article 43
Title@2025-07-24 (4): Sublinear Regret for a Class of Continuous-Time Linear-Quadratic Reinforcement Learning Problems
Title: Sublinear Regret for a Class of Continuous-Time Linear-Quadratic Reinforcement Learning Problems | Sublinearer Bedauern für eine Klasse von linear-Quadratischen Lernproblemen | 连续时线性强化学习问题分类的子线性遗憾 2407.17226v6 |
Authors (3): Yilie Huang, Yanwei Jia, Xun Yu Zhou
We study reinforcement learning (RL) for a class of continuous-time linear-quadratic (LQ) control problems for diffusions, where states are scalar-valued and running control rewards are absent but volatilities of the state processes depend on both state and control variables. We apply a model-free approach that relies neither on knowledge of model parameters nor on their estimations, and devise an RL algorithm to learn the optimal policy parameter directly. Our main contributions include the introduction of an exploration schedule and a regret analysis of the proposed algorithm. We provide the convergence rate of the policy parameter to the optimal one, and prove that the algorithm achieves a regret bound of $O(N^{\frac{3}{4}})$ up to a logarithmic factor, where $N$ is the number of learning episodes. We conduct a simulation study to validate the theoretical results and demonstrate the effectiveness and reliability of the proposed algorithm. We also perform numerical comparisons between our method and those of the recent model-based stochastic LQ RL studies adapted to the state- and control-dependent volatility setting, demonstrating a better performance of the former in terms of regret bounds.
我们研究的是一组连续时间线性赤道(LQ)控制扩散问题的强化学习(RL)问题,在这个类别中,国家是计算价值和运行控制奖励,但国家过程的波动取决于状态变量和控制变量。我们采用不依赖模型参数知识或不依赖其估计的模型方法,并设计了一种RL算法,直接学习最佳政策参数。我们的主要贡献包括采用探索时间表和对拟议算法的遗憾分析。我们为最佳的参数提供了政策参数的趋同率,并证明该算法达到了美元(Nfrac{344)至对数系数的遗憾度,而美元是学习事件的数量。我们进行了模拟研究,以验证理论结果,并展示拟议算法的有效性和可靠性。我们还对我们的方法和最近基于模型的LQRL研究方法进行了数字比较,这些方法与依赖状态和控制的波动设置相适应,显示了前一个数值的更好表现。
Article 44
Title@2025-07-24 (4): Masked Autoencoders that Feel the Heart: Unveiling Simplicity Bias for ECG Analyses
Title: Masked Autoencoders that Feel the Heart: Unveiling Simplicity Bias for ECG Analyses | Masked Autoencoder, die das Herz fühlen: Enthüllen Einfachheit Bias für EKG-Analysen | 感觉心脏的蒙面自动代码器:用于ECG分析的“永存的简单比” 2506.22495v2 |
Authors (5): He-Yang Xu, Hongxiang Gao, Yuwen Li, Xiu-Shen Wei, Chengyu Liu
The diagnostic value of electrocardiogram (ECG) lies in its dynamic characteristics, ranging from rhythm fluctuations to subtle waveform deformations that evolve across time and frequency domains. However, supervised ECG models tend to overfit dominant and repetitive patterns, overlooking fine-grained but clinically critical cues, a phenomenon known as Simplicity Bias (SB), where models favor easily learnable signals over subtle but informative ones. In this work, we first empirically demonstrate the presence of SB in ECG analyses and its negative impact on diagnostic performance, while simultaneously discovering that self-supervised learning (SSL) can alleviate it, providing a promising direction for tackling the bias. Following the SSL paradigm, we propose a novel method comprising two key components: 1) Temporal-Frequency aware Filters to capture temporal-frequency features reflecting the dynamic characteristics of ECG signals, and 2) building on this, Multi-Grained Prototype Reconstruction for coarse and fine representation learning across dual domains, further mitigating SB. To advance SSL in ECG analyses, we curate a large-scale multi-site ECG dataset with 1.53 million recordings from over 300 clinical centers. Experiments on three downstream tasks across six ECG datasets demonstrate that our method effectively reduces SB and achieves state-of-the-art performance. Code and dataset will be released publicly.
心电图(ECG)的诊断价值在于其动态特点,从节奏波动到不同时间和频率变化的微妙波形变形,从节奏波动到微妙的波形变形等不同时间和频率变化。然而,受监督的ECG模型往往过度适应支配性和重复性模式,忽略细微的但临床关键提示,这是一种被称为Simplicity Bias(SB)的现象,模型有利于容易学习的信号,而不是微妙但信息丰富的信号。在这项工作中,我们首先从经验上表明SB在ECG分析中的存在及其对诊断性能的负面影响,同时发现自我监督的学习(SSL)可以减轻这种变化,为纠正偏向提供有希望的方向。遵循SSL模式,我们提出了由两个关键组成部分组成的新颖方法:(1) 温度感知过滤器,以捕捉反映ECG信号动态特征的时频特征;(2) 以此为基础,多度模型型重建,以在双重领域进行粗度和微度代表制的学习,进一步减轻SB。为了推进ECG分析中的SSL,我们为克服偏向偏向偏向偏向偏向偏向的偏向的偏向方向提供了一个有希望的方向,为解决偏向偏向偏向偏向偏向方向的方向,为解决偏向偏向偏向偏向偏向偏向偏向偏向偏向方向。我们提出了一种新方向,我们提出了一种新方向,我们提出了一种由三点的ECG的ECG的ECG数据,我们提出了一种由三处的多位的多位的ECG的ECG数据,我们提出了一种由SG的跨三运行的多位置,从S53万分解的ECG数据解的实验性能、从SG数据,从SD-SD-SD-SD-SD-SD-SD-从SD-从S-SD-SD-SD-SD-SD-SD-SD-从SD-SD-SD-SD-SD-SD-SD-SD-SD-SD-SD-SD-SD-SD-SD-SD-SD-SD-SD-SD-SD-S-SD-SD-SD-SD-SD
Article 45
Title@2025-07-24 (4): Multi-Preference Lambda-weighted Listwise DPO for Small-Scale Model Alignment
Title: Multi-Preference Lambda-weighted Listwise DPO for Small-Scale Model Alignment | Multi-Preference Lambda-bewertet Listwise DPO für kleine Modellausrichtung | 用于小规模模型调整的多参数 Lambda加权列表DPO 2506.19780v5 |
Authors (5): Yuhui Sun, Xiyao Wang, Zixi Li, Zhenlong Yuan, Jinman Zhao
Large language models (LLMs) demonstrate strong generalization across a wide range of language tasks, but often generate outputs that misalign with human preferences. Reinforcement Learning from Human Feedback (RLHF) addresses this by optimizing models toward human preferences using a learned reward function and reinforcement learning, yielding improved alignment but suffering from high computational cost and instability. Direct Preference Optimization (DPO) simplifies the process by treating alignment as a classification task over binary preference pairs, reducing training overhead while achieving competitive performance. However, it assumes fixed, single-dimensional preferences and only supports pairwise supervision. To address these limitations, we propose Multi-Preference Lambda-weighted Listwise DPO, which allows the model to learn from more detailed human feedback and flexibly balance multiple goals such as helpfulness, honesty, and fluency. Our method models full-ranked preference distributions rather than binary comparisons, enabling more informative learning signals. The lambda vector controls the relative importance of different alignment goals, allowing the model to generalize across diverse human objectives. During inference, lambda can be adjusted without retraining, providing controllable alignment behavior for downstream use. We also introduce a learned scheduler that dynamically samples performant lambda configurations to improve robustness. Notably, our method requires only 20GB of GPU memory for training, making it suitable for compute-constrained settings such as academic labs, educational tools, or on-device assistants. Experiments on 1B-2B scale models show that our method consistently outperforms standard DPO on alignment benchmarks while enabling efficient, controllable, and fine-grained adaptation suitable for real-world deployment.
大型语言模型(LLMS) 展示了对多种语言任务进行高度概括化,但往往产生与人类偏好不相符的产出。 人类反馈强化学习(RLHF)通过利用学习奖励功能和强化学习优化人类偏好模式,从而实现更完善的对齐,但又受到高计算成本和不稳定的影响。 直接偏好优化(DPO) 简化了这一过程,将匹配视为二进制优惠配对的分类任务,减少了培训管理费用,同时实现了竞争性业绩。 但是,它采取了固定的、单维的偏好,并且只支持对称的监督。 为了应对这些限制,我们建议多Pread Lambda加权列表DPO(RLGHF)(RLHF)(RLTHF)(RE)(REG-PE) (REG) (REG) (R) (RU) (RU) (LLLM) (LTPO(R) (R) (RE) (LEG) (LEG) (LEG) , , , , 使模型能够从更详细的人类偏向人类偏向人类偏好, 和人类偏爱优化的模型学习到更优化的模型中学习, , , 并灵活地学习, 和灵活地学习模型学习, , , 并灵活地学习, , , , , 等的模型, 等多个的模型的模型, , 并使用更精确的模型学习, 等多重的模型, 等多的模型, 等, , , 并使用。
Article 46
Title@2025-07-24 (4): DualXDA: Towards Sparse, Efficient and Explainable Data Attribution in Large AI Models
Title: DualXDA: Towards Sparse, Efficient and Explainable Data Attribution in Large AI Models | DualXDA: Auf dem Weg zu sparsamen, effizienten und erklärbaren Datenzuweisungen in großen KI-Modellen | DUAXDA:在大型AI型模型中实现数据分散、高效和可解释的归属 2402.12118v2 |
Authors (5): Galip Ümit Yolcu, Moritz Weckbecker, Thomas Wiegand, Wojciech Samek, Sebastian Lapuschkin
Deep learning models achieve remarkable performance, yet their decision-making processes often remain opaque. In response, the field of eXplainable Artificial Intelligence (XAI) has grown significantly over the last decade, primarily focusing on feature attribution methods. Complementing this perspective, Data Attribution (DA) has emerged as a promising paradigm that shifts the focus from features to data provenance. However, existing DA approaches suffer from prohibitively high computational costs and memory demands. Additionally, current attribution methods exhibit low sparsity, hindering the discovery of decisive patterns in the data. We introduce DualXDA, a framework for sparse, efficient and explainable DA, comprised of two interlinked approaches for Dual Data Attribution (DualDA) and eXplainable Data Attribution (XDA): With DualDA, we propose efficient and effective DA, leveraging Support Vector Machine theory to provide fast and naturally sparse data attributions for AI predictions. We demonstrate that DualDA achieves high attribution quality, excels at solving a series of evaluated downstream tasks, while at the same time improving explanation time by a factor of up to 4,100,000$\times$ compared to the original Influence Functions method, and up to 11,000$\times$ compared to the method’s most efficient approximation from literature. We further introduce XDA, a method for enhancing Data Attribution with capabilities from feature attribution methods to explain why training samples are relevant for the prediction of a test sample in terms of impactful features. Taken together, our contributions in DualXDA ultimately point towards a future of eXplainable AI applied at unprecedented scale, enabling transparent, efficient and novel analysis of even the largest neural architectures fostering a new generation of accountable AI systems. Code at https://github.com/gumityolcu/DualXDA.
深度学习模式取得了显著的绩效,但其决策程序往往仍然不透明。作为回应,在过去十年中,电子XA(XAI)领域大幅增长,主要侧重于特征归属方法。补充这一视角,数据归属(DA)已经成为一个很有希望的范例,将重点从特征转向数据源。然而,现有的DA方法面临着过高的计算成本和记忆要求。此外,目前的归属方法表现出了低度的宽度,阻碍了数据中发现决定性模式的发现。我们引入了透明XDA(DaultX),这是一个稀释、高效和可解释的DA(DA)的框架,由两种相互关联的双重数据归属(DaultDA)和可扩展数据属性方法(XDADA):我们提出高效和有效的DA(DA)模式,利用支持Vector Mactor机理论为AI预测提供快速和自然分散的数据属性。我们证明DA达到了高度的归属质量,在解决一系列经评估的下游任务时优异度,同时改进解释时间为4100,000美元(etime),比原始的DA(DADA)数据属性分析成本分析要从原始的10000美元,比原始数据分析方法要更深入地提高一个比例, 数据数据分析方法到更进一步解释。
Article 47
Title@2025-07-24 (4): Not All Features Deserve Attention: Graph-Guided Dependency Learning for Tabular Data Generation with Language Models
Title: Not All Features Deserve Attention: Graph-Guided Dependency Learning for Tabular Data Generation with Language Models | Nicht alle Funktionen widmen sich der Aufmerksamkeit: Graphengeführtes Abhängigkeitslernen für tabellarische Datengenerierung mit Sprachmodellen | 并非所有值得注意的地物:用语言模型编制图表数据时的图表指导依赖性学习 2507.18504v1 |
Authors (4): Zheyu Zhang, Shuo Yang, Bardh Prenkaj, Gjergji Kasneci
Large Language Models (LLMs) have shown strong potential for tabular data generation by modeling textualized feature-value pairs. However, tabular data inherently exhibits sparse feature-level dependencies, where many feature interactions are structurally insignificant. This creates a fundamental mismatch as LLMs’ self-attention mechanism inevitably distributes focus across all pairs, diluting attention on critical relationships, particularly in datasets with complex dependencies or semantically ambiguous features. To address this limitation, we propose GraDe (Graph-Guided Dependency Learning), a novel method that explicitly integrates sparse dependency graphs into LLMs’ attention mechanism. GraDe employs a lightweight dynamic graph learning module guided by externally extracted functional dependencies, prioritizing key feature interactions while suppressing irrelevant ones. Our experiments across diverse real-world datasets demonstrate that GraDe outperforms existing LLM-based approaches by up to 12% on complex datasets while achieving competitive results with state-of-the-art approaches in synthetic data quality. Our method is minimally intrusive yet effective, offering a practical solution for structure-aware tabular data modeling with LLMs.
大型语言模型(LLMS)显示了通过模拟文本化特效对等生成表格式数据的巨大潜力。然而,表格数据本身显示的特征依赖性很少,许多特征互动在结构上是微不足道的。这造成了一种根本的不匹配,因为LLMS的自我注意机制不可避免地将重点分散到所有对等之间,分散了对关键关系的关注,特别是在具有复杂依赖性或语义模糊特征的数据集中。为解决这一局限性,我们提议Grade(Grade-Guid Dispidence Learning)(Grade)(Grade-Guid Destandings),这是一种新颖的方法,明确将稀少的依赖性图表纳入LLMS的注意机制。 Grade使用了一个由外部提取功能依赖性指导的轻量度动态图形学习模块,在抑制无关性数据的同时将关键特征互动置于优先位置。我们跨越各种现实世界数据集的实验表明,GraredDe 将现有的LMM方法比复杂的数据集高出12%,同时在合成数据质量中以最先进的方法取得竞争性的结果。我们的方法很少具有侵入性,但有效,为与LMLMSMSLM的表式数据模型提供实用的解决办法提供了实用的解决办法。
Article 48
Title@2025-07-24 (4): PLOT-TAL: Prompt Learning with Optimal Transport for Few-Shot Temporal Action Localization
Title: PLOT-TAL: Prompt Learning with Optimal Transport for Few-Shot Temporal Action Localization | PLOT-TAL: Schnell lernen mit optimalem Transport für temporale Aktionslokalisierung | PLOT-TAL: 以最优化交通方式迅速学习,促进少数时空行动地方化 2403.18915v2 |
Authors (2): Edward Fish, Andrew Gilbert
Few-shot temporal action localization (TAL) methods that adapt large models via single-prompt tuning often fail to produce precise temporal boundaries. This stems from the model learning a non-discriminative mean representation of an action from sparse data, which compromises generalization. We address this by proposing a new paradigm based on multi-prompt ensembles, where a set of diverse, learnable prompts for each action is encouraged to specialize on compositional sub-events. To enforce this specialization, we introduce PLOT-TAL, a framework that leverages Optimal Transport (OT) to find a globally optimal alignment between the prompt ensemble and the video’s temporal features. Our method establishes a new state-of-the-art on the challenging few-shot benchmarks of THUMOS’14 and EPIC-Kitchens, without requiring complex meta-learning. The significant performance gains, particularly at high IoU thresholds, validate our hypothesis and demonstrate the superiority of learning distributed, compositional representations for precise temporal localization.
通过单一即时调整来调整大型模型的微小时间行动本地化方法往往无法产生准确的时间界限。 这是因为模型学习了来自稀少数据的行动的非区别性平均值,从而会损害一般化。 我们通过提出基于多即时组合的新范式来解决这个问题,在多即时组合的基础上,鼓励对每项行动提供一套多样、可学习的提示,以专门研究构成的次活动。为了实施这一专业化,我们引入了PLOT-TAL,这是一个利用最佳交通(OT)来寻找及时组合和视频时间特征之间全球最佳一致的框架。我们的方法为THUMOS’14和ECIC-Kitchens的具有挑战性的几分基准确立了新的状态,而不需要复杂的元学习。显著的绩效收益,特别是在高IOU临界值上,验证了我们的假设,并展示了所分布的学习的优越性,为精确的时间本地化提供了构成演示。
Article 49
Title@2025-07-24 (4): EarthLink: A Self-Evolving AI Agent for Climate Science
Title: EarthLink: A Self-Evolving AI Agent for Climate Science | EarthLink: Ein sich selbst entwickelnder KI-Agent für Klimawissenschaften | EarthLink:一个自我发展的AI气候科学代理机构 2507.17311v2 |
Authors (17): Zijie Guo, Jiong Wang, Xiaoyu Yue, Wangxu Wei, Zhe Jiang, Wanghan Xu, Ben Fei, Wenlong Zhang, Xinyu Gu, Lijing Cheng, Jing-Jia Luo, Chao Li, Yaqiang Wang, Tao Chen, Wanli Ouyang, Fenghua Ling, Lei Bai
Modern Earth science is at an inflection point. The vast, fragmented, and complex nature of Earth system data, coupled with increasingly sophisticated analytical demands, creates a significant bottleneck for rapid scientific discovery. Here we introduce EarthLink, the first AI agent designed as an interactive copilot for Earth scientists. It automates the end-to-end research workflow, from planning and code generation to multi-scenario analysis. Unlike static diagnostic tools, EarthLink can learn from user interaction, continuously refining its capabilities through a dynamic feedback loop. We validated its performance on a number of core scientific tasks of climate change, ranging from model-observation comparisons to the diagnosis of complex phenomena. In a multi-expert evaluation, EarthLink produced scientifically sound analyses and demonstrated an analytical competency that was rated as comparable to specific aspects of a human junior researcher’s workflow. Additionally, its transparent, auditable workflows and natural language interface empower scientists to shift from laborious manual execution to strategic oversight and hypothesis generation. EarthLink marks a pivotal step towards an efficient, trustworthy, and collaborative paradigm for Earth system research in an era of accelerating global change. The system is accessible at our website https://earthlink.intern-ai.org.cn.
现代地球科学处于一个分流点。地球系统数据的巨大、分散和复杂性质,加上日益复杂的分析需求,为快速科学发现制造了巨大的瓶颈。在这里,我们引入了作为地球科学家互动共同试办的第一个AI代理物EarthLink。它自动化了从规划和代码生成到多情景分析的端到端研究工作流程。与静态诊断工具不同,EarthLink可以从用户互动中学习,通过动态反馈循环不断提高自身能力。我们验证了其在一系列气候变化核心科学任务方面的表现,从模型观测比较到复杂现象诊断。在多专家评估中,EarthLink进行了科学可靠的分析,并展示了与人类初级研究人员工作流程具体方面相比的分析能力。此外,它透明、可审计的工作流程和自然语言界面使科学家能够从劳动手工操作转向战略监督和假设生成。EarthLink标志着在加速全球变化的时代向地球系统研究的有效、可信赖和合作模式迈出的关键一步。这个系统可以在我们的网站 http://earsearlines.
Article 50
Title@2025-07-24 (4): Unsupervised Concept Drift Detection from Deep Learning Representations in Real-time
Title: Unsupervised Concept Drift Detection from Deep Learning Representations in Real-time | Unüberwachtes Konzept Drift Erkennung von Deep-Learning-Darstellungen in Echtzeit | 从实时深层学习代表中检测出 2406.17813v2 |
Authors (4): Salvatore Greco, Bartolomeo Vacchetti, Daniele Apiletti, Tania Cerquitelli
Concept drift is the phenomenon in which the underlying data distributions and statistical properties of a target domain change over time, leading to a degradation in model performance. Consequently, production models require continuous drift detection monitoring. Most drift detection methods to date are supervised, relying on ground-truth labels. However, they are inapplicable in many real-world scenarios, as true labels are often unavailable. Although recent efforts have proposed unsupervised drift detectors, many lack the accuracy required for reliable detection or are too computationally intensive for real-time use in high-dimensional, large-scale production environments. Moreover, they often fail to characterize or explain drift effectively. To address these limitations, we propose \textsc{DriftLens}, an unsupervised framework for real-time concept drift detection and characterization. Designed for deep learning classifiers handling unstructured data, \textsc{DriftLens} leverages distribution distances in deep learning representations to enable efficient and accurate detection. Additionally, it characterizes drift by analyzing and explaining its impact on each label. Our evaluation across classifiers and data-types demonstrates that \textsc{DriftLens} (i) outperforms previous methods in detecting drift in 15/17 use cases; (ii) runs at least 5 times faster; (iii) produces drift curves that align closely with actual drift (correlation $\geq!0.85$); (iv) effectively identifies representative drift samples as explanations.
概念的漂移是一种现象,即目标领域长期变化的基本数据分布和统计特性随时间而变化,导致模型性能的退化。因此,生产模型需要不断的漂移检测监测。迄今为止,大多数漂移检测方法都是依靠地面真相标签进行监督的。然而,这些方法在许多真实世界的情景中不适用,因为真实标签往往无法找到。虽然最近的努力提出了未经监督的漂移探测器,但许多人缺乏可靠检测所需的准确性,或者在高维、大型生产环境中实时使用时,计算速度过快。此外,它们往往无法有效地描述或解释漂移情况。为了应对这些限制,我们建议对迄今为止的漂移方法进行监督。对于实时概念漂移探测和定性而言,这是一个不受监督的框架。 设计用于处理非结构数据、 extsc{DriftrentL} 深度学习分布距离,以便能够高效率和准确的探测。此外,它通过分析和解释对每个标签的影响而具有漂移特征。我们进行的跨分类和数据类型评估表明,我们进行了跨级和数据类型评估,以15美元为最短的流流化分析; 精确的流动解释方式,用前的流流化方法比前的流解分析了15; 精确地分析了流学案例,比前的流化的流化,比前的流化,比前的流化的流分析了更快,比前的流化,比前的流化的流分析了更快。
Article 51
Title@2025-07-24 (4): Faithful, Interpretable Chest X-ray Diagnosis with Anti-Aliased B-cos Networks
Title: Faithful, Interpretable Chest X-ray Diagnosis with Anti-Aliased B-cos Networks | Treue, dolmetschbare Röntgendiagnose im Brustkorb mit Anti-Aliased-B-Cos-Netzwerken | 真实的、可解释的胸透透透透透透透透透透透透透透透析与反闭合的B子网络的诊断 2507.16761v2 |
Authors (3): Marcel Kleinmann, Shashank Agnihotri, Margret Keuper
Faithfulness and interpretability are essential for deploying deep neural networks (DNNs) in safety-critical domains such as medical imaging. B-cos networks offer a promising solution by replacing standard linear layers with a weight-input alignment mechanism, producing inherently interpretable, class-specific explanations without post-hoc methods. While maintaining diagnostic performance competitive with state-of-the-art DNNs, standard B-cos models suffer from severe aliasing artifacts in their explanation maps, making them unsuitable for clinical use where clarity is essential. In this work, we address these limitations by introducing anti-aliasing strategies using FLCPooling (FLC) and BlurPool (BP) to significantly improve explanation quality. Our experiments on chest X-ray datasets demonstrate that the modified $\text{B-cos}\text{FLC}$ and $\text{B-cos}\text{BP}$ preserve strong predictive performance while providing faithful and artifact-free explanations suitable for clinical application in multi-class and multi-label settings. Code available at: GitHub repository (url: https://github.com/mkleinma/B-cos-medical-paper).
在医疗成像等安全关键领域部署深神经网络(DNN)至关重要。B-cos网络以加权投入校准机制取代标准线性层,提供了很有希望的解决方案,用加权投入校准机制取代标准线性层,产生固有的可解释和类别解释,而没有热后的方法。标准B-cos模型在保持与最先进的DNNs具有竞争力的诊断性能的同时,在解释图中还存在严重的化名文物,使其不适于临床使用,在明确性至关重要的地方不适于临床使用。在这项工作中,我们通过采用使用FLCPooling(FLC)和Blurrpool(BBP)来消除这些局限性,以大幅提高解释质量。我们在胸前X射线数据集的实验表明,经过修改的$\text{B-colt{FLC}$和$\text{B-cos_colt{B_B_BBBPP}保持强有力的预测性能,同时提供忠实和无产物解释,适合在多级和多标签环境中临床应用。在GitHub 存放处(url:http://gismaphy-ma-ma-mabs-mabs-com-com-com/kins-mainpaperper)的代码。
Article 52
Title@2025-07-24 (4): DriftMoE: A Mixture of Experts Approach to Handle Concept Drifts
Title: DriftMoE: A Mixture of Experts Approach to Handle Concept Drifts | DriftMoE: Eine Mischung aus Experten Ansatz zum Umgang mit Konzept Drifts | DriftMoE:处理 “ 漂流概念 “ 的混合专家办法 2507.18464v1 |
Authors (4): Miguel Aspis, Sebastián A. Cajas Ordónez, Andrés L. Suárez-Cetrulo, Ricardo Simón Carbajo
Learning from non-stationary data streams subject to concept drift requires models that can adapt on-the-fly while remaining resource-efficient. Existing adaptive ensemble methods often rely on coarse-grained adaptation mechanisms or simple voting schemes that fail to optimally leverage specialized knowledge. This paper introduces DriftMoE, an online Mixture-of-Experts (MoE) architecture that addresses these limitations through a novel co-training framework. DriftMoE features a compact neural router that is co-trained alongside a pool of incremental Hoeffding tree experts. The key innovation lies in a symbiotic learning loop that enables expert specialization: the router selects the most suitable expert for prediction, the relevant experts update incrementally with the true label, and the router refines its parameters using a multi-hot correctness mask that reinforces every accurate expert. This feedback loop provides the router with a clear training signal while accelerating expert specialization. We evaluate DriftMoE’s performance across nine state-of-the-art data stream learning benchmarks spanning abrupt, gradual, and real-world drifts testing two distinct configurations: one where experts specialize on data regimes (multi-class variant), and another where they focus on single-class specialization (task-based variant). Our results demonstrate that DriftMoE achieves competitive results with state-of-the-art stream learning adaptive ensembles, offering a principled and efficient approach to concept drift adaptation. All code, data pipelines, and reproducibility scripts are available in our public GitHub repository: https://github.com/miguel-ceadar/drift-moe.
从非静止数据流中学习,并受到概念漂移的影响,这就需要一些模型,这些模型可以在飞行中适应,同时保持资源效率。现有的适应性混合方法往往依赖粗糙的调整机制或无法优化利用专门知识的简单投票计划。本文介绍了DriftMoE, 这是一种在线的在线专家混合模型(MoE)架构,通过一个新的原则培训框架解决这些限制。 DriftMoE 具有一个紧凑的神经路由器,该路由器与一组渐进式的Heffeffing树专家共同培训。关键创新在于一个能让专家专业化的共生学习循环:路由器选择最合适的专家进行预测,相关专家以真正的标签逐步更新,路由器改进参数,使用一个能强化每位准确专家的多热校正面掩码。这个反馈循环为路由器提供了一个清晰的培训信号,同时加快了专家专业化。我们评估了9个州的DriftMoE 数据流方法的绩效,在快速、渐进、真实和真实式的学习基准上,从而测试了我们两个不同版本的变式的Slimomomoveal Gal 概念:一个专家展示了另一个的Sildal-dal-dal-dal-dal-dal-ex-drodu略数据,从而展示了我们又展示了另一个的变换了另一个了另一个的版本数据。
Article 53
Title@2025-07-24 (4): Restoring Rhythm: Punctuation Restoration Using Transformer Models for Bangla, a Low-Resource Language
Title: Restoring Rhythm: Punctuation Restoration Using Transformer Models for Bangla, a Low-Resource Language | Wiederherstellung des Rhythmus: Pünktlichkeitsrestaurierung mit Transformer-Modellen für Bangla, eine Sprache mit geringer Ressource | 恢复时速:使用孟加拉国低资源语言 “ 孟加拉 “ 变压器模型恢复脉冲 2507.18448v1 |
Authors (4): Md Obyedullahil Mamun, Md Adyelullahil Mamun, Arif Ahmad, Md. Imran Hossain Emu
Punctuation restoration enhances the readability of text and is critical for post-processing tasks in Automatic Speech Recognition (ASR), especially for low-resource languages like Bangla. In this study, we explore the application of transformer-based models, specifically XLM-RoBERTa-large, to automatically restore punctuation in unpunctuated Bangla text. We focus on predicting four punctuation marks: period, comma, question mark, and exclamation mark across diverse text domains. To address the scarcity of annotated resources, we constructed a large, varied training corpus and applied data augmentation techniques. Our best-performing model, trained with an augmentation factor of alpha = 0.20%, achieves an accuracy of 97.1% on the News test set, 91.2% on the Reference set, and 90.2% on the ASR set. Results show strong generalization to reference and ASR transcripts, demonstrating the model’s effectiveness in real-world, noisy scenarios. This work establishes a strong baseline for Bangla punctuation restoration and contributes publicly available datasets and code to support future research in low-resource NLP.
标点恢复会提高文字的可读性,对于自动语音识别(ASR)中的后处理任务至关重要,特别是对孟加拉语等低资源语言而言。在本研究中,我们探索了以变压器为基础的模型的应用,特别是XLM-ROBERTAUG型,以自动恢复未标点的孟加拉语文本中的标点。我们侧重于预测四个标点:时期、逗号、问题标记和在不同文本域的感光标记。为了解决附加说明的资源稀缺的问题,我们建立了一个庞大的、多样的训练资料库和应用的数据增强技术。我们最优秀的模型,以阿尔法=0.20%的扩增因数来培训,在新闻测试集上实现了97.1%的准确度,在参考集上实现了91.2%的准确度,在ASR集上实现了90.2%的准确度。结果显示参考和ASR记录有很强的概括性,展示了模型在现实世界中的效能,噪音情景。这项工作为Bangla punctuation 恢复奠定了一个强大的基线,并且为公众可获取的数据设置和代码以支持未来低资源NPPROP的研究提供了支持未来研究。
Article 54
Title@2025-07-24 (4): Outcome-Based Online Reinforcement Learning: Algorithms and Fundamental Limits
Title: Outcome-Based Online Reinforcement Learning: Algorithms and Fundamental Limits | Ergebnisbasiertes Online-Verstärkungslernen: Algorithmen und grundlegende Grenzen | 基于成果的在线强化学习:等级和基本限制 2505.20268v2 |
Authors (4): Fan Chen, Zeyu Jia, Alexander Rakhlin, Tengyang Xie
Reinforcement learning with outcome-based feedback faces a fundamental challenge: when rewards are only observed at trajectory endpoints, how do we assign credit to the right actions? This paper provides the first comprehensive analysis of this problem in online RL with general function approximation. We develop a provably sample-efficient algorithm achieving $\widetilde{O}({C_{\rm cov} H^3}/{\epsilon^2})$ sample complexity, where $C_{\rm cov}$ is the coverability coefficient of the underlying MDP. By leveraging general function approximation, our approach works effectively in large or infinite state spaces where tabular methods fail, requiring only that value functions and reward functions can be represented by appropriate function classes. Our results also characterize when outcome-based feedback is statistically separated from per-step rewards, revealing an unavoidable exponential separation for certain MDPs. For deterministic MDPs, we show how to eliminate the completeness assumption, dramatically simplifying the algorithm. We further extend our approach to preference-based feedback settings, proving that equivalent statistical efficiency can be achieved even under more limited information. Together, these results constitute a theoretical foundation for understanding the statistical properties of outcome-based reinforcement learning.
以基于结果的反馈加强学习面临一个根本的挑战:当奖励只出现在轨迹端点上时,我们如何为正确的行动分配信用?本文件首次全面分析在线 RL 中的这一问题,同时提供一般功能近似。我们开发了一个可以想象的样本效率算法,实现美元全局化{O}({Crm cov}H}3}/ / ipsilon}2}美元样本复杂性,其中$Crm cov}是基本 MDP的可覆盖系数。我们通过利用一般功能近似法,我们的方法在表格方法失败的大或无限的状态空间中有效发挥作用,只要求适当功能类别代表价值功能和奖励功能。我们的结果还体现在基于结果的反馈在统计上与每一步的回报分开,揭示某些 MDP 不可避免的指数分离。关于确定性 MDP,我们展示了如何消除完整性假设,大大简化了算法。我们进一步将我们的方法扩大到基于优惠的反馈环境,证明即使在有限的信息之下也可以实现同等的统计效率。这些结果构成了一个理论基础,以了解基于统计结果的特性。
Article 55
Title@2025-07-24 (4): IPCGRL: Language-Instructed Reinforcement Learning for Procedural Level Generation
Title: IPCGRL: Language-Instructed Reinforcement Learning for Procedural Level Generation | IPCGRL: Sprachgestütztes Verstärkungslernen für die verfahrenstechnische Level-Generierung | ICPCGRL: 程序生成阶段语言教学强化学习 2503.12358v4 |
Authors (5): In-Chang Baek, Sung-Hyun Kim, Seo-Young Lee, Dong-Hyeon Kim, Kyung-Joong Kim
Recent research has highlighted the significance of natural language in enhancing the controllability of generative models. While various efforts have been made to leverage natural language for content generation, research on deep reinforcement learning (DRL) agents utilizing text-based instructions for procedural content generation remains limited. In this paper, we propose IPCGRL, an instruction-based procedural content generation method via reinforcement learning, which incorporates a sentence embedding model. IPCGRL fine-tunes task-specific embedding representations to effectively compress game-level conditions. We evaluate IPCGRL in a two-dimensional level generation task and compare its performance with a general-purpose embedding method. The results indicate that IPCGRL achieves up to a 21.4% improvement in controllability and a 17.2% improvement in generalizability for unseen instructions. Furthermore, the proposed method extends the modality of conditional input, enabling a more flexible and expressive interaction framework for procedural content generation.
最近的研究突出了自然语言在加强基因模型的可控性方面的重要性。虽然作出了各种努力来利用自然语言来生成内容,但利用基于文本的指示来生成程序内容的深度强化学习(DRL)代理物的研究仍然有限。在本文件中,我们建议采用基于指令的程序性内容生成方法IPCGRL, 这是一种通过强化学习产生程序内容的方法,其中包括一个包含句子的嵌入模式。ICPCGRL 微调具体任务嵌入代表物,以有效压缩游戏级条件。我们评估了二维级生成任务中的IPCGRL, 并将其性能与通用嵌入方法进行比较。结果显示,IPCGRL在控制性方面实现了21.4%的改进,在对无形指令的一般性方面实现了17.2%的改进。此外,拟议方法扩大了有条件投入的方式,为程序内容生成提供了更加灵活和明确的互动框架。
Article 56
Title@2025-07-24 (4): NLML-HPE: Head Pose Estimation with Limited Data via Manifold Learning
Title: NLML-HPE: Head Pose Estimation with Limited Data via Manifold Learning | NLML-HPE: Kopfhosenschätzung mit begrenzten Daten über Manifold Learning | NLML-HPE:通过人工学习用有限数据进行测算的负责人 2507.18429v1 |
Authors (2): Mahdi Ghafourian, Federico M. Sukno
Head pose estimation (HPE) plays a critical role in various computer vision applications such as human-computer interaction and facial recognition. In this paper, we propose a novel deep learning approach for head pose estimation with limited training data via non-linear manifold learning called NLML-HPE. This method is based on the combination of tensor decomposition (i.e., Tucker decomposition) and feed forward neural networks. Unlike traditional classification-based approaches, our method formulates head pose estimation as a regression problem, mapping input landmarks into a continuous representation of pose angles. To this end, our method uses tensor decomposition to split each Euler angle (yaw, pitch, roll) to separate subspaces and models each dimension of the underlying manifold as a cosine curve. We address two key challenges: 1. Almost all HPE datasets suffer from incorrect and inaccurate pose annotations. Hence, we generated a precise and consistent 2D head pose dataset for our training set by rotating 3D head models for a fixed set of poses and rendering the corresponding 2D images. 2. We achieved real-time performance with limited training data as our method accurately captures the nature of rotation of an object from facial landmarks. Once the underlying manifold for rotation around each axis is learned, the model is very fast in predicting unseen data. Our training and testing code is available online along with our trained models: https: //github.com/MahdiGhafoorian/NLML_HPE.
头形估计( HHPE) 在各种计算机视觉应用( 如人机互动和面部识别) 中发挥着关键作用。 在本文中, 我们提出一种新的深层次头部估计方法, 以非线性多元学习( NLML- HPE) 的有限培训数据来进行深度估计。 这种方法基于高压分解( 塔克分解) 和前方神经网络的组合。 与传统的基于分类的方法不同, 我们的方法将头部表示成一个回归问题, 将输入的标志绘制成一个连续的显示面孔。 为此, 我们的方法使用 将每个尤拉角度( yaw、 pitch、 roll) 的分解成一个深度的深度的深度估算, 将基础元件的每个层面的子空间和模型作为组合曲线曲线。 我们面对两大挑战: 1. 几乎所有 HPE 数据集都受到不正确和不准确的外形说明。 因此, 我们制作了一个精确和一致的 2D 头部构成数据集数据集, 通过将3D 头模型转换成固定的组合/ 并制作相应的 2D 图像。 2. 目标。 2. 我们实现了实时的模拟的模拟模型, 模拟的模型运行运行, 我们的模拟的模拟数据是一次根据我们所学会所学的 所学会所学会所学会所学会所学会所学会所学会所学会所学会所学会所学会所学的 所学的 所学的精确的 所学的 。
Article 57
Title@2025-07-24 (4): How do language models learn facts? Dynamics, curricula and hallucinations
Title: How do language models learn facts? Dynamics, curricula and hallucinations | Wie lernen Sprachmodelle Fakten? Dynamik, Lehrpläne und Halluzinationen | 语言模式如何了解事实?动态、课程和幻觉 2503.21676v2 |
Authors (6): Nicolas Zucchet, Jörg Bornschein, Stephanie Chan, Andrew Lampinen, Razvan Pascanu, Soham De
Large language models accumulate vast knowledge during pre-training, yet the dynamics governing this acquisition remain poorly understood. This work investigates the learning dynamics of language models on a synthetic factual recall task, uncovering three key findings: First, language models learn in three phases, exhibiting a performance plateau before acquiring precise factual knowledge. Mechanistically, this plateau coincides with the formation of attention-based circuits that support recall. Second, the training data distribution significantly impacts learning dynamics, as imbalanced distributions lead to shorter plateaus. Finally, hallucinations emerge simultaneously with knowledge, and integrating new knowledge into the model through fine-tuning is challenging, as it quickly corrupts its existing parametric memories. Our results emphasize the importance of data distribution in knowledge acquisition and suggest novel data scheduling strategies to accelerate neural network training.
大型语言模型在培训前积累了大量知识,但是关于这一获取的动态仍然不甚了解。这项工作调查了语言模型在合成事实回顾任务方面的学习动态,发现了三个主要结论:首先,语言模型分三个阶段学习,在获得准确事实知识之前展示一个性能高原。机械上,这一高原与形成支持回顾的基于关注的电路相吻合。第二,培训数据分布对学习动态产生重大影响,因为分布不平衡导致高原缩短。最后,幻觉与知识同时出现,通过微调将新知识纳入模型具有挑战性,因为它迅速腐蚀了其现有的准光学记忆。我们的结果强调在获取知识方面进行数据分配的重要性,并建议新的数据列表战略,以加快神经网络培训。
Article 58
Title@2025-07-24 (4): Multi-Model Ensemble and Reservoir Computing for River Discharge Prediction in Ungauged Basins
Title: Multi-Model Ensemble and Reservoir Computing for River Discharge Prediction in Ungauged Basins | Multi-Model-Ensemble und Reservoir Computing für Flussentladungsvorhersage in ungespurten Becken | 多模型组合和储量计算,用于未排出盆地的河流排泄预测 2507.18423v1 |
Authors (2): Mizuki Funato, Yohei Sawada
Despite the critical need for accurate flood prediction and water management, many regions lack sufficient river discharge observations, limiting the skill of rainfall-runoff analyses. Although numerous physically based and machine learning models exist, achieving high accuracy, interpretability, and computational efficiency under data-scarce conditions remains a major challenge. We address this challenge with a novel method, HYdrological Prediction with multi-model Ensemble and Reservoir computing (HYPER) that leverages multi-model ensemble and reservoir computing (RC). Our approach first applies Bayesian model averaging (BMA) to 43 “uncalibrated” catchment-based conceptual hydrological models. An RC model is then trained via linear regression to correct errors in the BMA output, a non-iterative process that ensures high computational efficiency. For ungauged basins, we infer the required BMA and RC weights by linking them to catchment attributes from gauged basins, creating a generalizable framework. We evaluated HYPER using data from 87 river basins in Japan. In a data-rich scenario, HYPER (median Kling-Gupta Efficiency, KGE, of 0.56) performed comparably to a benchmark LSTM (KGE 0.55) but required only 5% of its computational time. In a data-scarce scenario (23% of basins gauged), HYPER maintained robust performance (KGE 0.55) and lower uncertainty, whereas the LSTM’s performance degraded significantly (KGE -0.04). These results reveal that individual conceptual hydrological models do not necessarily need to be calibrated when an effectively large ensemble is assembled and combined with machine-learning-based bias correction. HYPER provides a robust, efficient, and generalizable solution for discharge prediction, particularly in ungauged basins, making it applicable to a wide range of regions.
尽管迫切需要准确的洪水预测和水管理,但许多区域都缺乏足够的河流排放观测,限制了降雨流分析的技能。尽管存在许多基于物理和机器的学习模型,但在数据偏差条件下,实现高度准确性、可解释性和计算效率仍然是一个重大挑战。我们用一种新颖的方法来应对这一挑战,即利用多模型混合和储水计算(HYPER)来利用多模类堆积和储油计算(RC)。我们的方法首先将Bayesian模型平均值(BMA)应用到43个“未经校准”的集水系概念水文模型。然后通过线性回归来培训RC模型,以纠正BMA输出中的错误,这是一个确保高计算效率的非显示过程。对于盆地来说,我们通过将所需的BMA和RC重量与测量盆地的集水分属性联系起来,从而创建一个可概括化的框架。我们用87个河流流域的直径流数据,我们用其直径直的直径直径直径直径直径直的流模型来评估 HYPER。在一种数据丰富的情况下,HYPER(M-medial-GUnial-GUT-GOD-GE-GY-GY-G-GY-GY-GY-GY-GY-GY-GILGY-GY-LGY_BD-LS-LS-C-LS-C-GS-GS-GS-GS-C-GY-C-GY-G-G-GM-GM-G-G-GM-M-GY-G-G-GM-GIS-GIS-GIS-GIS-GY-GIS-GIS-GIS-GIS-GIS-GIS-GIS-GIS-GIS-G-G-G-G-G-G-G-G-G-G-G-GD-GIS-GD-GD-GD-GIS-N-G-G-G-G-G-GIS-N-G-G-G-G-G-G-G-G-G-G-G-GIS-M-GIS-M-G-G-G-G-G-G-G
Article 59
Title@2025-07-24 (4): Residual Prior-driven Frequency-aware Network for Image Fusion
Title: Residual Prior-driven Frequency-aware Network for Image Fusion | Residual Prior-driven Frequency-aware Netzwerk für Bild-Fusion | 图像融合超前驱动频率感知网络 2507.06735v2 |
Authors (5): Guan Zheng, Xue Wang, Wenhua Qian, Peng Liu, Runzhuo Ma
Image fusion aims to integrate complementary information across modalities to generate high-quality fused images, thereby enhancing the performance of high-level vision tasks. While global spatial modeling mechanisms show promising results, constructing long-range feature dependencies in the spatial domain incurs substantial computational costs. Additionally, the absence of ground-truth exacerbates the difficulty of capturing complementary features effectively. To tackle these challenges, we propose a Residual Prior-driven Frequency-aware Network, termed as RPFNet. Specifically, RPFNet employs a dual-branch feature extraction framework: the Residual Prior Module (RPM) extracts modality-specific difference information from residual maps, thereby providing complementary priors for fusion; the Frequency Domain Fusion Module (FDFM) achieves efficient global feature modeling and integration through frequency-domain convolution. Additionally, the Cross Promotion Module (CPM) enhances the synergistic perception of local details and global structures through bidirectional feature interaction. During training, we incorporate an auxiliary decoder and saliency structure loss to strengthen the model’s sensitivity to modality-specific differences. Furthermore, a combination of adaptive weight-based frequency contrastive loss and SSIM loss effectively constrains the solution space, facilitating the joint capture of local details and global features while ensuring the retention of complementary information. Extensive experiments validate the fusion performance of RPFNet, which effectively integrates discriminative features, enhances texture details and salient objects, and can effectively facilitate the deployment of the high-level vision task.
虽然全球空间建模机制显示了有希望的成果,但在空间领域建立远程特征依赖性需要大量计算费用。此外,缺乏地面真相加剧了有效获取互补特征的难度。为了应对这些挑战,我们提议建立一个被称为RPFNet的遗留的先前驱动频率感知网络。具体地说,RPFNet采用一个双部门特征提取框架:遗留式前模块(RPM)从剩余地图中提取特定模式差异信息,从而为整合提供补充性前程;频率多曼融合模块(DFFFM)通过频度-持续演进实现高效的全球特征建模和整合。此外,交叉促进模块(CPM)通过双向特征互动,增强对地方细节和全球结构的协同认识。具体特征提取框架(RPFNet)具体地框架(RPM)采用双部门性脱色和突出结构损失,以加强模型对特定模式差异的敏感性。此外,基于适应性重力的重力定位模型(DMFFM)模块(DFFFM)实现了高效的全球特征建模化和高频度(RFM)的升级,同时有效推进了地方定位(RBI)的升级(RBL)升级(L)和高压(LLL)的升级)系统(LI)系统(LLLLL)系统(L)的升级)系统(LV)的升级)等调制导制导制导制导制导。
Article 60
Title@2025-07-24 (4): FinDPO: Financial Sentiment Analysis for Algorithmic Trading through Preference Optimization of LLMs
Title: FinDPO: Financial Sentiment Analysis for Algorithmic Trading through Preference Optimization of LLMs | FinDPO: Finanz-Sentiment-Analyse für algorithmischen Handel durch Preference-Optimierung von LLMs | FinDPO:通过优惠优化LLMs,分析通过高利贷交易的金融敏感度 2507.18417v1 |
Authors (3): Giorgos Iacovides, Wuyang Zhou, Danilo Mandic
Opinions expressed in online finance-related textual data are having an increasingly profound impact on trading decisions and market movements. This trend highlights the vital role of sentiment analysis as a tool for quantifying the nature and strength of such opinions. With the rapid development of Generative AI (GenAI), supervised fine-tuned (SFT) large language models (LLMs) have become the de facto standard for financial sentiment analysis. However, the SFT paradigm can lead to memorization of the training data and often fails to generalize to unseen samples. This is a critical limitation in financial domains, where models must adapt to previously unobserved events and the nuanced, domain-specific language of finance. To this end, we introduce FinDPO, the first finance-specific LLM framework based on post-training human preference alignment via Direct Preference Optimization (DPO). The proposed FinDPO achieves state-of-the-art performance on standard sentiment classification benchmarks, outperforming existing supervised fine-tuned models by 11% on the average. Uniquely, the FinDPO framework enables the integration of a fine-tuned causal LLM into realistic portfolio strategies through a novel ‘logit-to-score’ conversion, which transforms discrete sentiment predictions into continuous, rankable sentiment scores (probabilities). In this way, simulations demonstrate that FinDPO is the first sentiment-based approach to maintain substantial positive returns of 67% annually and strong risk-adjusted performance, as indicated by a Sharpe ratio of 2.0, even under realistic transaction costs of 5 basis points (bps).
在线金融相关文本数据表达的意见正在对贸易决策和市场流动产生日益深刻的影响。这一趋势凸显了情绪分析作为量化这类意见的性质和力度的工具的重要作用。随着GenAI(GenAI)的快速发展,监管的微调(SFT)大型语言模型(LLMS)已成为金融情绪分析的实际标准。然而,SFT模式可能导致培训数据的记忆化,而且往往无法向看不见的样本推广。这是金融领域的一个严重局限性,在这方面,模型必须适应以前未曾观察到的不透明事件以及微调的、特定领域的金融语言。为此,我们引入了FinDPO,这是以培训后的人优惠调整为基础的第一个针对具体财务的LMM框架。拟议的FINDPO在标准情绪分类基准上达到最先进的业绩,在平均上比现有的受监管的微调模式高出11%。 金融领域最强的FinDPO框架能够将一个精确调整的、甚至精确的LLMM(甚至精确的LM ) 纳入到真实的快速的货币组合战略之中,通过新式的货币级的货币级的汇率,以持续地显示不断的货币级的货币级的汇率基础。
Article 61
Title@2025-07-24 (4): Iwin Transformer: Hierarchical Vision Transformer using Interleaved Windows
Title: Iwin Transformer: Hierarchical Vision Transformer using Interleaved Windows | Iwin Transformer: Hierarchische Vision Transformer mit Interleaved Windows | Iwin 变换器: 使用内部视窗的等级愿景变换器 2507.18405v1 |
Authors (2): Simin Huo, Ning Li
We introduce Iwin Transformer, a novel position-embedding-free hierarchical vision transformer, which can be fine-tuned directly from low to high resolution, through the collaboration of innovative interleaved window attention and depthwise separable convolution. This approach uses attention to connect distant tokens and applies convolution to link neighboring tokens, enabling global information exchange within a single module, overcoming Swin Transformer’s limitation of requiring two consecutive blocks to approximate global attention. Extensive experiments on visual benchmarks demonstrate that Iwin Transformer exhibits strong competitiveness in tasks such as image classification (87.4 top-1 accuracy on ImageNet-1K), semantic segmentation and video action recognition. We also validate the effectiveness of the core component in Iwin as a standalone module that can seamlessly replace the self-attention module in class-conditional image generation. The concepts and methods introduced by the Iwin Transformer have the potential to inspire future research, like Iwin 3D Attention in video generation. The code and models are available at https://github.com/cominder/Iwin-Transformer.
我们引入了伊温变形器,这是一个新型的无位置组合式高层次视觉变形器,可以通过创新的左间窗口关注和深度分离的分解组合协作,从低分辨率到高分辨率直接进行微调。这个方法利用注意力将远处的象征物连接起来,并运用卷土重来将相邻的象征物连接起来,使全球信息交流在一个模块内得以实现,克服了斯温变形器要求连续两个区块接近全球关注的局限性。关于视觉基准的广泛实验表明,伊温变形器在图像分类(图像Net-1K上87.4最高一级精确度)、语义分解和视频动作识别等任务中表现出很强的竞争力。我们还验证了伊温核心部分作为独立模块的有效性,该模块可以无缝地取代在等级条件图像生成中的自我注意模块。伊温变形器引入的概念和方法有可能激发未来的研究,如视频生成中的Iwin 3D注意。代码和模型可以在https://github.com/comimmer/Iwin-Transforent中找到。
Article 62
Title@2025-07-24 (4): CLEAR: Error Analysis via LLM-as-a-Judge Made Easy
Title: CLEAR: Error Analysis via LLM-as-a-Judge Made Easy | CLEAR: Fehleranalyse über LLM-as-a-Judge leicht gemacht | CLLEAR:通过LLM-as-a法官进行错误分析 2507.18392v1 |
Authors (5): Asaf Yehudai, Lilach Eden, Yotam Perlitz, Roy Bar-Haim, Michal Shmueli-Scheuer
The evaluation of Large Language Models (LLMs) increasingly relies on other LLMs acting as judges. However, current evaluation paradigms typically yield a single score or ranking, answering which model is better but not why. While essential for benchmarking, these top-level scores obscure the specific, actionable reasons behind a model’s performance. To bridge this gap, we introduce CLEAR, an interactive, open-source package for LLM-based error analysis. CLEAR first generates per-instance textual feedback, then it creates a set of system-level error issues, and quantifies the prevalence of each identified issue. Our package also provides users with an interactive dashboard that allows for a comprehensive error analysis through aggregate visualizations, applies interactive filters to isolate specific issues or score ranges, and drills down to the individual instances that exemplify a particular behavioral pattern. We demonstrate CLEAR analysis for RAG and Math benchmarks, and showcase its utility through a user case study.
对大语言模型的评价越来越依赖作为法官的其他LLMs。然而,目前的评价模式通常产生单一的评分或排名,回答哪个模型更好,而不是原因。这些顶级评分虽然对于基准衡量至关重要,但却模糊了模型性能背后的具体、可操作的原因。为了缩小这一差距,我们引入了CLEAR,这是一个用于基于LLM的错误分析的互动式、开放源包。CLEAR首先生成了每份文字反馈,然后生成了一套系统级错误问题,并量化了每个问题的普遍性。我们的软件包还为用户提供了一个互动的仪表板,允许通过综合可视化来进行全面的错误分析,应用互动过滤器来孤立具体问题或得分范围,并钻探出能够体现特定行为模式的单个实例。我们展示了对RAG和数学基准的CLEAR分析,并通过用户案例研究展示其实用性。
Article 63
Title@2025-07-24 (4): A Comprehensive Review of Diffusion Models in Smart Agriculture: Progress, Applications, and Challenges
Title: A Comprehensive Review of Diffusion Models in Smart Agriculture: Progress, Applications, and Challenges | Eine umfassende Überprüfung von Difffusionsmodellen in der intelligenten Landwirtschaft: Fortschritt, Anwendungen und Herausforderungen | 全面审查 “ 智能农业传播模式:进展、应用和挑战 “ 2507.18376v1 |
Authors (9): Xing Hua, Haodong Chen, Qianqian Duan, Danfeng Hong, Ruijiao Li, Huiliang Shang, Linghua Jiang, Haima Yang, Dawei Zhang
With the global population growing and arable land resources becoming increasingly scarce,smart agriculture and precision agriculture have emerged as key directions for the future ofagricultural development.Artificial intelligence (AI) technologies, particularly deep learning models, have found widespread applications in areas such as crop monitoring and pest detection. As an emerging generative model, diffusion models have shown significant promise in tasks like agricultural image processing, data augmentation, and remote sensing. Compared to traditional generative adversarial networks (GANs), diffusion models offer superior training stability and generation quality, effectively addressing challenges such as limited agricultural data and imbalanced image samples. This paper reviews the latest advancements in the application of diffusion models in agriculture, focusing on their potential in crop pest and disease detection, remote sensing image enhancement, crop growth prediction, and agricultural resource management. Experimental results demonstrate that diffusion models significantly improve model accuracy and robustness in data augmentation, image generation, and denoising, especially in complex environments. Despite challenges related to computational efficiency and generalization capabilities, diffusion models are expected to play an increasingly important role in smart and precision agriculture as technology advances, providing substantial support for the sustainable development of global agriculture.
随着全球人口增长和可耕地资源日益稀缺,智能农业和精准农业已成为农业发展未来的关键方向。人工智能技术,特别是深层学习模式,在作物监测和虫害检测等领域广泛应用。作为一种新兴的基因化模式,推广模式在农业图像处理、数据增强和遥感等任务方面显示出巨大的希望。与传统的基因对抗网络相比,推广模式提供了较高的培训稳定性和生产质量,有效地应对了农业数据有限和图像样本不平衡等挑战。本文件回顾了农业推广模式应用的最新进展,重点是其在作物虫害和疾病检测、遥感图像增强、作物生长预测和农业资源管理方面的潜力。实验结果表明,推广模式极大地提高了模型在数据增强、图像生成和淡化方面的准确性和稳健性,特别是在复杂环境中。尽管在计算效率和普遍化能力方面存在挑战,但随着技术进步,推广模式可望在智能和精准农业方面发挥越来越重要的作用,为全球农业的可持续发展提供大量支持。
Article 64
Title@2025-07-24 (4): On Reconstructing Training Data From Bayesian Posteriors and Trained Models
Title: On Reconstructing Training Data From Bayesian Posteriors and Trained Models | Über die Wiederherstellung von Trainingsdaten aus Bayesischen Nachbildungen und ausgebildeten Modellen | Bayesian Posides和经过培训的模型的培训数据重建 2507.18372v1 |
Authors (1): George Wynne
Publicly releasing the specification of a model with its trained parameters means an adversary can attempt to reconstruct information about the training data via training data reconstruction attacks, a major vulnerability of modern machine learning methods. This paper makes three primary contributions: establishing a mathematical framework to express the problem, characterising the features of the training data that are vulnerable via a maximum mean discrepancy equivalance and outlining a score matching framework for reconstructing data in both Bayesian and non-Bayesian models, the former is a first in the literature.
公开发布具有经过培训的参数的模型规格意味着对手可以通过培训数据重建攻击来尝试重建有关培训数据的信息,这是现代机器学习方法的主要弱点,本文作出了三项主要贡献:建立一个数学框架来表达问题,说明通过最大平均差异等值而脆弱的培训数据的特点,并概述一个在巴伊西亚和非巴伊西亚模式中重建数据的得分匹配框架,前者是文献中的第一个。
Article 65
Title@2025-07-24 (4): Efficient Uncertainty in LLMs through Evidential Knowledge Distillation
Title: Efficient Uncertainty in LLMs through Evidential Knowledge Distillation | Effiziente Unsicherheit in LLMs durch Evidential Knowledge Destillation | 通过证据知识蒸馏在LLMs中提高效能的不确定性 2507.18366v1 |
Authors (3): Lakshmana Sri Harsha Nemani, P. K. Srijith, Tomasz Kuśmierczyk
Accurate uncertainty quantification remains a key challenge for standard LLMs, prompting the adoption of Bayesian and ensemble-based methods. However, such methods typically necessitate computationally expensive sampling, involving multiple forward passes to effectively estimate predictive uncertainty. In this paper, we introduce a novel approach enabling efficient and effective uncertainty estimation in LLMs without sacrificing performance. Specifically, we distill uncertainty-aware teacher models - originally requiring multiple forward passes - into compact student models sharing the same architecture but fine-tuned using Low-Rank Adaptation (LoRA). We compare two distinct distillation strategies: one in which the student employs traditional softmax-based outputs, and another in which the student leverages Dirichlet-distributed outputs to explicitly model epistemic uncertainty via evidential learning. Empirical evaluations on classification datasets demonstrate that such students can achieve comparable or superior predictive and uncertainty quantification performance relative to their teacher models, while critically requiring only a single forward pass. To our knowledge, this is the first demonstration that immediate and robust uncertainty quantification can be achieved in LLMs through evidential distillation.
准确的不确定性量化仍然是标准LLM公司面临的一个关键挑战,促使采用Bayesian和共同法方法,但这种方法通常需要计算昂贵的抽样,涉及多个前方传票,以有效估计预测不确定性。在本文件中,我们引入了一种新颖的方法,使LLMs能够有效和高效地估算不确定性,而不会牺牲业绩。具体地说,我们将具有不确定性的教师模型(最初需要多个前方传票)提炼为具有相同架构但使用低Rank适应(Lora)进行微调的紧凑学生模型。我们比较了两种截然不同的蒸馏战略:一种是学生使用传统的软麦基产出,另一种是学生利用分散派生产出,通过证据性学习明确模拟缩略图的不确定性。关于分类数据集的实证评估表明,这些学生能够比其教师模型取得可比或更高水平的预测和不确定性量化业绩,同时只需要一个前方传票。据我们了解,这是第一个通过证据蒸馏在LMS公司实现直接和稳健的不确定性量化的证明。
Article 66
Title@2025-07-24 (4): Leveraging the Structure of Medical Data for Improved Representation Learning
Title: Leveraging the Structure of Medical Data for Improved Representation Learning | Nutzung der Struktur medizinischer Daten für ein verbessertes Repräsentationslernen | 利用医疗数据结构改进代表性学习 2507.02987v3 |
Authors (10): Andrea Agostini, Sonia Laguna, Alain Ryser, Samuel Ruiperez-Campillo, Moritz Vandenhirtz, Nicolas Deperrois, Farhad Nooralahzadeh, Michael Krauthammer, Thomas M. Sutter, Julia E. Vogt
Building generalizable medical AI systems requires pretraining strategies that are data-efficient and domain-aware. Unlike internet-scale corpora, clinical datasets such as MIMIC-CXR offer limited image counts and scarce annotations, but exhibit rich internal structure through multi-view imaging. We propose a self-supervised framework that leverages the inherent structure of medical datasets. Specifically, we treat paired chest X-rays (i.e., frontal and lateral views) as natural positive pairs, learning to reconstruct each view from sparse patches while aligning their latent embeddings. Our method requires no textual supervision and produces informative representations. Evaluated on MIMIC-CXR, we show strong performance compared to supervised objectives and baselines being trained without leveraging structure. This work provides a lightweight, modality-agnostic blueprint for domain-specific pretraining where data is structured but scarce
与互联网规模公司不同,像MIMIC-CXR这样的临床数据集提供了有限的图像计数和稀缺的注释,但通过多视图成像呈现出丰富的内部结构。我们提议了一个自我监督的框架,利用医疗数据集的固有结构。具体地说,我们把配对的胸腔X光片(即前视和横向观)作为自然正面对子处理,学习从稀疏的片段重建每一面,同时调整其潜伏。我们的方法不需要文字监督,并产生信息说明。对MIMIC-CXR的评价显示,与监督的目标和基线相比,我们表现良好,在没有杠杆结构的情况下,正在培训。这项工作为数据结构化但稀缺的领域特定培训提供了一种轻量、模式-通用的蓝图。
Article 67
Title@2025-07-24 (4): Latent Space Alignment for AI-Native MIMO Semantic Communications
Title: Latent Space Alignment for AI-Native MIMO Semantic Communications | Latent Space Alignment für KI-Native MIMO Semantische Kommunikation | 用于AI-Native MIMO语义通信的 远程空间对齐 2507.16680v2 |
Authors (4): Mario Edoardo Pandolfo, Simone Fiorellino, Emilio Calvanese Strinati, Paolo Di Lorenzo
Semantic communications focus on prioritizing the understanding of the meaning behind transmitted data and ensuring the successful completion of tasks that motivate the exchange of information. However, when devices rely on different languages, logic, or internal representations, semantic mismatches may occur, potentially hindering mutual understanding. This paper introduces a novel approach to addressing latent space misalignment in semantic communications, exploiting multiple-input multiple-output (MIMO) communications. Specifically, our method learns a MIMO precoder/decoder pair that jointly performs latent space compression and semantic channel equalization, mitigating both semantic mismatches and physical channel impairments. We explore two solutions: (i) a linear model, optimized by solving a biconvex optimization problem via the alternating direction method of multipliers (ADMM); (ii) a neural network-based model, which learns semantic MIMO precoder/decoder under transmission power budget and complexity constraints. Numerical results demonstrate the effectiveness of the proposed approach in a goal-oriented semantic communication scenario, illustrating the main trade-offs between accuracy, communication burden, and complexity of the solutions.
语义通信侧重于优先了解传输数据的含义,并确保成功完成促进信息交流的任务;然而,当装置依赖不同语言、逻辑或内部表现时,可能会出现语义错配,从而可能妨碍相互理解;本文件提出了解决语义通信中潜在空间错配的新办法,利用多投入多输出通信(MIMO)通信;具体地说,我们的方法学习了MIMO前编码器/分解器对配方,该对方共同进行潜伏空间压缩和语义频道平衡,减少语义错配和物理频道缺陷;我们探讨两种解决办法:(一)线性模型,通过乘数交替方向方法解决双孔优化问题,优化其优化;(二) 以神经网络为基础的模型,在传输能力预算和复杂性制约下学习语义 MIMO前编码器/分解码器;数字结果表明拟议方法在面向目标的语义通信情景中的有效性,说明了准确性、通信负担和解决方案的复杂性之间的主要交易。
Article 68
Title@2025-07-24 (4): Tiny is not small enough: High-quality, low-resource facial animation models through hybrid knowledge distillation
Title: Tiny is not small enough: High-quality, low-resource facial animation models through hybrid knowledge distillation | Tiny ist nicht klein genug: Hochwertige, ressourcenarme Gesichtsanimationsmodelle durch Hybrid-Wissensdestillation | 微小不够小:通过混合知识蒸馏,建立高质量、资源低的面部动画模型。 2507.18352v1 |
Authors (4): Zhen Han, Mattias Teye, Derek Yadgaroff, Judith Bütepage
The training of high-quality, robust machine learning models for speech-driven 3D facial animation requires a large, diverse dataset of high-quality audio-animation pairs. To overcome the lack of such a dataset, recent work has introduced large pre-trained speech encoders that are robust to variations in the input audio and, therefore, enable the facial animation model to generalize across speakers, audio quality, and languages. However, the resulting facial animation models are prohibitively large and lend themselves only to offline inference on a dedicated machine. In this work, we explore on-device, real-time facial animation models in the context of game development. We overcome the lack of large datasets by using hybrid knowledge distillation with pseudo-labeling. Given a large audio dataset, we employ a high-performing teacher model to train very small student models. In contrast to the pre-trained speech encoders, our student models only consist of convolutional and fully-connected layers, removing the need for attention context or recurrent updates. In our experiments, we demonstrate that we can reduce the memory footprint to up to 3.4 MB and required future audio context to up to 81 ms while maintaining high-quality animations. This paves the way for on-device inference, an important step towards realistic, model-driven digital characters.
高品质、强健的3D面部动画机培训高质量、强健的3D面部动画机学习模型的培训需要大量、多样的高质量声动组合数据集。为了克服缺少这种数据集的问题,最近的工作引进了大型预先训练的语音解说器,这些解说器对输入音频的变异非常强大,因此使得面部动画模型能够对发言者、音质和语言进行概括化。然而,由此产生的面部动画模型大得令人望而却步,只能对专用机器进行离线推断。在这项工作中,我们探索了在游戏开发过程中的高级、实时面部动画模型。我们通过使用假贴标签的混合知识蒸馏,克服了大型数据集的缺乏。鉴于大量的音频数据集,我们采用了高水平的教师模型来培训非常小的学生模型。与经过预先训练的语音解说模型相比,我们的学生模型只能包括进化和完全连接的层层,从而消除关注环境或经常性更新的需要。在我们的实验中,我们证明我们可以将记忆的足迹缩到3.4MB和未来的高音感动版,同时需要向81级的高度的动向方向。
Article 69
Title@2025-07-24 (4): Low-rank adaptive physics-informed HyperDeepONets for solving differential equations
Title: Low-rank adaptive physics-informed HyperDeepONets for solving differential equations | Low-rank adaptive Physik-informiert HyperDeepONets zur Lösung von Differentialgleichungen | 用于解决差别方程的低级别适应性物理知情高超深电联 2507.18346v1 |
Authors (3): Etienne Zeudong, Elsa Cardoso-Bihlo, Alex Bihlo
HyperDeepONets were introduced in Lee, Cho and Hwang [ICLR, 2023] as an alternative architecture for operator learning, in which a hypernetwork generates the weights for the trunk net of a DeepONet. While this improves expressivity, it incurs high memory and computational costs due to the large number of output parameters required. In this work we introduce, in the physics-informed machine learning setting, a variation, PI-LoRA-HyperDeepONets, which leverage low-rank adaptation (LoRA) to reduce complexity by decomposing the hypernetwork’s output layer weight matrix into two smaller low-rank matrices. This reduces the number of trainable parameters while introducing an extra regularization of the trunk networks’ weights. Through extensive experiments on both ordinary and partial differential equations we show that PI-LoRA-HyperDeepONets achieve up to 70\% reduction in parameters and consistently outperform regular HyperDeepONets in terms of predictive accuracy and generalization.
Lee、Cho和Hwang[ICLR, 2023]作为操作员学习的替代结构,在Lee、Cho和Hwang[ICLR, 2023]中引入了超网络,作为操作员学习的替代结构,在其中,一个超网络生成了DeepONet中干线网网的重量网的重量网的重量网。虽然这提高了表达性,但由于需要大量的产出参数,它产生较高的内存和计算成本。在这项工作中,我们在物理学知情的机器学习环境中引入了一种变异,即PI-LORA-HyperDepeepONets(PI-LORA-HyperDeepONets),通过将超网络产出层重量矩阵分解成两个较小的低级矩阵来降低复杂性。这减少了可训练参数的数量,同时对干线网的重量作了额外的规范。通过对普通和局部的差别等式的广泛实验,我们表明PI-LORA-HyperDepepONets在预测准确性和概括性方面实现了70,并持续超过常规的超高端-DepeepONets。
Article 70
Title@2025-07-24 (4): Remembering the Markov Property in Cooperative MARL
Title: Remembering the Markov Property in Cooperative MARL | Erinnerung an das Markov-Grundstück in der Genossenschaft MARL | 记得马尔科夫在MARL合作社中的财产 2507.18333v1 |
Authors (5): Kale-ab Abebe Tessera, Leonard Hinckeldey, Riccardo Zamboni, David Abel, Amos Storkey
Cooperative multi-agent reinforcement learning (MARL) is typically formalised as a Decentralised Partially Observable Markov Decision Process (Dec-POMDP), where agents must reason about the environment and other agents’ behaviour. In practice, current model-free MARL algorithms use simple recurrent function approximators to address the challenge of reasoning about others using partial information. In this position paper, we argue that the empirical success of these methods is not due to effective Markov signal recovery, but rather to learning simple conventions that bypass environment observations and memory. Through a targeted case study, we show that co-adapting agents can learn brittle conventions, which then fail when partnered with non-adaptive agents. Crucially, the same models can learn grounded policies when the task design necessitates it, revealing that the issue is not a fundamental limitation of the learning models but a failure of the benchmark design. Our analysis also suggests that modern MARL environments may not adequately test the core assumptions of Dec-POMDPs. We therefore advocate for new cooperative environments built upon two core principles: (1) behaviours grounded in observations and (2) memory-based reasoning about other agents, ensuring success requires genuine skill rather than fragile, co-adapted agreements.
合作性多试剂强化学习(MARL)通常被正规化为分散化部分可观测的马尔科夫决定程序(Dec-POMDP),代理商必须了解环境和其他代理商的行为。实际上,目前的无模型的MARL算法使用简单的经常性功能相近器来应对对使用部分信息的其他人进行推理的挑战。在本立场文件中,我们争辩说,这些方法的成功经验不是由于有效的Markov信号恢复,而是因为学习绕过环境观测和记忆的简单公约。我们通过有针对性的案例研究,表明共同适应的代理商可以学习易碎的公约,而当与非适应剂合作时,这些公约就会失败。至关重要的是,在任务设计需要时,同样的模型可以学习基于基础的政策,表明这个问题不是学习模式的基本限制,而是基准设计失败。我们的分析还表明,现代的MARL环境可能无法充分测试Dec-POMDPs的核心假设。我们因此倡导基于两个核心原则的新的合作环境:(1)基于观察的行为和(2)基于记忆的推理,而不是基于其他脆弱代理商的真正成功的技能。
Article 71
Title@2025-07-24 (4): Hierarchical Dimensionless Learning (Hi-π): A physics-data hybrid-driven approach for discovering dimensionless parameter combinations
Title: Hierarchical Dimensionless Learning (Hi-π): A physics-data hybrid-driven approach for discovering dimensionless parameter combinations | Hierarchisches dimensionsloses Lernen (Hi-π): Ein physik-data-hybridgetriebener Ansatz zur Entdeckung dimensionsloser Parameterkombinationen | 高层次无尺寸学习(Hi-):物理学-数据混合驱动的发现无尺寸参数组合的物理-数据混合法 2507.18332v1 |
Authors (3): Mingkun Xia, Haitao Lin, Weiwei Zhang
Dimensional analysis provides a universal framework for reducing physical complexity and reveal inherent laws. However, its application to high-dimensional systems still generates redundant dimensionless parameters, making it challenging to establish physically meaningful descriptions. Here, we introduce Hierarchical Dimensionless Learning (Hi-{\pi}), a physics-data hybrid-driven method that combines dimensional analysis and symbolic regression to automatically discover key dimensionless parameter combination(s). We applied this method to classic examples in various research fields of fluid mechanics. For the Rayleigh-B'enard convection, this method accurately extracted two intrinsic dimensionless parameters: the Rayleigh number and the Prandtl number, validating its unified representation advantage across multiscale data. For the viscous flows in a circular pipe, the method automatically discovers two optimal dimensionless parameters: the Reynolds number and relative roughness, achieving a balance between accuracy and complexity. For the compressibility correction in subsonic flow, the method effectively extracts the classic compressibility correction formulation, while demonstrating its capability to discover hierarchical structural expressions through optimal parameter transformations.
度量分析为降低物理复杂性和揭示固有法则提供了一个通用框架。 但是,它在高维系统中的应用仍然产生多余的无维参数,使得建立具有物理意义的描述变得具有挑战性。 在这里,我们引入了一种物理学-数据混合回归法(Hi- ipi}),它结合了维度分析和符号回归,自动发现关键的无维参数组合。我们将这种方法应用到流体力学不同研究领域的典型例子中。对于Rayleigh-B'enard对流体,这种方法准确地提取了两个内在的无维参数:Rayleigh 和 Prandtl 参数,在多尺度数据中验证了它的统一代表优势。对于圆柱形管流的表面,该方法自动发现了两个最佳的无维度参数:Reynolds 数字和相对粗度,在精度和复杂度之间取得平衡。 对于亚声学流的可容性校正,该方法有效地提取了典型的压缩校正校正校正公式,同时展示其通过最佳参数转换发现等级结构表达的能力。
Article 72
Title@2025-07-24 (4): GVCCS: A Dataset for Contrail Identification and Tracking on Visible Whole Sky Camera Sequences
Title: GVCCS: A Dataset for Contrail Identification and Tracking on Visible Whole Sky Camera Sequences | GVCCS: Ein Datensatz zur kontrailen Identifizierung und Verfolgung sichtbarer Ganzhimmel-Kamerasequenzen | GVCSCS:一个用于识别和跟踪可见全天相摄像机序列的可视全天相摄像头的对照识别和跟踪数据集 2507.18330v1 |
Authors (5): Gabriel Jarry, Ramon Dalmau, Philippe Very, Franck Ballerini, Stephania-Denisa Bocu
Aviation’s climate impact includes not only CO2 emissions but also significant non-CO2 effects, especially from contrails. These ice clouds can alter Earth’s radiative balance, potentially rivaling the warming effect of aviation CO2. Physics-based models provide useful estimates of contrail formation and climate impact, but their accuracy depends heavily on the quality of atmospheric input data and on assumptions used to represent complex processes like ice particle formation and humidity-driven persistence. Observational data from remote sensors, such as satellites and ground cameras, could be used to validate and calibrate these models. However, existing datasets don’t explore all aspect of contrail dynamics and formation: they typically lack temporal tracking, and do not attribute contrails to their source flights. To address these limitations, we present the Ground Visible Camera Contrail Sequences (GVCCS), a new open data set of contrails recorded with a ground-based all-sky camera in the visible range. Each contrail is individually labeled and tracked over time, allowing a detailed analysis of its lifecycle. The dataset contains 122 video sequences (24,228 frames) and includes flight identifiers for contrails that form above the camera. As reference, we also propose a unified deep learning framework for contrail analysis using a panoptic segmentation model that performs semantic segmentation (contrail pixel identification), instance segmentation (individual contrail separation), and temporal tracking in a single architecture. By providing high-quality, temporally resolved annotations and a benchmark for model evaluation, our work supports improved contrail monitoring and will facilitate better calibration of physical models. This sets the groundwork for more accurate climate impact understanding and assessments.
航空的气候影响不仅包括CO2排放,而且还包括重要的非CO2效应,特别是来自天体的气候效应。这些冰云可以改变地球的辐射平衡,可能与航空CO2的升温效应相对应。基于物理的模型提供了对天体形成和气候影响的有用估计,但其准确性在很大程度上取决于大气输入数据的质量以及用于代表冰粒形成和湿驱动的持久性等复杂过程的假设。来自远程传感器的观测数据,例如卫星和地面照相机,可以用来验证和校准这些模型。然而,现有的数据集无法探索天体动态和形成的所有方面:它们通常缺乏时间跟踪,并且不会将天体对地体与天体的对比与气候影响挂钩。为了应对这些局限性,我们展示了地面可见的可变相相相相相相相相对的相近相近相近相近相近的相近相近相近的相近相近的相近数据组。每套天体模型都可以单独贴标签和追踪,从而详细分析其寿命周期。数据集包含122个视频序列(24,28框架),它们通常没有时间跟踪,我们用直径径径比的轨的轨路路路路路路路路路路段来分析。
Article 73
Title@2025-07-24 (4): Position: An Empirically Grounded Identifiability Theory Will Accelerate Self-Supervised Learning Research
Title: Position: An Empirically Grounded Identifiability Theory Will Accelerate Self-Supervised Learning Research | Position: Eine empirisch begründete Identifizierbarkeitstheorie beschleunigt die selbstüberwachte Lernforschung | 职位: 以活性基础的可识别性理论将加速自我监督学习研究 2504.13101v3 |
Authors (4): Patrik Reizinger, Randall Balestriero, David Klindt, Wieland Brendel
Self-Supervised Learning (SSL) powers many current AI systems. As research interest and investment grow, the SSL design space continues to expand. The Platonic view of SSL, following the Platonic Representation Hypothesis (PRH), suggests that despite different methods and engineering approaches, all representations converge to the same Platonic ideal. However, this phenomenon lacks precise theoretical explanation. By synthesizing evidence from Identifiability Theory (IT), we show that the PRH can emerge in SSL. However, current IT cannot explain SSL’s empirical success. To bridge the gap between theory and practice, we propose expanding IT into what we term Singular Identifiability Theory (SITh), a broader theoretical framework encompassing the entire SSL pipeline. SITh would allow deeper insights into the implicit data assumptions in SSL and advance the field towards learning more interpretable and generalizable representations. We highlight three critical directions for future research: 1) training dynamics and convergence properties of SSL; 2) the impact of finite samples, batch size, and data diversity; and 3) the role of inductive biases in architecture, augmentations, initialization schemes, and optimizers.
由于研究兴趣和投资的增长,SSL设计空间继续扩大。根据Platonic Expresentive Depositions(PRH),SSL的平流视图显示,尽管采用不同的方法和工程方法,所有表达方式都汇合到相同的Paltonic理想。然而,这种现象缺乏准确的理论解释。通过综合来自可辨识性理论(IT)的证据,我们表明PRF可以在SSL中出现。然而,当前的IT无法解释SSL的成功经验。为了缩小理论和实践之间的差距,我们建议将IT扩展为我们称为Singal Interificity Theory(SITH)的更广泛的理论框架(SITH),这个框架涵盖SSL整个管道。SITH将使人们能够更深入地了解SL的隐含数据假设,并推进实地学习更可解和可概括的表达方式。我们强调未来研究的三个关键方向:1) 培训动态和趋同特性;2) 定数样品、批量尺寸和数据多样性的影响;3) 建筑、扩增、初始化、优化和优化的偏差的作用。
Article 74
Title@2025-07-24 (4): A Multi-Dataset Benchmark for Semi-Supervised Semantic Segmentation in ECG Delineation
Title: A Multi-Dataset Benchmark for Semi-Supervised Semantic Segmentation in ECG Delineation | Ein Multi-Dataset-Benchmark für semi-überwachte semantische Segmentierung in EKG-Delineation | ECG 划定中半超部分解的多数据集基准 2507.18323v1 |
Authors (4): Minje Park, Jeonghwa Lim, Taehyung Yu, Sunghoon Joo
Electrocardiogram (ECG) delineation, the segmentation of meaningful waveform features, is critical for clinical diagnosis. Despite recent advances using deep learning, progress has been limited by the scarcity of publicly available annotated datasets. Semi-supervised learning presents a promising solution by leveraging abundant unlabeled ECG data. In this study, we present the first systematic benchmark for semi-supervised semantic segmentation (SemiSeg) in ECG delineation. We curated and unified multiple public datasets, including previously underused sources, to support robust and diverse evaluation. We adopted five representative SemiSeg algorithms from computer vision, implemented them on two different architectures: the convolutional network and the transformer, and evaluated them in two different settings: in-domain and cross-domain. Additionally, we propose ECG-specific training configurations and augmentation strategies and introduce a standardized evaluation framework. Our results show that the transformer outperforms the convolutional network in semi-supervised ECG delineation. We anticipate that our benchmark will serve as a foundation for advancing semi-supervised ECG delineation methods and will facilitate further research in this domain.
在临床诊断中,有意义的波形特征的分解(ECG)划界是临床诊断的关键。尽管最近通过深层学习取得了进步,但进展因公开提供的附加说明的数据集稀缺而受到限制。半监督的学习通过利用大量无标签的ECG数据提供了一个有希望的解决方案。在本研究中,我们介绍了ECG划界中半监督的语义分解(Semiseg)的第一个系统基准。我们整理和统一了多个公共数据集,包括以前未充分利用的来源,以支持稳健和多样的评价。我们采用了来自计算机愿景的五种具有代表性的Semeseg算法,在两种不同的结构上实施这些算法:同源网络和变异器,并在两种不同环境下对其进行评估:即大陆和跨大陆。此外,我们提出了ECG特定培训配置和增强战略,并引入了标准化的评价框架。我们的结果显示,变压器在半监督ECG划界中超越了革命网络。我们预计,我们的基准将成为推进半监督ECG划界方法的基础,并将促进该领域的研究。
Article 75
Title@2025-07-24 (4): I-CEE: Tailoring Explanations of Image Classification Models to User Expertise
Title: I-CEE: Tailoring Explanations of Image Classification Models to User Expertise | I-CEE: Maßgeschneiderte Erläuterungen von Bildklassifikationsmodellen zur Benutzerexpertise | I-CEE:根据用户专门知识对图像分类模型的定制解释 2312.12102v3 |
Authors (4): Yao Rong, Peizhu Qian, Vaibhav Unhelkar, Enkelejda Kasneci
Effectively explaining decisions of black-box machine learning models is critical to responsible deployment of AI systems that rely on them. Recognizing their importance, the field of explainable AI (XAI) provides several techniques to generate these explanations. Yet, there is relatively little emphasis on the user (the explainee) in this growing body of work and most XAI techniques generate “one-size-fits-all” explanations. To bridge this gap and achieve a step closer towards human-centered XAI, we present I-CEE, a framework that provides Image Classification Explanations tailored to User Expertise. Informed by existing work, I-CEE explains the decisions of image classification models by providing the user with an informative subset of training data (i.e., example images), corresponding local explanations, and model decisions. However, unlike prior work, I-CEE models the informativeness of the example images to depend on user expertise, resulting in different examples for different users. We posit that by tailoring the example set to user expertise, I-CEE can better facilitate users’ understanding and simulatability of the model. To evaluate our approach, we conduct detailed experiments in both simulation and with human participants (N = 100) on multiple datasets. Experiments with simulated users show that I-CEE improves users’ ability to accurately predict the model’s decisions (simulatability) compared to baselines, providing promising preliminary results. Experiments with human participants demonstrate that our method significantly improves user simulatability accuracy, highlighting the importance of human-centered XAI
有效解释黑箱机器学习模型的决定对于负责任地部署依赖这些模型的AI系统至关重要。认识到其重要性,可解释的AI(XAI)领域提供了几种技术来作出这些解释。然而,在这一不断增长的工作体系中,相对较少强调用户(解释者),而大多数XAI技术产生“一刀切”的解释。为了缩小这一差距,并朝着以人为中心的XAI更接近一步,我们介绍了I-CEEE,这是一个为用户专门知识提供图像分类准确性解释的框架。根据现有工作,ICEE通过向用户提供一系列内容丰富的培训数据(例如,示例图像)、相应的当地解释和模式决定。然而,与以前的工作不同,I-CEE模拟模型模型模型建模了范例图象依赖用户专长的丰富性,为不同用户提供了不同有希望的范例。我们根据用户的专长,I-CEEE框架可以更好地提高用户对模型的准确度的理解和可比性。为了评估我们的方法,我们进行了详细的图像模型实验,我们用模型对用户进行了详细的实验,并且向人类用户们展示了100号的模型的精确的预测力。
Article 76
Title@2025-07-24 (4): State of Health Estimation of Batteries Using a Time-Informed Dynamic Sequence-Inverted Transformer
Title: State of Health Estimation of Batteries Using a Time-Informed Dynamic Sequence-Inverted Transformer | Zustand der Gesundheit Schätzung von Batterien mit einem zeitinformierten dynamischen Sequenz-invertierten Transformer | 使用时间化动态序列反向转换器对电池进行健康状况估计 2507.18320v1 |
Authors (4): Janak M. Patel, Milad Ramezankhani, Anirudh Deodhar, Dagnachew Birru
The rapid adoption of battery-powered vehicles and energy storage systems over the past decade has made battery health monitoring increasingly critical. Batteries play a central role in the efficiency and safety of these systems, yet they inevitably degrade over time due to repeated charge-discharge cycles. This degradation leads to reduced energy efficiency and potential overheating, posing significant safety concerns. Accurate estimation of a State of Health (SoH) of battery is therefore essential for ensuring operational reliability and safety. Several machine learning architectures, such as LSTMs, transformers, and encoder-based models, have been proposed to estimate SoH from discharge cycle data. However, these models struggle with the irregularities inherent in real-world measurements: discharge readings are often recorded at non-uniform intervals, and the lengths of discharge cycles vary significantly. To address this, most existing approaches extract features from the sequences rather than processing them in full, which introduces information loss and compromises accuracy. To overcome these challenges, we propose a novel architecture: Time-Informed Dynamic Sequence Inverted Transformer (TIDSIT). TIDSIT incorporates continuous time embeddings to effectively represent irregularly sampled data and utilizes padded sequences with temporal attention mechanisms to manage variable-length inputs without discarding sequence information. Experimental results on the NASA battery degradation dataset show that TIDSIT significantly outperforms existing models, achieving over 50% reduction in prediction error and maintaining an SoH prediction error below 0.58%. Furthermore, the architecture is generalizable and holds promise for broader applications in health monitoring tasks involving irregular time-series data.
电池动力车辆和能源储存系统在过去十年中被迅速采用,使得电池健康监测越来越至关重要;电池在这些系统的效率和安全方面发挥着中心作用,但由于反复放电周期,电池不可避免地会随着时间推移而退化;这种退化导致能源效率降低和潜在的过热,从而引起严重的安全问题。因此,准确估计电池的健康状况(SoH)对于确保运行可靠性和安全至关重要。一些机器学习结构,如LSTMS、变压器和基于编码器的更宽泛模型,已经建议从排放周期数据中估算 SoH值。然而,这些模型与现实世界测量中固有的不规则之处作斗争:排放读数往往在非统一间隔中记录,而且排放周期长度也大不相同。为了解决这一问题,大多数现有方法从序列中提取特征,而不是全面处理电池,从而造成信息损失和损害准确性。为了克服这些挑战,我们提议了一个新的结构:时间化动态序列变换变换器(TIDSTDIIT),这些模型包含持续的时间嵌入到低于实际世界测量中固有的承诺值:排放值读数在不统一间隔间隔间隔间隔期间记录中记录,使美国航天系统数据显示不规则的降解数据减少数据,从而使现有变化模型数据在缩小数据中显示现有变压数据结构中的数据减少。
Article 77
Title@2025-07-24 (4): Regression-aware Continual Learning for Android Malware Detection
Title: Regression-aware Continual Learning for Android Malware Detection | Regressions-aware Continual Learning für Android Malware-Erkennung | Android Maware 探测 Android Maware 持续学习 2507.18313v1 |
Authors (9): Daniele Ghiani, Daniele Angioni, Giorgio Piras, Angelo Sotgiu, Luca Minnei, Srishti Gupta, Maura Pintor, Fabio Roli, Battista Biggio
Malware evolves rapidly, forcing machine learning (ML)-based detectors to adapt continuously. With antivirus vendors processing hundreds of thousands of new samples daily, datasets can grow to billions of examples, making full retraining impractical. Continual learning (CL) has emerged as a scalable alternative, enabling incremental updates without full data access while mitigating catastrophic forgetting. In this work, we analyze a critical yet overlooked issue in this context: security regression. Unlike forgetting, which manifests as a general performance drop on previously seen data, security regression captures harmful prediction changes at the sample level, such as a malware sample that was once correctly detected but evades detection after a model update. Although often overlooked, regressions pose serious risks in security-critical applications, as the silent reintroduction of previously detected threats in the system may undermine users’ trust in the whole updating process. To address this issue, we formalize and quantify security regression in CL-based malware detectors and propose a regression-aware penalty to mitigate it. Specifically, we adapt Positive Congruent Training (PCT) to the CL setting, preserving prior predictive behavior in a model-agnostic manner. Experiments on the ELSA, Tesseract, and AZ-Class datasets show that our method effectively reduces regression across different CL scenarios while maintaining strong detection performance over time.
Malware 迅速演化, 迫使机器学习( ML) 探测器不断适应。 随着反病毒供应商每天处理数十万个新样本, 数据集可以发展成数十亿个例子, 使得全面再培训不切实际。 持续学习( CL)已经成为一个可扩展的替代方案, 使得在不完全数据访问的情况下能够进行渐进更新, 而同时减轻灾难性的遗忘。 在这项工作中, 我们分析了这个方面一个重要但被忽略的问题: 安全回归。 而不是忘记, 它表现为以往所见数据的总体性能下降, 安全回归会捕捉到抽样层面的有害预测变化, 例如恶意软件样本曾经被正确检测过, 但却在模型更新后逃避检测。 尽管经常被忽视, 回归在安全关键应用中构成了严重的风险, 因为对系统中先前检测到的威胁的静态重新引入可能会破坏用户对整个更新过程的信任。 为了解决这一问题, 我们正式确定并量化基于 CL 恶意探测器的安全倒退的检测器, 并提议一个减缩的罚法。 具体地说, 我们调整 CLL 培训 (PC ) 适应C 设置, 预先预设, , 在模型中保留强的预测行为, 并同时有效测试 ASLALACT 方法 。
Article 78
Title@2025-07-24 (4): GNN-ACLP: Graph Neural Networks Based Analog Circuit Link Prediction
Title: GNN-ACLP: Graph Neural Networks Based Analog Circuit Link Prediction | GNN-ACLP: Graph Neural Networks Based Analog Circuit Link Prediction | GNN-ALLP:基于模拟电路链接预测的图表神经网络 2504.10240v4 |
Authors (9): Guanyuan Pan, Tiansheng Zhou, Bingtao Ma, Yaqi Wang, Jianxiang Zhao, Zhi Li, Yugui Lin, Pietro Lio, Shuai Wang
Circuit link prediction identifying missing component connections from incomplete netlists is crucial in analog circuit design automation. However, existing methods face three main challenges: 1) Insufficient use of topological patterns in circuit graphs reduces prediction accuracy; 2) Data scarcity due to the complexity of annotations hinders model generalization; 3) Limited adaptability to various netlist formats. We propose GNN-ACLP, a graph neural networks (GNNs) based method featuring three innovations to tackle these challenges. First, we introduce the SEAL (learning from Subgraphs, Embeddings, and Attributes for Link prediction) framework and achieve port-level accuracy in circuit link prediction. Second, we propose Netlist Babel Fish, a netlist format conversion tool leveraging retrieval-augmented generation (RAG) with a large language model (LLM) to improve the compatibility of netlist formats. Finally, we construct SpiceNetlist, a comprehensive dataset that contains 775 annotated circuits across 10 different component classes. Experiments demonstrate accuracy improvements of 16.08% on SpiceNetlist, 11.38% on Image2Net, and 16.01% on Masala-CHAI compared to the baseline in intra-dataset evaluation, while maintaining accuracy from 92.05% to 99.07% in cross-dataset evaluation, exhibiting robust feature transfer capabilities.
在模拟电路设计自动化方面,现有方法面临三大挑战:(1) 电路图中不适当使用地形学模式降低了预测的准确性;(2) 由于说明的复杂性而缺乏数据,妨碍了模型的简单化;(3) 对各种网络列表格式的适应性有限。我们提议GNN-ANALP, 一种基于图形神经网络(GNN-ANNNs)的方法,该方法有三项创新,以应对这些挑战。首先,我们引入SEAL(从Subgraphs、嵌入和链接预测属性中学习)框架,并在电路连接预测中实现港口一级的准确性。第二,我们提议Netlist Babel Fish, 一种使用网络列表格式转换工具,利用检索和推荐生成的生成(RAG),使用大型语言模型(LLM),以提高网络列表格式的兼容性。最后,我们构建了SpiceNetlist,这是一套综合数据集,包含10个不同组成部分的775条附加说明电路。实验显示SpiceNetlist的准确性改进了16.08%,在图像2NetNet上提高了11.38%,在Masala-CHAI上实现了16.01%,在Masala-CHAI的网络上转换工具,从稳度评估,从稳性地段,从9-creabreaccreabilvadational-palvicational-palviidaldationdationdationalitydational-dationalitydationaldationalvialvialvialvicalvicalvialvicildationdationalvicildations。
Article 79
Title@2025-07-24 (4): Variational inference for pile-up removal at hadron colliders with diffusion models
Title: Variational inference for pile-up removal at hadron colliders with diffusion models | Variationsableitung zur Stapelabfuhr an Hadron-Kollidern mit Diffusionsmodellen | 与扩散模型相撞的hadron相撞器的堆叠式清除的变异推论 2410.22074v2 |
Authors (4): Malte Algren, Tobias Golling, Christopher Pollard, John Andrew Raine
In this paper, we present a novel method for pile-up removal of $pp$ interactions using variational inference with diffusion models, called vipr. Instead of using classification methods to identify which particles are from the primary collision, a generative model is trained to predict the constituents of the hard-scatter particle jets with pile-up removed. This results in an estimate of the full posterior over hard-scatter jet constituents, which has not yet been explored in the context of pile-up removal, yielding a clear advantage over existing methods especially in the presence of imperfect detector efficiency. We evaluate the performance of vipr in a sample of jets from simulated $t\bar{t}$ events overlain with pile-up contamination. vipr outperforms softdrop and has comparable performance to puppiml in predicting the substructure of the hard-scatter jets over a wide range of pile-up scenarios.
在本文中,我们提出了一个利用与扩散模型的变异推断法(称为Vepr)堆积去除美元相互作用的新方法。我们没有使用分类法来确定哪些粒子来自一次碰撞,而是训练了一个基因模型来预测堆积式清除的硬散射粒子喷射机的成分。这导致对堆积式清除的喷射机成分的尾部和硬散射喷射机成分的完全外部部分的估计,这种估计尚未在堆积式清除中进行探讨,对现有的方法产生了明显的优势,特别是在不完善的检测器效率的情况下。我们评估了模拟的$t\bar{t}美元喷射机样本中振动在堆积式污染的表面事件上的性能。振荡柔软滑,在预测大量堆积式假设的硬散射喷射喷射机的亚结构方面,其性能与微浮化法相似。
Article 80
Title@2025-07-24 (4): PRIX: Learning to Plan from Raw Pixels for End-to-End Autonomous Driving
Title: PRIX: Learning to Plan from Raw Pixels for End-to-End Autonomous Driving | PRIX: Planen lernen von rohen Pixeln für autonomes Fahren Ende-zu-Ende | PRIX: 学习从Raw像素到计划用于终端到终端自治驾驶 2507.17596v2 |
Authors (4): Maciej K. Wozniak, Lianhang Liu, Yixi Cai, Patric Jensfelt
While end-to-end autonomous driving models show promising results, their practical deployment is often hindered by large model sizes, a reliance on expensive LiDAR sensors and computationally intensive BEV feature representations. This limits their scalability, especially for mass-market vehicles equipped only with cameras. To address these challenges, we propose PRIX (Plan from Raw Pixels). Our novel and efficient end-to-end driving architecture operates using only camera data, without explicit BEV representation and forgoing the need for LiDAR. PRIX leverages a visual feature extractor coupled with a generative planning head to predict safe trajectories from raw pixel inputs directly. A core component of our architecture is the Context-aware Recalibration Transformer (CaRT), a novel module designed to effectively enhance multi-level visual features for more robust planning. We demonstrate through comprehensive experiments that PRIX achieves state-of-the-art performance on the NavSim and nuScenes benchmarks, matching the capabilities of larger, multimodal diffusion planners while being significantly more efficient in terms of inference speed and model size, making it a practical solution for real-world deployment. Our work is open-source and the code will be at https://maxiuw.github.io/prix.
虽然端到端自主驱动模型显示了有希望的成果,但其实际部署往往受到巨大模型规模的阻碍,依赖昂贵的利达AR传感器和计算密集的BEV特征表示。这限制了其可扩缩性,特别是只配备相机的大众市场车辆。为了应对这些挑战,我们提议PRIX(Raw Pixels的Plan)。我们的新颖而高效的端到端驱动结构仅使用相机数据运作,而没有明确的BEV代表,也不再需要LIDAR。PRIX利用视觉特征提取器和基因规划头直接预测原生像素投入的安全轨迹。我们结构的核心组成部分是环境觉悟再校准变换器(CaRT),这是一个旨在有效提高多级视觉功能以进行更强有力的规划的新模块。我们通过全面实验证明,PRIX在NavSim和Nuscenes基准上实现了最先进的性能,与大型、多式传播规划者的能力相匹配,同时在推导速度和模型大小方面效率显著提高。我们结构的一个核心部分是现实的代码/世界部署。我们的工作将是开放的。
Article 81
Title@2025-07-24 (4): Self-Supervised Coarsening of Unstructured Grid with Automatic Differentiation
Title: Self-Supervised Coarsening of Unstructured Grid with Automatic Differentiation | Selbstüberwachte Verzahnung des unstrukturierten Gitters mit automatischer Differenzierung | 带有自动差异的无结构网格自操作粗化 2507.18297v1 |
Authors (5): Sergei Shumilin, Alexander Ryabov, Nikolay Yavich, Evgeny Burnaev, Vladimir Vanovskiy
Due to the high computational load of modern numerical simulation, there is a demand for approaches that would reduce the size of discrete problems while keeping the accuracy reasonable. In this work, we present an original algorithm to coarsen an unstructured grid based on the concepts of differentiable physics. We achieve this by employing k-means clustering, autodifferentiation and stochastic minimization algorithms. We demonstrate performance of the designed algorithm on two PDEs: a linear parabolic equation which governs slightly compressible fluid flow in porous media and the wave equation. Our results show that in the considered scenarios, we reduced the number of grid points up to 10 times while preserving the modeled variable dynamics in the points of interest. The proposed approach can be applied to the simulation of an arbitrary system described by evolutionary partial differential equations.
由于现代数字模拟的计算负荷很高,需要采用一些方法来减少离散问题的规模,同时保持准确性合理性。在这项工作中,我们提出了一个原始算法,根据不同物理学的概念来腐蚀一个无结构的网格。我们通过使用k- means群集、自动区分和最小化算法来实现这一点。我们展示了两种PDEs设计算法的性能:一种线性抛物线方程,它调节了微压缩的多孔径流和波形方程。我们的结果显示,在考虑的假设情景中,我们把网格点的数量减到10倍,同时将模型变量动态保持在利益点上。提议的方法可以适用于以进化部分差异方程式描述的任意系统的模拟。
Article 82
Title@2025-07-24 (4): Leveraging Data Augmentation and Siamese Learning for Predictive Process Monitoring
Title: Leveraging Data Augmentation and Siamese Learning for Predictive Process Monitoring | Leveraging Data Augmentation und Siamese Learning für vorausschauende Prozessüberwachung | 利用数据增强和西亚学习来监测预测过程 2507.18293v1 |
Authors (3): Sjoerd van Straten, Alessandro Padella, Marwan Hassani
Predictive Process Monitoring (PPM) enables forecasting future events or outcomes of ongoing business process instances based on event logs. However, deep learning PPM approaches are often limited by the low variability and small size of real-world event logs. To address this, we introduce SiamSA-PPM, a novel self-supervised learning framework that combines Siamese learning with Statistical Augmentation for Predictive Process Monitoring. It employs three novel statistically grounded transformation methods that leverage control-flow semantics and frequent behavioral patterns to generate realistic, semantically valid new trace variants. These augmented views are used within a Siamese learning setup to learn generalizable representations of process prefixes without the need for labeled supervision. Extensive experiments on real-life event logs demonstrate that SiamSA-PPM achieves competitive or superior performance compared to the SOTA in both next activity and final outcome prediction tasks. Our results further show that statistical augmentation significantly outperforms random transformations and improves variability in the data, highlighting SiamSA-PPM as a promising direction for training data enrichment in process prediction.
预测过程监测(PPM)能够根据事件日志预测当前业务过程的今后事件或结果;然而,深层次的PPM方法往往受到现实世界事件日志变化性小和规模小的限制。为了解决这个问题,我们引入了SiamSA-PPM,这是一个全新的自我监督学习框架,将Siamse的学习与预测过程监测的统计强化结合起来。它采用了三种基于统计的新颖的转型方法,利用控制流的语义和经常的行为模式产生现实的、具有内在效力的新追踪变异。这些扩大的视角在Siamse学习设置中被使用,以学习一般的流程前缀,而不需要有标签的监督。关于实际活动日志的广泛实验表明,在下一个活动和最后结果预测任务中,SOTA的竞争力或优异性表现都与SOSA-PM相比。我们的结果进一步表明,统计的扩大大大优于随机变异性,突出SiamSA-PPM是过程预测中培训数据浓缩的一个很有希望的方向。
Article 83
Title@2025-07-24 (4): BEAVER: Building Environments with Assessable Variation for Evaluating Multi-Objective Reinforcement Learning
Title: BEAVER: Building Environments with Assessable Variation for Evaluating Multi-Objective Reinforcement Learning | BEAVER: Bauen von Umgebungen mit einschätzbarer Variation zur Bewertung von multi-objektiven Verstärkungslernen | BEAVER: 在环境建设中采用可评估的变数评估多目标强化学习 2507.07769v2 |
Authors (3): Ruohong Liu, Jack Umenberger, Yize Chen
Recent years have seen significant advancements in designing reinforcement learning (RL)-based agents for building energy management. While individual success is observed in simulated or controlled environments, the scalability of RL approaches in terms of efficiency and generalization across building dynamics and operational scenarios remains an open question. In this work, we formally characterize the generalization space for the cross-environment, multi-objective building energy management task, and formulate the multi-objective contextual RL problem. Such a formulation helps understand the challenges of transferring learned policies across varied operational contexts such as climate and heat convection dynamics under multiple control objectives such as comfort level and energy consumption. We provide a principled way to parameterize such contextual information in realistic building RL environments, and construct a novel benchmark to facilitate the evaluation of generalizable RL algorithms in practical building control tasks. Our results show that existing multi-objective RL methods are capable of achieving reasonable trade-offs between conflicting objectives. However, their performance degrades under certain environment variations, underscoring the importance of incorporating dynamics-dependent contextual information into the policy learning process.
近年来,在设计建筑能源管理的强化学习(RL)代理物方面取得了显著进展。虽然在模拟或受控环境中观察到个别成功,但在建筑动态和操作设想中,在效率和通用方面,RL方法的可扩展性仍是一个未决问题。在这项工作中,我们正式确定跨环境、多目标建筑能源管理任务的通用空间,并拟订多目标环境RL问题。这种表述有助于理解在多种控制目标(如舒适程度和能源消耗)下,在气候和热对流动态等不同业务背景下转移已学政策的挑战。我们提供了一个原则性方法,将这种背景信息在现实建筑环境中进行参数化,并建立一个新的基准,以便利评估实际建筑控制任务中可通用的RL算法。我们的成果显示,现有的多目标RL方法能够在相互冲突的目标之间实现合理的权衡。然而,其性能在某些环境变化下退化,强调将动态背景信息纳入政策学习过程的重要性。
Article 84
Title@2025-07-24 (4): ReSem3D: Refinable 3D Spatial Constraints via Fine-Grained Semantic Grounding for Generalizable Robotic Manipulation
Title: ReSem3D: Refinable 3D Spatial Constraints via Fine-Grained Semantic Grounding for Generalizable Robotic Manipulation | ReSem3D: Verfeinerbare 3D-Raumeinschränkungen durch feinkörnige semantische Erdung für eine generalisierbare Robotermanipulation | ReSem3D:通过精密的可通用机器人操纵的语义定位,改进3D空间限制 2507.18262v1 |
Authors (5): Chenyu Su, Weiwei Shang, Chen Qian, Fei Zhang, Shuang Cong
Semantics-driven 3D spatial constraints align highlevel semantic representations with low-level action spaces, facilitating the unification of task understanding and execution in robotic manipulation. The synergistic reasoning of Multimodal Large Language Models (MLLMs) and Vision Foundation Models (VFMs) enables cross-modal 3D spatial constraint construction. Nevertheless, existing methods have three key limitations: (1) coarse semantic granularity in constraint modeling, (2) lack of real-time closed-loop planning, (3) compromised robustness in semantically diverse environments. To address these challenges, we propose ReSem3D, a unified manipulation framework for semantically diverse environments, leveraging the synergy between VFMs and MLLMs to achieve fine-grained visual grounding and dynamically constructs hierarchical 3D spatial constraints for real-time manipulation. Specifically, the framework is driven by hierarchical recursive reasoning in MLLMs, which interact with VFMs to automatically construct 3D spatial constraints from natural language instructions and RGB-D observations in two stages: part-level extraction and region-level refinement. Subsequently, these constraints are encoded as real-time optimization objectives in joint space, enabling reactive behavior to dynamic disturbances. Extensive simulation and real-world experiments are conducted in semantically rich household and sparse chemical lab environments. The results demonstrate that ReSem3D performs diverse manipulation tasks under zero-shot conditions, exhibiting strong adaptability and generalization. Code and videos at https://resem3d.github.io.
由3D驱动的语义空间限制使高层次的语义表达与低层次行动空间相协调,促进在机器人操作中统一任务理解和执行任务,促进在机器人操作中统一任务理解和执行任务。多式大语言模型(MLLMs)和愿景基础模型(VFMs)的协同推理使跨式3D空间制约结构得以跨式3D空间构建。然而,现有方法有三个关键局限性:(1) 制约模型中粗略的语义颗粒颗粒质颗粒,(2) 缺乏实时闭路图规划,(3) 破坏语义多样性环境中的稳健性。为了应对这些挑战,我们提议ReSem3D, 一种统一的语义多样性环境操作框架,利用VFMs和MLLMsms之间的协同作用,以实现精细的视觉地面定位,动态3 动态3 动态MLLMS(LMs) 和RGB-D(RGB-D) 多样性观测,在两个阶段中自动构建3D 3级的联合提取和区域级的软体系,这些制约因素使真实的模拟环境的动态模拟演化,在真实的演化过程中演化中演进。
Article 85
Title@2025-07-24 (4): Alternative Loss Function in Evaluation of Transformer Models
Title: Alternative Loss Function in Evaluation of Transformer Models | Alternative Verlustfunktion bei der Bewertung von Transformer-Modellen | 变换模型评价中的替代损失功能 2507.16548v2 |
Authors (3): Jakub Michańków, Paweł Sakowski, Robert Ślepaczuk
The proper design and architecture of testing machine learning models, especially in their application to quantitative finance problems, is crucial. The most important aspect of this process is selecting an adequate loss function for training, validation, estimation purposes, and hyperparameter tuning. Therefore, in this research, through empirical experiments on equity and cryptocurrency assets, we apply the Mean Absolute Directional Loss (MADL) function, which is more adequate for optimizing forecast-generating models used in algorithmic investment strategies. The MADL function results are compared between Transformer and LSTM models, and we show that in almost every case, Transformer results are significantly better than those obtained with LSTM.
测试机器学习模型的恰当设计和结构,特别是在将其应用于量化融资问题时,至关重要,这一过程最重要的方面是选择适当的损失功能,用于培训、验证、估算和超参数调试,因此,在这项研究中,通过对股本和加密货币资产的经验实验,我们应用了平均绝对方向损失(MADL)功能,这一功能更适合于优化算法投资战略中使用的预测生成模型。MADL功能结果在变换器和LSTM模型之间进行了比较,我们表明,几乎在每一种情况下,变换器的结果都比LSTM取得的结果要好得多。
Article 86
Title@2025-07-24 (4): SyncMapV2: Robust and Adaptive Unsupervised Segmentation
Title: SyncMapV2: Robust and Adaptive Unsupervised Segmentation | SyncMapV2: Robuste und adaptive unüberwachte Segmentierung | 同步马普V2: 强力和适应性不受监督的分割 2506.16297v3 |
Authors (3): Heng Zhang, Zikang Wan, Danilo Vasconcellos Vargas
Human vision excels at segmenting visual cues without the need for explicit training, and it remains remarkably robust even as noise severity increases. In contrast, existing AI algorithms struggle to maintain accuracy under similar conditions. Here, we present SyncMapV2, the first to solve unsupervised segmentation with state-of-the-art robustness. SyncMapV2 exhibits a minimal drop in mIoU, only 0.01%, under digital corruption, compared to a 23.8% drop observed in SOTA methods. This superior performance extends across various types of corruption: noise (7.3% vs. 37.7%), weather (7.5% vs. 33.8%), and blur (7.0% vs. 29.5%). Notably, SyncMapV2 accomplishes this without any robust training, supervision, or loss functions. It is based on a learning paradigm that uses self-organizing dynamical equations combined with concepts from random networks. Moreover, unlike conventional methods that require re-initialization for each new input, SyncMapV2 adapts online, mimicking the continuous adaptability of human vision. Thus, we go beyond the accurate and robust results, and present the first algorithm that can do all the above online, adapting to input rather than re-initializing. In adaptability tests, SyncMapV2 demonstrates near-zero performance degradation, which motivates and fosters a new generation of robust and adaptive intelligence in the near future.
人类的视觉在视觉分解线索方面非常出色,无需明确培训,而且即使噪音强度增加,这种优异性能仍然相当强劲。相比之下,现有的AI算法在类似条件下为保持准确性而挣扎。在这里,我们介绍SyncMapV2,这是第一个用最先进的强健性解决未经监督的分解。SyncMapV2在数字腐败下显示MIOU最小下降,只有0.01%,而数字腐败下则只有0.01%,而在SOTA方法中则有23.8%的下降。这种优异性能跨越了各种腐败类型:噪音(7.3%对37.7%)、天气(7.5%对33.8%)和模糊(7.0%对29.5 % ) 。因此,SycMapV2在没有强健的培训、监督或损失功能的情况下实现了这一点。SyncMV2基于学习模式,它使用自我组织动态方程式,同时使用随机网络的概念。此外,常规方法要求对每项新投入进行重新初始化,SycMapV2在网上进行调整,对人的视觉进行第一次的不断调整。因此,在接近于在线的适应,我们可以进行精确和感化的自我调整,对结果进行自我调整,而不能再升级。
Article 87
Title@2025-07-24 (4): Boosting Revisited: Benchmarking and Advancing LP-Based Ensemble Methods
Title: Boosting Revisited: Benchmarking and Advancing LP-Based Ensemble Methods | Revisited Boosting: Benchmarking and Advancing LP-Based Ensemble Methods | 重新审视促进:基准制定和推进基于LP的组合组合方法 2507.18242v1 |
Authors (5): Fabian Akkerman, Julien Ferry, Christian Artigues, Emmanuel Hebrard, Thibaut Vidal
Despite their theoretical appeal, totally corrective boosting methods based on linear programming have received limited empirical attention. In this paper, we conduct the first large-scale experimental study of six LP-based boosting formulations, including two novel methods, NM-Boost and QRLP-Boost, across 20 diverse datasets. We evaluate the use of both heuristic and optimal base learners within these formulations, and analyze not only accuracy, but also ensemble sparsity, margin distribution, anytime performance, and hyperparameter sensitivity. We show that totally corrective methods can outperform or match state-of-the-art heuristics like XGBoost and LightGBM when using shallow trees, while producing significantly sparser ensembles. We further show that these methods can thin pre-trained ensembles without sacrificing performance, and we highlight both the strengths and limitations of using optimal decision trees in this context.
尽管有理论上的吸引力,但完全纠正基于线性编程的提振方法却只得到有限的实证关注。 在本文中,我们对六种基于LP的提振配方进行了第一次大规模实验性研究,包括两种新型方法,即NM-Boost和QRLP-Boost,共20个不同的数据集。我们评估了这些配方中超脂和最佳基底学习者的使用情况,不仅分析了准确性,而且分析了共性、边缘分布、随时性能和超光谱灵敏度。我们表明,完全纠正方法在使用浅树时可以超越或匹配XGBoost和LightGBMM等最先进的超脂质配方,同时生产非常稀薄的聚合物。我们进一步表明,这些方法可以在不牺牲性能的情况下减少预先训练过的聚合物,我们强调在这方面使用最佳决策树的优点和局限性。
Article 88
Title@2025-07-24 (4): Robust Multi-View Learning via Representation Fusion of Sample-Level Attention and Alignment of Simulated Perturbation
Title: Robust Multi-View Learning via Representation Fusion of Sample-Level Attention and Alignment of Simulated Perturbation | Robustes Multi-View-Lernen durch Darstellung Fusion von Sample-Level-Achtung und Ausrichtung der simulierten Perturbation | 通过展示抽样关注层的聚合和模拟扰动的调整,通过代表方式进行强有力的多视角学习 2503.04151v2 |
Authors (5): Jie Xu, Na Zhao, Gang Niu, Masashi Sugiyama, Xiaofeng Zhu
Recently, multi-view learning (MVL) has garnered significant attention due to its ability to fuse discriminative information from multiple views. However, real-world multi-view datasets are often heterogeneous and imperfect, which usually causes MVL methods designed for specific combinations of views to lack application potential and limits their effectiveness. To address this issue, we propose a novel robust MVL method (namely RML) with simultaneous representation fusion and alignment. Specifically, we introduce a simple yet effective multi-view transformer fusion network where we transform heterogeneous multi-view data into homogeneous word embeddings, and then integrate multiple views by the sample-level attention mechanism to obtain a fused representation. Furthermore, we propose a simulated perturbation based multi-view contrastive learning framework that dynamically generates the noise and unusable perturbations for simulating imperfect data conditions. The simulated noisy and unusable data obtain two distinct fused representations, and we utilize contrastive learning to align them for learning discriminative and robust representations. Our RML is self-supervised and can also be applied for downstream tasks as a regularization. In experiments, we employ it in multi-view unsupervised clustering, noise-label classification, and as a plug-and-play module for cross-modal hashing retrieval. Extensive comparison experiments and ablation studies validate RML’s effectiveness. Code is available at https://github.com/SubmissionsIn/RML.
最近,多观点学习(MVL)因其能够将多种观点的歧视性信息融合在一起而引起人们的极大关注。然而,现实世界多观点数据集往往多种多样和不完善,通常导致为具体观点组合设计的MVL方法缺乏应用潜力并限制其效力。为解决这一问题,我们提议采用新型强健的MVL方法(即RML),同时配有代表组合和校正。具体地说,我们引入了一个简单而有效的多观点变压器融合网络,将差异多观点数据转化成单一的单词嵌入,然后将抽样关注机制的多重观点整合起来,以获得混合代表制代表制。此外,我们提议采用基于模拟扰动的多视角对比对比式对比式的多视角对比式学习框架,为模拟不完善的数据条件而产生噪音和不可用的扰动性扰动性。模拟的杂乱和不可用数据得到两个不同的组合式表达式,我们利用对比式学习它们来学习有区别性和稳健的表达式的表达式。我们的RMLL是自我监督的,还可以应用到下游任务中来进行下游任务,作为常规的常规化。我们将它用于多视角的升级的升级和升级。在多视角上进行。我们用于升级的模制的模制的模制的模制的模制的模制的模制。我们用模制的模制的模制的模制。我们将它用作的模制的模制的模制的模制。我们将它用作的模制的模制的模制的模制。我们制的模制的模制的模制的模制的模制的模制的模制的模制的模制的模。我们用模制的模制的模制的变式的变式的模制的模制的模制的模制的模制的模制的模。我们用模制的模制的模制的模制。我们用模制的模制的模制的模制的模制的模制的模制的模制的模制的模制的模制的模制的模制的模制的变式的模制的模制的模制的模制的模制的模制的模制的模制的模制的模制。我们制。我们制的模制的
Article 89
Title@2025-07-24 (4): Compositional Coordination for Multi-Robot Teams with Large Language Models
Title: Compositional Coordination for Multi-Robot Teams with Large Language Models | Kompositionskoordination für Multi-Roboter-Teams mit großen Sprachmodellen | 具有大语言模式的多机器人小组的组成协调 2507.16068v2 |
Authors (5): Zhehui Huang, Guangyao Shi, Yuwei Wu, Vijay Kumar, Gaurav S. Sukhatme
Multi-robot coordination has traditionally relied on a mission-specific and expert-driven pipeline, where natural language mission descriptions are manually translated by domain experts into mathematical formulation, algorithm design, and executable code. This conventional process is labor-intensive, inaccessible to non-experts, and inflexible to changes in mission requirements. Here, we propose LAN2CB (Language to Collective Behavior), a novel framework that leverages large language models (LLMs) to streamline and generalize the multi-robot coordination pipeline. LAN2CB transforms natural language (NL) mission descriptions into executable Python code for multi-robot systems through two core modules: (1) Mission Analysis, which parses mission descriptions into behavior trees, and (2) Code Generation, which leverages the behavior tree and a structured knowledge base to generate robot control code. We further introduce a dataset of natural language mission descriptions to support development and benchmarking. Experiments in both simulation and real-world environments demonstrate that LAN2CB enables robust and flexible multi-robot coordination from natural language, significantly reducing manual engineering effort and supporting broad generalization across diverse mission types. Website: https://sites.google.com/view/lan-cb
多机器人协调历来依赖一个特派团专用和专家驱动的管道,其中自然语言任务说明由域专家人工转换成数学配制、算法设计和可执行代码。这一常规过程是劳动密集型的,非专家无法使用,无法灵活地适应任务要求的变化。在这里,我们提议使用LAN2CB(集体行为语言至集体行为),这是一个利用大型语言模式简化和普及多机器人协调管道的新框架。 LAN2CB将自然语言(NL)任务说明转换成多机器人系统可执行的 Python 代码,通过两个核心模块:(1) 任务分析,将任务描述划为行为树,和(2) 代码生成,利用行为树和结构知识库生成机器人控制代码。我们进一步引入一套自然语言任务说明数据集,以支持发展和基准化。在模拟和现实世界环境中进行的实验表明,LAN2CB使多机器人系统系统的描述能够从自然语言中实现可靠和灵活的多机器人协调,大大减少了手工工程努力,并支持了不同类型任务的一般化。 http://mexiolog/clusional orges.
Article 90
Title@2025-07-24 (4): Why Do Class-Dependent Evaluation Effects Occur with Time Series Feature Attributions? A Synthetic Data Investigation
Title: Why Do Class-Dependent Evaluation Effects Occur with Time Series Feature Attributions? A Synthetic Data Investigation | Warum wirken sich klassenabhängige Auswertungseffekte mit Zeitreihen-Feature-Attributionen aus? Eine synthetische Datenuntersuchung | 为何类依赖评价效果与时间序列特征属性是否相符? 合成数据调查 2506.11790v2 |
Authors (4): Gregor Baer, Isel Grau, Chao Zhang, Pieter Van Gorp
Evaluating feature attribution methods represents a critical challenge in explainable AI (XAI), as researchers typically rely on perturbation-based metrics when ground truth is unavailable. However, recent work reveals that these evaluation metrics can show different performance across predicted classes within the same dataset. These “class-dependent evaluation effects” raise questions about whether perturbation analysis reliably measures attribution quality, with direct implications for XAI method development and evaluation trustworthiness. We investigate under which conditions these class-dependent effects arise by conducting controlled experiments with synthetic time series data where ground truth feature locations are known. We systematically vary feature types and class contrasts across binary classification tasks, then compare perturbation-based degradation scores with ground truth-based precision-recall metrics using multiple attribution methods. Our experiments demonstrate that class-dependent effects emerge with both evaluation approaches, even in simple scenarios with temporally localized features, triggered by basic variations in feature amplitude or temporal extent between classes. Most critically, we find that perturbation-based and ground truth metrics frequently yield contradictory assessments of attribution quality across classes, with weak correlations between evaluation approaches. These findings suggest that researchers should interpret perturbation-based metrics with care, as they may not always align with whether attributions correctly identify discriminating features. By showing this disconnect, our work points toward reconsidering what attribution evaluation actually measures and developing more rigorous evaluation methods that capture multiple dimensions of attribution quality.
在可解释的AI(XAI)中,评估特征归属的方法是一个重大挑战,因为研究人员通常在缺乏地面真相时依赖以扰动为基础的衡量标准,然而,最近的工作表明,这些评估指标可以在同一数据集中显示不同预测类别的不同性能。这些“依级评价效果”提出了关于扰动分析是否可靠地衡量属性质量,对XAI方法的开发和评价可靠性产生直接影响的问题。我们调查在哪些条件下,对已知地面真相特征所在地的合成时间序列数据进行控制实验,从而产生这些依级影响。我们系统地在二进制分类任务中不同特征类型和类别对比,然后用多重归属方法将基于扰动的降解得分与基于地面的精确召回指标进行比较。我们的实验表明,在两种评价方法中,即使是在具有时间上固定特征的简单假设情况下,也会产生与不同的结果。我们发现,根据扰动和地面真相衡量标准,对不同类别之间的归属质量的评估往往产生相互矛盾的评估,而评价方法之间的相关性则很弱。这些研究结果表明,通过更精确地解释我们的工作归属性的方法,是否应该以更精确地区分地解释我们的工作分级评估。
Article 91
Title@2025-07-24 (4): Sparse identification of nonlinear dynamics with library optimization mechanism: Recursive long-term prediction perspective
Title: Sparse identification of nonlinear dynamics with library optimization mechanism: Recursive long-term prediction perspective | Sparse Identifikation von nichtlinearen Dynamiken mit Bibliotheksoptimierungsmechanismus: Rekursive langfristige Vorhersageperspektive | 利用图书馆优化机制粗略地识别非线性动态与图书馆优化机制:递归性长期预测前景 2507.18220v1 |
Authors (7): Ansei Yonezawa, Heisei Yonezawa, Shuichi Yahagi, Itsuro Kajiwara, Shinya Kijimoto, Hikaru Taniuchi, Kentaro Murakami
The sparse identification of nonlinear dynamics (SINDy) approach can discover the governing equations of dynamical systems based on measurement data, where the dynamical model is identified as the sparse linear combination of the given basis functions. A major challenge in SINDy is the design of a library, which is a set of candidate basis functions, as the appropriate library is not trivial for many dynamical systems. To overcome this difficulty, this study proposes SINDy with library optimization mechanism (SINDy-LOM), which is a combination of the sparse regression technique and the novel learning strategy of the library. In the proposed approach, the basis functions are parametrized. The SINDy-LOM approach involves a two-layer optimization architecture: the inner-layer, in which the data-driven model is extracted as the sparse linear combination of the candidate basis functions, and the outer-layer, in which the basis functions are optimized from the viewpoint of the recursive long-term (RLT) prediction accuracy; thus, the library design is reformulated as the optimization of the parametrized basis functions. The resulting SINDy-LOM model has good interpretability and usability, as the proposed approach yields the parsimonious model. The library optimization mechanism significantly reduces user burden. The RLT perspective improves the reliability of the resulting model compared with the traditional SINDy approach that can only ensure the one-step-ahead prediction accuracy. The validity of the proposed approach is demonstrated by applying it to a diesel engine airpath system, which is a well-known complex industrial system.
对非线性动态(SINDy)的微弱识别方法可以发现以测量数据为基础的动态系统的治理方程式,其中动态模型被确定为特定基础功能的稀少线性组合。SINDIY的一个主要挑战是设计一个图书馆,这是一套候选基础功能,因为对于许多动态系统来说,适当的图书馆并不是微不足道的。为克服这一困难,本研究报告建议SINDIy使用图书馆优化机制(SINDI-LOM),这是稀薄回归技术和图书馆新颖学习战略的组合。在拟议方法中,基础功能是完全的。SINDI-LOM方法涉及一个两层优化结构:内层结构,其中数据驱动模型是作为候选人基础功能的稀薄线性组合提取的,外层,其中基础功能从循环性长期(RLT)预测准确性的观点中优化;因此,图书馆设计只能重新改写为对可辨称的平衡基础功能的优化。由此形成的SINDI-LOM模型具有良好的准确性,由此对IMI系统进行精确性分析,从而将S-RLI的精确性加以改进。
Article 92
Title@2025-07-24 (4): FedSA-GCL: A Semi-Asynchronous Federated Graph Learning Framework with Personalized Aggregation and Cluster-Aware Broadcasting
Title: FedSA-GCL: A Semi-Asynchronous Federated Graph Learning Framework with Personalized Aggregation and Cluster-Aware Broadcasting | FedSA-GCL: Ein semi-asynchrones Federated Graph Learning Framework mit personalisierter Aggregation und Cluster-Aware Broadcasting | FedSA-GCL:半同步的联邦联邦图表学习框架,配有个性化聚合和集束软件广播 2507.18219v1 |
Authors (6): Zhongzheng Yuan, Lianshuai Guo, Xunkai Li, Yinlin Zhu, Wenyu Wang, Meixia Qu
Federated Graph Learning (FGL) is a distributed learning paradigm that enables collaborative training over large-scale subgraphs located on multiple local systems. However, most existing FGL approaches rely on synchronous communication, which leads to inefficiencies and is often impractical in real-world deployments. Meanwhile, current asynchronous federated learning (AFL) methods are primarily designed for conventional tasks such as image classification and natural language processing, without accounting for the unique topological properties of graph data. Directly applying these methods to graph learning can possibly result in semantic drift and representational inconsistency in the global model. To address these challenges, we propose FedSA-GCL, a semi-asynchronous federated framework that leverages both inter-client label distribution divergence and graph topological characteristics through a novel ClusterCast mechanism for efficient training. We evaluate FedSA-GCL on multiple real-world graph datasets using the Louvain and Metis split algorithms, and compare it against 9 baselines. Extensive experiments demonstrate that our method achieves strong robustness and outstanding efficiency, outperforming the baselines by an average of 2.92% with the Louvain and by 3.4% with the Metis.
联邦图表学习(FGL)是一种分布式学习模式,它使得在多个地方系统中的大型子图层上进行合作培训成为了合作性培训,但大多数现有的FGL方法依赖于同步通信,这导致效率低下,在现实世界的部署中往往不切实际。与此同时,当前非同步的联邦化学习方法主要针对图像分类和自然语言处理等常规任务设计,而没有考虑到图表数据的独特地形特性。直接将这些方法应用于图形学习可能导致全球模型的语义漂移和代表性不一致。为了应对这些挑战,我们提议FDSA-GCL是一个半同步化的联邦化框架,通过新型的GroupCast培训机制,利用客户间标签分布差异和图表表层特征。我们用Louvain和Metis的分算法对多种真实世界图形数据集进行评估,并将其与9个基线进行比较。广泛的实验表明,我们的方法取得了很强的坚固性和突出的效率,比Louin和Meva平均3.4%的基线高出2.92%。
Article 93
Title@2025-07-24 (4): The Role of the Time-Dependent Hessian in High-Dimensional Optimization
Title: The Role of the Time-Dependent Hessian in High-Dimensional Optimization | Die Rolle des Zeitabhängigen Hessen bei der hochdimensionalen Optimierung | 时间依赖的赫西安人在高多样性最佳化中的作用 2403.02418v3 |
Authors (3): Tony Bonnaire, Giulio Biroli, Chiara Cammarota
Gradient descent is commonly used to find minima in rough landscapes, particularly in recent machine learning applications. However, a theoretical understanding of why good solutions are found remains elusive, especially in strongly non-convex and high-dimensional settings. Here, we focus on the phase retrieval problem as a typical example, which has received a lot of attention recently in theoretical machine learning. We analyze the Hessian during gradient descent, identify a dynamical transition in its spectral properties, and relate it to the ability of escaping rough regions in the loss landscape. When the signal-to-noise ratio (SNR) is large enough, an informative negative direction exists in the Hessian at the beginning of the descent, i.e in the initial condition. While descending, a BBP transition in the spectrum takes place in finite time: the direction is lost, and the dynamics is trapped in a rugged region filled with marginally stable bad minima. Surprisingly, for finite system sizes, this window of negative curvature allows the system to recover the signal well before the theoretical SNR found for infinite sizes, emphasizing the central role of initialization and early-time dynamics for efficiently navigating rough landscapes.
在粗糙的景观中,特别是在最近的机器学习应用中,人们通常会使用渐渐的下降来寻找迷你。然而,对于为什么找到好的解决办法,理论上的理解仍然难以找到,特别是在强烈的非电流和高维环境中。这里,我们把阶段的回收问题作为一个典型的例子,最近理论机学学习中引起了很多注意。我们分析梯度下降期间的赫西安人,找出其光谱特性的动态转变,并将其与在损失地貌中逃离粗糙区域的能力联系起来。当信号对噪音比率(SNR)足够大的时候,赫斯尼人的信号与噪音比率(SNR)就存在信息性的消极方向,即最初处于初始状态。在下降时,BBPP在频谱上的转变是在有限的时间内发生的:方向已经丢失,动态被困在一个崎岖不平的区域内,充满了微不稳定的微小的微小的微小的微微粒子。令人惊讶的是,这个负曲线窗口使得系统能够在理论性SNR发现无限大小之前恢复信号,强调粗小的初始和早期动态的中心作用。
Article 94
Title@2025-07-24 (4): Goal-based Trajectory Prediction for improved Cross-Dataset Generalization
Title: Goal-based Trajectory Prediction for improved Cross-Dataset Generalization | Zielbasierte Trajektorie-Vorhersage für verbesserte Cross-Dataset-Verallgemeinerung | 改进交叉数据通用化的基于目标的轨迹预测 2507.18196v1 |
Authors (3): Daniel Grimm, Ahmed Abouelazm, J. Marius Zöllner
To achieve full autonomous driving, a good understanding of the surrounding environment is necessary. Especially predicting the future states of other traffic participants imposes a non-trivial challenge. Current SotA-models already show promising results when trained on real datasets (e.g. Argoverse2, NuScenes). Problems arise when these models are deployed to new/unseen areas. Typically, performance drops significantly, indicating that the models lack generalization. In this work, we introduce a new Graph Neural Network (GNN) that utilizes a heterogeneous graph consisting of traffic participants and vectorized road network. Latter, is used to classify goals, i.e. endpoints of the predicted trajectories, in a multi-staged approach, leading to a better generalization to unseen scenarios. We show the effectiveness of the goal selection process via cross-dataset evaluation, i.e. training on Argoverse2 and evaluating on NuScenes.
为了实现完全自主驾驶,必须很好地了解周围环境。 特别是预测其他交通参与者的未来状况将带来非三重挑战。 当前的 SotA模型在接受真实数据集培训(例如Argoverse2, NuScenes)时已经显示出有希望的结果。 当这些模型被部署到新的/不见得的地区时会出现问题。 一般来说, 性能明显下降, 表明模型缺乏概括性。 在这项工作中, 我们引入一个新的图形神经网络( GNN) , 使用由交通参与者和传导式道路网络组成的混合图。 光电池用多阶段方法对目标进行分类, 即预测轨迹的终点, 导致更好地概括到不可见的情景。 我们通过交叉数据评估, 即 Argoverse2 培训和 Nuscenes 评估, 显示目标选择过程的有效性。
Article 95
Title@2025-07-24 (4): Beyond Low-rank Decomposition: A Shortcut Approach for Efficient On-Device Learning
Title: Beyond Low-rank Decomposition: A Shortcut Approach for Efficient On-Device Learning | Jenseits der Low-Rank-Dekomposition: Ein Shortcut-Ansatz für effizientes On-Device-Lernen | 超越低级别分解:高效在线学习的捷径方法 2505.05086v2 |
Authors (4): Le-Trung Nguyen, Ael Quelennec, Van-Tam Nguyen, Enzo Tartaglione
On-device learning has emerged as a promising direction for AI development, particularly because of its potential to reduce latency issues and mitigate privacy risks associated with device-server communication, while improving energy efficiency. Despite these advantages, significant memory and computational constraints still represent major challenges for its deployment. Drawing on previous studies on low-rank decomposition methods that address activation memory bottlenecks in backpropagation, we propose a novel shortcut approach as an alternative. Our analysis and experiments demonstrate that our method can reduce activation memory usage, even up to $120.09\times$ compared to vanilla training, while also reducing overall training FLOPs up to $1.86\times$ when evaluated on traditional benchmarks.
尽管存在这些优势,但重大的记忆和计算限制仍然是部署这些优势的主要挑战。我们利用以前关于低级别分解方法的研究,解决反向通信中激活记忆瓶颈的问题,提出了新的捷径方法,作为替代方案。我们的分析和实验表明,我们的方法可以减少激活记忆的使用,甚至比香草培训减少120.09美元,同时在评估传统基准时,将总体培训FLOP减少至186美元。
Article 96
Title@2025-07-24 (4): A general language model for peptide identification
Title: A general language model for peptide identification | Ein allgemeines Sprachmodell für die Peptididentifikation | 铅化物识别通用语言模式 2502.15610v4 |
Authors (8): Jixiu Zhai, Tianchi Lu, Haitian Zhong, Ziyang Xu, Yuhuan Liu, Shengrui Xu, Jingwan Wang, Dan Huang
Accurate identification of bioactive peptides (BPs) and protein post-translational modifications (PTMs) is essential for understanding protein function and advancing therapeutic discovery. However, most computational methods remain limited in their generalizability across diverse peptide functions. Here, we present PDeepPP, a unified deep learning framework that integrates pretrained protein language models with a hybrid transformer-convolutional architecture, enabling robust identification across diverse peptide classes and PTM sites. We curated comprehensive benchmark datasets and implemented strategies to address data imbalance, allowing PDeepPP to systematically extract both global and local sequence features. Through extensive analyses-including dimensionality reduction and comparison studies-PDeepPP demonstrates strong, interpretable peptide representations and achieves state-of-the-art performance in 25 of the 33 biological identification tasks. Notably, PDeepPP attains high accuracy in antimicrobial (0.9726) and phosphorylation site (0.9984) identification, with 99.5% specificity in glycosylation site prediction and substantial reduction in false negatives in antimalarial tasks. By enabling large-scale, accurate peptide analysis, PDeepPP supports biomedical research and the discovery of novel therapeutic targets for disease treatment. All code, datasets, and pretrained models are publicly available via GitHub:https://github.com/fondress/PDeepPP and Hugging Face:https://huggingface.co/fondress/PDeppPP.
精确地识别生物活性浸泡物(BPs)和蛋白质翻译后修改(PTMs)对于理解蛋白质功能和推进治疗发现至关重要,然而,大多数计算方法在多种浸泡功能中仍然有限。在这里,我们介绍了PDepPP,这是一个统一的深层次学习框架,将经过预先训练的蛋白语言模型与混合变压器-革命结构相结合,使得能够强有力地识别各种浸泡物类和PTM站点。我们制定了全面的基准数据集,并实施了解决数据不平衡的战略,使PEepP能够系统地提取全球和地方序列特征。通过广泛的分析(包括维度减少和比较研究-PDepeptePP)显示其强健、可解释的peptide表示方式,并在33项生物识别任务中的25项中实现最先进的蛋白性表现。 值得注意的是,PIPP在抗微生物学(0.9726)和磷酸化网站(0.9984)的识别方法,在血压变异点的预测中达到99.5%的特性,并在防疟中大量减少不实的负面任务。
Article 97
Title@2025-07-24 (4): ChronoSelect: Robust Learning with Noisy Labels via Dynamics Temporal Memory
Title: ChronoSelect: Robust Learning with Noisy Labels via Dynamics Temporal Memory | ChronoSelect: Robustes Lernen mit lauten Etiketten über Dynamics Temporal Memory | ChronoSect: 通过动态时空内存与新标签进行强力学习 2507.18183v1 |
Authors (5): Jianchao Wang, Qingfeng Li, Pengcheng Zheng, Xiaorong Pu, Yazhou Ren
Training deep neural networks on real-world datasets is often hampered by the presence of noisy labels, which can be memorized by over-parameterized models, leading to significant degradation in generalization performance. While existing methods for learning with noisy labels (LNL) have made considerable progress, they fundamentally suffer from static snapshot evaluations and fail to leverage the rich temporal dynamics of learning evolution. In this paper, we propose ChronoSelect (chrono denoting its temporal nature), a novel framework featuring an innovative four-stage memory architecture that compresses prediction history into compact temporal distributions. Our unique sliding update mechanism with controlled decay maintains only four dynamic memory units per sample, progressively emphasizing recent patterns while retaining essential historical knowledge. This enables precise three-way sample partitioning into clean, boundary, and noisy subsets through temporal trajectory analysis and dual-branch consistency. Theoretical guarantees prove the mechanism’s convergence and stability under noisy conditions. Extensive experiments demonstrate ChronoSelect’s state-of-the-art performance across synthetic and real-world benchmarks.
对现实世界数据集的深层神经网络的培训往往受到噪音标签的阻碍,这些标签可以通过过度参数化的模型进行记忆,从而导致一般性能的显著退化。虽然使用噪音标签的现有学习方法(LNL)取得了相当大的进展,但它们基本上受到静态快照评估的影响,未能利用学习演变的丰富时间动态。在本文中,我们提议ChronoSelect(chrono detno devention its Time tyal situal),这是一个具有创新的四阶段记忆结构的新框架,将历史预测压缩成紧凑的时间分布。我们独特的有控制衰败的滑动更新机制每个样本只保留四个动态的存储器,逐步强调最近的模式,同时保留基本的历史知识。这通过时间轨迹分析以及双曲线的一致性,使得精确的三向样本分割成为清洁、边界和噪音的子集。理论保证证明了该机制在噪音条件下的趋同和稳定性。大规模实验展示了CronoSweite在合成和现实世界基准下的最新性表现。
Article 98
Title@2025-07-24 (4): Statistical Runtime Verification for LLMs via Robustness Estimation
Title: Statistical Runtime Verification for LLMs via Robustness Estimation | Statistische Laufzeitprüfung für LLMs mittels Robustheitsschätzung | 通过强力估计法对LLMs进行统计运行时间校验 2504.17723v2 |
Authors (3): Natan Levy, Adiel Ashrov, Guy Katz
Adversarial robustness verification is essential for ensuring the safe deployment of Large Language Models (LLMs) in runtime-critical applications. However, formal verification techniques remain computationally infeasible for modern LLMs due to their exponential runtime and white-box access requirements. This paper presents a case study adapting and extending the RoMA statistical verification framework to assess its feasibility as an online runtime robustness monitor for LLMs in black-box deployment settings. Our adaptation of RoMA analyzes confidence score distributions under semantic perturbations to provide quantitative robustness assessments with statistically validated bounds. Our empirical validation against formal verification baselines demonstrates that RoMA achieves comparable accuracy (within 1\% deviation), and reduces verification times from hours to minutes. We evaluate this framework across semantic, categorial, and orthographic perturbation domains. Our results demonstrate RoMA’s effectiveness for robustness monitoring in operational LLM deployments. These findings point to RoMA as a potentially scalable alternative when formal methods are infeasible, with promising implications for runtime verification in LLM-based systems.
对于确保将大语言模型(LLMs)安全地用于运行中的关键时间应用程序来说,对自动稳健性核查至关重要;然而,由于现代LMs的快速运行时间和白箱访问要求,正式的核查技术在计算上仍然无法对现代LMs具有适用性;本文件介绍了一个案例研究,以调整和扩大RoMA统计核查框架,评估其在黑箱部署环境中作为LLMs在线运行时间稳健性监测的可行性;我们对RoMA的调整,在语义扰动下分析信任分数分布情况,以提供具有统计验证界限的量化稳健性评估;我们对照正式核查基线进行的经验验证表明,RoMA的准确性相当(偏差1),并将核查时间从小时缩短到分钟;我们从语义、分类和方位扰动领域评估这一框架。我们的结果显示,在应用LMMs部署中,RoMA对稳健性监测的有效性。这些结果指出,在正式方法行不通的情况下,RoMA是一种可能推广的替代方法,对LM系统进行实时核查具有潜在影响。
Article 99
Title@2025-07-24 (4): SDSC:A Structure-Aware Metric for Semantic Signal Representation Learning
Title: SDSC:A Structure-Aware Metric for Semantic Signal Representation Learning | SDSC:A Structure-Aware Metric for Semantic Signal Representative Learning | SDSC:用于语义信号代言学习的结构-孔径计量仪 2507.14516v2 |
Authors (2): Jeyoung Lee, Hochul Kang
We propose the Signal Dice Similarity Coefficient (SDSC), a structure-aware metric function for time series self-supervised representation learning. Most Self-Supervised Learning (SSL) methods for signals commonly adopt distance-based objectives such as mean squared error (MSE), which are sensitive to amplitude, invariant to waveform polarity, and unbounded in scale. These properties hinder semantic alignment and reduce interpretability. SDSC addresses this by quantifying structural agreement between temporal signals based on the intersection of signed amplitudes, derived from the Dice Similarity Coefficient (DSC).Although SDSC is defined as a structure-aware metric, it can be used as a loss by subtracting from 1 and applying a differentiable approximation of the Heaviside function for gradient-based optimization. A hybrid loss formulation is also proposed to combine SDSC with MSE, improving stability and preserving amplitude where necessary. Experiments on forecasting and classification benchmarks demonstrate that SDSC-based pre-training achieves comparable or improved performance over MSE, particularly in in-domain and low-resource scenarios. The results suggest that structural fidelity in signal representations enhances the semantic representation quality, supporting the consideration of structure-aware metrics as viable alternatives to conventional distance-based methods.
我们提议使用信号极相近系数(SDSC),这是对时间序列自监督的代表学习的一种结构认知度功能,用于时间序列自我监督的代表学习。大多数自学(SSL)信号的方法通常采用基于距离的目标,例如平均正方差(MSE),这些对振幅敏感,对波形极差不易,而且规模上没有限制。这些特性妨碍语义一致性和减少可解释性。SDSC通过量化基于来自Dice相近系数(DSC)的已签字数字交叉点的时间信号之间的结构协议来解决这个问题。虽然SDSC的定义是结构自觉的衡量标准,但SDSC通常采用从1中减去,对梯度优化适用海维端功能的不同近似值,从而可以用作一种损失。还提议采用混合损失配方,将SDSC与MSE结合起来,提高稳定性,必要时保持振荡性。基于预测和分类基准的实验表明,SDSC培训前的成绩与MSE相比是可比的或改进的,特别是在多端和低层代表(DSC),但SDSC被定义为结构结构结构,支持了常规代表的可靠结构。建议提高结构结构结构。
Article 100
Title@2025-07-24 (4): GeoAvatar: Adaptive Geometrical Gaussian Splatting for 3D Head Avatar
Title: GeoAvatar: Adaptive Geometrical Gaussian Splatting for 3D Head Avatar | GeoAvatar: Adaptive geometrische Gaussian Splatting für 3D-Kopf Avatar | GeoAvatar: 3D Avatar 头的适应性几何高山喷涂 2507.18155v1 |
Authors (5): SeungJun Moon, Hah Min Lew, Seungeun Lee, Ji-Su Kang, Gyeong-Moon Park
Despite recent progress in 3D head avatar generation, balancing identity preservation, i.e., reconstruction, with novel poses and expressions, i.e., animation, remains a challenge. Existing methods struggle to adapt Gaussians to varying geometrical deviations across facial regions, resulting in suboptimal quality. To address this, we propose GeoAvatar, a framework for adaptive geometrical Gaussian Splatting. GeoAvatar leverages Adaptive Pre-allocation Stage (APS), an unsupervised method that segments Gaussians into rigid and flexible sets for adaptive offset regularization. Then, based on mouth anatomy and dynamics, we introduce a novel mouth structure and the part-wise deformation strategy to enhance the animation fidelity of the mouth. Finally, we propose a regularization loss for precise rigging between Gaussians and 3DMM faces. Moreover, we release DynamicFace, a video dataset with highly expressive facial motions. Extensive experiments show the superiority of GeoAvatar compared to state-of-the-art methods in reconstruction and novel animation scenarios.
尽管最近在3D头骨变形方面有所进展,但身份保护平衡,即重建,以新的面容和表达方式,即动画,仍然是一项挑战。现有的方法很难使高斯人适应面部区域不同的几何偏差,导致质量不理想。为了解决这个问题,我们提议GeoAvatar,一个适应性几何高斯面板框架。GeoAvatar利用适应性前定位阶段(APS)这一不受监督的方法,使高斯人成为适应性调整的僵硬和灵活套件。然后,根据口腔解剖和动态,我们推出一种新口结构和部分变形战略,以加强口腔的动动真伪性。最后,我们提议对高斯人和3DMM脸部面部的精确调整进行正规化损失。此外,我们发布了具有高度表情动作的视频数据集“动态法斯”。广泛的实验显示,GeoAvatar的优势与重建和新动动动动画情景中的最新方法相比。
Article 101
Title@2025-07-24 (4): When Noisy Labels Meet Class Imbalance on Graphs: A Graph Augmentation Method with LLM and Pseudo Label
Title: When Noisy Labels Meet Class Imbalance on Graphs: A Graph Augmentation Method with LLM and Pseudo Label | Wenn geräuschvolle Etiketten die Klassenungleichgewichte auf Graphen treffen: Eine grafische Augmentationsmethode mit LLM und Pseudo-Label | 当噪音标签在图表上达到类平衡时:与LLM和Pseudo标签的图表放大法 2507.18153v1 |
Authors (6): Riting Xia, Rucong Wang, Yulin Liu, Anchen Li, Xueyan Liu, Yan Zhang
Class-imbalanced graph node classification is a practical yet underexplored research problem. Although recent studies have attempted to address this issue, they typically assume clean and reliable labels when processing class-imbalanced graphs. This assumption often violates the nature of real-world graphs, where labels frequently contain noise. Given this gap, this paper systematically investigates robust node classification for class-imbalanced graphs with noisy labels. We propose GraphALP, a novel Graph Augmentation framework based on Large language models (LLMs) and Pseudo-labeling techniques. Specifically, we design an LLM-based oversampling method to generate synthetic minority nodes, producing label-accurate minority nodes to alleviate class imbalance. Based on the class-balanced graphs, we develop a dynamically weighted pseudo-labeling method to obtain high-confidence pseudo labels to reduce label noise ratio. Additionally, we implement a secondary LLM-guided oversampling mechanism to mitigate potential class distribution skew caused by pseudo labels. Experimental results show that GraphALP achieves superior performance over state-of-the-art methods on class-imbalanced graphs with noisy labels.
分类平衡的图形节点分类是一个实用的、但探索不足的研究问题。 尽管最近的研究试图解决这个问题, 它们通常在处理类平衡的图表时假定使用清洁和可靠的标签。 这一假设往往违反真实世界图的性质, 标签经常含有噪音。 鉴于这一差距, 本文系统地调查使用噪音标签的分类平衡图的稳健节点分类。 我们提议了基于大语言模型( LLLM) 和 Pseudo 标签技术的新颖的图表强化框架GreatALP 。 具体地说, 我们设计了一种基于 LLM 的过度采样方法来生成合成少数群体节点, 产生标签精确的少数群体节点来缓解类不平衡。 根据分类平衡图, 我们开发了一种动态加权假标签方法, 以获得高自信的假标签, 以降低标签的噪音比率。 此外, 我们实施了二级LMM 指导的过度采样机制, 以缓解由假标签造成的可能等级分布。 实验结果显示, GIAPALP在类平衡性图上, 优于州- Artal- lagal- lags。
Article 102
Title@2025-07-24 (4): Robust Non-adaptive Group Testing under Errors in Group Membership Specifications
Title: Robust Non-adaptive Group Testing under Errors in Group Membership Specifications | Robuste, nicht adaptive Gruppenprüfung unter Fehlern in den Gruppenmitgliedschaftsspezifikationen | 根据集团成员类别规格错误进行强力非适应性小组测试 2409.05345v2 |
Authors (4): Shuvayan Banerjee, Radhendushka Srivastava, James Saunderson, Ajit Rajwade
Given $p$ samples, each of which may or may not be defective, group testing (GT) aims to determine their defect status by performing tests on $n < p$ `groups’, where a group is formed by mixing a subset of the $p$ samples. Assuming that the number of defective samples is very small compared to $p$, GT algorithms have provided excellent recovery of the status of all $p$ samples with even a small number of groups. Most existing methods, however, assume that the group memberships are accurately specified. This assumption may not always be true in all applications, due to various resource constraints. Such errors could occur, eg, when a technician, preparing the groups in a laboratory, unknowingly mixes together an incorrect subset of samples as compared to what was specified. We develop a new GT method, the Debiased Robust Lasso Test Method (DRLT), that handles such group membership specification errors. The proposed DRLT method is based on an approach to debias, or reduce the inherent bias in, estimates produced by Lasso, a popular and effective sparse regression technique. We also provide theoretical upper bounds on the reconstruction error produced by our estimator. Our approach is then combined with two carefully designed hypothesis tests respectively for (i) the identification of defective samples in the presence of errors in group membership specifications, and (ii) the identification of groups with erroneous membership specifications. The DRLT approach extends the literature on bias mitigation of statistical estimators such as the LASSO, to handle the important case when some of the measurements contain outliers, due to factors such as group membership specification errors. We present numerical results which show that our approach outperforms several baselines and robust regression techniques for identification of defective samples as well as erroneously specified groups.
以美元为单位,每个样本都可能有缺陷或可能没有缺陷,因此,团体测量(GT)的目的是通过对美元 < p$`组’进行测试来确定它们的缺陷状况,因为“组”是一组通过混合美元样品的子集组成的。假设有缺陷的样品数量与美元美元相比非常小,GT算法为所有p美元样品的状况提供了极好的恢复,即使有少量的组别,但大多数现有方法都假定该组成员是准确的。由于各种资源限制,这一假设可能并非在所有应用中都是正确的。例如,当一个技术人员在实验室准备这些组时,这种错误会发生,与美元样品的一组混杂在一起,而与美元比较的样品不正确。我们开发了新的GT方法,即Debised Robust Lasso测试方法(DRLTT方法),处理这类组成员规格错误。拟议的DRT方法基于一种降低误差的方法,或减少由Lasso产生的内在偏差,即精确的精确和精确的回归技术。我们作为成员资格测试中某些组的数值,我们后来的精确的精确的精确的精确的标定的标,我们作为成员的标定的标值的标值检验方法。我们分别提供。 我们的排序的顺序的顺序的标值检验方法,我们作为两个的标值的标值的标值的标值的标值的标值的标值的标值的标值的标值,我们分别提供了一种精确的标值的标值的标值的标值的标值的标值的标值,我们的标值的标值的标值的标值的标值。
Article 103
Title@2025-07-24 (4): Neuromorphic Computing for Embodied Intelligence in Autonomous Systems: Current Trends, Challenges, and Future Directions
Title: Neuromorphic Computing for Embodied Intelligence in Autonomous Systems: Current Trends, Challenges, and Future Directions | Neuromorphes Computing für körpereigene Intelligenz in autonomen Systemen: Aktuelle Trends, Herausforderungen und Zukunftsrichtungen | 自治区内渗透情报的神经元化计算:当前趋势、挑战和未来方向 2507.18139v1 |
Authors (2): Alberto Marchisio, Muhammad Shafique
The growing need for intelligent, adaptive, and energy-efficient autonomous systems across fields such as robotics, mobile agents (e.g., UAVs), and self-driving vehicles is driving interest in neuromorphic computing. By drawing inspiration from biological neural systems, neuromorphic approaches offer promising pathways to enhance the perception, decision-making, and responsiveness of autonomous platforms. This paper surveys recent progress in neuromorphic algorithms, specialized hardware, and cross-layer optimization strategies, with a focus on their deployment in real-world autonomous scenarios. Special attention is given to event-based dynamic vision sensors and their role in enabling fast, efficient perception. The discussion highlights new methods that improve energy efficiency, robustness, adaptability, and reliability through the integration of spiking neural networks into autonomous system architectures. We integrate perspectives from machine learning, robotics, neuroscience, and neuromorphic engineering to offer a comprehensive view of the state of the field. Finally, emerging trends and open challenges are explored, particularly in the areas of real-time decision-making, continual learning, and the development of secure, resilient autonomous systems.
日益需要智能、适应性和节能自主系统,如机器人、移动剂(如无人驾驶飞行器)和自驾车辆等各个领域的智能、适应性和节能自主系统,这促使人们对神经突变计算产生兴趣。通过从生物神经系统的灵感,神经突变方法为增强自主平台的感知、决策和反应能力提供了有希望的途径。本文对神经变形算法、专门硬件和跨层优化战略的最新进展进行了调查,重点是在现实世界自主情景中的部署。特别关注基于事件的动态视觉传感器及其在促成快速、高效感知方面的作用。讨论突出了通过将螺旋神经网络纳入自主系统结构来提高能效、稳健性、适应性和可靠性的新方法。我们综合了机器学习、机器人、神经科学和神经变形工程的观点,以全面了解实地状况。最后,探讨了新出现的趋势和公开挑战,特别是在实时决策、持续学习以及发展安全、有弹性的自主系统等领域。
Article 104
Title@2025-07-24 (4): DAA*: Deep Angular A Star for Image-based Path Planning
Title: DAA*: Deep Angular A Star for Image-based Path Planning | DAA*: Deep Angular Ein Stern für bildbasierte Pfadplanung | DAA*:基于图像的路径规划深角A星 2507.09305v3 |
Authors (1): Zhiwei Xu
Path smoothness is often overlooked in path imitation learning from expert demonstrations. In this paper, we introduce a novel learning method, termed deep angular A* (DAA), by incorporating the proposed path angular freedom (PAF) into A to improve path similarity through adaptive path smoothness. The PAF aims to explore the effect of move angles on path node expansion by finding the trade-off between their minimum and maximum values, allowing for high adaptiveness for imitation learning. DAA* improves path optimality by closely aligning with the reference path through joint optimization of path shortening and smoothing, which correspond to heuristic distance and PAF, respectively. Throughout comprehensive evaluations on 7 datasets, including 4 maze datasets, 2 video-game datasets, and a real-world drone-view dataset containing 2 scenarios, we demonstrate remarkable improvements of our DAA* over neural A* in path similarity between the predicted and reference paths with a shorter path length when the shortest path is plausible, improving by 9.0% SPR, 6.9% ASIM, and 3.9% PSIM. Furthermore, when jointly learning pathfinding with both path loss and path probability map loss, DAA* significantly outperforms the state-of-the-art TransPath by 6.3% SPR, 6.0% PSIM, and 3.7% ASIM. We also discuss the minor trade-off between path optimality and search efficiency where applicable. Our code and model weights are available at https://github.com/zwxu064/DAAStar.git.
在专家演示的模拟学习中,路路样的光滑常常被忽略。 在本文中,我们引入了一种新型学习方法,称为深角A* (DAA),将拟议的角自由路径(PAF)纳入A,通过适应性路径平滑,改善路径的相似性。PAF旨在探索路径节点扩展上移动角度的效果,找到最小值和最大值之间的取舍,允许高适应性模仿学习。DAA* 通过联合优化缩短路径和平滑路径,分别与超光度距离和PAF(DAA)相匹配,从而改进了路径的最佳性。在7个数据集的全面评估中,包括4个迷你数据集、2个视频游戏数据集,以及包含2个假景情景的真实世界无人机视图数据集,我们展示了我们DA 相对于神经A* 在预测和参考模式之间的路径相似性,在最短路径看似的情况下,通过9.0%的改进,6.9%的SPRIM和3.9%的 PSIM* 的全面评价,在共同研究路径上,S-SIM-ral-ral-ral-ral-ral-ral-ral-ral-r-ral-ral-ral-ral-ral-ral-ral-ral-ral-ral-ral-r-r-r-rb_
Article 105
Title@2025-07-24 (4): TOC-UCO: a comprehensive repository of tabular ordinal classification datasets
Title: TOC-UCO: a comprehensive repository of tabular ordinal classification datasets | TOC-UCO: ein umfassendes Repository von tabellarischen Klassifikationsdatensätzen | TOC-UCO:表格格式分类数据集综合储存库 2507.17348v2 |
Authors (6): Rafael Ayllón-Gavilán, David Guijo-Rubio, Antonio Manuel Gómez-Orellana, Francisco Bérchez-Moreno, Víctor Manuel Vargas-Yun, Pedro A. Gutiérrez
An ordinal classification (OC) problem corresponds to a special type of classification characterised by the presence of a natural order relationship among the classes. This type of problem can be found in a number of real-world applications, motivating the design and development of many ordinal methodologies over the last years. However, it is important to highlight that the development of the OC field suffers from one main disadvantage: the lack of a comprehensive set of datasets on which novel approaches to the literature have to be benchmarked. In order to approach this objective, this manuscript from the University of C'ordoba (UCO), which have previous experience on the OC field, provides the literature with a publicly available repository of tabular data for a robust validation of novel OC approaches, namely TOC-UCO (Tabular Ordinal Classification repository of the UCO). Specifically, this repository includes a set of $46$ tabular ordinal datasets, preprocessed under a common framework and ensured to have a reasonable number of patterns and an appropriate class distribution. We also provide the sources and preprocessing steps of each dataset, along with details on how to benchmark a novel approach using the TOC-UCO repository. For this, indices for $30$ different randomised train-test partitions are provided to facilitate the reproducibility of the experiments.
星系分类(OC)问题相当于一种特殊类型的分类,其特点是各类别之间存在自然秩序关系,这种类型的问题见于若干现实世界应用中,在过去几年中鼓励设计和开发许多正统方法,但必须强调,OC字段的发展存在一个主要缺点:缺乏一套综合的数据集,必须据此对文献采取新的方法作为基准;为了实现这一目标,C’o Crdoba大学(UCO)的这份手稿在OC领域具有以往的经验,为文献提供了一个公开的表格数据储存库,用于有力验证新的OC方法,即TOC-UCO(UCO卫星分类储存库)。具体地说,这个储存库包括一套46美元的表格或正本数据集,在共同框架内预先处理,并确保有合理数量的模式和适当的分类分配。我们还提供了每一数据集的来源和预处理步骤,以及如何为使用TRAC-O-CO的模型来作为基准,从而为使用TRAC-RO-S-SARVAR提供该模型的随机性模型。
Article 106
Title@2025-07-24 (4): Maximizing Prefix-Confidence at Test-Time Efficiently Improves Mathematical Reasoning
Title: Maximizing Prefix-Confidence at Test-Time Efficiently Improves Mathematical Reasoning | Maximierung von Prefix-Konfidenz bei Test-Time verbessert mathematische Reasoning effizient | 使试验时间有效改进数学理由的预设信息最大化 2507.18122v1 |
Authors (4): Matthias Otth, Jonas Hübotter, Ido Hakimi, Andreas Krause
Recent work has shown that language models can self-improve by maximizing their own confidence in their predictions, without relying on external verifiers or reward signals. In this work, we study the test-time scaling of language models for mathematical reasoning tasks, where the model’s own confidence is used to select the most promising attempts. Surprisingly, we find that we can achieve significant performance gains by continuing only the most promising attempt, selected by the model’s prefix-confidence. We systematically evaluate prefix-confidence scaling on five mathematical reasoning datasets: the school-level GSM8K and MATH500, and the competition-level AMC23, AIME24, and AIME25. We find that prefix-confidence scaling with prefixes of only 32 tokens achieves a better accuracy-compute trade-off than majority voting. Moreover, prefix-confidence scaling appears less susceptible than BoN to length biases. Finally, we also evaluate test-time training with prefix-confidence and find that, while outperforming the base model, it does not improve over prefix-confidence scaling.
最近的工作表明,语言模型可以自我改善,不依靠外部核查或奖励信号,最大限度地提高自己对预测的信心。在这项工作中,我们研究数学推理任务的语言模型测试时间的尺度,模型自己的信心被用于选择最有希望的尝试。令人惊讶的是,我们发现,我们只能通过继续最有希望的尝试才能取得显著的业绩收益,而这种尝试是由模型的前置信心所选择的。我们系统地评估五个数学推理数据集的测试时间培训:GSM8K和MATH500,以及竞争等级的AMC23、AIME24和AIME25。我们发现,只有32个符号的前置信心,其前置信心的缩放比多数票的缩放更准确。此外,前置信心的缩放似乎比BoN更不易受到长期的偏差。最后,我们还评估具有前置信心的测试时间培训,并发现,虽然比基本模型的缩放得更好,但不会超过前置信心的缩放。
Article 107
Title@2025-07-24 (4): A Survey of Deep Learning for Geometry Problem Solving
Title: A Survey of Deep Learning for Geometry Problem Solving | Eine Umfrage über Deep Learning zur Lösung von Geometrieproblemen | 解决几何问题深层学习调查 2507.11936v3 |
Authors (3): Jianzhe Ma, Wenxuan Wang, Qin Jin
Geometry problem solving is a key area of mathematical reasoning, which is widely involved in many important fields such as education, mathematical ability assessment of artificial intelligence, and multimodal ability assessment. In recent years, the rapid development of deep learning technology, especially the rise of multimodal large language models, has triggered a widespread research boom. This paper provides a survey of the applications of deep learning in geometry problem solving, including (i) a comprehensive summary of the relevant tasks in geometry problem solving; (ii) a thorough review of related deep learning methods; (iii) a detailed analysis of evaluation metrics and methods; and (iv) a critical discussion of the current challenges and future directions that can be explored. Our goal is to provide a comprehensive and practical reference of deep learning for geometry problem solving to promote further developments in this field. We create a continuously updated list of papers on GitHub: https://github.com/majianz/dl4gps.
解决几何问题是数学推理的一个关键领域,它广泛涉及许多重要领域,例如教育、人工智能数学能力评估和多式联运能力评估。近年来,深层次学习技术的迅速发展,特别是多式联运大型语言模型的兴起,引发了广泛的研究繁荣。本文调查了深层次学习在解决几何问题方面的应用,包括:(一) 全面概述几何问题解决中的相关任务;(二) 彻底审查相关的深层次学习方法;(三) 详细分析评价指标和方法;(四) 批判性地讨论目前的挑战和今后可探讨的方向。我们的目标是为解决几何问题的深层次学习提供全面和实用的参考,以促进该领域的进一步发展。我们不断更新关于GitHub的文件清单:https://github.com/majianz/dl4gps。
Article 108
Title@2025-07-24 (4): VCDiag: Classifying Erroneous Waveforms for Failure Triage Acceleration
Title: VCDiag: Classifying Erroneous Waveforms for Failure Triage Acceleration | VCDiag: Klassifizierende Erroneous-Wellenformen für Ausfall-Triage-Beschleunigung | VCDiag: 失灵千兆字节加速不规则波形分类 2506.03590v3 |
Authors (7): Minh Luu, Surya Jasper, Khoi Le, Evan Pan, Michael Quinn, Aakash Tyagi, Jiang Hu
Failure triage in design functional verification is critical but time-intensive, relying on manual specification reviews, log inspections, and waveform analyses. While machine learning (ML) has improved areas like stimulus generation and coverage closure, its application to RTL-level simulation failure triage, particularly for large designs, remains limited. VCDiag offers an efficient, adaptable approach using VCD data to classify failing waveforms and pinpoint likely failure locations. In the largest experiment, VCDiag achieves over 94% accuracy in identifying the top three most likely modules. The framework introduces a novel signal selection and statistical compression approach, achieving over 120x reduction in raw data size while preserving features essential for classification. It can also be integrated into diverse Verilog/SystemVerilog designs and testbenches.
设计功能核查中的故障分类十分关键,但需要时间,依靠人工规格审查、日志检查和波形分析。虽然机器学习(ML)改善了刺激生成和覆盖关闭等领域,但其在RTL级模拟故障分类中的应用仍然有限,特别是对于大型设计而言。VCDiag利用VCD数据提供了一个高效、适应性的方法,使用VCD数据对故障的波形进行分类和确定可能的故障地点。在最大的实验中,VCDiag在确定最可能的三个模块时实现了94%的准确性。这个框架引入了一种新的信号选择和统计压缩方法,实现了原始数据尺寸的120x减少,同时保留了分类所必需的特征。它也可以被纳入不同的Verilog/SystemVerilog设计和测试箱中。
Article 109
Title@2025-07-24 (4): Generalizing Adam to Manifolds for Efficiently Training Transformers
Title: Generalizing Adam to Manifolds for Efficiently Training Transformers | Verallgemeinern von Adam zu Manifolds für effizientes Training Transformers | 将亚当推广为高效率培训变换器的处理器 2305.16901v4 |
Authors (1): Benedikt Brantner
One of the primary reasons behind the success of neural networks has been the emergence of an array of new, highly-successful optimizers, perhaps most importantly the Adam optimizer. It is widely used for training neural networks, yet notoriously hard to interpret. Lacking a clear physical intuition, Adam is difficult to generalize to manifolds. Some attempts have been made to directly apply parts of the Adam algorithm to manifolds or to find an underlying structure, but a full generalization has remained elusive. In this work a new approach is presented that leverages the special structure of the manifolds which are relevant for optimization of neural networks, such as the Stiefel manifold, the symplectic Stiefel manifold and the Grassmann manifold: all of these are homogeneous spaces and as such admit a global tangent space representation - a common vector space (Lie subspace) in which all tangent spaces can easily be represented. This global tangent space representation is used to perform all of the steps in the Adam optimizer and we are able to fully generalize the optimizer to manifolds without a projection step. The resulting algorithm is then applied to train a transformer for which orthogonality constraints are enforced up to machine precision and we observe significant speed-ups in the training process.
神经网络成功的主要原因之一是出现了一系列新的、高度成功的优化优化器,也许最重要的是亚当优化器。它被广泛用于培训神经网络,但臭名昭著地难以解释。缺乏明确的物理直觉,亚当难以概括成多种。一些尝试试图将亚当算法的某些部分直接应用于元件或寻找一个基础结构,但完全的概括化仍然难以实现。在这项工作中,提出了一种新办法,利用与神经网络优化相关的各种元件的特殊结构,如Stiefel 元件、静默式Stiefel 元件和格拉斯曼元件:所有这些空间都是同质的,因此都接受全球相近空间代表——一个共同的矢量空间(Lie 子空间),所有相近的空间都可以很容易被代表。这种全球相近空间代表用于执行亚当优化器的所有步骤,我们可以将优化器完全的元件综合到没有投影步骤。随后产生的算法被用来培养一个显著的变压速度,从而测量机器的变压或变压速度。
Article 110
Title@2025-07-24 (4): A Two-armed Bandit Framework for A/B Testing
Title: A Two-armed Bandit Framework for A/B Testing | Ein zweiarmiges Bandit-Framework für A/B-Tests | A/B测试有两武装的土匪框架 2507.18118v1 |
Authors (5): Jinjuan Wang, Qianglin Wen, Yu Zhang, Xiaodong Yan, Chengchun Shi
A/B testing is widely used in modern technology companies for policy evaluation and product deployment, with the goal of comparing the outcomes under a newly-developed policy against a standard control. Various causal inference and reinforcement learning methods developed in the literature are applicable to A/B testing. This paper introduces a two-armed bandit framework designed to improve the power of existing approaches. The proposed procedure consists of three main steps: (i) employing doubly robust estimation to generate pseudo-outcomes, (ii) utilizing a two-armed bandit framework to construct the test statistic, and (iii) applying a permutation-based method to compute the $p$-value. We demonstrate the efficacy of the proposed method through asymptotic theories, numerical experiments and real-world data from a ridesharing company, showing its superior performance in comparison to existing methods.
A/B测试在现代技术公司广泛用于政策评价和产品部署,目的是将新制订的政策下的成果与标准控制进行比较,文献中开发的各种因果推论和强化学习方法适用于A/B测试,本文件介绍了一个两武装土匪框架,目的是提高现有方法的力量,拟议程序包括三个主要步骤:(一) 采用加倍有力的估计来产生假结果,(二) 使用两武装土匪框架来构建测试统计数字,(三) 采用基于固定法的方法计算美元价值,我们通过一个搭载公司提供的无药理论、数字实验和真实世界数据,展示拟议方法的功效,显示其优于现有方法。
Article 111
Title@2025-07-24 (4): The Impact of Pseudo-Science in Financial Loans Risk Prediction
Title: The Impact of Pseudo-Science in Financial Loans Risk Prediction | Die Auswirkungen von Pseudo-Science auf die Risikovorhersage von Finanzkrediten | 假科学对金融贷款风险预测的影响 2507.16182v2 |
Authors (2): Bruno Scarone, Ricardo Baeza-Yates
We study the societal impact of pseudo-scientific assumptions for predicting the behavior of people in a straightforward application of machine learning to risk prediction in financial lending. This use case also exemplifies the impact of survival bias in loan return prediction. We analyze the models in terms of their accuracy and social cost, showing that the socially optimal model may not imply a significant accuracy loss for this downstream task. Our results are verified for commonly used learning methods and datasets. Our findings also show that there is a natural dynamic when training models that suffer survival bias where accuracy slightly deteriorates, and whose recall and precision improves with time. These results act as an illusion, leading the observer to believe that the system is getting better, when in fact the model is suffering from increasingly more unfairness and survival bias.
我们研究假科学假设的社会影响,以直接应用机器学习来预测金融贷款的风险预测,预测人们的行为。这个使用案例还表明贷款回报预测中生存偏差的影响。我们从准确性和社会成本的角度分析模型,表明社会最佳模式并不意味着这一下游任务会损失大量准确性。我们的结果通过常用的学习方法和数据集来核实。我们的研究结果还表明,当培训模型在生存偏差中遭遇到准确性稍有恶化、其回溯和精确度随着时间而提高时,就会有一种自然的动态。这些结果是一种幻觉,使观察家相信系统正在好转,而事实上,当模型正遭受越来越不公平和生存偏差的影响时。
Article 112
Title@2025-07-24 (4): On the Approximation of Stationary Processes using the ARMA Model
Title: On the Approximation of Stationary Processes using the ARMA Model | Zur Annäherung von stationären Prozessen mit dem ARMA-Modell | 使用ARMA模型的固定工艺接近情况 2408.10610v3 |
Authors (3): Anand Ganesh, Babhrubahan Bose, Anand Rajagopalan
We revisit an old problem related to Autoregressive Moving Average (ARMA) models, on quantifying and bounding the approximation error between a true stationary process $X_t$ and an ARMA model $Y_t$. We take the transfer function representation of an ARMA model and show that the associated $L^{\infty}$ norm provides a valid alternate norm that controls the $L^2$ norm and has structural properties comparable to the cepstral norm. We show that a certain subspace of stationary processes, which includes ARMA models, forms a Banach algebra under the $L^{\infty}$ norm that respects the group structure of $H^{\infty}$ transfer functions. The natural definition of invertibility in this algebra is consistent with the original definition of ARMA invertibility, and generalizes better to non-ARMA processes than Wiener’s $\ell^1$ condition. Finally, we calculate some explicit approximation bounds in the simpler context of continuous transfer functions, and critique some heuristic ideas on Pad'e approximations and parsimonious models.
我们重新审视了与自动递减平均移动(ARMA)模型有关的一个老问题,即如何量化和约束真实固定过程$x_t$和ARMA模型$Y_t$之间的近似误差。我们采用ARMA模型的转移函数代表,并表明相关的$Linfty}美元规范提供了一个有效的替代规范,控制了$L2美元规范,并具有与 cepstral 规范相类似的结构属性。我们发现,包括ARMA模型在内的固定过程的某个子空间,在$Linfty}$的规范下形成一个Banach代数,以尊重$Hinfty}$的组合结构转移函数。这一代数中的不可翻转性自然定义与ARMA的原始不可转性定义相一致,并且比Wiener的$\ell_1美元条件更适用于非ARMA进程。最后,我们计算了一些更简单的连续转移功能背景下的明确的近似界限,并批评了Pad’近似值和parsicious模型上的一些超度概念。
Article 113
Title@2025-07-24 (4): Agentic AI framework for End-to-End Medical Data Inference
Title: Agentic AI framework for End-to-End Medical Data Inference | Agentische KI-Framework für Ende-zu-Ende medizinische Datenableitung | 最终至最终医疗数据推断的AA AA 框架框架 2507.18115v1 |
Authors (5): Soorya Ram Shimgekar, Shayan Vassef, Abhay Goyal, Navin Kumar, Koustuv Saha
Building and deploying machine learning solutions in healthcare remains expensive and labor-intensive due to fragmented preprocessing workflows, model compatibility issues, and stringent data privacy constraints. In this work, we introduce an Agentic AI framework that automates the entire clinical data pipeline, from ingestion to inference, through a system of modular, task-specific agents. These agents handle both structured and unstructured data, enabling automatic feature selection, model selection, and preprocessing recommendation without manual intervention. We evaluate the system on publicly available datasets from geriatrics, palliative care, and colonoscopy imaging. For example, in the case of structured data (anxiety data) and unstructured data (colonoscopy polyps data), the pipeline begins with file-type detection by the Ingestion Identifier Agent, followed by the Data Anonymizer Agent ensuring privacy compliance, where we first identify the data type and then anonymize it. The Feature Extraction Agent identifies features using an embedding-based approach for tabular data, extracting all column names, and a multi-stage MedGemma-based approach for image data, which infers modality and disease name. These features guide the Model-Data Feature Matcher Agent in selecting the best-fit model from a curated repository. The Preprocessing Recommender Agent and Preprocessing Implementor Agent then apply tailored preprocessing based on data type and model requirements. Finally, the ``Model Inference Agent” runs the selected model on the uploaded data and generates interpretable outputs using tools like SHAP, LIME, and DETR attention maps. By automating these high-friction stages of the ML lifecycle, the proposed framework reduces the need for repeated expert intervention, offering a scalable, cost-efficient pathway for operationalizing AI in clinical environments.
由于处理前工作流程支离破碎、模型兼容问题和严格的数据隐私限制,在保健领域建设和部署机器学习解决方案仍然昂贵和劳动密集型。在这项工作中,我们引入了一个代理AI框架,将整个临床数据管道从摄入到推断自动化,通过模块化、任务特定代理器系统,从吸收到推断。这些代理器处理结构化和非结构化数据,允许自动选择特征、模式选择和未经人工干预的预处理建议。我们评估了来自老年医学、缓和护理和结肠镜图像的公开数据集系统。例如,在结构化数据(焦虑数据)和无结构化的数据(结肠镜化聚谱数据数据数据数据数据)方面,我们引入了一种代理AI框架,通过接受感化识别器检测文件类型,确保隐私的合规性,我们首先确定数据类型,然后在不进行人工干预的情况下对数据进行匿名处理。我们使用基于嵌入模型的方法,提取所有专栏名称,以及基于多级MDGemma-emma的方法,用于图像数据数据流流数据,在模型中,并用最精选的机序流流流化数据流化数据格式和流化数据流化数据流化工具提供。
Article 114
Title@2025-07-24 (4): Nonconvex Optimization Framework for Group-Sparse Feedback Linear-Quadratic Optimal Control I: Penalty Approach
Title: Nonconvex Optimization Framework for Group-Sparse Feedback Linear-Quadratic Optimal Control I: Penalty Approach | Nonconvex Optimization Framework for Group-Spasse Feedback Linear-Quadratic Optimal Control I: Strafansatz | 群分反馈线性水量最佳最佳控制一:惩罚办法的优化框架 2507.18114v1 |
Authors (3): Lechen Feng, Xun Li, Yuan-Hua Ni
This paper develops a unified nonconvex optimization framework for the design of group-sparse feedback controllers in infinite-horizon linear-quadratic (LQ) problems. We address two prominent extensions of the classical LQ problem: the distributed LQ problem with fixed communication topology (DFT-LQ) and the sparse feedback LQ problem (SF-LQ), both of which are motivated by the need for scalable and structure-aware control in large-scale systems. Unlike existing approaches that rely on convex relaxations or are limited to block-diagonal structures, we directly formulate the controller synthesis as a finite-dimensional nonconvex optimization problem with group $\ell_0$-norm regularization, capturing general sparsity patterns. We establish a connection between DFT-LQ and SF-LQ problems, showing that both can be addressed within our unified framework. Furthermore, we propose a penalty-based proximal alternating linearized minimization (PALM) algorithm and provide a rigorous convergence analysis under mild assumptions, overcoming the lack of coercivity in the objective function. The proposed method admits efficient solvers for all subproblems and guarantees global convergence to critical points. Our results fill a key gap in the literature by enabling the direct design of group-sparse feedback gains with theoretical guarantees, without resorting to convex surrogates or restrictive structural assumptions.
本文开发了一个统一的非convex优化框架, 用于设计无限正弦线性赤道(LQ)问题中组分流反馈控制器。 我们处理经典LQ问题的两个显著扩展: 固定通信地形(DFT-LQ)的分布式LQ问题和微弱反馈LQ问题(SF-LQ), 两者的动机都是需要在大型系统中进行可缩放和结构认知的控制。 与依赖 convex 放松或限于块对角结构的现有方法不同, 我们直接将控制器合成作为有限维非convex优化问题, 与 $\ ell_ 0$- 诺姆的正规化组问题相提并论, 捕捉到一般垃圾模式的模式模式。 我们把DFT- LQ和SF-LQ之间的分散式反馈LQ问题联系起来, 表明这两种问题都可以在我们的统一框架内解决。 此外, 我们提议基于惩罚的准准偏向线性最小最小化最小化算法(PALM)的算法, 并在简单假设下提供严格的趋同分析, 克服目标函数缺乏共度的共度, 。 提议的精度的精度 , 将核心的精度的精度的精度 的精度的精度的精度假设中, 将精度的精度的精度的精度的精度的精度的精度的精度的精度的精度的精度的精度的精度的精度的精度的精度的精度的精度的精度 。
Article 115
Title@2025-07-24 (4): Policy Disruption in Reinforcement Learning:Adversarial Attack with Large Language Models and Critical State Identification
Title: Policy Disruption in Reinforcement Learning:Adversarial Attack with Large Language Models and Critical State Identification | Politische Disruption bei der Stärkung des Lernens:Umgekehrter Angriff mit großen Sprachmodellen und kritischer Zustandsidentifikation | 强化学习方面的政策混乱:以大语言模式和关键状态识别进行反向攻击 2507.18113v1 |
Authors (5): Junyong Jiang, Buwei Tian, Chenxing Xu, Songze Li, Lu Dong
Reinforcement learning (RL) has achieved remarkable success in fields like robotics and autonomous driving, but adversarial attacks designed to mislead RL systems remain challenging. Existing approaches often rely on modifying the environment or policy, limiting their practicality. This paper proposes an adversarial attack method in which existing agents in the environment guide the target policy to output suboptimal actions without altering the environment. We propose a reward iteration optimization framework that leverages large language models (LLMs) to generate adversarial rewards explicitly tailored to the vulnerabilities of the target agent, thereby enhancing the effectiveness of inducing the target agent toward suboptimal decision-making. Additionally, a critical state identification algorithm is designed to pinpoint the target agent’s most vulnerable states, where suboptimal behavior from the victim leads to significant degradation in overall performance. Experimental results in diverse environments demonstrate the superiority of our method over existing approaches.
强化学习(RL)在机器人和自主驾驶等领域取得了显著成功,但旨在误导RL系统的对抗性攻击仍然具有挑战性。现有方法往往依赖于改变环境或政策,限制其实用性。本文建议了一种对抗性攻击方法,即环境中的现有行为主体指导目标政策,在不改变环境的情况下产生次优行动。我们提议了一个奖励性迭代优化框架,利用大型语言模型(LLLMs)产生明确针对目标主体脆弱性的对抗性奖励,从而增强引导目标主体做出不最佳决策的实效。此外,一个关键的国家识别算法旨在确定目标主体最脆弱的状态,在这些状态下,受害人的次优行为导致总体表现的显著退化。不同环境中的实验结果显示了我们方法优于现有方法。
Article 116
Title@2025-07-24 (4): Percentile-Based Deep Reinforcement Learning and Reward Based Personalization For Delay Aware RAN Slicing in O-RAN
Title: Percentile-Based Deep Reinforcement Learning and Reward Based Personalization For Delay Aware RAN Slicing in O-RAN | Prozentual basierte Deep-Verstärkung-Lernen und Belohnung basierte Personalisierung für Delay Aware RAN Slicing in O-RAN | 在O-RAN为延迟了解RAN切片而进行百分百分率深强化学习和奖励性个人化 2507.18111v1 |
Authors (2): Peyman Tehrani, Anas Alsoliman
In this paper, we tackle the challenge of radio access network (RAN) slicing within an open RAN (O-RAN) architecture. Our focus centers on a network that includes multiple mobile virtual network operators (MVNOs) competing for physical resource blocks (PRBs) with the goal of meeting probabilistic delay upper bound constraints for their clients while minimizing PRB utilization. Initially, we derive a reward function based on the law of large numbers (LLN), then implement practical modifications to adapt it for real-world experimental scenarios. We then propose our solution, the Percentile-based Delay-Aware Deep Reinforcement Learning (PDA-DRL), which demonstrates its superiority over several baselines, including DRL models optimized for average delay constraints, by achieving a 38\% reduction in resultant average delay. Furthermore, we delve into the issue of model weight sharing among multiple MVNOs to develop a robust personalized model. We introduce a reward-based personalization method where each agent prioritizes other agents’ model weights based on their performance. This technique surpasses traditional aggregation methods, such as federated averaging, and strategies reliant on traffic patterns and model weight distance similarities.
在本文中,我们处理无线电接入网络在开放的RAN(O-RAN)架构内剪切的难题。我们的焦点集中在一个包括多个移动虚拟网络运营商(MVNOs)的网络上,该网络竞相争夺有形资源区块(PRBs),目的是应对其客户的概率延迟上限限制,同时尽量减少PRB的利用率。我们最初根据大量法律(LLN)获得奖励功能,然后实施实际的调整,以适应现实世界的实验情景。然后我们提出我们的解决办法,即基于百分位的延迟软件深强化学习(PDA-DRL),它显示其优于若干基线,包括为平均延迟制约而优化的DRL模型,其方法是在平均延迟方面实现38的优化。此外,我们探讨了多MVNOs之间的模型权重共享问题,以开发一个强大的个性化模型。我们采用了一种以奖励为基础的个性化方法,使每个代理商根据自己的性能优先选择其他代理商的模型重量。这一技术超越了传统的汇总方法,例如节率平均率和对交通量和距离模型的相似性战略。
Article 117
Title@2025-07-24 (4): A New Pair of GloVes
Title: A New Pair of GloVes | Ein neues Paar GloVes | 新的地球之对 2507.18103v1 |
Authors (3): Riley Carlson, John Bauer, Christopher D. Manning
This report documents, describes, and evaluates new 2024 English GloVe (Global Vectors for Word Representation) models. While the original GloVe models built in 2014 have been widely used and found useful, languages and the world continue to evolve and we thought that current usage could benefit from updated models. Moreover, the 2014 models were not carefully documented as to the exact data versions and preprocessing that were used, and we rectify this by documenting these new models. We trained two sets of word embeddings using Wikipedia, Gigaword, and a subset of Dolma. Evaluation through vocabulary comparison, direct testing, and NER tasks shows that the 2024 vectors incorporate new culturally and linguistically relevant words, perform comparably on structural tasks like analogy and similarity, and demonstrate improved performance on recent, temporally dependent NER datasets such as non-Western newswire data.
本报告文件、描述和评价了新的2024年英文GloVe(全球语言代言人)模型。2014年建立的原GloVe模型已被广泛使用,并被认为有用,但语言和世界继续演变,我们认为当前使用可受益于更新模型。此外,2014年模型没有仔细记录使用的确切数据版本和预处理,我们通过记录这些新模型来纠正这一点。我们用Wikipedia、Gigaword和Dolma的子集来培训了两套单词嵌入。通过词汇比较、直接测试和NER任务进行的评估表明,2024年的矢量含有新的文化和语言相关词汇,在类比和相似性等结构任务上可比较,并展示了近期具有时间依赖的NER数据集(如非西方新闻线数据)的性能。
Article 118
Title@2025-07-24 (4): Comparison of Segmentation Methods in Remote Sensing for Land Use Land Cover
Title: Comparison of Segmentation Methods in Remote Sensing for Land Use Land Cover | Vergleich der Segmentierungsmethoden bei der Fernerkundung für die Bodenbedeckung | 土地利用、土地利用的变化和林业遥感遥感 分路方法比较 2507.18099v1 |
Authors (5): Naman Srivastava, Joel D Joy, Yash Dixit, Swarup E, Rakshit Ramesh
Land Use Land Cover (LULC) mapping is essential for urban and resource planning, and is one of the key elements in developing smart and sustainable cities.This study evaluates advanced LULC mapping techniques, focusing on Look-Up Table (LUT)-based Atmospheric Correction applied to Cartosat Multispectral (MX) sensor images, followed by supervised and semi-supervised learning models for LULC prediction. We explore DeeplabV3+ and Cross-Pseudo Supervision (CPS). The CPS model is further refined with dynamic weighting, enhancing pseudo-label reliability during training. This comprehensive approach analyses the accuracy and utility of LULC mapping techniques for various urban planning applications. A case study of Hyderabad, India, illustrates significant land use changes due to rapid urbanization. By analyzing Cartosat MX images over time, we highlight shifts such as urban sprawl, shrinking green spaces, and expanding industrial areas. This demonstrates the practical utility of these techniques for urban planners and policymakers.
土地使用覆盖(LULC)绘图对于城市和资源规划至关重要,是发展智能和可持续城市的关键要素之一。 本研究评估了先进的LULC绘图技术,重点是以LUT为基础的大气校正,适用于Cartosat多光谱(MX)传感器图像,随后是LULC预测的监督和半监督的学习模式。我们探索DeeplabV3+和交叉监督(CPS)。CPS模型以动态加权方式进一步完善,提高了培训过程中的假标签可靠性。这一综合方法分析了LULC绘图技术在各种城市规划应用中的准确性和实用性。印度海得拉巴的案例研究说明了快速城市化导致的土地使用变化。通过分析Cartosat MX图像,我们强调城市无计划、绿地缩小和扩大工业区等变化。这显示了这些技术对城市规划者和决策者的实际用途。
Article 119
Title@2025-07-24 (4): Learning from Hard Labels with Additional Supervision on Non-Hard-Labeled Classes
Title: Learning from Hard Labels with Additional Supervision on Non-Hard-Labeled Classes | Lernen von Hardlabels mit zusätzlicher Überwachung auf nicht-Hard-Label-Klassen | 学习从硬标签中学习,对非黑、黑、黑、有附加监督的非黑、 黑、 黑、 黑、 黑、 黑、 黑、 黑、 黑、 黑、 黑、 黑、 黑、 黑、 黑、 黑、 黑、 黑、 黑 2507.18098v1 |
Authors (2): Kosuke Sugiyama, Masato Uchida
In scenarios where training data is limited due to observation costs or data scarcity, enriching the label information associated with each instance becomes crucial for building high-accuracy classification models. In such contexts, it is often feasible to obtain not only hard labels but also {\it additional supervision}, such as the confidences for the hard labels. This setting naturally raises fundamental questions: {\it What kinds of additional supervision are intrinsically beneficial?} And {\it how do they contribute to improved generalization performance?} To address these questions, we propose a theoretical framework that treats both hard labels and additional supervision as probability distributions, and constructs soft labels through their affine combination. Our theoretical analysis reveals that the essential component of additional supervision is not the confidence score of the assigned hard label, but rather the information of the distribution over the non-hard-labeled classes. Moreover, we demonstrate that the additional supervision and the mixing coefficient contribute to the refinement of soft labels in complementary roles. Intuitively, in the probability simplex, the additional supervision determines the direction in which the deterministic distribution representing the hard label should be adjusted toward the true label distribution, while the mixing coefficient controls the step size along that direction. Through generalization error analysis, we theoretically characterize how the additional supervision and its mixing coefficient affect both the convergence rate and asymptotic value of the error bound. Finally, we experimentally demonstrate that, based on our theory, designing additional supervision can lead to improved classification accuracy, even when utilized in a simple manner.
在培训数据因观察成本或数据稀缺而受到限制的情况下,丰富与每个实例相关的标签信息对于建立高准确性分类模型至关重要。 在这样的情况下,获取不仅硬性标签而且额外的监管通常都是可行的,例如对硬性标签的信任度。这种设置自然提出了根本性问题:什么类型的额外监督具有内在好处?}和它们如何有助于改进一般化绩效?}为了解决这些问题,我们提出了一个理论框架,既将硬性标签和额外的监督作为概率分布处理,又通过它们的灵巧组合构建软性标签。在这种背景下,我们理论分析表明,额外监督的基本组成部分不是指定硬性标签的可信度,而是非硬性标签类别分布的信息。此外,我们证明,额外的监督和混合系数有助于改进软性标签的互补性性能。在可能性方面,额外的监督可以确定一个方向,即硬性标签的可靠性分布应该调整到硬性标签的准确性分布,并通过其组合组合组合来建立软性标签。我们的理论分析表明,在设计真实性标签的理论性分配过程中,我们如何调整其进一步的浓度,同时将系数控制,我们如何测量其最终的浓度率,我们如何调整。
Article 120
Title@2025-07-24 (4): Long-Short Distance Graph Neural Networks and Improved Curriculum Learning for Emotion Recognition in Conversation
Title: Long-Short Distance Graph Neural Networks and Improved Curriculum Learning for Emotion Recognition in Conversation | Lang-Short-Distanz Graph Neural Networks und verbessertes Curriculum-Lernen für Emotionserkennung im Gespräch | 长短距离远距神经神经网络和改进课程学习,以在对话中认识情感 2507.15205v2 |
Authors (3): Xinran Li, Xiujuan Xu, Jiaqi Qiao
Emotion Recognition in Conversation (ERC) is a practical and challenging task. This paper proposes a novel multimodal approach, the Long-Short Distance Graph Neural Network (LSDGNN). Based on the Directed Acyclic Graph (DAG), it constructs a long-distance graph neural network and a short-distance graph neural network to obtain multimodal features of distant and nearby utterances, respectively. To ensure that long- and short-distance features are as distinct as possible in representation while enabling mutual influence between the two modules, we employ a Differential Regularizer and incorporate a BiAffine Module to facilitate feature interaction. In addition, we propose an Improved Curriculum Learning (ICL) to address the challenge of data imbalance. By computing the similarity between different emotions to emphasize the shifts in similar emotions, we design a “weighted emotional shift” metric and develop a difficulty measurer, enabling a training process that prioritizes learning easy samples before harder ones. Experimental results on the IEMOCAP and MELD datasets demonstrate that our model outperforms existing benchmarks.
交流中情感认知(ERC)是一项实际而具有挑战性的任务。本文件提出了一种新型的多式联运方法,即长短距离图像神经网络(LSDGN) 。基于直接环形图(DAG),它构建了一个长距离平面神经网络和一个短距离平面神经网络,以获得相距遥远和相近言论的多式特征。为了确保长距离和短距离特征在代表性上尽可能不同,同时能够使两个模块之间产生相互影响,我们使用一个差异调节器,并纳入一个比阿芬模块,以促进特征互动。此外,我们提出一个改进课程学习(ICL),以应对数据不平衡的挑战。通过计算不同情感之间的相似性以强调类似情感的转变,我们设计了一个“加权情感转变”指标,并开发一个困难测量器,使培训过程能够优先学习较难的样本。IEMOCAP和MELD数据集的实验结果表明,我们的模型超过了现有的基准。
Article 121
Title@2025-07-24 (4): LLM Web Dynamics: Tracing Model Collapse in a Network of LLMs
Title: LLM Web Dynamics: Tracing Model Collapse in a Network of LLMs | LLM Web Dynamics: Aufspüren eines Modellkollapses in einem Netzwerk von LLMs | LLM 网络动态:追踪在LLM网络中的模型崩溃情况 2506.15690v3 |
Authors (4): Tianyu Wang, Akira Horiguchi, Lingyou Pang, Carey E. Priebe
The increasing use of synthetic data from the public Internet has enhanced data usage efficiency in large language model (LLM) training. However, the potential threat of model collapse remains insufficiently explored. Existing studies primarily examine model collapse in a single model setting or rely solely on statistical surrogates. In this work, we introduce LLM Web Dynamics (LWD), an efficient framework for investigating model collapse at the network level. By simulating the Internet with a retrieval-augmented generation (RAG) database, we analyze the convergence pattern of model outputs. Furthermore, we provide theoretical guarantees for this convergence by drawing an analogy to interacting Gaussian Mixture Models.
越来越多地使用公共互联网的合成数据提高了大语言模型培训的数据使用效率,然而,模型崩溃的潜在威胁仍未得到充分探讨。现有研究主要在单一模型设置中研究模型崩溃问题,或者仅仅依靠统计代孕。在这项工作中,我们引入了LLM网络动态系统(LWD),这是在网络一级调查模型崩溃问题的有效框架。我们通过一个检索和升级的生成数据库模拟互联网,分析了模型产出的趋同模式。此外,我们通过类推互动的高山混合模型,为这种趋同提供了理论保障。
Article 122
Title@2025-07-24 (4): A Principled Approach for Data Bias Mitigation
Title: A Principled Approach for Data Bias Mitigation | Ein prinzipieller Ansatz für Daten-Bias-Minderung | 减轻数据偏见的原则办法 2405.12312v4 |
Authors (4): Bruno Scarone, Alfredo Viola, Renée J. Miller, Ricardo Baeza-Yates
The widespread use of machine learning and data-driven algorithms for decision making has been steadily increasing over many years. \emph{Bias} in the data can adversely affect this decision-making. We present a new mitigation strategy to address data bias. Our methods are explainable and come with mathematical guarantees of correctness. They can take advantage of new work on table discovery to find new tuples that can be added to a dataset to create real datasets that are unbiased or less biased. Our framework covers data with non-binary labels and with multiple sensitive attributes. Hence, we are able to measure and mitigate bias that does not appear over a single attribute (or feature), but only intersectionally, when considering a combination of attributes. We evaluate our techniques on publicly available datasets and provide a theoretical analysis of our results, highlighting novel insights into data bias.
多年来,在决策中广泛使用机器学习和数据驱动算法的情况一直在稳步增加。在数据中, \ emph{Bias} 会对决策产生不利影响。 我们提出了一个新的缓解战略,以解决数据偏差问题。 我们的方法是可以解释的,并且具有数学上的正确性保证。 他们可以利用表格发现的新工作找到新的图例,可以在数据集中添加新的图例,以创建公正或较少偏差的真实数据集。 我们的框架覆盖了非二进制标签和多重敏感属性的数据。 因此,我们能够测量和减少不出现在单一属性(或特征)上的偏差,但在考虑各种属性组合时只能相互交织。 我们评估了我们关于公开数据集的技术,并对我们的结果进行理论分析,突出了对数据偏差的新洞察力。
Article 123
Title@2025-07-24 (4): Compliant Residual DAgger: Improving Real-World Contact-Rich Manipulation with Human Corrections
Title: Compliant Residual DAgger: Improving Real-World Contact-Rich Manipulation with Human Corrections | Compliant Residual DAgger: Verbesserung der Real-World Kontakt-Rich-Manipulation mit menschlichen Korrekturen | 共同残存挖掘者:改进现实世界接触-Rich 人教管管管 2506.16685v2 |
Authors (4): Xiaomeng Xu, Yifan Hou, Zeyi Liu, Shuran Song
We address key challenges in Dataset Aggregation (DAgger) for real-world contact-rich manipulation: how to collect informative human correction data and how to effectively update policies with this new data. We introduce Compliant Residual DAgger (CR-DAgger), which contains two novel components: 1) a Compliant Intervention Interface that leverages compliance control, allowing humans to provide gentle, accurate delta action corrections without interrupting the ongoing robot policy execution; and 2) a Compliant Residual Policy formulation that learns from human corrections while incorporating force feedback and force control. Our system significantly enhances performance on precise contact-rich manipulation tasks using minimal correction data, improving base policy success rates by over 50\% on two challenging tasks (book flipping and belt assembly) while outperforming both retraining-from-scratch and finetuning approaches. Through extensive real-world experiments, we provide practical guidance for implementing effective DAgger in real-world robot learning tasks. Result videos are available at: https://compliant-residual-dagger.github.io/
我们解决了用于现实世界接触程度高的操纵的数据集聚合(Dagger)中的关键挑战:如何收集信息丰富的人类校正数据,以及如何以新数据有效地更新政策。我们引入了Complient 遗留式评估器(CR-Dagger),它包含两个新颖组成部分:1)一个复合干预界面,利用合规控制,使人类能够提供温和、准确的三角洲行动校正,而同时又不干扰正在进行的机器人政策执行;2)一个复合残余政策制定,既学习人类校正,又结合武力反馈和武力控制。我们的系统大大提高了精确的超超链接操作性能,利用最小校正数据,将两项具有挑战性的任务(翻转和带组装)的基本政策成功率提高50以上,同时超越了从转接和微调方法。我们通过广泛的现实世界实验,为在现实世界机器人学习任务中实施有效的Dagger提供了实用指导。结果视频可在以下网址上提供:https://confons-residual-redual-daguger.gitub.github.io/ 。
Article 124
Title@2025-07-24 (4): Fine-Tuned Language Models Generate Stable Inorganic Materials as Text
Title: Fine-Tuned Language Models Generate Stable Inorganic Materials as Text | Feinangepasste Sprachmodelle erzeugen stabile anorganische Materialien als Text | 精精精导语言模型生成稳定无机材料作为文本 2402.04379v2 |
Authors (6): Nate Gruver, Anuroop Sriram, Andrea Madotto, Andrew Gordon Wilson, C. Lawrence Zitnick, Zachary Ulissi
We propose fine-tuning large language models for generation of stable materials. While unorthodox, fine-tuning large language models on text-encoded atomistic data is simple to implement yet reliable, with around 90% of sampled structures obeying physical constraints on atom positions and charges. Using energy above hull calculations from both learned ML potentials and gold-standard DFT calculations, we show that our strongest model (fine-tuned LLaMA-2 70B) can generate materials predicted to be metastable at about twice the rate (49% vs 28%) of CDVAE, a competing diffusion model. Because of text prompting’s inherent flexibility, our models can simultaneously be used for unconditional generation of stable material, infilling of partial structures and text-conditional generation. Finally, we show that language models’ ability to capture key symmetries of crystal structures improves with model scale, suggesting that the biases of pretrained LLMs are surprisingly well-suited for atomistic data.
我们建议微调大型语言模型以生成稳定材料。 虽然在文本编码的原子数据中,非正统、微调大型语言模型简单易行,但可靠,大约90%的抽样结构在原子位置和电荷上服从物理限制。利用从学到的ML潜力和金质标准DFT计算得出的高于船体的能量计算,我们显示我们最强的模型(fine-dord LalaMA-2-70B)可以产生材料,预测材料的可转换率约为CDVAE(CDVAE)的两倍左右(49%对28 % ) , 这是一种相互竞争的传播模型。由于文本催化的内在灵活性,我们的模型可以同时用于无条件生成稳定材料、填充部分结构和文本条件生成。 最后,我们表明语言模型捕捉晶体结构关键对称的能力随着模型规模的提高,表明受过训练的LMMS的偏差对于原子学数据来说是惊人的。
Article 125
Title@2025-07-24 (4): Compressed and distributed least-squares regression: convergence rates with applications to Federated Learning
Title: Compressed and distributed least-squares regression: convergence rates with applications to Federated Learning | Komprimierte und verteilte am wenigsten quadratische Regression: Konvergenzraten mit Anwendungen für Federated Learning | 压缩和分布的最小平方回归:与应用到联邦学习的趋同率 2308.01358v2 |
Authors (2): Constantin Philippenko, Aymeric Dieuleveut
In this paper, we investigate the impact of compression on stochastic gradient algorithms for machine learning, a technique widely used in distributed and federated learning. We underline differences in terms of convergence rates between several unbiased compression operators, that all satisfy the same condition on their variance, thus going beyond the classical worst-case analysis. To do so, we focus on the case of least-squares regression (LSR) and analyze a general stochastic approximation algorithm for minimizing quadratic functions relying on a random field. We consider weak assumptions on the random field, tailored to the analysis (specifically, expected H"older regularity), and on the noise covariance, enabling the analysis of various randomizing mechanisms, including compression. We then extend our results to the case of federated learning. More formally, we highlight the impact on the convergence of the covariance $\mathfrak{C}{\mathrm{ania}}$ of the additive noise induced by the algorithm. We demonstrate despite the non-regularity of the stochastic field, that the limit variance term scales with $\mathrm{Tr}(\mathfrak{C}{\mathrm{ania}} H^{-1})/K$ (where $H$ is the Hessian of the optimization problem and $K$ the number of iterations) generalizing the rate for the vanilla LSR case where it is $\sigma^2 \mathrm{Tr}(H H^{-1}) / K = \sigma^2 d / K$ (Bach and Moulines, 2013). Then, we analyze the dependency of $\mathfrak{C}_{\mathrm{ania}}$ on the compression strategy and ultimately its impact on convergence, first in the centralized case, then in two heterogeneous FL frameworks.
在本文中, 我们调查压缩机器学习的平面梯度算法{ 平面梯度算法的影响 { 机器学习的平面梯度算法, 这是在分布式和联合学习中广泛使用的一种技术 。 我们强调几个不偏心压缩操作者之间在趋同率方面的差异, 这些操作者都满足了与其差异相同的条件, 从而超越了传统的最坏情况分析。 为了这样做, 我们集中关注最不平面回归( LSR) 案例, 分析一个随机字段中最小的平面缩影算法 。 我们考虑到随机字段( 具体来说, H\ “ older 常规 ” ) 的假设不力, 以及噪音变异性, 包括压缩。 我们然后将结果扩展至 federeral 学习的案例。 更正式地, 我们强调对最小差值 $\ frafrak\\ r\ gr\\ kr=maxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
Article 126
Title@2025-07-24 (4): History-Guided Video Diffusion
Title: History-Guided Video Diffusion | Geschichte-geführte Video-Diffusion | 历史引导视频传播 2502.06764v2 |
Authors (6): Kiwhan Song, Boyuan Chen, Max Simchowitz, Yilun Du, Russ Tedrake, Vincent Sitzmann
Classifier-free guidance (CFG) is a key technique for improving conditional generation in diffusion models, enabling more accurate control while enhancing sample quality. It is natural to extend this technique to video diffusion, which generates video conditioned on a variable number of context frames, collectively referred to as history. However, we find two key challenges to guiding with variable-length history: architectures that only support fixed-size conditioning, and the empirical observation that CFG-style history dropout performs poorly. To address this, we propose the Diffusion Forcing Transformer (DFoT), a video diffusion architecture and theoretically grounded training objective that jointly enable conditioning on a flexible number of history frames. We then introduce History Guidance, a family of guidance methods uniquely enabled by DFoT. We show that its simplest form, vanilla history guidance, already significantly improves video generation quality and temporal consistency. A more advanced method, history guidance across time and frequency further enhances motion dynamics, enables compositional generalization to out-of-distribution history, and can stably roll out extremely long videos. Project website: https://boyuan.space/history-guidance
无分类指导(CFG)是改进传播模型中有条件生成的关键技术,在提高样本质量的同时能够更准确地控制。将这一技术推广到视频传播是自然的,视频传播是自然的,以可变的上下文框架(统称为历史)为条件生成视频。然而,我们发现在以可变长历史为指导方面有两大挑战:仅支持固定尺寸调节的架构,以及CFG式历史辍学表现不佳的经验性观察。为了解决这个问题,我们提议了Difmulting Forninger(DFoT),一个视频传播架构和基于理论的培训目标,可以共同对一些灵活的历史框架进行调节。我们随后引入了历史指导,这是一套由DFoT独有的辅助指导方法。我们展示了它的最简单的形式,即Vanilla历史指导,已经大大改善了视频生成质量和时间一致性。一个更先进的方法、历史指导时间和频率进一步增强运动的动态,使组成一般化能够超越分配历史,并且可以刺动极长的视频。项目网站:https://boyuperuan.space/histrytal
Article 127
Title@2025-07-24 (4): Squeeze10-LLM: Squeezing LLMs’ Weights by 10 Times via a Staged Mixed-Precision Quantization Method
Title: Squeeze10-LLM: Squeezing LLMs’ Weights by 10 Times via a Staged Mixed-Precision Quantization Method | Squeeze10-LLM: Gewichte der LLMs um 10 Mal durch eine stufenweise gemischte Präzisionsquantifizierung | Squeze10-LLLM:通过分阶段混合精密量化方法用10 Times挤压LLMs的重量 2507.18073v1 |
Authors (12): Qingcheng Zhu, Yangyang Ren, Linlin Yang, Mingbao Lin, Yanjing Li, Sheng Xu, Zichao Feng, Haodong Zhu, Yuguang Yang, Juan Zhang, Runqi Wang, Baochang Zhang
Deploying large language models (LLMs) is challenging due to their massive parameters and high computational costs. Ultra low-bit quantization can significantly reduce storage and accelerate inference, but extreme compression (i.e., mean bit-width <= 2) often leads to severe performance degradation. To address this, we propose Squeeze10-LLM, effectively “squeezing” 16-bit LLMs’ weights by 10 times. Specifically, Squeeze10-LLM is a staged mixed-precision post-training quantization (PTQ) framework and achieves an average of 1.6 bits per weight by quantizing 80% of the weights to 1 bit and 20% to 4 bits. We introduce Squeeze10LLM with two key innovations: Post-Binarization Activation Robustness (PBAR) and Full Information Activation Supervision (FIAS). PBAR is a refined weight significance metric that accounts for the impact of quantization on activations, improving accuracy in low-bit settings. FIAS is a strategy that preserves full activation information during quantization to mitigate cumulative error propagation across layers. Experiments on LLaMA and LLaMA2 show that Squeeze10-LLM achieves state-of-the-art performance for sub-2bit weight-only quantization, improving average accuracy from 43% to 56% on six zero-shot classification tasks–a significant boost over existing PTQ methods. Our code will be released upon publication.
部署大型语言模型(LLMS)具有挑战性,因为其参数庞大,计算成本高。超低位量计可以大幅降低存储量和加速推导速度,但极端压缩(即平均比重=位元++2)往往会导致严重性能退化。为此,我们提议使用Squeze10-LLMM, 有效“挤压” 16位LMS的重量10倍。具体地说,Squeeze10-LLM是一个分阶段混合精度的训练后分类框架(PTQ),通过将80%的重量量化到1位和20%到4位来达到平均1.6位的重量。我们采用Squeze10LLLMM(Squenizion10-PIG),在SQIMA(SQILMA)的升级过程中,将全面提升到SQLMA(SQMA)的升级到SQLM(SQA)的累积性差。
Article 128
Title@2025-07-24 (4): C-AAE: Compressively Anonymizing Autoencoders for Privacy-Preserving Activity Recognition in Healthcare Sensor Streams
Title: C-AAE: Compressively Anonymizing Autoencoders for Privacy-Preserving Activity Recognition in Healthcare Sensor Streams | C-AAE: Komprimierend anonymisierende Autoencoder für Datenschutz-Erhaltung Aktivitätserkennung in Healthcare Sensor Streams | C-AAE: 压缩匿名自动编码器,以便在保健感应器流中确认隐私保护活动 2507.18072v1 |
Authors (3): Ryusei Fujimoto, Yugo Nakamura, Yutaka Arakawa
Wearable accelerometers and gyroscopes encode fine-grained behavioural signatures that can be exploited to re-identify users, making privacy protection essential for healthcare applications. We introduce C-AAE, a compressive anonymizing autoencoder that marries an Anonymizing AutoEncoder (AAE) with Adaptive Differential Pulse-Code Modulation (ADPCM). The AAE first projects raw sensor windows into a latent space that retains activity-relevant features while suppressing identity cues. ADPCM then differentially encodes this latent stream, further masking residual identity information and shrinking the bitrate. Experiments on the MotionSense and PAMAP2 datasets show that C-AAE cuts user re-identification F1 scores by 10-15 percentage points relative to AAE alone, while keeping activity-recognition F1 within 5 percentage points of the unprotected baseline. ADPCM also reduces data volume by roughly 75 %, easing transmission and storage overheads. These results demonstrate that C-AAE offers a practical route to balancing privacy and utility in continuous, sensor-based activity recognition for healthcare.
我们引入了C-AAE,这是一个与匿名自动编码器(AAE)结合的压缩匿名自动编码器(AAE),配有适应性差异脉冲-阴极调动(ADPCM),AAE首先将原始传感器窗口投入一个潜在空间,保留与活动相关的特征,同时抑制身份提示。ADPCM随后对这个潜在流进行了不同的编码,进一步掩蔽了剩余身份信息,并缩小了比特率。关于MtionSense和PAMAP2数据集的实验显示,C-AE将用户重新识别F1分数的比值单独减少10-15个百分点,同时将活动识别F1分保持在不受保护基线的5个百分点以内。ADPCM还将数据量减少约75%,缓解了传输和存储管理。这些结果表明,C-AE为持续平衡隐私和通用传感器活动提供了一条切实可行的途径。
Article 129
Title@2025-07-24 (4): Group Sequence Policy Optimization
Title: Group Sequence Policy Optimization | Optimierung der Gruppensequenzpolitik | 组序列政策优化 2507.18071v1 |
Authors (12): Chujie Zheng, Shixuan Liu, Mingze Li, Xiong-Hui Chen, Bowen Yu, Chang Gao, Kai Dang, Yuqiong Liu, Rui Men, An Yang, Jingren Zhou, Junyang Lin
This paper introduces Group Sequence Policy Optimization (GSPO), our stable, efficient, and performant reinforcement learning algorithm for training large language models. Unlike previous algorithms that adopt token-level importance ratios, GSPO defines the importance ratio based on sequence likelihood and performs sequence-level clipping, rewarding, and optimization. We demonstrate that GSPO achieves superior training efficiency and performance compared to the GRPO algorithm, notably stabilizes Mixture-of-Experts (MoE) RL training, and has the potential for simplifying the design of RL infrastructure. These merits of GSPO have contributed to the remarkable improvements in the latest Qwen3 models.
本文件介绍群体序列政策优化(GSPO),这是我们为培训大型语言模式而采用的稳定、高效和绩效强化学习算法,与以往采用象征性重要性比率的算法不同,PSPO根据序列概率确定重要性比率,并进行顺序剪切、奖赏和优化,我们证明,与GROP算法相比,PSPO实现了较高的培训效率和绩效,特别是稳定了Mixture-Experts(MOE)RL培训,并有可能简化RL基础设施的设计,PSPO的这些优点促进了最新的Quen3模型的显著改进。
Article 130
Title@2025-07-24 (4): BlockDialect: Block-wise Fine-grained Mixed Format Quantization for Energy-Efficient LLM Inference
Title: BlockDialect: Block-wise Fine-grained Mixed Format Quantization for Energy-Efficient LLM Inference | BlockDialekt: Blockweise feinkörnige Mischformat-Quantisierung für energieeffiziente LLM-Inferenz | BlockDiaect: 节能LLM 推论的粗件精细混合格式量化 2501.01144v5 |
Authors (2): Wonsuk Jang, Thierry Tambe
The rapidly increasing size of large language models (LLMs) presents significant challenges in memory usage and computational costs. Quantizing both weights and activations can address these issues, with hardware-supported fine-grained scaling emerging as a promising solution to mitigate outliers. However, existing methods struggle to capture nuanced block data distributions. We propose BlockDialect, a block-wise fine-grained mixed format technique that assigns a per-block optimal number format from a formatbook for better data representation. Additionally, we introduce DialectFP4, a formatbook of FP4 variants (akin to dialects) that adapt to diverse data distributions. To leverage this efficiently, we propose a two-stage approach for online DialectFP4 activation quantization. Importantly, DialectFP4 ensures energy efficiency by selecting representable values as scaled integers compatible with low-precision integer arithmetic. BlockDialect achieves 10.78% (7.48%) accuracy gain on the LLaMA3-8B (LLaMA2-7B) model compared to MXFP4 format with lower bit usage per data, while being only 5.45% (2.69%) below full precision even when quantizing full-path matrix multiplication. Focusing on how to represent over how to scale, our work presents a promising path for energy-efficient LLM inference.
大型语言模型(LLMS) 快速增长的大小在记忆使用和计算成本方面提出了重大挑战。 量化权重和激活都能够解决这些问题, 硬件支持的微微缩缩缩放正在形成一个有希望的缓解离子的解决方案。 但是, 现有的方法很难捕捉细块数据分布。 我们提议了BlockDiacle, 这是一种块式细微的混合格式技术, 从一个格式手册中为更好的数据代表性指定了每个区块的最佳数字格式。 此外, 我们引入了 Dialec FP4 4 格式手册, 一种适应不同数据分布的FP4变体( 类似方言的方言) 。 为了高效地利用这个方法, 我们建议了双阶段的方法, 用于在线的 DialectFP4 激活四倍的四分级化。 重要的是, DialectF4 能够确保能源效率, 选择可代表值为与低精度缩缩缩缩缩缩图相匹配的整整数值。 将LLLAMA3- 8B( LLMA2-7B) 的精度模型(LLMA2-7B) 与MFP4格式相比, 模型的精度模型的精度增长为MXFP-4格式, 将比小比小的精度格式, 显示为5.45- Plexmexmexmexmalmax, 的全缩图图图仅为5-x。
Article 131
Title@2025-07-24 (4): Multiscale Neural PDE Surrogates for Prediction and Downscaling: Application to Ocean Currents
Title: Multiscale Neural PDE Surrogates for Prediction and Downscaling: Application to Ocean Currents | Multiscale Neural PDE Surrogats für Vorhersage und Downscaling: Anwendung auf Meeresströmungen | 预测和缩小预测和缩小尺度的多尺度多神经PDE代号:对洋流的应用 2507.18067v1 |
Authors (4): Abdessamad El-Kabid, Loubna Benabbou, Redouane Lguensat, Alex Hernández-García
Accurate modeling of physical systems governed by partial differential equations is a central challenge in scientific computing. In oceanography, high-resolution current data are critical for coastal management, environmental monitoring, and maritime safety. However, available satellite products, such as Copernicus data for sea water velocity at ~0.08 degrees spatial resolution and global ocean models, often lack the spatial granularity required for detailed local analyses. In this work, we (a) introduce a supervised deep learning framework based on neural operators for solving PDEs and providing arbitrary resolution solutions, and (b) propose downscaling models with an application to Copernicus ocean current data. Additionally, our method can model surrogate PDEs and predict solutions at arbitrary resolution, regardless of the input resolution. We evaluated our model on real-world Copernicus ocean current data and synthetic Navier-Stokes simulation datasets.
在海洋学中,高分辨率的当前数据对沿海管理、环境监测和海洋安全至关重要,然而,现有的卫星产品,如哥白尼数据,用于海水速度的约0.08度空间分辨率和全球海洋模型,往往缺乏进行详细地方分析所需的空间颗粒。在这项工作中,我们(a) 引入一个有监督的深层次学习框架,以神经操作者为基础,解决PDEs和提供任意解析解决方案,以及(b) 提出对哥白尼海洋当前数据应用的降尺度模型。此外,我们的方法可以模拟代位式PDEs,并预测任意解析的解决方案,而不论投入分辨率如何。我们评估了我们关于现实世界哥白尼洋流数据和合成-斯图克斯模拟数据集的模式。
Article 132
Title@2025-07-24 (4): Fixing the Pitfalls of Probabilistic Time-Series Forecasting Evaluation by Kernel Quadrature
Title: Fixing the Pitfalls of Probabilistic Time-Series Forecasting Evaluation by Kernel Quadrature | Fixierung der Pitfalls der probabilistischen Zeitreihen-Prognosebewertung durch Kernel-Quadratur | 由内核二次曲线确定概率时间- 系列预测评价的空隙 2503.06079v2 |
Authors (3): Masaki Adachi, Masahiro Fujisawa, Michael A Osborne
Despite the significance of probabilistic time-series forecasting models, their evaluation metrics often involve intractable integrations. The most widely used metric, the continuous ranked probability score (CRPS), is a strictly proper scoring function; however, its computation requires approximation. We found that popular CRPS estimators–specifically, the quantile-based estimator implemented in the widely used GluonTS library and the probability-weighted moment approximation–both exhibit inherent estimation biases. These biases lead to crude approximations, resulting in improper rankings of forecasting model performance when CRPS values are close. To address this issue, we introduced a kernel quadrature approach that leverages an unbiased CRPS estimator and employs cubature construction for scalable computation. Empirically, our approach consistently outperforms the two widely used CRPS estimators.
尽管概率性的时间序列预测模型意义重大,但其评价指标往往涉及难以解决的整合。最广泛使用的指标,即连续排名概率评分(CRPS)是一个严格适当的评分功能;然而,其计算需要近似值。我们发现,广受使用的GluonTS图书馆使用的CPS估测器,以及概率加权的瞬间近似估测器,都显示出固有的估计偏差。这些偏差导致粗略的近似值,导致在CRPS值接近时对预测模型的性能进行不适当的排名。为了解决这一问题,我们引入了内核圈梯式方法,利用无偏倚的CRPS估测算器,并采用可缩放的构造来进行可缩放计算。我们的方法始终优于两种广泛使用的CRPS估测器。
Article 133
Title@2025-07-24 (4): Causally Testing Gender Bias in LLMs: A Case Study on Occupational Bias
Title: Causally Testing Gender Bias in LLMs: A Case Study on Occupational Bias | Causally Testing Gender Bias in LLMs: Eine Fallstudie über berufsbezogene Bias | 《LLMM中因果测试性别偏见:职业偏见案例研究》 2212.10678v4 |
Authors (5): Yuen Chen, Vethavikashini Chithrra Raghuram, Justus Mattern, Rada Mihalcea, Zhijing Jin
Generated texts from large language models (LLMs) have been shown to exhibit a variety of harmful, human-like biases against various demographics. These findings motivate research efforts aiming to understand and measure such effects. This paper introduces a causal formulation for bias measurement in generative language models. Based on this theoretical foundation, we outline a list of desiderata for designing robust bias benchmarks. We then propose a benchmark called OccuGender, with a bias-measuring procedure to investigate occupational gender bias. We test several state-of-the-art open-source LLMs on OccuGender, including Llama, Mistral, and their instruction-tuned versions. The results show that these models exhibit substantial occupational gender bias. Lastly, we discuss prompting strategies for bias mitigation and an extension of our causal formulation to illustrate the generalizability of our framework. Our code and data https://github.com/chenyuen0103/gender-bias.
从大型语言模型(LLMs)中生成的文字显示,对各种人口结构存在各种有害、人性的偏见,这些调查结果激发了旨在理解和衡量这些影响的研究工作;本文件介绍了在基因化语言模型中进行偏见衡量的因果表述;根据这一理论基础,我们概述了设计稳健的偏见基准的偏差清单;然后我们提出了一个称为Occu Gender的基准,并提出了调查职业性别偏见的偏见衡量程序;我们测试了包括Llama、Mistral在内的一些关于奥克库性别的先进开放源的开源LMS,及其经指导的版本;结果显示这些模型表现出严重的职业性别偏见;最后,我们讨论了如何推动减少偏见的战略,并扩展我们的因果关系表述,以说明我们框架的可概括性。我们的代码和数据是https://github.com/chenyuen0103/gender-bials。
Article 134
Title@2025-07-24 (4): A Multi-Faceted Evaluation Framework for Assessing Synthetic Data Generated by Large Language Models
Title: A Multi-Faceted Evaluation Framework for Assessing Synthetic Data Generated by Large Language Models | Ein Multi-Faceted-Evaluierungsrahmen für die Bewertung synthetischer Daten, erzeugt durch große Sprachmodelle | 评估由大语言模型生成的合成数据多面评价框架 2404.14445v2 |
Authors (3): Yefeng Yuan, Yuhong Liu, Liang Cheng
The rapid advancements in generative AI and large language models (LLMs) have opened up new avenues for producing synthetic data, particularly in the realm of structured tabular formats, such as product reviews. Despite the potential benefits, concerns regarding privacy leakage have surfaced, especially when personal information is utilized in the training datasets. In addition, there is an absence of a comprehensive evaluation framework capable of quantitatively measuring the quality of the generated synthetic data and their utility for downstream tasks. In response to this gap, we introduce SynEval, an open-source evaluation framework designed to assess the fidelity, utility, and privacy preservation of synthetically generated tabular data via a suite of diverse evaluation metrics. We validate the efficacy of our proposed framework - SynEval - by applying it to synthetic product review data generated by three state-of-the-art LLMs: ChatGPT, Claude, and Llama. Our experimental findings illuminate the trade-offs between various evaluation metrics in the context of synthetic data generation. Furthermore, SynEval stands as a critical instrument for researchers and practitioners engaged with synthetic tabular data,, empowering them to judiciously determine the suitability of the generated data for their specific applications, with an emphasis on upholding user privacy.
基因化的AI和大型语言模型(LLMS)的迅速发展为合成数据的生产开辟了新的途径,特别是在结构化的表格格式领域,如产品审查。尽管可能带来好处,但是对隐私泄漏的担忧已经浮现出来,特别是在培训数据集中利用个人信息的情况下。此外,缺乏能够量化计量所生成的合成数据质量及其在下游任务中的效用的综合评价框架。针对这一差距,我们引入了SynEval,这是一个开放源评价框架,目的是通过一套不同的评估指标,评估合成生成的表格数据的准确性、实用性和隐私性。我们验证了我们拟议框架——SynEval——的有效性,将它应用到三个最先进的LMS:ChatGPT、Claude和Llama综合数据的综合产品审查数据。我们的实验结果说明了在合成数据生成方面各种评价指标之间的取舍。此外,SynEval是从事合成表格数据的研究人员和从业人员的关键工具,使他们能明智地确定所生成的数据是否适合其具体应用,强调用户的隐私。
Article 135
Title@2025-07-24 (4): Privacy-Preserving Synthetic Review Generation with Diverse Writing Styles Using LLMs
Title: Privacy-Preserving Synthetic Review Generation with Diverse Writing Styles Using LLMs | Privacy-Preserving Synthetic Review Generation mit unterschiedlichen Schreibstilen mit LLMs | 使用LLMMs以多种写作风格生成的隐私-保护合成审查 2507.18055v1 |
Authors (6): Tevin Atwal, Chan Nam Tieu, Yefeng Yuan, Zhan Shi, Yuhong Liu, Liang Cheng
The increasing use of synthetic data generated by Large Language Models (LLMs) presents both opportunities and challenges in data-driven applications. While synthetic data provides a cost-effective, scalable alternative to real-world data to facilitate model training, its diversity and privacy risks remain underexplored. Focusing on text-based synthetic data, we propose a comprehensive set of metrics to quantitatively assess the diversity (i.e., linguistic expression, sentiment, and user perspective), and privacy (i.e., re-identification risk and stylistic outliers) of synthetic datasets generated by several state-of-the-art LLMs. Experiment results reveal significant limitations in LLMs’ capabilities in generating diverse and privacy-preserving synthetic data. Guided by the evaluation results, a prompt-based approach is proposed to enhance the diversity of synthetic reviews while preserving reviewer privacy.
虽然合成数据为实际世界数据提供了一种成本效益高、可扩展的替代方法,以便利模式培训,但其多样性和隐私风险仍未得到充分探讨。我们以基于文本的合成数据为重点,提出了一套综合指标,用于定量评估多种合成数据(即语言表达、情绪和用户视角)和若干最新水平的LLM生成的合成数据集的隐私(即重新识别风险和外星体)。实验结果表明,LLMS生成多样性和隐私保护合成数据的能力存在重大限制。在评价结果的指导下,建议采取迅速依据的办法,在保护审查人的隐私的同时,加强合成审查的多样性。
Article 136
Title@2025-07-24 (4): Unisoma: A Unified Transformer-based Solver for Multi-Solid Systems
Title: Unisoma: A Unified Transformer-based Solver for Multi-Solid Systems | Unisoma: Ein Unified Transformer-basierter Solver für Multi-Solid-Systeme | Unisoma:多层系统统一变压器解决方案 2506.06021v2 |
Authors (5): Shilong Tao, Zhe Feng, Haonan Sun, Zhanxing Zhu, Yunhuai Liu
Multi-solid systems are foundational to a wide range of real-world applications, yet modeling their complex interactions remains challenging. Existing deep learning methods predominantly rely on implicit modeling, where the factors influencing solid deformation are not explicitly represented but are instead indirectly learned. However, as the number of solids increases, these methods struggle to accurately capture intricate physical interactions. In this paper, we introduce a novel explicit modeling paradigm that incorporates factors influencing solid deformation through structured modules. Specifically, we present Unisoma, a unified and flexible Transformer-based model capable of handling variable numbers of solids. Unisoma directly captures physical interactions using contact modules and adaptive interaction allocation mechanism, and learns the deformation through a triplet relationship. Compared to implicit modeling techniques, explicit modeling is more well-suited for multi-solid systems with diverse coupling patterns, as it enables detailed treatment of each solid while preventing information blending and confusion. Experimentally, Unisoma achieves consistent state-of-the-art performance across seven well-established datasets and two complex multi-solid tasks. Code is avaiable at https://github.com/therontau0054/Unisoma.
多固体系统是一系列广泛现实世界应用的基础,但模拟其复杂互动仍然具有挑战性。现有的深层次学习方法主要依靠隐含模型,其中影响固态变形的因素没有明确体现,而是间接学习。然而,随着固体数量的增加,这些方法难以准确地捕捉复杂的物理互动。在本文中,我们引入了一个新的明确的模型模式,其中包括通过结构化模块影响固态变形的因素。具体地说,我们介绍了Unisoma,一个统一和灵活的基于变异体的模型,能够处理各种固体的变异数量。Unisoma直接利用接触模块和适应性互动分配机制获取物理互动,并通过三重关系学习变形。与隐含的模型技术相比,明确的模型更适合于具有不同组合模式的多固体系统,因为它能够详细处理每一种固体,同时防止信息的混合和混乱。实验性,Unisoma在7个完善的数据集和两个复杂的多固体任务中实现了一致的状态性表现。代码可以在 https://githubub.comtheruma.00。
Article 137
Title@2025-07-24 (4): ViGText: Deepfake Image Detection with Vision-Language Model Explanations and Graph Neural Networks
Title: ViGText: Deepfake Image Detection with Vision-Language Model Explanations and Graph Neural Networks | ViGText: Deepfake-Bilderkennung mit Vision-Language-Modellerklärungen und Graph-Neural-Netzwerken | ViGText: 用视觉语言模型解释和图形神经网络进行深假图像探测 2507.18031v1 |
Authors (5): Ahmad ALBarqawi, Mahmoud Nazzal, Issa Khalil, Abdallah Khreishah, NhatHai Phan
The rapid rise of deepfake technology, which produces realistic but fraudulent digital content, threatens the authenticity of media. Traditional deepfake detection approaches often struggle with sophisticated, customized deepfakes, especially in terms of generalization and robustness against malicious attacks. This paper introduces ViGText, a novel approach that integrates images with Vision Large Language Model (VLLM) Text explanations within a Graph-based framework to improve deepfake detection. The novelty of ViGText lies in its integration of detailed explanations with visual data, as it provides a more context-aware analysis than captions, which often lack specificity and fail to reveal subtle inconsistencies. ViGText systematically divides images into patches, constructs image and text graphs, and integrates them for analysis using Graph Neural Networks (GNNs) to identify deepfakes. Through the use of multi-level feature extraction across spatial and frequency domains, ViGText captures details that enhance its robustness and accuracy to detect sophisticated deepfakes. Extensive experiments demonstrate that ViGText significantly enhances generalization and achieves a notable performance boost when it detects user-customized deepfakes. Specifically, average F1 scores rise from 72.45% to 98.32% under generalization evaluation, and reflects the model’s superior ability to generalize to unseen, fine-tuned variations of stable diffusion models. As for robustness, ViGText achieves an increase of 11.1% in recall compared to other deepfake detection approaches. When facing targeted attacks that exploit its graph-based architecture, ViGText limits classification performance degradation to less than 4%. ViGText uses detailed visual and textual analysis to set a new standard for detecting deepfakes, helping ensure media authenticity and information integrity.
深假技术的迅速崛起,它产生了现实但欺诈性的数字内容,威胁到媒体的真实性。传统的深假探测方法往往与精密的、定制的深假方法挣扎,特别是在一般化和抵御恶意攻击方面。本文介绍了VigText,这是将图像与视野大语言模型(VLLM)文本解释整合到一个基于图表的框架里的一种创新方法,目的是改进深假检测。 Vigtext的新颖之处在于它将详细的解释与视觉数据整合在一起,因为它提供了比字幕更深背景的认知分析,而这些分析往往缺乏具体性,无法揭示出微妙的不一致之处。 Vigtext系统将图像分为补丁、构建图像和文本图表,并将这些图像整合到分析中,使用Georg Ne Neal 网络(GNPs)来进行深度分析。通过使用多层次的特征提取, Vigtextext获取细节可以提高其精度和准确性能检测精密的深度数据。ViGText 广泛的实验表明,在检测用户-直观性攻击时,能够显著地提升其精度,在精确度分析中进行精确度分析。
Article 138
Title@2025-07-24 (4): AI Workflow, External Validation, and Development in Eye Disease Diagnosis
Title: AI Workflow, External Validation, and Development in Eye Disease Diagnosis | KI-Workflow, externe Validierung und Entwicklung in der Augenerkrankungen-Diagnose | AI 工作流程、外部验证和眼病诊断的发展 2409.15087v2 |
Authors (38): Qingyu Chen, Tiarnan D L Keenan, Elvira Agron, Alexis Allot, Emily Guan, Bryant Duong, Amr Elsawy, Benjamin Hou, Cancan Xue, Sanjeeb Bhandari, Geoffrey Broadhead, Chantal Cousineau-Krieger, Ellen Davis, William G Gensheimer, David Grasic, Seema Gupta, Luis Haddock, Eleni Konstantinou, Tania Lamba, Michele Maiberger, Dimosthenis Mantopoulos, Mitul C Mehta, Ayman G Nahri, Mutaz AL-Nawaflh, Arnold Oshinsky, Brittany E Powell, Boonkit Purt, Soo Shin, Hillary Stiefel, Alisa T Thavikulwat, Keith James Wroblewski, Tham Yih Chung, Chui Ming Gemmy Cheung, Ching-Yu Cheng, Emily Y Chew, Michelle R. Hribar, Michael F. Chiang, Zhiyong Lu
Timely disease diagnosis is challenging due to increasing disease burdens and limited clinician availability. AI shows promise in diagnosis accuracy but faces real-world application issues due to insufficient validation in clinical workflows and diverse populations. This study addresses gaps in medical AI downstream accountability through a case study on age-related macular degeneration (AMD) diagnosis and severity classification. We designed and implemented an AI-assisted diagnostic workflow for AMD, comparing diagnostic performance with and without AI assistance among 24 clinicians from 12 institutions with real patient data sampled from the Age-Related Eye Disease Study (AREDS). Additionally, we demonstrated continual enhancement of an existing AI model by incorporating approximately 40,000 additional medical images (named AREDS2 dataset). The improved model was then systematically evaluated using both AREDS and AREDS2 test sets, as well as an external test set from Singapore. AI assistance markedly enhanced diagnostic accuracy and classification for 23 out of 24 clinicians, with the average F1-score increasing by 20% from 37.71 (Manual) to 45.52 (Manual + AI) (P-value < 0.0001), achieving an improvement of over 50% in some cases. In terms of efficiency, AI assistance reduced diagnostic times for 17 out of the 19 clinicians tracked, with time savings of up to 40%. Furthermore, a model equipped with continual learning showed robust performance across three independent datasets, recording a 29% increase in accuracy, and elevating the F1-score from 42 to 54 in the Singapore population.
由于疾病负担增加,诊所数量有限,因此及时的疾病诊断具有挑战性,因为疾病负担增加,临床工作流程和不同人群的验证不足,AI显示诊断准确性有希望,但面临现实世界的应用问题。本研究通过对年龄相关肌肉畸形(AMD)诊断和严重程度分类进行案例研究,解决医疗AI下游问责方面的差距。我们设计并实施了AMAD的AI辅助诊断工作流程,将12个机构的真正病人数据抽样的12个机构的24名临床医生的诊断性能与不提供AI援助相比较,将诊断性能从37.71(手册)提高到45.52(手册+新加坡AI)。此外,我们通过纳入约40 000多张医疗图像(名为AREDS2数据集),持续加强现有的AI模式。随后,通过使用AREDS和AREDS2测试组以及新加坡的外部测试,系统评估改进的模型。我们设计并实施了AIDS诊断性能,24名临床医生中有23名的诊断性能和分类明显提高了20%,从37.71(手册+AI)增加到45.52(手册+AI) (P-0001),在一些病例中实现了50%以上的改进了50 %。此外,通过持续的临床数据跟踪跟踪记录显示了19个诊断时间。
Article 139
Title@2025-07-24 (4): Does visualization help AI understand data?
Title: Does visualization help AI understand data? | Hilft die Visualisierung KI, Daten zu verstehen? | 可视化能帮助AI理解数据吗? 2507.18022v1 |
Authors (3): Victoria R. Li, Johnathan Sun, Martin Wattenberg
Charts and graphs help people analyze data, but can they also be useful to AI systems? To investigate this question, we perform a series of experiments with two commercial vision-language models: GPT 4.1 and Claude 3.5. Across three representative analysis tasks, the two systems describe synthetic datasets more precisely and accurately when raw data is accompanied by a scatterplot, especially as datasets grow in complexity. Comparison with two baselines – providing a blank chart and a chart with mismatched data – shows that the improved performance is due to the content of the charts. Our results are initial evidence that AI systems, like humans, can benefit from visualization.
图表和图表有助于人们分析数据,但是否对AI系统有用?为了调查这一问题,我们用两种商业视觉语言模型进行了一系列实验:GPT 4.1和Claude 3.5。在三项具有代表性的分析任务中,当原始数据附有散射图时,这两个系统更准确和准确地描述合成数据集,特别是当数据集日益复杂时。与两个基线 – – 提供一个空白图表和一个有不匹配数据的图表 – – 的比较表明,业绩的改善是由于图表的内容所致。我们的结果初步证明,AI系统与人类一样,能够从可视化中受益。
Article 140
Title@2025-07-24 (4): Zeroth-order log-concave sampling
Title: Zeroth-order log-concave sampling | logkonkav-Probenahme der Nullten Ordnung | 零级对数集中取样 2507.18021v1 |
Authors (1): Yunbum Kook
We study the zeroth-order query complexity of log-concave sampling, specifically uniform sampling from convex bodies using membership oracles. We propose a simple variant of the proximal sampler that achieves the query complexity with matched R'enyi orders between the initial warmness and output guarantee. Specifically, for any $\varepsilon>0$ and $q\geq2$, the sampler, initialized at $\pi_{0}$, outputs a sample whose law is $\varepsilon$-close in $q$-R'enyi divergence to $\pi$, the uniform distribution over a convex body in $\mathbb{R}^{d}$, using $\widetilde{O}(qM_{q}^{q/(q-1)}d^{2}\,\lVert\operatorname{cov}\pi\rVert\log\frac{1}{\varepsilon})$ membership queries, where $M_{q}=\lVert\text{d}\pi_{0}/\text{d}\pi\rVert_{L^{q}(\pi)}$. We further introduce a simple annealing scheme that produces a warm start in $q$-R'enyi divergence (i.e., $M_{q}=O(1)$) using $\widetilde{O}(qd^{2}R^{3/2}\,\lVert\operatorname{cov}\pi\rVert^{1/4})$ queries, where $R^{2}=\mathbb{E}_{\pi}[ | \cdot | ^{2}]$. This interpolates between known complexities for warm-start generation in total variation and R'enyi-infinity divergence. To relay a R'enyi warmness across the annealing scheme, we establish hypercontractivity under simultaneous heat flow and translate it into an improved mixing guarantee for the proximal sampler under a logarithmic Sobolev inequality. These results extend naturally to general log-concave distributions accessible via evaluation oracles, incurring additional quadratic queries. |
我们研究对co- concavel 取样的零顺序查询复杂性, 特别是使用成员或星座从 convex 机构进行的统一采样。 我们提出一个简单的原始采样器变式, 使查询复杂, 初始温暖和输出保证之间匹配 R\ enyi 命令。 具体来说, 对于任何$\ varepsilon> 0美元和 $q\ geqqq2美元, 采样器, 初始化为$\ pí0} , 输出一种样本, 法律是 $- varepsil $- closeal , 以 $- R\\ enye 利差到 $ 美元, 以 $\\ diotr= diotr= lix lax a laxalqrqr=qr=qrqr\\\\\\\\ dirdeal a lax a lax a fal lax lax modeal a a lax laxiqqqqqr=rqqqqqr=oqr=_ dromode, lax_\\\\\ lax modeal a a a a a mox lax lax lax a a a a a a a a modeal_\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\ drodeal a a a a a a modrodeal de a a a a a modeal de a a a a modrodeal d d d d d d d d d d droxxxx\\\\\\\\\\\\\\\\\
Article 141
Title@2025-07-24 (4): Predictive Scaling Laws for Efficient GRPO Training of Large Reasoning Models
Title: Predictive Scaling Laws for Efficient GRPO Training of Large Reasoning Models | vorausschauende Skalierungsgesetze für eine effiziente GRPO-Schulung großer vernünftiger Modelle | GROPP 高效培训大理由模型的预测增强法律 2507.18014v1 |
Authors (5): Datta Nimmaturi, Vaishnavi Bhargava, Rajat Ghosh, Johnu George, Debojyoti Dutta
Fine-tuning large language models (LLMs) for reasoning tasks using reinforcement learning methods like Group Relative Policy Optimization (GRPO) is computationally expensive. To address this, we propose a predictive framework that models training dynamics and helps optimize resource usage. Through experiments on Llama and Qwen models (3B 8B), we derive an empirical scaling law based on model size, initial performance, and training progress. This law predicts reward trajectories and identifies three consistent training phases: slow start, rapid improvement, and plateau. We find that training beyond certain number of an epoch offers little gain, suggesting earlier stopping can significantly reduce compute without sacrificing performance. Our approach generalizes across model types, providing a practical guide for efficient GRPO-based fine-tuning.
使用强化学习方法(如集体相对政策优化(GROP))对大型语言模型(LLMs)进行推理工作微调是计算上昂贵的。为了解决这个问题,我们提出了一个预测框架,以模拟动态和帮助优化资源使用。通过Llama和Quen模型(3B8B)的实验,我们根据模型规模、初步业绩和培训进展,得出了经验性的规模法。这部法律预测了奖励轨迹,并确定了三个连贯的培训阶段:缓慢开始、快速改进和高原。我们发现,超过某一个时代的培训不会带来什么好处,表明提前停止会大大降低计算而不牺牲业绩。我们的方法将各种模型加以概括,为以GROP为基础的高效微调提供实用指南。
Article 142
Title@2025-07-24 (4): Active Learning For Repairable Hardware Systems With Partial Coverage
Title: Active Learning For Repairable Hardware Systems With Partial Coverage | Aktives Lernen für reparable Hardware-Systeme mit teilweiser Abdeckung | 为部分覆盖的可修理硬件系统积极学习 2503.16315v3 |
Authors (4): Michael Potter, Beyza Kalkanlı, Deniz Erdoğmuş, Michael Everett
Identifying the optimal diagnostic test and hardware system instance to infer reliability characteristics using field data is challenging, especially when constrained by fixed budgets and minimal maintenance cycles. Active Learning (AL) has shown promise for parameter inference with limited data and budget constraints in machine learning/deep learning tasks. However, AL for reliability model parameter inference remains underexplored for repairable hardware systems. It requires specialized AL Acquisition Functions (AFs) that consider hardware aging and the fact that a hardware system consists of multiple sub-systems, which may undergo only partial testing during a given diagnostic test. To address these challenges, we propose a relaxed Mixed Integer Semidefinite Program (MISDP) AL AF that incorporates Diagnostic Coverage (DC), Fisher Information Matrices (FIMs), and diagnostic testing budgets. Furthermore, we design empirical-based simulation experiments focusing on two diagnostic testing scenarios: (1) partial tests of a hardware system with overlapping subsystem coverage, and (2) partial tests where one diagnostic test fully subsumes the subsystem coverage of another. We evaluate our proposed approach against the most widely used AL AF in the literature (entropy), as well as several intuitive AL AFs tailored for reliability model parameter inference. Our proposed AF ranked best on average among the alternative AFs across 6,000 experimental configurations, with respect to Area Under the Curve (AUC) of the Absolute Total Expected Event Error (ATEER) and Mean Squared Error (MSE) curves, with statistical significance calculated at a 0.05 alpha level using a Friedman hypothesis test.
主动学习(AL)显示,在机器学习/深层学习任务中,有有限的数据和预算限制,有希望进行参数推断;然而,对于可修理的硬件系统,有可靠模型参数推断仍然没有得到充分探讨;需要专门的AL 获取功能(AF),考虑到硬件老化,以及硬件系统由多个子系统组成,在特定诊断测试期间只能进行部分测试;为应对这些挑战,我们建议采用宽松的混合 Integer Sidfinite 程序(MISP),该方案包含诊断覆盖(DC)、Fisherish Infricises(FIMS)和诊断测试预算方面的有限数据和预算;此外,我们设计基于经验的模拟实验,侧重于两种诊断测试假设:(1) 对硬件系统进行部分测试,其子覆盖范围重叠;(2) 部分测试,其中诊断测试完全反映另一个子系统的覆盖;我们根据文献中最广泛使用的AL AFA(entripropy Indefinite) 质量评估我们的拟议方法(IMIS) ,以及若干次直径(AFAFAL) 的直径直径直径直径测试(AAF),根据AFAFAFAF) 的准确度标准,根据AFAL 的正确度标准,对AFAL 和AFAL 的正确度进行。
Article 143
Title@2025-07-24 (4): Analyzing Islamophobic Discourse Using Semi-Coded Terms and LLMs
Title: Analyzing Islamophobic Discourse Using Semi-Coded Terms and LLMs | Analyse des Islamophoben Diskurses mit semi-kodierten Ausdrücken und LLMs | 使用半编码术语和LLMs分析仇视伊斯兰者的情况 2503.18273v2 |
Authors (5): Raza Ul Mustafa, Roi Dupart, Gabrielle Smith, Noman Ashraf, Nathalie Japkowicz
In recent years, Islamophobia has gained significant traction across Western societies, fueled by the rise of digital communication networks. This paper performs a large-scale analysis of specialized, semi-coded Islamophobic terms such as (muzrat, pislam, mudslime, mohammedan, muzzies) floated on extremist social platforms, i.e., 4Chan, Gab, Telegram, etc. Many of these terms appear lexically neutral or ambiguous outside of specific contexts, making them difficult for both human moderators and automated systems to reliably identify as hate speech. First, we use Large Language Models (LLMs) to show their ability to understand these terms. Second, Google Perspective API suggests that Islamophobic posts tend to receive higher toxicity scores than other categories of hate speech like Antisemitism. Finally, we use BERT topic modeling approach to extract different topics and Islamophobic discourse on these social platforms. Our findings indicate that LLMs understand these Out-Of-Vocabulary (OOV) slurs; however, further improvements in moderation strategies and algorithmic detection are necessary to address such discourse effectively. Our topic modeling also indicates that Islamophobic text is found across various political, conspiratorial, and far-right movements and is particularly directed against Muslim immigrants. Taken altogether, we performed one of the first studies on Islamophobic semi-coded terms and shed a global light on Islamophobia.
近些年来,仇视伊斯兰教在西方社会得到了显著的推动,并受到数字通信网络的崛起的推动。本文对在极端主义社会平台上(如4Chan、Gab、Telegram等)漂浮在极端主义社会平台上的(muzrat、pislam、mudslime、mudslime、mohammedan、muzzies)等专门和半编码的伊斯兰恐惧词汇(muzrat、pislam、modslime、mohamedan、muzzies)进行了大规模分析。许多这些术语似乎在具体背景之外具有法律中立性或模糊性,使得人类主持人和自动化系统难以可靠地识别这些仇恨言论。首先,我们使用大语言模型(LLLMs)来显示它们理解这些术语的能力。第二,GoogleoPour APIAPI表示,仇视伊斯兰教的言论往往比其他类型的仇恨言论(如反犹太主义)得到更高的毒性分数。最后,我们使用BERT主题模型方法在这些社会平台上提取不同的话题和仇视伊斯兰教言论。我们的研究结果表明,LLMMSM理解这些“O-O(OVO)“Oibyal ” 和“关于伊斯兰的理论”的理论研究首先必须进一步改进和“针对这种政治运动。
Article 144
Title@2025-07-24 (4): Fine-Grained Uncertainty Quantification via Collisions
Title: Fine-Grained Uncertainty Quantification via Collisions | Feinkörnige Unsicherheit Quantifizierung über Kollisionen | 通过碰撞进行精细的不确定性定量 2411.12127v4 |
Authors (3): Jesse Friedbaum, Sudarshan Adiga, Ravi Tandon
We propose a new and intuitive metric for aleatoric uncertainty quantification (UQ), the prevalence of class collisions defined as the same input being observed in different classes. We use the rate of class collisions to define the collision matrix, a novel and uniquely fine-grained measure of uncertainty. For a classification problem involving $K$ classes, the $K\times K$ collision matrix $S$ measures the inherent difficulty in distinguishing between each pair of classes. We discuss several applications of the collision matrix, establish its fundamental mathematical properties, as well as show its relationship with existing UQ methods, including the Bayes error rate (BER). We also address the new problem of estimating the collision matrix using one-hot labeled data by proposing a series of innovative techniques to estimate $S$. First, we learn a pair-wise contrastive model which accepts two inputs and determines if they belong to the same class. We then show that this contrastive model (which is PAC learnable) can be used to estimate the Gramian matrix of $S$, defined as $G=S^TS$. Finally, we show that under reasonable assumptions, $G$ can be used to uniquely recover $S$, a new result on non-negative matrices which could be of independent interest. With a method to estimate $S$ established, we demonstrate how this estimate of $S$, in conjunction with the contrastive model, can be used to estimate the posterior class portability distribution of any point. Experimental results are also presented to validate our methods of estimating the collision matrix and class posterior distributions on several datasets.
我们提出一个新的和直观的计量标准,用于确定不同类别中观察到的同一输入值,即分类碰撞的发生率。我们使用舱碰撞率来定义碰撞矩阵,这是一个新颖的和独特的细微的不确定性度量。对于涉及K美元等级的分类问题,美元/日元碰撞矩阵用美元衡量区分每一类之间固有的困难。我们讨论碰撞矩阵的若干应用,确立其基本数学特性,并显示其与现有克郎方法的关系,包括拜斯误差率(BER)。我们还使用舱碰撞率来估计碰撞矩阵的新问题,用一热标签数据来定义碰撞矩阵。首先,我们学习了一种双向对比模型,接受两种投入并确定它们是否属于同一类别。然后我们表明,这一对比模型(PAC可以学习)可以用来估计美元(GG=S)的基值,并显示其与现有克兰德基值(Bayes)的差差值(BER)之间的关系。最后,我们还通过提出一个合理的假设,用一热的标签数据来估算碰撞矩阵的碰撞矩阵的估算结果,用美元-美元,我们用这个固定的基值的基值数据可以用来证明一个固定的基值的基值的比值。
Article 145
Title@2025-07-23 (3): Machine Unlearning of Traffic State Estimation and Prediction
Title: Machine Unlearning of Traffic State Estimation and Prediction | Maschinelles Entlernen von Verkehrsstaatschätzungen und Vorhersagen | 取消学习交通国估计和预测 2507.17984v1 |
Authors (4): Xin Wang, R. Tyrrell Rockafellar, Xuegang, Ban
Data-driven traffic state estimation and prediction (TSEP) relies heavily on data sources that contain sensitive information. While the abundance of data has fueled significant breakthroughs, particularly in machine learning-based methods, it also raises concerns regarding privacy, cybersecurity, and data freshness. These issues can erode public trust in intelligent transportation systems. Recently, regulations have introduced the “right to be forgotten”, allowing users to request the removal of their private data from models. As machine learning models can remember old data, simply removing it from back-end databases is insufficient in such systems. To address these challenges, this study introduces a novel learning paradigm for TSEP-Machine Unlearning TSEP-which enables a trained TSEP model to selectively forget privacy-sensitive, poisoned, or outdated data. By empowering models to “unlearn,” we aim to enhance the trustworthiness and reliability of data-driven traffic TSEP.
由数据驱动的交通状况估计和预测(TESP)严重依赖包含敏感信息的数据来源。 大量数据刺激了重大突破,特别是机器学习方法方面的突破,但也引起了对隐私、网络安全和数据新鲜性的担忧。 这些问题可能会削弱公众对智能运输系统的信任。 最近, 法规引入了“ 被遗忘的权利 ” , 允许用户请求从模型中移除其私人数据。 由于机器学习模型可以记住旧数据, 仅仅从后端数据库中去除数据是不够的。 为了应对这些挑战,本研究为TESEP-Machine unlearning TESEP引入了一种新的学习模式, 使训练有素的TESEP模型能够有选择地忘记对隐私敏感、有毒或过时的数据。 通过赋予模型“ 忽略 ” 能力, 我们的目标是提高数据驱动的TESP的可靠性和可靠性。
Article 146
Title@2025-07-23 (3): Pulse-PPG: An Open-Source Field-Trained PPG Foundation Model for Wearable Applications Across Lab and Field Settings
Title: Pulse-PPG: An Open-Source Field-Trained PPG Foundation Model for Wearable Applications Across Lab and Field Settings | Pulse-PPG: Ein Open-Source Feld-Trained PPG Foundation Modell für tragbare Anwendungen über Labor- und Feldeinstellungen hinweg | Pulse-PPG:开放源码实地培训的PPG基金会模型,用于跨实验室和实地环境的可穿戴应用 2502.01108v2 |
Authors (6): Mithun Saha, Maxwell A. Xu, Wanting Mao, Sameer Neupane, James M. Rehg, Santosh Kumar
Photoplethysmography (PPG)-based foundation models are gaining traction due to the widespread use of PPG in biosignal monitoring and their potential to generalize across diverse health applications. In this paper, we introduce Pulse-PPG, the first open-source PPG foundation model trained exclusively on raw PPG data collected over a 100-day field study with 120 participants. Existing PPG foundation models are either open-source but trained on clinical data or closed-source, limiting their applicability in real-world settings. We evaluate Pulse-PPG across multiple datasets and downstream tasks, comparing its performance against a state-of-the-art foundation model trained on clinical data. Our results demonstrate that Pulse-PPG, trained on uncurated field data, exhibits superior generalization across clinical and mobile health applications in both lab and field settings. This suggests that exposure to real-world variability enables the model to learn fine-grained representations, making it more adaptable across tasks. Furthermore, pre-training on field data surprisingly outperforms its pre-training on clinical data in many tasks, reinforcing the importance of training on real-world, diverse datasets. To encourage further advancements in robust foundation models leveraging field data, we plan to release Pulse-PPG, providing researchers with a powerful resource for developing more generalizable PPG-based models.
由于在生物信号监测中广泛使用PPG, 并有可能推广各种健康应用,基于光谱截图的基建模型正在获得牵引力。在本文件中,我们引入了Pulse-PPG,这是在100天实地研究中专门收集的原始PPG数据方面受过专门培训的首个公开源PPG基础模型,有120人参加。现有的PPPG基础模型不是开放源,而是临床数据或封闭源培训,限制了其在现实世界环境中的适用性。我们评估了多数据集和下游任务中的Pulse-PPG,对照临床数据最先进的基础模型对它的业绩进行比较。我们的成果表明,在未经保密的实地数据数据方面受过培训的PPGPG,在实验室和实地环境中的临床和移动健康应用中表现出超强的普及性。这表明,与现实世界变异性模型的接触使得模型能够学习精细的表述,使其在现实世界环境中更加适应性。此外,关于实地数据的培训前,令人惊讶地超越了它在许多任务中接受临床数据培训前的训练,加强了关于不精确的PPPG基础,加强了我们数据库的基础。
Article 147
Title@2025-07-23 (3): Machine Learning Workflow for Analysis of High-Dimensional Order Parameter Space: A Case Study of Polymer Crystallization from Molecular Dynamics Simulations
Title: Machine Learning Workflow for Analysis of High-Dimensional Order Parameter Space: A Case Study of Polymer Crystallization from Molecular Dynamics Simulations | Machine Learning Workflow zur Analyse von hochdimensionalen Ordnungsparametern Raum: Eine Fallstudie zur Polymerkristallisation aus molekularen Dynamiksimulationen | 分析高多元秩序参数空间的机器学习工作流:分子动态模拟的聚合体晶化案例研究 2507.17980v1 |
Authors (3): Elyar Tourani, Brian J. Edwards, Bamin Khomami
Currently, identification of crystallization pathways in polymers is being carried out using molecular simulation-based data on a preset cut-off point on a single order parameter (OP) to define nucleated or crystallized regions. Aside from sensitivity to cut-off, each of these OPs introduces its own systematic biases. In this study, an integrated machine learning workflow is presented to accurately quantify crystallinity in polymeric systems using atomistic molecular dynamics data. Each atom is represented by a high-dimensional feature vector that combines geometric, thermodynamic-like, and symmetry-based descriptors. Low dimensional embeddings are employed to expose latent structural fingerprints within atomic environments. Subsequently, unsupervised clustering on the embeddings identified crystalline and amorphous atoms with high fidelity. After generating high quality labels with multidimensional data, we use supervised learning techniques to identify a minimal set of order parameters that can fully capture this label. Various tests were conducted to reduce the feature set, demonstrating that using only three order parameters is sufficient to recreate the crystallization labels. Based on these observed OPs, the crystallinity index (C-index) is defined as the logistic regression model’s probability of crystallinity, remaining bimodal throughout the process and achieving over 0.98 classification performance (AUC). Notably, a model trained on one or a few snapshots enables efficient on-the-fly computation of crystallinity. Lastly, we demonstrate how the optimal C-index fit evolves during various stages of crystallization, supporting the hypothesis that entropy dominates early nucleation, while symmetry gains relevance later. This workflow provides a data-driven strategy for OP selection and a metric to monitor structural transformations in large-scale polymer simulations.
目前,正在利用一个单顺序参数(OP)上一个预设断点的分子模拟数据,对聚合物中的晶化路径进行鉴定。除了对截断的敏感外,每个这些分离点还引入了自己的系统偏差。在本研究中,提出一个集成的机器学习工作流程,以精确量化聚合系统中的晶化路径,使用原子分子动态数据。每个原子都由高维特性矢量代表,该特性矢量结合了几何、热动力学类和对称标定值。低维嵌嵌入用于在原子环境中暴露潜在的结构指纹。随后,除了对截断的敏感外,每个分离点还引入了其自身的系统偏差。在生成高质的多维数据标签后,我们使用监督的学习技术来确定一套最起码的秩序参数,可以完全捕捉这个标签。进行了各种测试,以降低地谱值集,表明仅使用三个顺序参数足以支持在原子环境环境中重新建立直径直径的直径直线性标值。在嵌的直径直径直径直值上观测到一个直径直径直径直径的直径的直径直径直径直值的精确度变变。
Article 148
Title@2025-07-23 (3): SIFOTL: A Principled, Statistically-Informed Fidelity-Optimization Method for Tabular Learning
Title: SIFOTL: A Principled, Statistically-Informed Fidelity-Optimization Method for Tabular Learning | SIFOTL: Eine grundsätzliche, statistisch informierte Methode der Fidelity-Optimierung für tabellarisches Lernen | SIFOTL: 表格学习的有原则的、统计化的、统计化的助产性优化方法 2507.17979v1 |
Authors (2): Shubham Mohole, Sainyam Galhotra
Identifying the factors driving data shifts in tabular datasets is a significant challenge for analysis and decision support systems, especially those focusing on healthcare. Privacy rules restrict data access, and noise from complex processes hinders analysis. To address this challenge, we propose SIFOTL (Statistically-Informed Fidelity-Optimization Method for Tabular Learning) that (i) extracts privacy-compliant data summary statistics, (ii) employs twin XGBoost models to disentangle intervention signals from noise with assistance from LLMs, and (iii) merges XGBoost outputs via a Pareto-weighted decision tree to identify interpretable segments responsible for the shift. Unlike existing analyses which may ignore noise or require full data access for LLM-based analysis, SIFOTL addresses both challenges using only privacy-safe summary statistics. Demonstrating its real-world efficacy, for a MEPS panel dataset mimicking a new Medicare drug subsidy, SIFOTL achieves an F1 score of 0.85, substantially outperforming BigQuery Contribution Analysis (F1=0.46) and statistical tests (F1=0.20) in identifying the segment receiving the subsidy. Furthermore, across 18 diverse EHR datasets generated based on Synthea ABM, SIFOTL sustains F1 scores of 0.86-0.96 without noise and >= 0.75 even with injected observational noise, whereas baseline average F1 scores range from 0.19-0.67 under the same tests. SIFOTL, therefore, provides an interpretable, privacy-conscious workflow that is empirically robust to observational noise.
为了应对这一挑战,我们提议采用SIFOTL(统计化的Fidelity-Optimation Froundation Flatal Learning) (SIFOTL) (SIFOTL) (统计化的Fidility-Opitimation Fround) (SIFOTL) 应对两种挑战,即(一) 提取符合隐私的数据摘要统计数据,(二) 使用双XGBOost 模型,在LLMS的协助下,将干扰信号从噪音中分离出来;(三) 通过Pareto加权决策树合并 XGBOost 输出,以确定可解释的变化责任部分。 与现有的分析不同,可能忽略噪音或需要完全使用LLM(LM) 分析的数据,SIFOTL(SIF) 应对两种挑战。 SIPS-W 显示其真实世界的功效,为一个新的Medicare药品补贴,SIFOTL(SIF) 平均值为0.85(F=0.46) 和统计测试(FIFSIF) 18xxxxxxxxxxxxxxxxxxxxxx) 。
Article 149
Title@2025-07-23 (3): Improving the Computational Efficiency and Explainability of GeoAggregator
Title: Improving the Computational Efficiency and Explainability of GeoAggregator | Verbesserung der Computational Efficiency und Erklärbarkeit von GeoAggregator | 提高地理聚合体的计算效率和可解释性 2507.17977v1 |
Authors (3): Rui Deng, Ziqi Li, Mingshu Wang
Accurate modeling and explaining geospatial tabular data (GTD) are critical for understanding geospatial phenomena and their underlying processes. Recent work has proposed a novel transformer-based deep learning model named GeoAggregator (GA) for this purpose, and has demonstrated that it outperforms other statistical and machine learning approaches. In this short paper, we further improve GA by 1) developing an optimized pipeline that accelerates the dataloading process and streamlines the forward pass of GA to achieve better computational efficiency; and 2) incorporating a model ensembling strategy and a post-hoc model explanation function based on the GeoShapley framework to enhance model explainability. We validate the functionality and efficiency of the proposed strategies by applying the improved GA model to synthetic datasets. Experimental results show that our implementation improves the prediction accuracy and inference speed of GA compared to the original implementation. Moreover, explanation experiments indicate that GA can effectively captures the inherent spatial effects in the designed synthetic dataset. The complete pipeline has been made publicly available for community use (https://github.com/ruid7181/GA-sklearn).
准确的建模和解释地理空间表数据(GTD)对于理解地理空间现象及其基本过程至关重要;最近的工作为此目的提出了一个新型的基于变压器的深学习模型,名为GeoAggragator(GA),并表明它优于其他统计和机器学习方法;在这份简短的论文中,我们进一步改进GA,1)开发一个最佳管道,加速数据加载过程,精简GA的前方通道,以达到更好的计算效率;2)纳入一个基于地理空间框架的模型组合战略和热后模型解释功能,以加强模型解释性;我们通过将改进的GA模型应用于合成数据集来验证拟议战略的功能和效率;实验结果显示,我们的实施提高了GA的预测准确性和推断速度,与最初的实施相比;此外,解释实验表明,GA能够有效捕捉设计合成数据集中固有的空间影响;已经公布了完整的管道供社区使用(https://github.com/ruid7181/GA-sklearn)。
Article 150
Title@2025-07-23 (3): Zero-Shot Dynamic Concept Personalization with Grid-Based LoRA
Title: Zero-Shot Dynamic Concept Personalization with Grid-Based LoRA | Zero-Shot Dynamic Concept Personalisierung mit Grid-Based LoRA | 以网基LORA为基网格的零热动态个人化概念 2507.17963v1 |
Authors (8): Rameen Abdal, Or Patashnik, Ekaterina Deyneka, Hao Chen, Aliaksandr Siarohin, Sergey Tulyakov, Daniel Cohen-Or, Kfir Aberman
Recent advances in text-to-video generation have enabled high-quality synthesis from text and image prompts. While the personalization of dynamic concepts, which capture subject-specific appearance and motion from a single video, is now feasible, most existing methods require per-instance fine-tuning, limiting scalability. We introduce a fully zero-shot framework for dynamic concept personalization in text-to-video models. Our method leverages structured 2x2 video grids that spatially organize input and output pairs, enabling the training of lightweight Grid-LoRA adapters for editing and composition within these grids. At inference, a dedicated Grid Fill module completes partially observed layouts, producing temporally coherent and identity preserving outputs. Once trained, the entire system operates in a single forward pass, generalizing to previously unseen dynamic concepts without any test-time optimization. Extensive experiments demonstrate high-quality and consistent results across a wide range of subjects beyond trained concepts and editing scenarios.
文本到视频生成的最新进展使得文本和图像提示的高质量合成成为了文本和图像提示中的高质量合成。虽然现在可以使动态概念的个性化,这些动态概念从单一的视频中捕捉到特定主题的外观和运动,但大多数现有方法都需要逐个微调,限制可缩放性。我们在文本到视频模型中引入了动态概念个化的完全零射框架。我们的方法利用了结构化的2x2视频网格,对输入和输出配对进行空间组织,使得能够培训这些网格中的轻量级网格-LORA调适器进行编辑和组成。推断,专用网格填充模块完成部分观测到的布局,产生时间一致和身份保护产出。经过培训后,整个系统将采用单一的向前传球,在没有任何测试时间优化的情况下将以往看不见的动态概念归纳为通用概念。广泛的实验展示了超越经过培训的概念和编辑情景的广泛主题的高质量和一致的结果。
Article 151
Title@2025-07-23 (3): VIBE: Video-Input Brain Encoder for fMRI Response Modeling
Title: VIBE: Video-Input Brain Encoder for fMRI Response Modeling | VIBE: Video-Input Gehirnencoder für fMRI Response Modeling | VIBE: 用于FMRI反应建模的视频投入大脑编码器 2507.17958v1 |
Authors (6): Daniel Carlstrom Schad, Shrey Dixit, Janis Keck, Viktor Studenyak, Aleksandr Shpilevoi, Andrej Bicanski
We present VIBE, a two-stage Transformer that fuses multi-modal video, audio, and text features to predict fMRI activity. Representations from open-source models (Qwen2.5, BEATs, Whisper, SlowFast, V-JEPA) are merged by a modality-fusion transformer and temporally decoded by a prediction transformer with rotary embeddings. Trained on 65 hours of movie data from the CNeuroMod dataset and ensembled across 20 seeds, VIBE attains mean parcel-wise Pearson correlations of 32.25 on in-distribution Friends S07 and 21.25 on six out-of-distribution films. An earlier iteration of the same architecture obtained 0.3198 and 0.2096, respectively, winning Phase-1 and placing second overall in the Algonauts 2025 Challenge.
我们介绍了两阶段变换器VIBE,该变换器结合了多式视频、音频和文字功能,以预测FMRI活动。开放源码模型(Quen2.5、BEATs、Whisper、Slowfast、V-JEPA)的表示方式由模式集成变压器合并,由带有旋转嵌入器的预测变压器暂时解码。通过利用CNeuroMod数据集提供的65小时电影数据培训,将20世纪20种种子混合在一起,VIBE获得分布式Friends S07和21.25六部发行外胶片中平均包裹式比重32.25和21.25的皮尔逊相。同一结构早期的循环分别获得了0.3198和0.296,赢得了第1阶段1和0.2025年的Algoouts挑战。
Article 152
Title@2025-07-23 (3): Clo-HDnn: A 4.66 TFLOPS/W and 3.78 TOPS/W Continual On-Device Learning Accelerator with Energy-efficient Hyperdimensional Computing via Progressive Search
Title: Clo-HDnn: A 4.66 TFLOPS/W and 3.78 TOPS/W Continual On-Device Learning Accelerator with Energy-efficient Hyperdimensional Computing via Progressive Search | Clo-HDnn: A 4.66 TFLOPS/W und 3.78 TOPS/W Continual On-Device Learning Accelerator mit energieeffizientem Hyperdimensional Computing via Progressive Search | Clo-HDnn: 一种4.66 TFLOPS/W和3.78 TOPS/W 通过渐进搜索使用节能超多维电子计算器的不间断远程学习加速器 2507.17953v1 |
Authors (13): Chang Eun Song, Weihong Xu, Keming Fan, Soumil Jain, Gopabandhu Hota, Haichao Yang, Leo Liu, Kerem Akarvardar, Meng-Fan Chang, Carlos H. Diaz, Gert Cauwenberghs, Tajana Rosing, Mingu Kang
Clo-HDnn is an on-device learning (ODL) accelerator designed for emerging continual learning (CL) tasks. Clo-HDnn integrates hyperdimensional computing (HDC) along with low-cost Kronecker HD Encoder and weight clustering feature extraction (WCFE) to optimize accuracy and efficiency. Clo-HDnn adopts gradient-free CL to efficiently update and store the learned knowledge in the form of class hypervectors. Its dual-mode operation enables bypassing costly feature extraction for simpler datasets, while progressive search reduces complexity by up to 61% by encoding and comparing only partial query hypervectors. Achieving 4.66 TFLOPS/W (FE) and 3.78 TOPS/W (classifier), Clo-HDnn delivers 7.77x and 4.85x higher energy efficiency compared to SOTA ODL accelerators.
Clo-HDnn是设计用于新的持续学习任务的一种在线学习(ODL)加速器。Clo-HDnn将高维计算(HDC)与低成本的Kronecker HD编码器和重量组合特性提取(WCFE)相结合,以优化准确性和效率。Clo-HDnn采用无梯度的CL,以便有效地更新和储存以类高压器形式的知识。它的双式操作使更简单的数据集绕过昂贵的特征提取,而渐进式搜索则通过仅对部分查询高压器进行编码和比较,使复杂性降低高达61%。实现4.66 TFLOPS/W(FE)和3.78 TPS/W(分类器),Clo-HDnn提供7.77x和4.85x更高的能源效率,而SOTA ODL加速器则高于SA ODL加速器。
Article 153
Title@2025-07-23 (3): Analyzing Fairness of Computer Vision and Natural Language Processing Models
Title: Analyzing Fairness of Computer Vision and Natural Language Processing Models | Analyse der Fairness von Computer Vision und natürlichen Sprachverarbeitungsmodellen | 分析计算机视觉和自然语言处理模式的公平性 2412.09900v3 |
Authors (3): Ahmed Rashed, Abdelkrim Kallich, Mohamed Eltayeb
Machine learning (ML) algorithms play a critical role in decision-making across various domains, such as healthcare, finance, education, and law enforcement. However, concerns about fairness and bias in these systems have raised significant ethical and social challenges. To address these challenges, this research utilizes two prominent fairness libraries, Fairlearn by Microsoft and AIF360 by IBM. These libraries offer comprehensive frameworks for fairness analysis, providing tools to evaluate fairness metrics, visualize results, and implement bias mitigation algorithms. The study focuses on assessing and mitigating biases for unstructured datasets using Computer Vision (CV) and Natural Language Processing (NLP) models. The primary objective is to present a comparative analysis of the performance of mitigation algorithms from the two fairness libraries. This analysis involves applying the algorithms individually, one at a time, in one of the stages of the ML lifecycle, pre-processing, in-processing, or post-processing, as well as sequentially across more than one stage. The results reveal that some sequential applications improve the performance of mitigation algorithms by effectively reducing bias while maintaining the model’s performance. Publicly available datasets from Kaggle were chosen for this research, providing a practical context for evaluating fairness in real-world machine learning workflows.
机器学习(ML)算法在保健、金融、教育和执法等各个领域的决策中发挥着关键作用,然而,对于这些系统中的公平和偏见的关切提出了重大的道德和社会挑战。为应对这些挑战,这项研究利用了两个著名的公平图书馆,即微软的Fairlearn图书馆和IBM的AIF360。这些图书馆为公平分析提供了全面框架,提供了评价公平度、可视化结果和实施减少偏向算法的工具。研究的重点是评估和减少利用计算机视野和自然语言处理(NLP)模型对非结构化数据集的偏差。主要目的是对两个公平图书馆的缓解算法的绩效进行比较分析。这一分析涉及在ML生命周期的一个阶段、预处理、处理、后处理、以及一个以上阶段连续应用算法。研究结果表明,一些顺序应用提高了减缓算法的性,既有效减少偏差,又维持模型的性能。从卡格公司为评估这一实际数据流环境而选择了实际数据流。
Article 154
Title@2025-07-23 (3): Learning Safe Strategies for Value Maximizing Buyers in Uniform Price Auctions
Title: Learning Safe Strategies for Value Maximizing Buyers in Uniform Price Auctions | Sichere Strategien für die Wertmaximierung von Käufern in einheitlichen Preisauktionen lernen | 统一价格拍卖中价值最大化买方学习安全战略 2406.03674v3 |
Authors (2): Negin Golrezaei, Sourav Sahoo
We study the bidding problem in repeated uniform price multi-unit auctions from the perspective of a value-maximizing buyer. The buyer aims to maximize their cumulative value over $T$ rounds while adhering to per-round return-on-investment (RoI) constraints in a strategic (or adversarial) environment. Using an $m$-uniform bidding format, the buyer submits $m$ bid-quantity pairs $(b_i, q_i)$ to demand $q_i$ units at bid $b_i$, with $m \ll M$ in practice, where $M$ denotes the maximum demand of the buyer. We introduce the notion of safe bidding strategies as those that satisfy the RoI constraints irrespective of competing bids. Despite the stringent requirement, we show that these strategies satisfy a mild no-overbidding condition, depend only on the valuation curve of the bidder, and the bidder can focus on a finite subset without loss of generality. Though the subset size is $O(M^m)$, we design a polynomial-time learning algorithm that achieves sublinear regret, both in full-information and bandit settings, relative to the hindsight-optimal safe strategy. We assess the robustness of safe strategies against the hindsight-optimal strategy from a richer class. We define the richness ratio $\alpha \in (0,1]$ as the minimum ratio of the value of the optimal safe strategy to that of the optimal strategy from richer class and construct hard instances showing the tightness of $\alpha$. Our algorithm achieves $\alpha$-approximate sublinear regret against these stronger benchmarks. Simulations on semi-synthetic auction data show that empirical richness ratios significantly outperform the theoretical worst-case bounds. The proposed safe strategies and learning algorithm extend naturally to more nuanced buyer and competitor models.
我们从价值最大化买主的角度研究多次统一价格多单位拍卖的投标问题。 买主的目标是在战略( 或对立) 环境下, 将累计价值最大化于美元, 而同时遵守每轮投资回报( ROI) 限制, 在战略( 或对立) 环境中, 遵守每轮投资回报( ROI) 限制。 买主采用美元统一投标( b_ i, q_ i) 格式, 要求以美元为单位, 以美元为单位, 以美元为单位, 以美元为单位, 以美元为单位, 以美元为单位, 以美元为单位, 美元为单位, 以美元为单位, 美元为单位, 以美元为单位, 美元为单位, 以美元为单位, 以美元为单位, 美元为单位, 以美元为单位, 美元为单位, 美元, 以美元为单位, 美元为单位, 美元表示买家, 美元为美元, 美元 美元 美元 美元 美元 美元 美元 美元 美元 美元 美元 美元 美元 美元 美元 美元 , 表示买家, 美元 美元 美元 美元 美元 美元 美元 美元 表示买家 美元 美元 美元 美元 美元 美元 美元 美元 美元 美元 美元 买家 。 我们提出一个 安全投标 的 安全投标战略 , 我们提出 安全投标 的 战略 概念 概念 概念 概念 概念 , , , , , 以 , , , , , 以 以 以 以 以 以 以 , 以 以 以 以 以 以 以 最 最 最 , 最 , , 以 最 , , 最 , , , 最 最 , , , , , , 最 最 最 最 最 , 最 最 最 最 , 最 最 , 最 最 最 以 以 最 最 以 以 最 最 最 最 最 最 最 最 最 最
Article 155
Title@2025-07-23 (3): Quantum Machine Learning Playground
Title: Quantum Machine Learning Playground | Quantum Machine Learning Spielplatz | 量子机器学习游戏场 2507.17931v1 |
Authors (3): Pascal Debus, Sebastian Issel, Kilian Tscharke
This article introduces an innovative interactive visualization tool designed to demystify quantum machine learning (QML) algorithms. Our work is inspired by the success of classical machine learning visualization tools, such as TensorFlow Playground, and aims to bridge the gap in visualization resources specifically for the field of QML. The article includes a comprehensive overview of relevant visualization metaphors from both quantum computing and classical machine learning, the development of an algorithm visualization concept, and the design of a concrete implementation as an interactive web application. By combining common visualization metaphors for the so-called data re-uploading universal quantum classifier as a representative QML model, this article aims to lower the entry barrier to quantum computing and encourage further innovation in the field. The accompanying interactive application is a proposal for the first version of a quantum machine learning playground for learning and exploring QML models.
文章介绍了一个创新的交互式可视化工具,旨在解开量子机器学习(QML)算法的神秘性。我们的工作受到TensorFlow游戏场等古典机器学习可视化工具的成功启发,目的是弥合专门为QML领域而提供的可视化资源之间的差距。文章包括全面概述量子计算和经典机器学习中的相关可视化隐喻,开发算法可视化概念,以及设计一个具体实施程序,作为交互式网络应用程序。通过将所谓的数据重新加载通用量子分类器的通用可视化隐喻合并为具有代表性的QML模型,本文旨在降低进入量子计算的障碍,并鼓励进一步实地创新。伴随的交互式应用程序是关于量子机器学习游戏的第一版的建议,用于学习和探索QML模型。
Article 156
Title@2025-07-23 (3): Task Priors: Enhancing Model Evaluation by Considering the Entire Space of Downstream Tasks
Title: Task Priors: Enhancing Model Evaluation by Considering the Entire Space of Downstream Tasks | Task Priors: Verbesserung der Modellbewertung unter Berücksichtigung des gesamten Raumes von Downstream-Aufgaben | 任务前期:考虑到下游任务的全部空间,加强示范评价 2507.09871v2 |
Authors (2): Niket Patel, Randall Balestriero
The grand goal of AI research, and particularly Self Supervised Learning (SSL), is to produce systems that can successfully solve any possible task. In contrast, current evaluation methods available to AI researchers typically rely on a fixed collection of hand-picked downstream benchmarks. Hence, a large amount of effort is put into designing and searching for large collection of evaluation tasks that can serve as a proxy of our grand goal. We argue that such a rigid evaluation protocol creates a silent bottleneck in AI research. To remedy that, we define a probabilistic space of downstream tasks obtained by adopting a distribution of tasks and by defining Task Priors. Under this view, one can evaluate a model’s performance over the set of all possible downstream tasks. Our framework is the first to provide answers to key questions such as (i) what is the average performance of my model over all possible downstream tasks weighted by the probability to encounter each task? or (ii) what is the variance of my model’s performance across all downstream tasks under the defined Task Priors? Beyond establishing a new standard for evaluation, we believe that Task Priors will accelerate the pace of research in SSL - where downstream task evaluation is the sole qualitative signal that researchers have access to.
AI研究,特别是自我监督学习(SSL)的宏伟目标是建立能够成功解决任何可能的任务的系统。相比之下,AI研究人员目前可用的评估方法通常依赖固定的、手工挑选的下游基准集。因此,在设计和搜索大量可替代我们宏伟目标的评价工作中投入了大量精力。我们争辩说,这种僵硬的评价协议在AI研究中造成了一个沉默的瓶颈。为了纠正这一点,我们界定了通过分配任务和界定任务前程而获得的下游任务的概率空间。根据这一观点,人们可以评估模型对所有可能的下游任务集的效绩。我们的框架首先为关键问题提供答案,如(一) 我的模式对所有可能的下游任务的平均效绩如何?或者(二) 我的模式在确定的任务前期的所有下游任务中的性能有何差异?除了确定新的评价标准之外,我们认为任务前期任务将加快SSL的研究步伐,而下游任务评估是获得研究人员的唯一质量信号。
Article 157
Title@2025-07-23 (3): UrbanPulse: A Cross-City Deep Learning Framework for Ultra-Fine-Grained Population Transfer Prediction
Title: UrbanPulse: A Cross-City Deep Learning Framework for Ultra-Fine-Grained Population Transfer Prediction | UrbanPulse: Ein stadtübergreifendes Deep-Learning-Framework für ultra-reine Bevölkerungstransfer-Vorhersage | 城市脉动:关于超精子人口转移预测的跨城市深入学习框架 2507.17924v1 |
Authors (2): Hongrong Yang, Markus Schlaepfer
Accurate population flow prediction is essential for urban planning, transportation management, and public health. Yet existing methods face key limitations: traditional models rely on static spatial assumptions, deep learning models struggle with cross-city generalization, and Large Language Models (LLMs) incur high computational costs while failing to capture spatial structure. Moreover, many approaches sacrifice resolution by clustering Points of Interest (POIs) or restricting coverage to subregions, limiting their utility for city-wide analytics. We introduce UrbanPulse, a scalable deep learning framework that delivers ultra-fine-grained, city-wide OD flow predictions by treating each POI as an individual node. It combines a temporal graph convolutional encoder with a transformer-based decoder to model multi-scale spatiotemporal dependencies. To ensure robust generalization across urban contexts, UrbanPulse employs a three-stage transfer learning strategy: pretraining on large-scale urban graphs, cold-start adaptation, and reinforcement learning fine-tuning.Evaluated on over 103 million cleaned GPS records from three metropolitan areas in California, UrbanPulse achieves state-of-the-art accuracy and scalability. Through efficient transfer learning, UrbanPulse takes a key step toward making high-resolution, AI-powered urban forecasting deployable in practice across diverse cities.
准确的人口流动预测对于城市规划、交通管理和公共卫生至关重要。然而,现有方法面临关键限制:传统模型依赖于静态空间假设,深学习模型与跨城市的概括性斗争,而大语言模型(LLMS)在未能捕捉空间结构的同时,也产生了高昂的计算成本。此外,许多方法通过利益集群(POIs)或限制对次区域的覆盖而牺牲分辨率,限制了其对全城市分析的实用性。我们引入了可扩展的深层次学习框架 “ 城市规划 “ ,这是一个可扩展的深层次学习框架,通过将每个POI作为单个节点处理,对全城市范围的OD流动预测进行超纯度的、全城市范围的预测。它将时间图混集器和基于变异器的解调器与模型的多尺度跨时间依赖性依赖性结合起来。为了确保城市环境的稳健普及化,城市规划采用三阶段转移战略:对大型城市图表进行预先培训,冷启动适应,并加强学习的微调。从加利福尼亚三个大都市地区对超过1.03万个经过清理的全球定位系统记录进行估价,在城市的快速的移动,在城市中实现高度的快速的精确的学习。
Article 158
Title@2025-07-23 (3): From Seed to Harvest: Augmenting Human Creativity with AI for Red-teaming Text-to-Image Models
Title: From Seed to Harvest: Augmenting Human Creativity with AI for Red-teaming Text-to-Image Models | Vom Samen zur Ernte: Augmenting Human Creativity mit KI für Red-Teaming-Text-to-Image-Modelle | 从种子到收割:通过国际促进红-红-电制文本到图像模型学会增强人类的创造力 2507.17922v1 |
Authors (7): Jessica Quaye, Charvi Rastogi, Alicia Parrish, Oana Inel, Minsuk Kahng, Lora Aroyo, Vijay Janapa Reddi
Text-to-image (T2I) models have become prevalent across numerous applications, making their robust evaluation against adversarial attacks a critical priority. Continuous access to new and challenging adversarial prompts across diverse domains is essential for stress-testing these models for resilience against novel attacks from multiple vectors. Current techniques for generating such prompts are either entirely authored by humans or synthetically generated. On the one hand, datasets of human-crafted adversarial prompts are often too small in size and imbalanced in their cultural and contextual representation. On the other hand, datasets of synthetically-generated prompts achieve scale, but typically lack the realistic nuances and creative adversarial strategies found in human-crafted prompts. To combine the strengths of both human and machine approaches, we propose Seed2Harvest, a hybrid red-teaming method for guided expansion of culturally diverse, human-crafted adversarial prompt seeds. The resulting prompts preserve the characteristics and attack patterns of human prompts while maintaining comparable average attack success rates (0.31 NudeNet, 0.36 SD NSFW, 0.12 Q16). Our expanded dataset achieves substantially higher diversity with 535 unique geographic locations and a Shannon entropy of 7.48, compared to 58 locations and 5.28 entropy in the original dataset. Our work demonstrates the importance of human-machine collaboration in leveraging human creativity and machine computational capacity to achieve comprehensive, scalable red-teaming for continuous T2I model safety evaluation.
文本到图像模型(T2I)在许多应用中变得十分普遍,使得它们针对对抗性攻击的强有力评价成为重要优先事项。持续获得不同领域具有挑战性的对抗性新信号对于测试这些针对多种矢量新攻击的抗御能力模型至关重要。目前生成这种信号的技术要么完全由人来制作,要么是合成生成的。一方面,人造对抗性闪电数据集在规模上往往太小,在文化和背景代表性上往往不平衡。另一方面,合成生成的速率数据集达到规模,但通常缺乏在人造的提示中发现的现实微妙之处和创造性的对抗性战略。为了将人类和机器方法的优势结合起来,我们提议采用Sed2Harvest这一混合的红色组合方法来引导文化多样性、人造对抗性闪烁种子的扩展。因此,迅速保存了人类速的特征和攻击模式,同时保持了可比的平均攻击成功率(NudeNet,0.36 SD NSFW,0.12 Q16)。我们扩大的数据集成的58个原始地理位置和滚动数据,显示了我们原始的535个原始地理位置,并展示了我们原始的滚动的滚动的地理位置。
Article 159
Title@2025-07-23 (3): Sliding Window Informative Canonical Correlation Analysis
Title: Sliding Window Informative Canonical Correlation Analysis | Sliding Window Informative Canonical Correlation Analysis | Sliding 窗口信息化 Canonical 关联分析 2507.17921v1 |
Authors (1): Arvind Prasadan
Canonical correlation analysis (CCA) is a technique for finding correlated sets of features between two datasets. In this paper, we propose a novel extension of CCA to the online, streaming data setting: Sliding Window Informative Canonical Correlation Analysis (SWICCA). Our method uses a streaming principal component analysis (PCA) algorithm as a backend and uses these outputs combined with a small sliding window of samples to estimate the CCA components in real time. We motivate and describe our algorithm, provide numerical simulations to characterize its performance, and provide a theoretical performance guarantee. The SWICCA method is applicable and scalable to extremely high dimensions, and we provide a real-data example that demonstrates this capability.
Canonical 相关分析(CCA)是一种在两个数据集之间寻找相关特征的技术。在本文中,我们提议将共同国家评估的新型扩展扩展至在线流数据设置: Sliding Window Informationive Canonical Connational Consurational 分析(SWICCA 分析 ) 。我们的方法使用流式主要元件分析算法作为后端,并使用这些输出与一个小型的滑动窗口结合,实时估计CCA 组件。我们激励和描述我们的算法,提供数字模拟来描述其性能特征,并提供理论性能保障。 SWICCA 方法在极高的维度上是适用和可伸缩的,我们提供了一个真实的数据示例来证明这一能力。
Article 160
Title@2025-07-23 (3): Tuning Sequential Monte Carlo Samplers via Greedy Incremental Divergence Minimization
Title: Tuning Sequential Monte Carlo Samplers via Greedy Incremental Divergence Minimization | Tuning Sequentielle Monte Carlo Sampler über Greedy Incremental Divergence Minimierung | 通过贪婪递增差异最小化, 2503.15704v4 |
Authors (4): Kyurae Kim, Zuheng Xu, Jacob R. Gardner, Trevor Campbell
The performance of sequential Monte Carlo (SMC) samplers heavily depends on the tuning of the Markov kernels used in the path proposal. For SMC samplers with unadjusted Markov kernels, standard tuning objectives, such as the Metropolis-Hastings acceptance rate or the expected-squared jump distance, are no longer applicable. While stochastic gradient-based end-to-end optimization has been explored for tuning SMC samplers, they often incur excessive training costs, even for tuning just the kernel step sizes. In this work, we propose a general adaptation framework for tuning the Markov kernels in SMC samplers by minimizing the incremental Kullback-Leibler (KL) divergence between the proposal and target paths. For step size tuning, we provide a gradient- and tuning-free algorithm that is generally applicable for kernels such as Langevin Monte Carlo (LMC). We further demonstrate the utility of our approach by providing a tailored scheme for tuning kinetic LMC used in SMC samplers. Our implementations are able to obtain a full schedule of tuned parameters at the cost of a few vanilla SMC runs, which is a fraction of gradient-based approaches.
连续的蒙特卡洛(SMC)采样器的性能在很大程度上取决于路径建议中使用的Markov内核的调试。对于未调整的Markov内核的SMC采样器,标准调试目标,如大都会-Hasting(KL)接受率或预期的平方跳跃距离等标准调试目标已不再适用。虽然为调控SMC采样器探索了基于梯度梯度的端到端优化,但是它们往往产生过高的培训费用,即使是对内核级尺寸的调试。在这项工作中,我们提议了一个通用的调适框架,以通过尽量减少Kullback-Leiber(KL)在建议和目标路径之间的递增差异来调控管SMC样品中的Markov内核。对于步骤的调试样器,我们提供了一种一般适用于Langevin Montecar(LMC)等内核取样器的梯度和调无调算法。我们进一步证明了我们的方法的效用,为调控管SMC样品中使用的电动式LMC级 LMC提供了一个定制的调制办法。我们执行的甚小的SMIL的进度是能够完全调整的进度的进度图。
Article 161
Title@2025-07-23 (3): SETOL: A Semi-Empirical Theory of (Deep) Learning
Title: SETOL: A Semi-Empirical Theory of (Deep) Learning | SETOL: Eine semi-empirische Theorie des (Tiefen) Lernens | SETOL:半经验学理论(深)学习 2507.17912v1 |
Authors (2): Charles H Martin, Christopher Hinrichs
We present a SemiEmpirical Theory of Learning (SETOL) that explains the remarkable performance of State-Of-The-Art (SOTA) Neural Networks (NNs). We provide a formal explanation of the origin of the fundamental quantities in the phenomenological theory of Heavy-Tailed Self-Regularization (HTSR): the heavy-tailed power-law layer quality metrics, alpha and alpha-hat. In prior work, these metrics have been shown to predict trends in the test accuracies of pretrained SOTA NN models, importantly, without needing access to either testing or training data. Our SETOL uses techniques from statistical mechanics as well as advanced methods from random matrix theory and quantum chemistry. The derivation suggests new mathematical preconditions for ideal learning, including a new metric, ERG, which is equivalent to applying a single step of the Wilson Exact Renormalization Group. We test the assumptions and predictions of SETOL on a simple 3-layer multilayer perceptron (MLP), demonstrating excellent agreement with the key theoretical assumptions. For SOTA NN models, we show how to estimate the individual layer qualities of a trained NN by simply computing the empirical spectral density (ESD) of the layer weight matrices and plugging this ESD into our SETOL formulas. Notably, we examine the performance of the HTSR alpha and the SETOL ERG layer quality metrics, and find that they align remarkably well, both on our MLP and on SOTA NNs.
我们提出了一个半经验性学习理论(SETOL ) , 解释国家艺术神经网络(SOTA) 神经网络(NNS)的杰出表现。 我们正式解释了重力自闭自我调节(HTSR):重力法层质量标准、阿尔法和阿尔法-哈特(HTSR) (SETSR) (SETSR) (SETOL) (SETOL) (SETOL) (SETOL) (SET) (SET) (SET) (SET) (SET) (SET) (SET) ) (SET) (SAT) ) (SAT) (S) (SAT) (S) (SAT) (SET) (SN) (SD) (SD) (SD) (SD) (SNOL (SD) (SD) (SNOL (SD) (SD) (SD) (SD) (SD (SD) (SD) (SD) (SD (SD) (SD (SD) (SD) (SD) (SD (SD (SD) (SD) (SD) (SD (SD) (SD) (的精度 级 级 级模型(SD) (SD) (SD) (SD) (SD) (SD) (SD) (SD) (SD) (SD) (的精度 ) (SD (SD) (SD) (SD) (SD) (SD) (SD) (SD) (SD) (SD) (SD) (SD) (SD) (SD) (S) (S) (SD) (SD) (SD) (SD) (SD) (SD) (SD) (SD) (SD) (SD) (SD) (SD) (SD) (SD) (SD) () () (SD) () (SD) (SD) (SB) () () (SD) () (SB) ( ) (SB) (的精度模型, 我们
Article 162
Title@2025-07-23 (3): EEG Foundation Models: A Critical Review of Current Progress and Future Directions
Title: EEG Foundation Models: A Critical Review of Current Progress and Future Directions | EEG-Stiftungsmodelle: Ein kritischer Überblick über aktuelle Fortschritte und zukünftige Richtungen | EEG基金会模式:对当前进展和未来方向的重要审查 2507.11783v2 |
Authors (3): Gayal Kuruppu, Neeraj Wagh, Yogatheesan Varatharajah
Patterns of electrical brain activity recorded via electroencephalography (EEG) offer immense value for scientific and clinical investigations. The inability of supervised EEG encoders to learn robust EEG patterns and their over-reliance on expensive signal annotations have sparked a transition towards general-purpose self-supervised EEG encoders, i.e., EEG foundation models (EEG-FMs), for robust and scalable EEG feature extraction. However, the real-world readiness of early EEG-FMs and the rubric for long-term research progress remain unclear. A systematic and comprehensive review of first-generation EEG-FMs is therefore necessary to understand the current state-of-the-art and identify key directions for future EEG-FMs. To that end, this study reviews 10 early EEG-FMs and presents a critical synthesis of their methodology, empirical findings, and outstanding research gaps. We find that most EEG-FMs adopt a sequence-based modeling scheme that relies on transformer-based backbones and the reconstruction of masked sequences for self-supervision. However, model evaluations remain heterogeneous and largely limited, making it challenging to assess their practical off-the-shelf utility. In addition to adopting standardized and realistic evaluations, future work should demonstrate more substantial scaling effects and make principled and trustworthy choices throughout the EEG representation learning pipeline. We believe that developing benchmarks, software tools, technical methodologies, and applications in collaboration with domain experts may further advance the translational utility and real-world adoption of EEG-FMs.
通过电子脑图学(EEG)记录的电子脑活动模式为科学和临床调查提供了巨大的价值。受监督的电子脑脑活动模式无法为科学和临床调查提供巨大的价值。受监督的EEG编码者无法学习稳健的EEG模式,而且过度依赖昂贵的信号说明,这导致向通用自我监督的EEG编码模式(EEEG-FMs)过渡,即EEEEG基础模型(EEEG-FMs),以进行稳健和可扩缩的EEEG特征提取。然而,早期电子脑脑活动模式(EEG-FMs)和用于早期研究进步的标本的实时世界性模式准备状态仍然不明确。因此,对第一代的EEEG-FMs进行系统系统的全面系统审查是必要的,以便了解当前的最新状况,并查明未来EEEG-FM系统的关键方向。为此,我们发现,大多数EEEG-FFFFM采用基于变换式的骨架和进一步重建隐蔽的序列结构。然而,模型评估仍然具有挑战性,在现实的实用的实用性、可理解性、可理解性评估中进行。
Article 163
Title@2025-07-23 (3): Deep learning-aided inverse design of porous metamaterials
Title: Deep learning-aided inverse design of porous metamaterials | Tiefes Lernen-unterstütztes inverses Design poröser Metamaterialien | 多孔元材料的深深学习辅助反向设计 2507.17907v1 |
Authors (4): Phu Thien Nguyen, Yousef Heider, Dennis M. Kochmann, Fadi Aldakheel
The ultimate aim of the study is to explore the inverse design of porous metamaterials using a deep learning-based generative framework. Specifically, we develop a property-variational autoencoder (pVAE), a variational autoencoder (VAE) augmented with a regressor, to generate structured metamaterials with tailored hydraulic properties, such as porosity and permeability. While this work uses the lattice Boltzmann method (LBM) to generate intrinsic permeability tensor data for limited porous microstructures, a convolutional neural network (CNN) is trained using a bottom-up approach to predict effective hydraulic properties. This significantly reduces the computational cost compared to direct LBM simulations. The pVAE framework is trained on two datasets: a synthetic dataset of artificial porous microstructures and CT-scan images of volume elements from real open-cell foams. The encoder-decoder architecture of the VAE captures key microstructural features, mapping them into a compact and interpretable latent space for efficient structure-property exploration. The study provides a detailed analysis and interpretation of the latent space, demonstrating its role in structure-property mapping, interpolation, and inverse design. This approach facilitates the generation of new metamaterials with desired properties. The datasets and codes used in this study will be made open-access to support further research.
这项研究的最终目的是利用一个深层次学习的基因化框架,探索多孔元材料的反向设计。具体地说,我们开发了一种财产变换自动电解码器(pVAE),一种变式自动电解码器(VAE),配以一个递增器,以产生结构化的元材料,配有定制的液压特性,如孔径和渗透性。虽然这项工作使用lattice Boltzmann 方法(LBM),为有限的多孔微结构生成内在的渗透性振荡器数据,但利用自下而上的方法对一个革命性神经网络进行了培训,以预测有效的液压特性。这大大降低了计算成本,与直接的液压模拟相比。PVAE框架在两个数据集上进行了培训:人工孔径微结构的合成数据集和真实的开放型细胞泡沫的体积元素的CT扫描图像。 VAE的电解分解器结构将捕捉到关键的微结构特征,将其映射成一个紧凑和可解释的隐藏空间空间空间空间空间空间定位空间空间空间定位支持的模型研究中,该研究将展示模型的模型的模型分析与模型分析与模型分析与模型分析。
Article 164
Title@2025-07-23 (3): Federated Learning for Large-Scale Cloud Robotic Manipulation: Opportunities and Challenges
Title: Federated Learning for Large-Scale Cloud Robotic Manipulation: Opportunities and Challenges | Föderiertes Lernen für großräumige Cloud-Robotermanipulation: Chancen und Herausforderungen | 大型云层机器人操纵联合会学习:机遇与挑战 2507.17903v1 |
Authors (4): Obaidullah Zaland, Chanh Nguyen, Florian T. Pokorny, Monowar Bhuyan
Federated Learning (FL) is an emerging distributed machine learning paradigm, where the collaborative training of a model involves dynamic participation of devices to achieve broad objectives. In contrast, classical machine learning (ML) typically requires data to be located on-premises for training, whereas FL leverages numerous user devices to train a shared global model without the need to share private data. Current robotic manipulation tasks are constrained by the individual capabilities and speed of robots due to limited low-latency computing resources. Consequently, the concept of cloud robotics has emerged, allowing robotic applications to harness the flexibility and reliability of computing resources, effectively alleviating their computational demands across the cloud-edge continuum. Undoubtedly, within this distributed computing context, as exemplified in cloud robotic manipulation scenarios, FL offers manifold advantages while also presenting several challenges and opportunities. In this paper, we present fundamental concepts of FL and their connection to cloud robotic manipulation. Additionally, we envision the opportunities and challenges associated with realizing efficient and reliable cloud robotic manipulation at scale through FL, where researchers adopt to design and verify FL models in either centralized or decentralized settings.
联邦学习联盟(FL)是一个新兴的分布式机器学习模式,在这种模式中,合作培训涉及各种装置的动态参与,以实现广泛的目标。相比之下,古典机器学习(ML)通常要求将数据定位在培训地点,而FL则利用许多用户设备来培训一个共同的全球模型,而无需分享私人数据。当前机器人操作任务受到机器人个人能力和速度的限制,原因是低弹性计算资源有限。因此,云型机器人的概念已经出现,允许机器人应用程序利用计算资源的灵活性和可靠性,有效地减轻其在云型连续体中的计算需求。毫无疑问,在这一分布式计算机背景下,如云型机器人操作情景所示,FL提供了多种优势,同时也提出了若干挑战和机遇。在本文件中,我们介绍了FL的基本概念及其与云型机器人操作的联系。此外,我们设想了通过FL实现规模高效和可靠的云型机器人操作的机会和挑战,研究人员通过FL在集中或分散的环境中设计和核实FL模型。
Article 165
Title@2025-07-23 (3): Multimodal Recurrent Ensembles for Predicting Brain Responses to Naturalistic Movies (Algonauts 2025)
Title: Multimodal Recurrent Ensembles for Predicting Brain Responses to Naturalistic Movies (Algonauts 2025) | Multimodale Recurrent-Ensembles zur Vorhersage von Gehirnreaktionen auf naturalistische Filme (Algonauten 2025) | 预测对自然电影的脑反应的多式经常性多年度联合会议(2025年8月20日) 2507.17897v1 |
Authors (3): Semih Eren, Deniz Kucukahmetler, Nico Scherf
Accurately predicting distributed cortical responses to naturalistic stimuli requires models that integrate visual, auditory and semantic information over time. We present a hierarchical multimodal recurrent ensemble that maps pretrained video, audio, and language embeddings to fMRI time series recorded while four subjects watched almost 80 hours of movies provided by the Algonauts 2025 challenge. Modality-specific bidirectional RNNs encode temporal dynamics; their hidden states are fused and passed to a second recurrent layer, and lightweight subject-specific heads output responses for 1000 cortical parcels. Training relies on a composite MSE-correlation loss and a curriculum that gradually shifts emphasis from early sensory to late association regions. Averaging 100 model variants further boosts robustness. The resulting system ranked third on the competition leaderboard, achieving an overall Pearson r = 0.2094 and the highest single-parcel peak score (mean r = 0.63) among all participants, with particularly strong gains for the most challenging subject (Subject 5). The approach establishes a simple, extensible baseline for future multimodal brain-encoding benchmarks.
准确预测对自然刺激的分布分布式线性反应需要将视觉、听觉和语义信息长期整合在一起的模式。 我们展示了一个等级式多式经常性组合,将视频、音频和语言预先嵌入到FMRI时间序列中进行记录,而四个对象则观看了2025年Algoauts挑战提供的近80小时的电影。 模式性特定的双向双向RNN(双向双向RNN)将时间动态编码化;它们的隐藏状态被整合并传递到第二个经常层,而其隐藏状态则被传递到第二个经常层,而1000个包包的轻量级主题头目输出反应。 培训依赖于综合的MSE-conlation损失和将重点从早期感官区域逐步转移到后期联系区域的课程。 动态100模型变异体将进一步增强稳健性。 由此形成的系统在竞争领导板上排名第三,在所有参与者中达到总体皮尔森r=0. 2094和最高单级峰分数(平均0.63分),在最具挑战性的主题上特别强的收益(第5项)。
Article 166
Title@2025-07-23 (3): Lower Bounds for Public-Private Learning under Distribution Shift
Title: Lower Bounds for Public-Private Learning under Distribution Shift | Untere Grenzen für öffentlich-privates Lernen unter Verteilungsverschiebung | 分配轮班下公-私学习的下下档次 2507.17895v1 |
Authors (3): Amrith Setlur, Pratiksha Thaker, Jonathan Ullman
The most effective differentially private machine learning algorithms in practice rely on an additional source of purportedly public data. This paradigm is most interesting when the two sources combine to be more than the sum of their parts. However, there are settings such as mean estimation where we have strong lower bounds, showing that when the two data sources have the same distribution, there is no complementary value to combining the two data sources. In this work we extend the known lower bounds for public-private learning to setting where the two data sources exhibit significant distribution shift. Our results apply to both Gaussian mean estimation where the two distributions have different means, and to Gaussian linear regression where the two distributions exhibit parameter shift. We find that when the shift is small (relative to the desired accuracy), either public or private data must be sufficiently abundant to estimate the private parameter. Conversely, when the shift is large, public data provides no benefit.
在实践上,最有效的、有差别的私人机器学习算法依赖于据称公共数据的额外来源。当两个来源合并在一起时,这一范式最有趣,因为这两个来源的数值大于其部分的总和。然而,有些环境,例如平均估计,我们有很强的下限,表明当两个数据来源分布相同时,将两个数据来源合并起来没有补充价值。在这项工作中,我们将已知的公私学习下限扩大到设定两种数据来源显著分布变化的地点。我们的结果适用于两种分布方式不同的高斯平均估计,以及两种分布显示参数变化的高斯线性回归。我们发现,当变化小时(相对于预期的准确性),公共或私人数据必须足以估计私人参数。反之,如果变化大,公共数据就没有好处。
Article 167
Title@2025-07-23 (3): Action-List Reinforcement Learning Syndrome Decoding for Binary Linear Block Codes
Title: Action-List Reinforcement Learning Syndrome Decoding for Binary Linear Block Codes | Action-Liste Verstärkungs-Lernsyndrom-Dekodierung für Binary Linear Block Codes | 二元线性线性块块代码的标记 2507.17893v1 |
Authors (2): Milad Taghipour, Bane Vasic
This paper explores the application of reinforcement learning techniques to enhance the performance of decoding of linear block codes based on flipping bits and finding optimal decisions. We describe the methodology for mapping the iterative decoding process into Markov Decision Processes (MDPs) and propose different methods to reduce the number of states in the MDP. A truncated MDP is proposed to reduce the number of states in the MDP by learning a Hamming ball with a specified radius around codewords. We then propose a general scheme for reinforcement learning based decoders applicable to any class of codes to improve the performance of decoders. We call this scheme an action-list decoding. We design an action-list decoder based on the Deep-Q network values that substantially enhance performance. We also get benefit of automorphism group of code to further improve the code performance. Additionally, we propose a feedback-based method to exploit and enhance the performance of existing high-performing decoders by applying reinforcement learning algorithms after the existing decoders. These approaches effectively reduces the complexity of the reinforcement learning block. Finally, we present experimental results for the Low-Density Parity Check (LDPC) codes over the Binary Symmetric Channel (BSC) to demonstrate the efficiency of the proposed methods.
本文探讨应用强化学习技术,提高基于翻转比特和找到最佳决定的线性区块代码解码的性能; 我们描述将迭代解码进程映射成Markov 决策进程(MDPs)的方法,并提出不同方法减少MDP国家的数目; 提议通过学习一个围绕编码词以特定半径的模拟球来减少MDP国家的数目; 然后,我们提出一个适用于任何类别代码的强化基于学习的解码器的一般计划,以改进解码器的性能。 我们称这个计划为行动列表解码。 我们设计了一个基于深Q网络值的行动列表解码器,大大增强性能。 我们还从自成一体的代码组中获益,以进一步改进代码性能。 此外,我们提出一种基于反馈的方法,在现有解码器之后应用增强性学习算法来利用和增强现有高性能解码器的性能。 这些方法有效地降低了强化学习块的复杂性。 最后,我们为Sy-C-BS-C-C-C-C-Chestal 演示Sy-C-Cral-Cral-Cyal-Creval 演示Sy-Cyal-Cyal-CLDdal-Sycaldaldaldaldaldaldaldal的方法,我们为Sy-BIS-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-Ctal-C-C-C-C-C-C-Ctaldaldaldaldaldal-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C
Article 168
Title@2025-07-23 (3): DeepCrossAttention: Supercharging Transformer Residual Connections
Title: DeepCrossAttention: Supercharging Transformer Residual Connections | DeepCrossAchtung: Supercharging Transformer Residual Verbindungen | 深十字感应:高压变压器残余连接 2502.06785v2 |
Authors (6): Mike Heddes, Adel Javanmard, Kyriakos Axiotis, Gang Fu, MohammadHossein Bateni, Vahab Mirrokni
Transformer networks have achieved remarkable success across diverse domains, leveraging a variety of architectural innovations, including residual connections. However, traditional residual connections, which simply sum the outputs of previous layers, can dilute crucial information. This work introduces DeepCrossAttention (DCA), an approach that enhances residual learning in transformers. DCA employs learnable, input-dependent weights to dynamically combine layer outputs, enabling the model to selectively focus on the most relevant information in any of the previous layers. Furthermore, DCA incorporates depth-wise cross-attention, allowing for richer interactions between layers at different depths. Our language modeling experiments show that DCA achieves improved perplexity for a given training time. Moreover, DCA obtains the same model quality up to 3x faster while adding a negligible number of parameters. Theoretical analysis confirms that DCA provides an improved trade-off between accuracy and model size when the ratio of collective layer ranks to the ambient dimension falls below a critical threshold.
变换器网络在不同领域取得了显著的成功,利用了各种建筑创新,包括剩余连接。然而,传统的剩余连接,只是将前层产出相加,可以淡化重要信息。这项工作引入了DeepCrossAtention(DCA),这是一种加强变压器剩余学习的方法。DCA将可学习的、依赖投入的权重用于动态组合层产出,使模型能够有选择地侧重于前层中任何层中最相关的信息。此外,DCA纳入了深层次的交叉关注,允许不同深度的层间互动更加丰富。我们的语言模拟实验显示,DCA在特定培训时间里实现了更高的不易解性。此外,DCA获得了相同的模型质量,最高达3x,同时增加了可忽略的参数。理论分析证实,当集体层与环境层面的比率低于临界阈值时,DCA提供了更精确和模型大小之间的权衡。
Article 169
Title@2025-07-23 (3): Fourier Neural Operators for Non-Markovian Processes:Approximation Theorems and Experiments
Title: Fourier Neural Operators for Non-Markovian Processes:Approximation Theorems and Experiments | Fourier-Neural-Betreiber für nicht markovianische Prozesse:Approximationstheorien und Experimente | 非 Markovian 进程四神经操作器: 近似理论和实验 2507.17887v1 |
Authors (3): Wonjae Lee, Taeyoung Kim, Hyungbin Park
This paper introduces an operator-based neural network, the mirror-padded Fourier neural operator (MFNO), designed to learn the dynamics of stochastic systems. MFNO extends the standard Fourier neural operator (FNO) by incorporating mirror padding, enabling it to handle non-periodic inputs. We rigorously prove that MFNOs can approximate solutions of path-dependent stochastic differential equations and Lipschitz transformations of fractional Brownian motions to an arbitrary degree of accuracy. Our theoretical analysis builds on Wong–Zakai type theorems and various approximation techniques. Empirically, the MFNO exhibits strong resolution generalization–a property rarely seen in standard architectures such as LSTMs, TCNs, and DeepONet. Furthermore, our model achieves performance that is comparable or superior to these baselines while offering significantly faster sample path generation than classical numerical schemes.
本文介绍一个以操作者为基础的神经网络,即以镜相加的Fourier神经操作员(MFNO),目的是学习随机系统的动态。MFO通过吸收镜相垫扩展标准的Fourier神经操作员(FNO),使其能够处理非定期投入。我们严格证明,MFRO可以任意地将依赖路径的随机差异方程式和分数布朗运动的利普施奇茨转化的解决方案推向任意的精确度。我们的理论分析以Wong-Zakai类型的理论和各种近似技术为基础。 简而言之,MFRO在LSTMS、TCNs和DeepONet等标准结构中鲜见的强烈分辨率一般化属性。此外,我们的模型取得了与这些基线相近或优于这些基线的性能,同时提供了比传统数字方法更快的样本生成速度。
Article 170
Title@2025-07-23 (3): PerceptionLM: Open-Access Data and Models for Detailed Visual Understanding
Title: PerceptionLM: Open-Access Data and Models for Detailed Visual Understanding | PerceptionLM: Open-Access-Daten und Modelle für ein detailliertes visuelles Verständnis | 感知LM:开放存取数据和详细视觉理解模型 2504.13180v3 |
Authors (29): Jang Hyun Cho, Andrea Madotto, Effrosyni Mavroudi, Triantafyllos Afouras, Tushar Nagarajan, Muhammad Maaz, Yale Song, Tengyu Ma, Shuming Hu, Suyog Jain, Miguel Martin, Huiyu Wang, Hanoona Rasheed, Peize Sun, Po-Yao Huang, Daniel Bolya, Nikhila Ravi, Shashank Jain, Tammy Stark, Shane Moon, Babak Damavandi, Vivian Lee, Andrew Westbury, Salman Khan, Philipp Krähenbühl, Piotr Dollár, Lorenzo Torresani, Kristen Grauman, Christoph Feichtenhofer
Vision-language models are integral to computer vision research, yet many high-performing models remain closed-source, obscuring their data, design and training recipe. The research community has responded by using distillation from black-box models to label training data, achieving strong benchmark results, at the cost of measurable scientific progress. However, without knowing the details of the teacher model and its data sources, scientific progress remains difficult to measure. In this paper, we study building a Perception Language Model (PLM) in a fully open and reproducible framework for transparent research in image and video understanding. We analyze standard training pipelines without distillation from proprietary models and explore large-scale synthetic data to identify critical data gaps, particularly in detailed video understanding. To bridge these gaps, we release 2.8M human-labeled instances of fine-grained video question-answer pairs and spatio-temporally grounded video captions. Additionally, we introduce PLM-VideoBench, a suite for evaluating challenging video understanding tasks focusing on the ability to reason about “what”, “where”, “when”, and “how” of a video. We make our work fully reproducible by providing data, training recipes, code & models. https://github.com/facebookresearch/perception_models
许多高性能模型是计算机愿景研究不可或缺的组成部分,然而,许多高性能模型仍然是封闭源,掩盖了它们的数据、设计和培训食谱。研究界的反应是,利用黑盒模型的蒸馏法来标出培训数据,以可衡量的科学进步为代价,取得强有力的基准成果;然而,在不了解教师模型及其数据来源的细节的情况下,科学进步仍然难以衡量。在本文件中,我们研究如何在完全开放和可复制的图像和视频理解透明研究框架内建立一个概念语言模型(PLM ) 。我们分析标准培训管道,而不从专有模型中提炼,并探索大规模合成数据,以查明关键的数据差距,特别是在详细的视频理解方面。为弥补这些差距,我们发布了2.8M 人类标签的精细视频问答配对和调制成的视频字幕。此外,我们引入了PLM-VideoBench,这是一套评估具有挑战性的视频理解任务的套件,侧重于“什么”、“在哪里”、“在哪里”、“什么时候”、“什么时候”和“展示”等大规模合成数据差距数据差距。我们的工作是用视频/制的模型。我们完全重订数据/制的模型。我们的工作,我们通过提供数据/制版数据和制模。我们的工作,我们完全重制的版本。我们用数据和制模版。我们的工作,我们用数据和制模版的模型来进行重制数据和制。我们的工作。
Article 171
Title@2025-07-23 (3): A Supervised Machine Learning Framework for Multipactor Breakdown Prediction in High-Power Radio Frequency Devices and Accelerator Components: A Case Study in Planar Geometry
Title: A Supervised Machine Learning Framework for Multipactor Breakdown Prediction in High-Power Radio Frequency Devices and Accelerator Components: A Case Study in Planar Geometry | Ein überwachtes Machine Learning Framework für Multipactor-Ausfallvorhersage in hochleistungsfähigen Funkfrequenzgeräten und Accelerator-Komponenten: Eine Fallstudie in der planaren Geometrie | 高功率无线电频率装置和加速器部件多光速分解预测监督的机器学习框架:平板几何案例研究 2507.17881v1 |
Authors (3): Asif Iqbal, John Verboncoeur, Peng Zhang
Multipactor is a nonlinear electron avalanche phenomenon that can severely impair the performance of high-power radio frequency (RF) devices and accelerator systems. Accurate prediction of multipactor susceptibility across different materials and operational regimes remains a critical yet computationally intensive challenge in accelerator component design and RF engineering. This study presents the first application of supervised machine learning (ML) for predicting multipactor susceptibility in two-surface planar geometries. A simulation-derived dataset spanning six distinct secondary electron yield (SEY) material profiles is used to train regression models - including Random Forest (RF), Extra Trees (ET), Extreme Gradient Boosting (XGBoost), and funnel-structured Multilayer Perceptrons (MLPs) - to predict the time-averaged electron growth rate, ${\delta}_{avg}$. Performance is evaluated using Intersection over Union (IoU), Structural Similarity Index (SSIM), and Pearson correlation coefficient. Tree-based models consistently outperform MLPs in generalizing across disjoint material domains. MLPs trained using a scalarized objective function that combines IoU and SSIM during Bayesian hyperparameter optimization with 5-fold cross-validation outperform those trained with single-objective loss functions. Principal Component Analysis reveals that performance degradation for certain materials stems from disjoint feature-space distributions, underscoring the need for broader dataset coverage. This study demonstrates both the promise and limitations of ML-based multipactor prediction and lays the groundwork for accelerated, data-driven modeling in advanced RF and accelerator system design.
多元调控器是一种非线性电子雪崩现象,它会严重损害高功率无线电频率装置和加速器系统的性能。准确预测不同材料和操作系统中的多压性易感性仍然是加速器部件设计和RF工程中一个关键但计算密集的挑战。本研究首次应用了监督机器学习(ML)来预测两层平面平面图的多压性易感性。模拟衍生数据集覆盖了6个不同的二次双向电子收益(SEY),用于培训回归模型,包括Rand Forest(RF)、Extra Treats(ET)、极端梯级增压性推进器(XGBoost)和缓冲型多动器结构多动器(MLP),用于预测时间平均电子增长率($=deltaavg}美元。业绩是使用Intercrection over Counion(IUU)、结构相似指数(SSIM)和Pearson相关系数进行评估的。基于树木的模型需要持续超越MP的模型,同时使用经过培训的S-LP系统、经过精化的SBlistreal Stal Stal Stal Studal IMF Sild、Sildal Silding Silding Silding Silding Silding Silding Silding 5 和SLLLLLLLLLLLLLLL) 和在S-S-S-SIMF Sildal 双校平面的S-SIMF Sildal Sildal IMF Sildal 和SBLL。MF Slimadal 和SBY Sildal 和SBSL 双校平化的常规化的常规分析功能,在SLLL 双校平平化的常规化的常规化的S-SBLLLL 和SF 上,在SBSBSBSBSBSBLLL 上进行中,在S-S-SBSBLLLLLLLL 和SBSL 和SLLL 的常规化的常规分析功能中,这些功能中,在S-S-S-S-S-S-SBSBS-S
Article 172
Title@2025-07-23 (3): Look the Other Way: Designing ‘Positive’ Molecules with Negative Data via Task Arithmetic
Title: Look the Other Way: Designing ‘Positive’ Molecules with Negative Data via Task Arithmetic | Sehen Sie den anderen Weg: Entwerfen von ‘Positiven’ Molekülen mit negativen Daten über Task-Arithmetik | 查看其他方式 : 通过任务亚学用负数据设计“ 功能性” 分子 2507.17876v1 |
Authors (3): Rıza Özçelik, Sarah de Ruiter, Francesca Grisoni
The scarcity of molecules with desirable properties (i.e., ‘positive’ molecules) is an inherent bottleneck for generative molecule design. To sidestep such obstacle, here we propose molecular task arithmetic: training a model on diverse and abundant negative examples to learn ‘property directions’ $–$ without accessing any positively labeled data $–$ and moving models in the opposite property directions to generate positive molecules. When analyzed on 20 zero-shot design experiments, molecular task arithmetic generated more diverse and successful designs than models trained on positive molecules. Moreover, we employed molecular task arithmetic in dual-objective and few-shot design tasks. We find that molecular task arithmetic can consistently increase the diversity of designs while maintaining desirable design properties. With its simplicity, data efficiency, and performance, molecular task arithmetic bears the potential to become the $\textit{de-facto}$ transfer learning strategy for de novo molecule design.
缺少具有适当特性的分子(即“正分子”)是基因分子设计的一个固有的瓶颈。为了避开这种障碍,我们在这里建议分子任务计算:在不获取任何被贴有正面标签的数据的情况下,用大量不同的负面例子来学习“财产方向”$-美元,并在相反的属性方向上移动模型以产生正分子。在分析20个零射设计实验时,分子任务算术产生的设计比对正分子模型所训练的模型更加多样化和成功。此外,我们在双重目标和少量设计任务中使用分子任务算术。我们发现,分子任务算术可以不断增加设计的多样性,同时保持理想的设计特性。由于它的简单性、数据效率和性能,分子任务算术具有成为 $textit{de-facto} 转移学习战略的潜力。
Article 173
Title@2025-07-23 (3): Integrating Feature Selection and Machine Learning for Nitrogen Assessment in Grapevine Leaves using In-Field Hyperspectral Imaging
Title: Integrating Feature Selection and Machine Learning for Nitrogen Assessment in Grapevine Leaves using In-Field Hyperspectral Imaging | Integration von Feature Selection und Machine Learning für die Stickstoffabschätzung in Grapevine Leaves mit Hilfe von Hyperspektralbildgebung im Feld | 利用实地超光谱成像法将地物选择和机器学习综合结合,用于在格拉佩维尼叶中进行氮评估 2507.17869v1 |
Authors (11): Atif Bilal Asad, Achyut Paudel, Safal Kshetri, Chenchen Kang, Salik Ram Khanal, Nataliya Shcherbatyuk, Pierre Davadant, R. Paul Schreiner, Santosh Kalauni, Manoj Karkee, Markus Keller
Nitrogen (N) is one of the most crucial nutrients in vineyards, affecting plant growth and subsequent products such as wine and juice. Because soil N has high spatial and temporal variability, it is desirable to accurately estimate the N concentration of grapevine leaves and manage fertilization at the individual plant level to optimally meet plant needs. In this study, we used in-field hyperspectral images with wavelengths ranging from $400 to 1000nm of four different grapevine cultivars collected from distinct vineyards and over two growth stages during two growing seasons to develop models for predicting N concentration at the leaf-level and canopy-level. After image processing, two feature selection methods were employed to identify the optimal set of spectral bands that were responsive to leaf N concentrations. The selected spectral bands were used to train and test two different Machine Learning (ML) models, Gradient Boosting and XGBoost, for predicting nitrogen concentrations. The comparison of selected bands for both leaf-level and canopy-level datasets showed that most of the spectral regions identified by the feature selection methods were across both methods and the dataset types (leaf- and canopy-level datasets), particularly in the key regions, 500-525nm, 650-690nm, 750-800nm, and 900-950nm. These findings indicated the robustness of these spectral regions for predicting nitrogen content. The results for N prediction demonstrated that the ML model achieved an R square of 0.49 for canopy-level data and an R square of 0.57 for leaf-level data, despite using different sets of selected spectral bands for each analysis level. The study demonstrated the potential of using in-field hyperspectral imaging and the use of spectral data in integrated feature selection and ML techniques to monitor N status in vineyards.
硝基( N) 是葡萄园中最重要的营养素之一, 影响植物生长以及葡萄酒和果汁等后续产品。 由于土壤N具有较高的空间和时间变异性, 有必要准确估计葡萄树叶的浓度, 并在单个植物一级管理肥化, 以最佳地满足植物需求。 在这项研究中, 我们使用四个不同的葡萄园和两个生长季节中两个生长阶段采集的波长从400至1000纳米不等的超光谱图像, 以预测叶层和甘油层的浓度。 由于土壤N具有较高的空间和时间变异性, 因此有必要准确估计葡萄叶叶叶叶叶叶叶叶叶叶叶叶叶叶叶叶叶叶叶叶叶叶叶叶叶叶叶叶叶叶叶叶叶叶叶的浓度, 使用两种地特征选择方法来确定最优的光谱带组, 用于培训和测试两种不同的机器学习模型, 即Gradicentic Tows和XGBOstt, 用于这些叶叶层和甘基层的精度 , 在500- 水平上, 和Slex- silal SA 的精确 数据分析中, 一级, 数据显示, 和Silal- sal- sal- dal- sal- sal- sal- salmal- 的每个数据 的每个 的每个 数据, 和Sali- sal- sal- sal- sal- sal- salma 数据 的数值 的数值的数值的数值的数值为这些结果。
Article 174
Title@2025-07-23 (3): Learning Individual Reproductive Behavior from Aggregate Fertility Rates via Neural Posterior Estimation
Title: Learning Individual Reproductive Behavior from Aggregate Fertility Rates via Neural Posterior Estimation | Individuelles reproduktives Verhalten von Aggregat Fertilitätsraten über neurale hintere Schätzung lernen | 学习个人生殖行为 学习个人生殖行为 2506.22607v2 |
Authors (4): Daniel Ciganda, Ignacio Campón, Iñaki Permanyer, Jakob H Macke
Age-specific fertility rates (ASFRs) provide the most extensive record of reproductive change, but their aggregate nature obscures the individual-level behavioral mechanisms that drive fertility trends. To bridge this micro-macro divide, we introduce a likelihood-free Bayesian framework that couples a demographically interpretable, individual-level simulation model of the reproductive process with Sequential Neural Posterior Estimation (SNPE). We show that this framework successfully recovers core behavioral parameters governing contemporary fertility, including preferences for family size, reproductive timing, and contraceptive failure, using only ASFRs. The framework’s effectiveness is validated on cohorts from four countries with diverse fertility regimes. Most compellingly, the model, estimated solely on aggregate data, successfully predicts out-of-sample distributions of individual-level outcomes, including age at first sex, desired family size, and birth intervals. Because our framework yields complete synthetic life histories, it significantly reduces the data requirements for building microsimulation models and enables behaviorally explicit demographic forecasts.
具体年龄生育率(ASFRs)是生育变化的最广泛记录,但其总体性质掩盖了驱动生育趋势的个人层面行为机制。为了弥合这一微观宏观差异,我们引入了一个无可能性的巴伊西亚框架,即夫妇双方的生殖过程可按人口、解释、个人层面模拟模型,与序列神经人口变化和生育间隔(SNPE)相匹配。我们表明,这一框架成功地恢复了当代生育的核心行为参数,包括家庭规模、生育时机和避孕失败的偏好,只使用ASFRs。框架的有效性由四个生育率制度不同的国家的组群验证。最令人信服的是,该模型仅根据综合数据估算,成功预测了个人层面成果的不全面分布,包括初性别年龄、理想家庭规模和生育间隔。由于我们的框架生成了完整的合成生命史,它大大降低了建立微观模拟模型的数据要求,并使得行为清晰的人口预测成为可能。
Article 175
Title@2025-07-23 (3): Streaming, Fast and Slow: Cognitive Load-Aware Streaming for Efficient LLM Serving
Title: Streaming, Fast and Slow: Cognitive Load-Aware Streaming for Efficient LLM Serving | Streaming, schnell und langsam: Kognitives Load-Aware Streaming für effizientes LLM Serving | 串流、快速和慢速:高效LLM服务认知式负载-软件流 2504.17999v2 |
Authors (2): Chang Xiao, Brenda Yang
Generative conversational interfaces powered by large language models (LLMs) typically stream output token-by-token at a rate determined by computational budget, often neglecting actual human reading speeds and the cognitive load associated with the content. This mismatch frequently leads to inefficient use of computational resources. For example, in cloud-based services, streaming content faster than users can read appears unnecessary, resulting in wasted computational resources and potential delays for other users, particularly during peak usage periods. To address this issue, we propose an adaptive streaming method that dynamically adjusts the pacing of LLM streaming output in real-time based on inferred cognitive load. Our approach estimates the cognitive load associated with streaming content and strategically slows down the stream during complex or information-rich segments, thereby freeing computational resources for other users. We conducted a statistical analysis and simulation based on a statistical model derived from data collected in a crowdsourced user study across various types of LLM-generated content. Our results show that this adaptive method can effectively reduce computational consumption while largely maintaining streaming speed above user’s normal reading speed.
由大型语言模型(LLMS)驱动的典型流式对话界面,通常以计算预算确定的速率生成流式逐项输出,往往忽视实际的人类阅读速度和与内容相关的认知负荷。这种不匹配经常导致计算资源使用效率低下。例如,在云型服务中,流内容流速度比用户看得快,似乎没有必要,导致计算资源浪费,其他用户可能出现延误,特别是在高峰使用期。为解决这一问题,我们建议了一种适应性流方法,根据推断的认知负荷动态,动态调整LLM流输出实时速度。我们的方法估算了与流内容相关的认知负荷,并在复杂或信息丰富部分战略性地放慢流流流流流速度,从而为其他用户腾出计算资源。我们根据从群集用户对各类LLM生成内容进行的数据收集的统计模型,进行了统计分析和模拟。我们的结果显示,这种适应性方法可以有效减少计算消耗量,同时基本上保持流速高于用户正常阅读速度。
Article 176
Title@2025-07-23 (3): PALADIN : Robust Neural Fingerprinting for Text-to-Image Diffusion Models
Title: PALADIN : Robust Neural Fingerprinting for Text-to-Image Diffusion Models | PALADIN : Robustes neurales Fingerprinting für Diffusionsmodelle von Text zu Bild | PALADIN: 文本到图像传播模型的强力神经指纹打印 2506.03170v2 |
Authors (2): Murthy L, Subarna Tripathi
The risk of misusing text-to-image generative models for malicious uses, especially due to the open-source development of such models, has become a serious concern. As a risk mitigation strategy, attributing generative models with neural fingerprinting is emerging as a popular technique. There has been a plethora of recent work that aim for addressing neural fingerprinting. A trade-off between the attribution accuracy and generation quality of such models has been studied extensively. None of the existing methods yet achieved 100% attribution accuracy. However, any model with less than cent percent accuracy is practically non-deployable. In this work, we propose an accurate method to incorporate neural fingerprinting for text-to-image diffusion models leveraging the concepts of cyclic error correcting codes from the literature of coding theory.
恶意使用文字到图像的基因化模型被误用的风险,特别是由于这类模型的公开来源开发,已成为一个严重关切的问题。作为一种风险缓解战略,将神经指纹的基因化模型归为一种流行技术正在出现。最近为处理神经指纹问题做了大量工作。对此类模型的归因准确性和生成质量进行了广泛研究。现有方法都未达到100%归因准确性。然而,任何精确度低于%的模型实际上都无法使用。在这项工作中,我们提出了一个精确的方法,将神经指纹归为图像扩散模型,利用循环错误概念来纠正编码理论文献中的编码。
Article 177
Title@2025-07-23 (3): Towards Facilitated Fairness Assessment of AI-based Skin Lesion Classifiers Through GenAI-based Image Synthesis
Title: Towards Facilitated Fairness Assessment of AI-based Skin Lesion Classifiers Through GenAI-based Image Synthesis | Auf dem Weg zu einer erleichterten Fairnessbewertung von KI-basierten Haut-Lesions-Klassifikatoren durch GenAI-basierte Bildsynthese | 通过GenAI基于GenAI的图像合成,促进基于AI的皮肤皮质分类分类的公平评估 2507.17860v1 |
Authors (1): Ko Watanabe. Stanislav Frolov. Adriano Lucieri. Andreas Dengel
Recent advancements in Deep Learning and its application on the edge hold great potential for the revolution of routine screenings for skin cancers like Melanoma. Along with the anticipated benefits of this technology, potential dangers arise from unforseen and inherent biases. Thus, assessing and improving the fairness of such systems is of utmost importance. A key challenge in fairness assessment is to ensure that the evaluation dataset is sufficiently representative of different Personal Identifiable Information (PII) (sex, age, and race) and other minority groups. Against the backdrop of this challenge, this study leverages the state-of-the-art Generative AI (GenAI) LightningDiT model to assess the fairness of publicly available melanoma classifiers. The results suggest that fairness assessment using highly realistic synthetic data is a promising direction. Yet, our findings indicate that verifying fairness becomes difficult when the melanoma-detection model used for evaluation is trained on data that differ from the dataset underpinning the synthetic images. Nonetheless, we propose that our approach offers a valuable new avenue for employing synthetic data to gauge and enhance fairness in medical-imaging GenAI systems.
最近深层学习的进展及其在边缘的应用极有可能对诸如梅兰诺马等皮肤癌进行常规筛查革命。除了这一技术的预期好处外,还可能出现不可预见的和固有的偏差。因此,评估和提高这种系统的公平性极为重要。公平评估的一个关键挑战是确保评价数据集充分代表不同的个人识别信息(PII)(性别、年龄和种族)和其他少数群体。在这一挑战的背景下,本研究利用最先进的GenAI(GenAI)点亮DTI模型来评估公开提供的黑兰诺玛分类者的公平性。研究结果表明,利用高度现实的合成数据进行公平评估是一个大有希望的方向。然而,我们的调查结果表明,当用于评价的黑兰诺玛检测模型就不同于支持合成图像的数据集的数据进行培训时,很难核实公平性。然而,我们提议,我们的方法为利用合成数据测量和提高GenAI系统医学计量的公平性提供了宝贵的新途径。
Article 178
Title@2025-07-23 (3): Choosing Public Datasets for Private Machine Learning via Gradient Subspace Distance
Title: Choosing Public Datasets for Private Machine Learning via Gradient Subspace Distance | Auswahl öffentlicher Datensätze für privates maschinelles Lernen über Gradient Subspace Distance | 通过梯度子空间距离为私人机器学习选择公共数据集 2303.01256v2 |
Authors (3): Xin Gu, Gautam Kamath, Zhiwei Steven Wu
Differentially private stochastic gradient descent privatizes model training by injecting noise into each iteration, where the noise magnitude increases with the number of model parameters. Recent works suggest that we can reduce the noise by leveraging public data for private machine learning, by projecting gradients onto a subspace prescribed by the public data. However, given a choice of public datasets, it is not a priori clear which one may be most appropriate for the private task. We give an algorithm for selecting a public dataset by measuring a low-dimensional subspace distance between gradients of the public and private examples. We provide theoretical analysis demonstrating that the excess risk scales with this subspace distance. This distance is easy to compute and robust to modifications in the setting. Empirical evaluation shows that trained model accuracy is monotone in this distance.
不同私人的随机梯度梯度梯度梯度私有化模式培训,在每迭代中注入噪音,使噪音数量随着模型参数的增加而增加。最近的工作表明,我们可以通过利用公共数据进行私人机器学习,将梯度投射到公共数据指定的子空间上,从而减少噪音。然而,考虑到公共数据集的选择,并不清楚哪些数据最适合私人任务。我们给出了一种算法,通过测量公共与私人数字梯度之间的低维次空间距离来选择公共数据集。我们提供了理论分析,表明这种子空间距离的超重风险尺度。这种距离很容易计算,而且对于环境的修改也比较有力。经验评估显示,经过培训的模型精度是这一距离的单体。
Article 179
Title@2025-07-23 (3): On the Energy Distribution of the Galactic Center Excess’ Sources
Title: On the Energy Distribution of the Galactic Center Excess’ Sources | Zur Energieverteilung der Quellen des Galaktischen Zentrums | 银河中心能源分配问题 2507.17804v1 |
Authors (5): Florian List, Yujin Park, Nicholas L. Rodd, Eve Schoen, Florian Wolf
The Galactic Center Excess (GCE) remains one of the defining mysteries uncovered by the Fermi $\gamma$-ray Space Telescope. Although it may yet herald the discovery of annihilating dark matter, weighing against that conclusion are analyses showing the spatial structure of the emission appears more consistent with a population of dim point sources. Technical limitations have restricted prior analyses to studying the point-source hypothesis purely spatially. All spectral information that could help disentangle the GCE from the complex and uncertain astrophysical emission was discarded. We demonstrate that a neural network-aided simulation-based inference approach can overcome such limitations and thereby confront the point source explanation of the GCE with spatial and spectral data. The addition is profound: energy information drives the putative point sources to be significantly dimmer, indicating either the GCE is truly diffuse in nature or made of an exceptionally large number of sources. Quantitatively, for our best fit background model, the excess is essentially consistent with Poisson emission as predicted by dark matter. If the excess is instead due to point sources, our median prediction is ${\cal O}(10^5)$ sources in the Galactic Center, or more than 35,000 sources at 90% confidence, both significantly larger than the hundreds of sources preferred by earlier point-source analyses of the GCE.
银河中心超量(GCE)仍然是Fermi $\gamma$-射入空间望远镜所发现的决定性谜题之一。虽然它可能预示着暗物质被发现,但根据这一结论,其分析表明,排放的空间结构似乎更符合暗点源群。技术限制将先前的分析限制在纯空间研究点源假设上。所有有助于将GCE与复杂和不确定的天体物理排放分离的光谱信息都被丢弃。我们证明,由网络辅助的神经网络模拟推断法可以克服这些限制,从而用空间和光谱数据面对GCE的点源解释。增加的内容是:能源信息将假定的点源大大模糊起来,表明GCE在性质上确实扩散,或者由非常大量的源组成。对于我们最合适的背景模型来说,超量与暗物质预测的Poisson排放基本一致。如果超量是点源,则我们的中位预测值来源是美元(10°5),从而用空间和光谱数据来对抗GCE的点解释。增加点源在银河中心前的百至更深层的源源中,或更深层源的GC355。
Article 180
Title@2025-07-23 (3): Large Learning Rates Simultaneously Achieve Robustness to Spurious Correlations and Compressibility
Title: Large Learning Rates Simultaneously Achieve Robustness to Spurious Correlations and Compressibility | Große Lernraten gleichzeitig Robustheit zu sauberen Korrelationen und Kompressibilität erreichen | 高学习率同时实现对净腐蚀和抑制的强力 2507.17748v1 |
Authors (4): Melih Barsbey, Lucas Prieto, Stefanos Zafeiriou, Tolga Birdal
Robustness and resource-efficiency are two highly desirable properties for modern machine learning models. However, achieving them jointly remains a challenge. In this paper, we position high learning rates as a facilitator for simultaneously achieving robustness to spurious correlations and network compressibility. We demonstrate that large learning rates also produce desirable representation properties such as invariant feature utilization, class separation, and activation sparsity. Importantly, our findings indicate that large learning rates compare favorably to other hyperparameters and regularization methods, in consistently satisfying these properties in tandem. In addition to demonstrating the positive effect of large learning rates across diverse spurious correlation datasets, models, and optimizers, we also present strong evidence that the previously documented success of large learning rates in standard classification tasks is likely due to its effect on addressing hidden/rare spurious correlations in the training dataset.
强力和资源效率是现代机器学习模式的两个非常可取的特性。然而,共同实现这些特性仍是一个挑战。在本文件中,我们把高学习率定位为同时实现对虚假关联和网络压缩的稳健性的一个促进因素。我们证明,高学习率还产生一些可取的代表性特性,如无差异特征的利用、阶级分离和激活宽度。重要的是,我们的调查结果表明,高学习率与其他超常参数和正规化方法相比,能够同时一致满足这些特性。除了表明各种假相相关数据集、模型和优化者的高学习率的积极影响外,我们还提出有力证据,证明以往记录的标准分类任务中高学习率的成功可能会对解决培训数据集中隐藏/错误的相互关系产生影响。
Article 181
Title@2025-07-23 (3): Rubrics as Rewards: Reinforcement Learning Beyond Verifiable Domains
Title: Rubrics as Rewards: Reinforcement Learning Beyond Verifiable Domains | Rubriken als Belohnungen: Verstärktes Lernen jenseits überprüfbarer Domänen | ” 奖励 “ :超越可核实域域的强化学习 2507.17746v1 |
Authors (6): Anisha Gunjal, Anthony Wang, Elaine Lau, Vaskar Nath, Bing Liu, Sean Hendryx
Extending Reinforcement Learning with Verifiable Rewards (RLVR) to real-world tasks often requires balancing objective and subjective evaluation criteria. However, many such tasks lack a single, unambiguous ground truth-making it difficult to define reliable reward signals for post-training language models. While traditional preference-based methods offer a workaround, they rely on opaque reward functions that are difficult to interpret and prone to spurious correlations. We introduce $\textbf{Rubrics as Rewards}$ (RaR), a framework that uses structured, checklist-style rubrics as interpretable reward signals for on-policy training with GRPO. Our best RaR method yields up to a $28\%$ relative improvement on HealthBench-1k compared to simple Likert-based approaches, while matching or surpassing the performance of reward signals derived from expert-written references. By treating rubrics as structured reward signals, we show that RaR enables smaller-scale judge models to better align with human preferences and sustain robust performance across model scales.
将强化学习与可验证的奖励(RLVR)扩大到现实世界的任务往往需要平衡客观和主观的评价标准。然而,许多这类任务缺乏单一、明确的实地真相,难以为培训后语言模式确定可靠的奖赏信号。传统的优惠制方法提供了一种变通办法,但它们依赖不透明的奖赏功能,难以解释,容易产生虚假的关联。我们引入了$textbf{rubrics作为奖赏$(RAR),这个框架使用结构化的、清单式的标志作为与GROP进行政策培训的可解释的奖赏信号。我们的最佳奖赏方法在健康Bench-1k上比简单的类似奖赏方法取得28美元相对的改善,同时匹配或超过从专家编写的参考资料中获得的奖赏信号的性能。我们通过将奖赏作为结构化奖赏信号来对待,我们表明拉R使规模较小的法官模型能够更好地与人类的偏好并保持各种模式的强性能。
Article 182
Title@2025-07-23 (3): SpecCLIP: Aligning and Translating Spectroscopic Measurements for Stars
Title: SpecCLIP: Aligning and Translating Spectroscopic Measurements for Stars | SpecCLIP: Richten und Übersetzen spektroskopischer Messungen für Sterne | spectCLIP: 恒星光谱测量的对齐和转换 2507.01939v2 |
Authors (9): Xiaosheng Zhao, Yang Huang, Guirong Xue, Xiao Kong, Jifeng Liu, Xiaoyu Tang, Timothy C. Beers, Yuan-Sen Ting, A-Li Luo
In recent years, large language models (LLMs) have transformed natural language understanding through vast datasets and large-scale parameterization. Inspired by this success, we present SpecCLIP, a foundation model framework that extends LLM-inspired methodologies to stellar spectral analysis. Stellar spectra, akin to structured language, encode rich physical and chemical information about stars. By training foundation models on large-scale spectral datasets, our goal is to learn robust and informative embeddings that support diverse downstream applications. As a proof of concept, SpecCLIP involves pre-training on two spectral types–LAMOST low-resolution and Gaia XP–followed by contrastive alignment using the CLIP (Contrastive Language-Image Pre-training) framework, adapted to associate spectra from different instruments. This alignment is complemented by auxiliary decoders that preserve spectrum-specific information and enable translation (prediction) between spectral types, with the former achieved by maximizing mutual information between embeddings and input spectra. The result is a cross-spectrum framework enabling intrinsic calibration and flexible applications across instruments. We demonstrate that fine-tuning these models on moderate-sized labeled datasets improves adaptability to tasks such as stellar-parameter estimation and chemical-abundance determination. SpecCLIP also enhances the accuracy and precision of parameter estimates benchmarked against external survey data. Additionally, its similarity search and cross-spectrum prediction capabilities offer potential for anomaly detection. Our results suggest that contrastively trained foundation models enriched with spectrum-aware decoders can advance precision stellar spectroscopy.
近年来,大型语言模型(LLMS)通过庞大的数据集和大规模参数化改变了自然语言理解。受这一成功启发,我们展示了SpecCLIP,这是一个基础模型框架,将受LLM启发的方法扩展至星光分析。Starr光谱与结构化语言相似,对关于恒星的丰富物理和化学信息进行编码。通过对大型光谱数据集的基础模型进行培训,我们的目标是学习支持不同下游应用的强大和知情的嵌入。作为概念的证明,SpecCLIP涉及两个光谱类型LAMOST低分辨率和Gaia XP的预先培训,其基础框架是利用CLIP(Contrastem Stateal-Imaage Stregy)框架将LLLLLMSFSF(CR)推导出对比性调整方法,使其适应于不同仪器的相联光谱化光谱化数据。这种校准由辅助的解算器进行补充,使光谱型类型之间能够通过存储和输入光谱光谱谱谱谱化的相互信息。结果显示我们精度精确的精确的精确的精确的精确精确精确度估算基础基础基础基础基础基础,并显示其内部校正和弹性的精确度评估工具,使这些精确度的校正的校正的校正。我们的校正的校正的校正和弹性的校正的校正。
Article 183
Title@2025-07-23 (3): Flow Matching Meets Biology and Life Science: A Survey
Title: Flow Matching Meets Biology and Life Science: A Survey | Flow Matching trifft auf Biologie und Life Science: Eine Umfrage | 流动匹配满足生物学和生命科学:调查 2507.17731v1 |
Authors (12): Zihao Li, Zhichen Zeng, Xiao Lin, Feihao Fang, Yanru Qu, Zhe Xu, Zhining Liu, Xuying Ning, Tianxin Wei, Ge Liu, Hanghang Tong, Jingrui He
Over the past decade, advances in generative modeling, such as generative adversarial networks, masked autoencoders, and diffusion models, have significantly transformed biological research and discovery, enabling breakthroughs in molecule design, protein generation, drug discovery, and beyond. At the same time, biological applications have served as valuable testbeds for evaluating the capabilities of generative models. Recently, flow matching has emerged as a powerful and efficient alternative to diffusion-based generative modeling, with growing interest in its application to problems in biology and life sciences. This paper presents the first comprehensive survey of recent developments in flow matching and its applications in biological domains. We begin by systematically reviewing the foundations and variants of flow matching, and then categorize its applications into three major areas: biological sequence modeling, molecule generation and design, and peptide and protein generation. For each, we provide an in-depth review of recent progress. We also summarize commonly used datasets and software tools, and conclude with a discussion of potential future directions. The corresponding curated resources are available at https://github.com/Violet24K/Awesome-Flow-Matching-Meets-Biology.
过去十年来,基因对抗网络、蒙面自动编码器和传播模型等基因模型的进展,大大改变了生物研究和发现,使分子设计、蛋白质生成、药物发现等方面的突破得以实现。与此同时,生物应用成为评价基因模型能力的宝贵测试台。最近,流动匹配作为基于传播的基因模型的一个强大而有效的替代方法,日益关注其应用于生物学和生命科学方面的问题。本文介绍了对流动匹配及其在生物领域应用的最新发展动态的第一次全面调查。我们首先系统地审查流量匹配的基础和变体,然后将其应用分类为三个主要领域:生物序列模型、分子生成和设计、以及生化和蛋白质生成。我们每个领域都对最近的进展进行深入审查。我们还总结了常用的数据集和软件工具,最后讨论了潜在的未来方向。相应的调整资源见https://github.com/Violet24K/Aweomen-Flow-MatchMetres-Mets-Bistrialogy。
Article 184
Title@2025-07-23 (3): Deep Generative Learning of Magnetic Frustration in Artificial Spin Ice from Magnetic Force Microscopy Images
Title: Deep Generative Learning of Magnetic Frustration in Artificial Spin Ice from Magnetic Force Microscopy Images | Tiefes generatives Lernen der magnetischen Frustration im künstlichen Spin-Eis von magnetischen Kraftmikroskopie-Bildern | 从磁力显微镜像图像中深入学习人造脊柱冰中的磁破碎 2507.17726v1 |
Authors (8): Arnab Neogi, Suryakant Mishra, Prasad P Iyer, Tzu-Ming Lu, Ezra Bussmann, Sergei Tretiak, Andrew Crandall Jones, Jian-Xin Zhu
Increasingly large datasets of microscopic images with atomic resolution facilitate the development of machine learning methods to identify and analyze subtle physical phenomena embedded within the images. In this work, microscopic images of honeycomb lattice spin-ice samples serve as datasets from which we automate the calculation of net magnetic moments and directional orientations of spin-ice configurations. In the first stage of our workflow, machine learning models are trained to accurately predict magnetic moments and directions within spin-ice structures. Variational Autoencoders (VAEs), an emergent unsupervised deep learning technique, are employed to generate high-quality synthetic magnetic force microscopy (MFM) images and extract latent feature representations, thereby reducing experimental and segmentation errors. The second stage of proposed methodology enables precise identification and prediction of frustrated vertices and nanomagnetic segments, effectively correlating structural and functional aspects of microscopic images. This facilitates the design of optimized spin-ice configurations with controlled frustration patterns, enabling potential on-demand synthesis.
越来越多的具有原子分辨率的微显微图像数据集有助于开发机器学习方法,以识别和分析图像中隐含的微妙物理现象。在这项工作中,蜂窝的微显微图像作为数据集,我们从中自动计算净磁时点和旋冰配置的方向方向。在我们工作流程的第一阶段,对机器学习模型进行了培训,以准确预测脊柱结构内的磁时点和方向。挥发式自动镜(VAE)是一种新兴的不受监督的深层学习技术,用于产生高质量的合成磁力显微镜(MFM)图像并提取潜在特征显示,从而减少实验和分化错误。拟议方法的第二阶段使得能够精确地识别和预测挫败的脊椎和纳米磁层,有效地将微粒图像的结构和功能方面联系起来。这有利于设计带有受控的挫折模式的优化的旋冰配置,从而有可能进行需要的合成。
Article 185
Title@2025-07-23 (3): On the Interaction of Compressibility and Adversarial Robustness
Title: On the Interaction of Compressibility and Adversarial Robustness | Über die Wechselwirkung von Kompressibilität und adversarialer Robustheit | 压缩和反压力相互作用问题 2507.17725v1 |
Authors (4): Melih Barsbey, Antônio H. Ribeiro, Umut Şimşekli, Tolga Birdal
Modern neural networks are expected to simultaneously satisfy a host of desirable properties: accurate fitting to training data, generalization to unseen inputs, parameter and computational efficiency, and robustness to adversarial perturbations. While compressibility and robustness have each been studied extensively, a unified understanding of their interaction still remains elusive. In this work, we develop a principled framework to analyze how different forms of compressibility - such as neuron-level sparsity and spectral compressibility - affect adversarial robustness. We show that these forms of compression can induce a small number of highly sensitive directions in the representation space, which adversaries can exploit to construct effective perturbations. Our analysis yields a simple yet instructive robustness bound, revealing how neuron and spectral compressibility impact $L_\infty$ and $L_2$ robustness via their effects on the learned representations. Crucially, the vulnerabilities we identify arise irrespective of how compression is achieved - whether via regularization, architectural bias, or implicit learning dynamics. Through empirical evaluations across synthetic and realistic tasks, we confirm our theoretical predictions, and further demonstrate that these vulnerabilities persist under adversarial training and transfer learning, and contribute to the emergence of universal adversarial perturbations. Our findings show a fundamental tension between structured compressibility and robustness, and suggest new pathways for designing models that are both efficient and secure.
现代神经网络预计将同时满足一系列可取的特性:精确地适应培训数据,普及到看不见的投入、参数和计算效率,以及稳健地适应对抗性扰动。虽然对每个系统都进行了广泛的压缩和稳健性研究,但对其相互作用的统一理解仍然难以实现。在这项工作中,我们制定原则框架,分析不同形式的压缩(如神经水平的宽度和光谱压缩)如何影响对抗性的稳健性。我们表明,这些压缩形式可以在代表空间中产生少量高度敏感的方向,对手可以利用这些高度敏感的方向来建立有效的扰动。我们的分析产生了简单但具有启发性的稳健性约束,通过对所学表现的影响,揭示神经和光谱压缩对美元和美元之间的稳健性影响。关键是,无论通过正规化、建筑偏差或隐含的学习动态,我们所查明的脆弱性都会产生。通过对合成和现实性任务的实证性评估,我们证实了我们的理论预测,并进一步证明这些脆弱性在对抗性和对抗性培训中持续存在,并且通过对立性分析性分析,表明我们的结构性的研究和对立性分析结果显示,并表明,这种紧张性模式之间的普遍地展示。
Article 186
Title@2025-07-23 (3): Towards Generalist Robot Learning from Internet Video: A Survey
Title: Towards Generalist Robot Learning from Internet Video: A Survey | Auf dem Weg zum generalistischen Roboter Lernen aus dem Internet Video: Eine Umfrage | 从互联网视频学习:调查 2404.19664v5 |
Authors (8): Robert McCarthy, Daniel C. H. Tan, Dominik Schmidt, Fernando Acero, Nathan Herr, Yilun Du, Thomas G. Thuruthel, Zhibin Li
Scaling deep learning to massive and diverse internet data has driven remarkable breakthroughs in domains such as video generation and natural language processing. Robot learning, however, has thus far failed to replicate this success and remains constrained by a scarcity of available data. Learning from videos (LfV) methods aim to address this data bottleneck by augmenting traditional robot data with large-scale internet video. This video data provides foundational information regarding physical dynamics, behaviours, and tasks, and can be highly informative for general-purpose robots. This survey systematically examines the emerging field of LfV. We first outline essential concepts, including detailing fundamental LfV challenges such as distribution shift and missing action labels in video data. Next, we comprehensively review current methods for extracting knowledge from large-scale internet video, overcoming LfV challenges, and improving robot learning through video-informed training. The survey concludes with a critical discussion of future opportunities. Here, we emphasize the need for scalable foundation model approaches that can leverage the full range of available internet video and enhance the learning of robot policies and dynamics models. Overall, the survey aims to inform and catalyse future LfV research, driving progress towards general-purpose robots.
深入学习大规模和多样化的互联网数据,在视频生成和自然语言处理等领域取得了显著突破。然而,机器人学习迄今未能复制这一成功,并且仍然受到现有数据稀缺的限制。从视频(LfV)中学习的方法旨在通过大规模互联网视频来增加传统机器人数据,从而解决数据瓶颈问题。这种视频数据提供有关物理动态、行为和任务的基础信息,并且可以为通用机器人提供大量信息。这项调查系统地审查了LfV的新兴领域。我们首先概述了基本概念,包括详细介绍LfV的基本挑战,如视频数据中的分销转换和缺失动作标签。接下来,我们全面审查目前从大型互联网视频中提取知识的方法,克服LfV的挑战,并通过视频知情培训改进机器人学习。调查最后对未来的机会进行了重要的讨论。在这里,我们强调需要有可扩展的基础模型方法,能够利用现有的全方位互联网视频,加强机器人政策和动态模型的学习。总体而言,调查的目的是为未来LfV的研究提供信息和催化未来LfV的研究,推动通俗用途机器人的进步。
Article 187
Title@2025-07-23 (3): Flow-Based Single-Step Completion for Efficient and Expressive Policy Learning
Title: Flow-Based Single-Step Completion for Efficient and Expressive Policy Learning | Flow-based Single-Step-Abschluss für effizientes und expressives politisches Lernen | 以流动为基础的单一步骤完成高效和明确政策学习 2506.21427v2 |
Authors (2): Prajwal Koirala, Cody Fleming
Generative models such as diffusion and flow-matching offer expressive policies for offline reinforcement learning (RL) by capturing rich, multimodal action distributions, but their iterative sampling introduces high inference costs and training instability due to gradient propagation across sampling steps. We propose the \textit{Single-Step Completion Policy} (SSCP), a generative policy trained with an augmented flow-matching objective to predict direct completion vectors from intermediate flow samples, enabling accurate, one-shot action generation. In an off-policy actor-critic framework, SSCP combines the expressiveness of generative models with the training and inference efficiency of unimodal policies, without requiring long backpropagation chains. Our method scales effectively to offline, offline-to-online, and online RL settings, offering substantial gains in speed and adaptability over diffusion-based baselines. We further extend SSCP to goal-conditioned RL, enabling flat policies to exploit subgoal structures without explicit hierarchical inference. SSCP achieves strong results across standard offline RL and behavior cloning benchmarks, positioning it as a versatile, expressive, and efficient framework for deep RL and sequential decision-making.
推广和流程匹配等生成模型,通过捕捉丰富的多式联运行动分布,为离线强化学习提供表达式政策(RL),通过捕获富集的多式联运行动分布,但其迭代抽样采样带来了高推价和培训不稳定性,因为跨采样步骤的梯度传播。我们提议了\textit{Sing-Single-Step Forpulation Policy}(SSCP),这是经过强化流程匹配目标培训的基因化政策,以预测中间流样本的直接完成矢量,从而能够准确、一分球行动生成。在一个离政策性行为者-批评框架内,SSCP将基因化模型的表达性与单式政策的培训性和推论效率结合起来,而无需长长的后向推进链。我们的方法尺度可以有效地实现离线、离线到离线到在线和在线的RL环境,在对基于扩散的基线的速度和适应性方面带来巨大收益。我们进一步扩展了SSCP,使平板政策能够在没有明确的等级推论的情况下利用子目标目标性目标性目标性矢量结构。SSCP在标准离离线下和行为克隆基准之间取得了强有力的结果,将它定位定位为一个可操作性、连续和深度决策。
Article 188
Title@2025-07-23 (3): Challenges learning from imbalanced data using tree-based models: Prevalence estimates systematically depend on hyperparameters and can be upwardly biased
Title: Challenges learning from imbalanced data using tree-based models: Prevalence estimates systematically depend on hyperparameters and can be upwardly biased | Herausforderungen beim Lernen aus unausgeglichenen Daten mit Baum-basierten Modellen: Prävalenzschätzungen hängen systematisch von Hyperparametern ab und können nach oben verzerrt sein | 利用树基模型从不平衡数据中吸取挑战:流行率估计数系统依赖超参数,可能向上偏偏 2412.16209v3 |
Authors (3): Nathan Phelps, Daniel J. Lizotte, Douglas G. Woolford
Imbalanced binary classification problems arise in many fields of study. When using machine learning models for these problems, it is common to subsample the majority class (i.e., undersampling) to create a (more) balanced dataset for model training. This biases the model’s predictions because the model learns from a dataset that does not follow the same data generating process as new data. One way of accounting for this bias is to analytically map the resulting predictions to new values based on the sampling rate for the majority class, which was used to create the training dataset. While this approach may work well for some machine learning models, we show that calibrating a random forest this way has unintended negative consequences, including prevalence estimates that can be upwardly biased. These prevalence estimates depend on both i) the number of predictors considered at each split in the random forest; and ii) the sampling rate used. We explain the former using known properties of random forests and analytical calibration. However, in investigating the latter issue, we made a surprising discovery - contrary to the widespread belief that decision trees are biased towards the majority class, they actually can be biased towards the minority class.
在许多研究领域出现了平衡的二进制分类问题。 当使用机器学习模型来对这些问题进行分类时,通常会分解多数阶层(即低抽样),为模型培训建立一个(更多)平衡的数据集。这偏向模型的预测,因为模型从一个数据集中学习,而该数据集并不遵循与新数据相同的数据生成过程。这种偏差的一个计算方法就是根据多数阶层的抽样率对预测结果进行新的值分析图解,而这种抽样率被用来创建培训数据集。虽然这种方法对某些机器学习模型可能效果良好,但我们发现随机调整森林会产生意想不到的负面后果,包括流行性估计可能向上偏差。这些流行率估计取决于在随机森林中每一分割的预测数;以及所使用的抽样率。我们用随机森林和分析校准的已知特性来解释前者。然而,在调查后一问题时,我们发现一个惊人的发现,与普遍认为决策树对多数阶层有偏向相反,它们实际上可能偏向少数阶层。
Article 189
Title@2025-07-23 (3): Sequential Bayesian Design for Efficient Surrogate Construction in the Inversion of Darcy Flows
Title: Sequential Bayesian Design for Efficient Surrogate Construction in the Inversion of Darcy Flows | Sequential Bayesian Design für effiziente Surrogate Konstruktion in der Inversion von Darcy Flows | 有效代用品建造以扭转达西流动的按顺序排列的贝耶斯设计 2507.17713v1 |
Authors (4): Hongji Wang, Hongqiao Wang, Jinyong Ying, Qingping Zhou
Inverse problems governed by partial differential equations (PDEs) play a crucial role in various fields, including computational science, image processing, and engineering. Particularly, Darcy flow equation is a fundamental equation in fluid mechanics, which plays a crucial role in understanding fluid flow through porous media. Bayesian methods provide an effective approach for solving PDEs inverse problems, while their numerical implementation requires numerous evaluations of computationally expensive forward solvers. Therefore, the adoption of surrogate models with lower computational costs is essential. However, constructing a globally accurate surrogate model for high-dimensional complex problems demands high model capacity and large amounts of data. To address this challenge, this study proposes an efficient locally accurate surrogate that focuses on the high-probability regions of the true likelihood in inverse problems, with relatively low model complexity and few training data requirements. Additionally, we introduce a sequential Bayesian design strategy to acquire the proposed surrogate since the high-probability region of the likelihood is unknown. The strategy treats the posterior evolution process of sequential Bayesian design as a Gaussian process, enabling algorithmic acceleration through one-step ahead prior. The complete algorithmic framework is referred to as Sequential Bayesian design for locally accurate surrogate (SBD-LAS). Finally, three experiments based the Darcy flow equation demonstrate the advantages of the proposed method in terms of both inversion accuracy and computational speed.
由部分差异方程式(PDEs)所支配的反面问题在各个领域,包括计算科学、图像处理和工程等领域都起着关键的作用。特别是,达西流程方程式是流体力学中一个根本的方程式,在通过多孔媒体了解流体流动方面发挥着关键作用。拜耳斯方法为解决PDE逆向问题提供了有效的方法,而其数字实施则要求对计算成本昂贵的远方求解器进行大量评估。因此,采用计算成本较低的代金模型至关重要。然而,为高维复杂问题构建一个全球准确的代金模型需要高模型容量和大量数据。为了应对这一挑战,本研究提出了一个高效的当地准确代金化代金,侧重于反向问题真实可能性高的概率区域,而模型复杂性相对较低,培训数据要求很少。此外,我们推出一个连续的Bayesian设计策略以获得拟议的代金质代金,因为这一可能性高概率区域是未知的。该战略将Bayesian设计的后端演进化过程视为一个高频进程,能够通过一阶速度加速进行当地SBSBSBSB的升级方法。 之前的完整地算法测试框架,其最后显示Sqrial-qal-qal-sal-sal-sal-sal 方向的升级方法的升级方法是先先向前先显示SBSBSBSBSBSBSBSBSBSBSBSB的精确性的方法。
Article 190
Title@2025-07-23 (3): The Impact of Feature Scaling In Machine Learning: Effects on Regression and Classification Tasks
Title: The Impact of Feature Scaling In Machine Learning: Effects on Regression and Classification Tasks | Die Auswirkungen von Feature Scaling im maschinellen Lernen: Auswirkungen auf Regressions- und Klassifizierungsaufgaben | 机械学习中的特质增强效果:对倒退和分类任务的影响 2506.08274v3 |
Authors (8): João Manoel Herrera Pinheiro, Suzana Vilas Boas de Oliveira, Thiago Henrique Segreto Silva, Pedro Antonio Rabelo Saraiva, Enzo Ferreira de Souza, Ricardo V. Godoy, Leonardo André Ambrosio, Marcelo Becker
This research addresses the critical lack of comprehensive studies on feature scaling by systematically evaluating 12 scaling techniques - including several less common transformations - across 14 different Machine Learning algorithms and 16 datasets for classification and regression tasks. We meticulously analyzed impacts on predictive performance (using metrics such as accuracy, MAE, MSE, and $R^2$) and computational costs (training time, inference time, and memory usage). Key findings reveal that while ensemble methods (such as Random Forest and gradient boosting models like XGBoost, CatBoost and LightGBM) demonstrate robust performance largely independent of scaling, other widely used models such as Logistic Regression, SVMs, TabNet, and MLPs show significant performance variations highly dependent on the chosen scaler. This extensive empirical analysis, with all source code, experimental results, and model parameters made publicly available to ensure complete transparency and reproducibility, offers model-specific crucial guidance to practitioners on the need for an optimal selection of feature scaling techniques.
这项研究通过系统评估14种不同的机器学习算法和16个数据集用于分类和回归任务的12种缩放技术(包括若干较不常见的变换)和16个数据集,解决了严重缺乏关于特征缩放的全面研究的问题。我们仔细分析了对预测性业绩的影响(使用精确度、MAE、MSE和$R%2美元等指标)和计算成本(培训时间、推算时间和记忆使用)和计算成本的影响(培训时间、试验结果和记忆使用)。主要调查结果显示,尽管混合方法(如随机森林和梯度加速模型,如XGBoost、CatBoost和LightGBM)显示强健的性能基本上独立于缩放,但其他广泛使用的模式(如物流递增、SVMS、TabNet和MLPs)显示,显著的性能差异在很大程度上取决于选定的缩放尺度。这一广泛的实证分析提供了所有源代码、实验结果和模型参数,以确保完全透明和可复制性,向从业人员提供关于最佳选择特征缩放技术需要的模型的关键指导。
Article 191
Title@2025-07-23 (3): Diffusion Factor Models: Generating High-Dimensional Returns with Factor Structure
Title: Diffusion Factor Models: Generating High-Dimensional Returns with Factor Structure | Diffusionsfaktormodelle: Erzeugen von hochdimensionalen Rückgaben mit Faktorstruktur | 传播因数模型:产生具有因数结构的高差异返回 2504.06566v4 |
Authors (4): Minshuo Chen, Renyuan Xu, Yumin Xu, Ruixun Zhang
Financial scenario simulation is essential for risk management and portfolio optimization, yet it remains challenging especially in high-dimensional and small data settings common in finance. We propose a diffusion factor model that integrates latent factor structure into generative diffusion processes, bridging econometrics with modern generative AI to address the challenges of the curse of dimensionality and data scarcity in financial simulation. By exploiting the low-dimensional factor structure inherent in asset returns, we decompose the score function–a key component in diffusion models–using time-varying orthogonal projections, and this decomposition is incorporated into the design of neural network architectures. We derive rigorous statistical guarantees, establishing nonasymptotic error bounds for both score estimation at O(d^{5/2} n^{-2/(k+5)}) and generated distribution at O(d^{5/4} n^{-1/2(k+5)}), primarily driven by the intrinsic factor dimension k rather than the number of assets d, surpassing the dimension-dependent limits in the classical nonparametric statistics literature and making the framework viable for markets with thousands of assets. Numerical studies confirm superior performance in latent subspace recovery under small data regimes. Empirical analysis demonstrates the economic significance of our framework in constructing mean-variance optimal portfolios and factor portfolios. This work presents the first theoretical integration of factor structure with diffusion models, offering a principled approach for high-dimensional financial simulation with limited data. Our code is available at https://github.com/xymmmm00/diffusion_factor_model.
金融假设情景模拟对于风险管理和组合优化至关重要,但特别是在金融中常见的高维和小型数据设置中,它仍然具有挑战性。我们提出一个扩散要素模型,将潜伏因素结构纳入基因扩散过程,将计量经济学与现代基因大赦国际连接起来,以应对在金融模拟中存在的维度诅咒和数据稀缺的挑战。我们利用资产回报中固有的低维因素结构,将分数函数-关键组成部分分解成传播模型中使用时间变化或直线预测的关键组成部分,这种分数已被纳入神经网络结构的设计中。我们提出了严格的统计保证,为O(d5/2}n-2(k+5})的得分估算设定了非否定性错误界限,并在O(d5/4}n1/2(k+5})中生成了分布。我们主要受内在因素因素层面K而不是资产数量驱动的分数分分分分解,超过了经典非参数统计文献中的维度限制,并使有数千个资产市场的框架变得可行。Numeralicalalisal-latial 研究证实了我们在历史模型中进行最佳数据回收框架的高级分析。
Article 192
Title@2025-07-23 (3): HydraOpt: Navigating the Efficiency-Performance Trade-off of Adapter Merging
Title: HydraOpt: Navigating the Efficiency-Performance Trade-off of Adapter Merging | HydraOpt: Navigieren des Effizienz-Leistungs-Austauschs von Adapter-Zusammenschlüssen | Hydjopt: 管理适应器合并的效率-绩效权衡 2507.17706v1 |
Authors (7): Taha Ceritli, Ondrej Bohdal, Mete Ozay, Jijoong Moon, Kyeng-Hun Lee, Hyeonmok Ko, Umberto Michieli
Large language models (LLMs) often leverage adapters, such as low-rank-based adapters, to achieve strong performance on downstream tasks. However, storing a separate adapter for each task significantly increases memory requirements, posing a challenge for resource-constrained environments such as mobile devices. Although model merging techniques can reduce storage costs, they typically result in substantial performance degradation. In this work, we introduce HydraOpt, a new model merging technique that capitalizes on the inherent similarities between the matrices of low-rank adapters. Unlike existing methods that produce a fixed trade-off between storage size and performance, HydraOpt allows us to navigate this spectrum of efficiency and performance. Our experiments show that HydraOpt significantly reduces storage size (48% reduction) compared to storing all adapters, while achieving competitive performance (0.2-1.8% drop). Furthermore, it outperforms existing merging techniques in terms of performance at the same or slightly worse storage efficiency.
大型语言模型(LLMS)常常利用低级适应器等低级适应器等大型适应器,在下游任务上取得强劲的成绩。然而,为每项任务分别储存一个适应器,大大增加了记忆要求,对诸如移动设备等资源受限制的环境构成了挑战。虽然模型合并技术可以降低存储成本,但通常会导致显著的性能退化。在这项工作中,我们引入了一种新型合并技术,即HydraOpt,这是一种利用低级适应器矩阵之间内在相似之处的新模式。与现有的在存储规模和性能之间实现固定平衡的方法不同,HydryOpt允许我们掌握这一效率和性能的范围。我们的实验表明,与储存所有适应器相比,HydraOpt大大降低了存储规模(48%的减少量),同时实现了竞争性性能(0.2-1.8 %的下降 ) 。此外,它比现有的合并技术在同样或稍差的存储效率方面优于现有的合并技术。
Article 193
Title@2025-07-23 (3): Balans: Multi-Armed Bandits-based Adaptive Large Neighborhood Search for Mixed-Integer Programming Problem
Title: Balans: Multi-Armed Bandits-based Adaptive Large Neighborhood Search for Mixed-Integer Programming Problem | Balans: Multi-Armed Bandits-basierte adaptive Großnachbarschaft Suche nach gemischt-integer-Programmierungsproblem | Balans:多武装强盗基于适应性的大型邻里搜索混合内插方案拟订问题 2412.14382v3 |
Authors (3): Junyang Cai, Serdar Kadioglu, Bistra Dilkina
Mixed-integer programming (MIP) is a powerful paradigm for modeling and solving various important combinatorial optimization problems. Recently, learning-based approaches have shown a potential to speed up MIP solving via offline training that then guides important design decisions during the search. However, a significant drawback of these methods is their heavy reliance on offline training, which requires collecting training datasets and computationally costly training epochs yet offering only limited generalization to unseen (larger) instances. In this paper, we propose Balans, an adaptive meta-solver for MIPs with online learning capability that does not require any supervision or apriori training. At its core, Balans is based on adaptive large-neighborhood search, operating on top of an MIP solver by successive applications of destroy and repair neighborhood operators. During the search, the selection among different neighborhood definitions is guided on the fly for the instance at hand via multi-armed bandit algorithms. Our extensive experiments on hard optimization instances show that Balans offers significant performance gains over the default MIP solver, is better than committing to any single best neighborhood, and improves over the state-of-the-art large-neighborhood search for MIPs. Finally, we release Balans as a highly configurable, MIP solver agnostic, open-source software.
混合内插程序(MIP)是建模和解决各种重要的组合优化问题的强大范例。最近,基于学习的方法显示出通过离线培训加快MIP解决离线培训的潜力,从而在搜索过程中指导重要的设计决定。然而,这些方法的一个重大缺陷是严重依赖离线培训,这需要收集培训数据集和计算成本高昂的培训时代,但仅对看不见(大)案例提供有限的概括化。在本文中,我们提议Balans,这是具有无需任何监督或优先培训的在线学习能力的MIP的适应性元软件。在其核心方面,Balans基于适应性大型邻里搜索,通过连续的摧毁和修理社区运营商应用在MIP解决方案解决方案解决方案解决方案解决方案解决方案解决方案的顶端运作。在搜索过程中,不同街区定义的选择以多种手边际(大)算法为指导。我们在硬性优化实例上的广泛实验显示,Balans为默认的MIP解决方案解决方案解决方案解决方案解决方案解决方案解决方案解决方案提供显著的绩效收益,比致力于任何单一的最佳邻里搜索工具更好,我们最终将高端的搜索系统升级。
Article 194
Title@2025-07-23 (3): A Mathematical Theory of Discursive Networks
Title: A Mathematical Theory of Discursive Networks | Eine mathematische Theorie diskursiver Netzwerke | 讨论网络的数学理论 2507.06565v5 |
Authors (1): Juan B. Gutiérrez
Large language models (LLMs) turn writing into a live exchange between humans and software. We characterize this new medium as a discursive network that treats people and LLMs as equal nodes and tracks how their statements circulate. We define the generation of erroneous information as invalidation (any factual, logical, or structural breach) and show it follows four hazards: drift from truth, self-repair, fresh fabrication, and external detection. We develop a general mathematical model of discursive networks that shows that a network governed only by drift and self-repair stabilizes at a modest error rate. Giving each false claim even a small chance of peer review shifts the system to a truth-dominant state. We operationalize peer review with the open-source Flaws-of-Others (FOO) algorithm: a configurable loop in which any set of agents critique one another while a harmonizer merges their verdicts. We identify an ethical transgression, epithesis, that occurs when humans fail to engage in the discursive network. The takeaway is practical and cultural: reliability in this new medium comes not from perfecting single models but from connecting imperfect ones into networks that enforce mutual accountability.
大型语言模型( LLMs) 将写作变成人类和软件之间的实时交换。 我们把这个新媒体描述为一个不准确的网络, 将人和LLMs视为平等的节点, 并跟踪其声明的传播方式。 我们把错误信息的生成定义为无效( 任何事实、 逻辑或结构性违反) , 并显示它有四种危险: 从真理、 自我修复、 新鲜制造和外部检测中漂移出来。 我们开发了一个迷惑网络的一般数学模型, 显示一个仅受漂移和自我修复制约的网络以微小的错误速度稳定下来。 给每个错误的网络一个很小的同行审查机会, 将系统转换成一个以真理为主的状态 。 我们使用开放源法( FOO) 算法( ) 来进行同行审查: 一个可配置的循环, 任何一组代理人互相批评, 而一个协调者将其判断合并在一起。 我们发现一个道德上的违法现象, 即当人类无法参与不透明网络时会发生。 摘取是实用的和文化的: 新介质介质的介质不是来自完善的单一模式, 而是连接的网络。
Article 195
Title@2025-07-23 (3): Joint Asymmetric Loss for Learning with Noisy Labels
Title: Joint Asymmetric Loss for Learning with Noisy Labels | Gemeinsamer asymmetrischer Lernverlust mit geräuscharmen Etiketten | 与Noisy标签的 联合非对称学习损失 2507.17692v1 |
Authors (7): Jialiang Wang, Xianming Liu, Xiong Zhou, Gangfeng Hu, Deming Zhai, Junjun Jiang, Xiangyang Ji
Learning with noisy labels is a crucial task for training accurate deep neural networks. To mitigate label noise, prior studies have proposed various robust loss functions, particularly symmetric losses. Nevertheless, symmetric losses usually suffer from the underfitting issue due to the overly strict constraint. To address this problem, the Active Passive Loss (APL) jointly optimizes an active and a passive loss to mutually enhance the overall fitting ability. Within APL, symmetric losses have been successfully extended, yielding advanced robust loss functions. Despite these advancements, emerging theoretical analyses indicate that asymmetric losses, a new class of robust loss functions, possess superior properties compared to symmetric losses. However, existing asymmetric losses are not compatible with advanced optimization frameworks such as APL, limiting their potential and applicability. Motivated by this theoretical gap and the prospect of asymmetric losses, we extend the asymmetric loss to the more complex passive loss scenario and propose the Asymetric Mean Square Error (AMSE), a novel asymmetric loss. We rigorously establish the necessary and sufficient condition under which AMSE satisfies the asymmetric condition. By substituting the traditional symmetric passive loss in APL with our proposed AMSE, we introduce a novel robust loss framework termed Joint Asymmetric Loss (JAL). Extensive experiments demonstrate the effectiveness of our method in mitigating label noise. Code available at: https://github.com/cswjl/joint-asymmetric-loss
为了减少标签噪音,先前的研究提出了各种稳健的损失功能,特别是对称损失。然而,由于过于严格的限制,对称损失通常会因不适当的问题而受到影响。为了解决这一问题,主动被动损失(APL)共同优化了主动和被动损失,以相互增强总体适应能力。在杀伤人员地雷中,对称损失得到成功延长,产生了先进的强力损失功能。尽管取得了这些进展,但新出现的理论分析表明,不对称损失是新型的稳健损失功能,拥有与对称损失相比的更高性能。然而,现有的对称损失通常由于过于严格的限制而存在不适当的问题。为了解决这一问题,主动被动被动损失(APL)联合优化损失(APL) 联合优化框架不兼容性损失(AMS) 和不对称损失的前景(AMSE) 。我们提出了“非对称中性中性中性差” , 并提出了“新式的“偏重性差” 。我们严格地建立了“AMSE” 满足不对称条件的必要和充分条件。通过替代了APLA-PL的传统的对准被动被动损失和“联合标准” 标准框架,我们提出了“ABLASSILAD” 。我们提出的“高压”的“联合标准” 。我们提出的“AMPLALLARSUD” 。我们提出的“UDRDRD” 的“UDRDRDRD” 。
Article 196
Title@2025-07-23 (3): CASCADE: LLM-Powered JavaScript Deobfuscator at Google
Title: CASCADE: LLM-Powered JavaScript Deobfuscator at Google | CASCADE: LLM-Powered JavaScript Deobfuscator bei Google | CASCADE: 谷歌的LLM Powered JavaScript Deobfuscator 谷歌的LLM Powered JavaScript Deobfuscator 2507.17691v1 |
Authors (4): Shan Jiang, Pranoy Kovuri, David Tao, Zhixun Tan
Software obfuscation, particularly prevalent in JavaScript, hinders code comprehension and analysis, posing significant challenges to software testing, static analysis, and malware detection. This paper introduces CASCADE, a novel hybrid approach that integrates the advanced coding capabilities of Gemini with the deterministic transformation capabilities of a compiler Intermediate Representation (IR), specifically JavaScript IR (JSIR). By employing Gemini to identify critical prelude functions, the foundational components underlying the most prevalent obfuscation techniques, and leveraging JSIR for subsequent code transformations, CASCADE effectively recovers semantic elements like original strings and API names, and reveals original program behaviors. This method overcomes limitations of existing static and dynamic deobfuscation techniques, eliminating hundreds to thousands of hardcoded rules while achieving reliability and flexibility. CASCADE is already deployed in Google’s production environment, demonstrating substantial improvements in JavaScript deobfuscation efficiency and reducing reverse engineering efforts.
软件模糊化,特别是在爪哇史克里普特,阻碍代码理解和分析,对软件测试、静态分析和恶意检测构成重大挑战。本文介绍CASCADE,这是一种新型混合方法,将Gemini的先进编码能力与编译器中级代表(IR),特别是JavaScript IR(JSIR)的确定性转化能力相结合。通过使用Gemini来识别关键前端功能,即最流行的模糊化技术的基本组成部分,利用JSIR进行随后的代码转换,CASCADE有效地回收了原始字符串和API名称等语义元素,并揭示了原始程序行为。这种方法克服了现有静态和动态脱色技术的局限性,消除了数百至数千条硬编码规则,同时实现了可靠性和灵活性。CASCADE已经部署在谷歌的生产环境中,展示了JavaScript deobfuscation效率的大幅改进,并减少了逆向工程努力。
Article 197
Title@2025-07-23 (3): In-Trajectory Inverse Reinforcement Learning: Learn Incrementally Before An Ongoing Trajectory Terminates
Title: In-Trajectory Inverse Reinforcement Learning: Learn Incrementally Before An Ongoing Trajectory Terminates | In-Trajektorie Inverse Verstärkung Lernen: Inkrementell lernen, bevor eine laufende Trajektorie endet | 轨迹反反强化学习:在持续轨迹终止之前逐步学习 2410.15612v7 |
Authors (2): Shicheng Liu, Minghui Zhu
Inverse reinforcement learning (IRL) aims to learn a reward function and a corresponding policy that best fit the demonstrated trajectories of an expert. However, current IRL works cannot learn incrementally from an ongoing trajectory because they have to wait to collect at least one complete trajectory to learn. To bridge the gap, this paper considers the problem of learning a reward function and a corresponding policy while observing the initial state-action pair of an ongoing trajectory and keeping updating the learned reward and policy when new state-action pairs of the ongoing trajectory are observed. We formulate this problem as an online bi-level optimization problem where the upper level dynamically adjusts the learned reward according to the newly observed state-action pairs with the help of a meta-regularization term, and the lower level learns the corresponding policy. We propose a novel algorithm to solve this problem and guarantee that the algorithm achieves sub-linear local regret $O(\sqrt{T}+\log T+\sqrt{T}\log T)$. If the reward function is linear, we prove that the proposed algorithm achieves sub-linear regret $O(\log T)$. Experiments are used to validate the proposed algorithm.
反强化学习(IRL)旨在学习最符合专家所显示轨迹的奖赏功能和相应政策。然而,当前的IRL工作无法从持续轨迹中逐步学习,因为他们必须等待收集至少一个完整的轨迹才能学习。为了缩小差距,本文件考虑了学习奖赏功能和相应政策的问题,同时观察当前轨迹的初始州-行动对,并在观察到新的州-行动对当前轨迹时不断更新所学的奖赏和政策。我们将此问题表述为在线双级优化问题,即上层根据新观察到的州-行动对对等在超常化术语的帮助下对所学的奖赏进行动态调整,而下层则学习相应的政策。我们提出一种新的算法来解决这个问题,并保证算法能够达到当地亚线性遗憾$O(sqrt{Tsqrt{Tçlog T). 。如果奖励功能是线性,我们证明拟议的算法能够实现亚线性后悔$O(\log T) 。
Article 198
Title@2025-07-23 (3): Mindfulness Meditation and Respiration: Accelerometer-Based Respiration Rate and Mindfulness Progress Estimation to Enhance App Engagement and Mindfulness Skills
Title: Mindfulness Meditation and Respiration: Accelerometer-Based Respiration Rate and Mindfulness Progress Estimation to Enhance App Engagement and Mindfulness Skills | Achtsamkeitsmeditation und Atmung: Beschleunigungsmesser-basierte Atmungsrate und Achtsamkeitsfortschritt Schätzung zur Verbesserung von App-Verlobung und Achtsamkeits-Fähigkeiten | 冥想和呼吸:以加速计为基础的呼吸率和记忆进展估计,以加强应用参与和记忆技能 2507.17688v1 |
Authors (8): Mohammad Nur Hossain Khan, David creswell, Jordan Albert, Patrick O’Connell, Shawn Fallon, Mathew Polowitz, Xuhai “orson” Xu, Bashima islam
Mindfulness training is widely recognized for its benefits in reducing depression, anxiety, and loneliness. With the rise of smartphone-based mindfulness apps, digital meditation has become more accessible, but sustaining long-term user engagement remains a challenge. This paper explores whether respiration biosignal feedback and mindfulness skill estimation enhance system usability and skill development. We develop a smartphone’s accelerometer-based respiration tracking algorithm, eliminating the need for additional wearables. Unlike existing methods, our approach accurately captures slow breathing patterns typical of mindfulness meditation. Additionally, we introduce the first quantitative framework to estimate mindfulness skills-concentration, sensory clarity, and equanimity-based on accelerometer-derived respiration data. We develop and test our algorithms on 261 mindfulness sessions in both controlled and real-world settings. A user study comparing an experimental group receiving biosignal feedback with a control group using a standard app shows that respiration feedback enhances system usability. Our respiration tracking model achieves a mean absolute error (MAE) of 1.6 breaths per minute, closely aligning with ground truth data, while our mindfulness skill estimation attains F1 scores of 80-84% in tracking skill progression. By integrating respiration tracking and mindfulness estimation into a commercial app, we demonstrate the potential of smartphone sensors to enhance digital mindfulness training.
在减少抑郁、焦虑和孤独方面,人们广泛认识到了意识培训的好处。随着基于智能手机的注意应用软件的兴起,数字冥想变得更加容易获取,但维持长期用户参与仍然是一个挑战。本文件探讨的是,重新发布生物信号反馈和注意技能估计是否增强了系统的可用性和技能开发。我们开发了基于智能手机的加速度计呼吸跟踪算法,消除了对额外穿损的需求。与现有方法不同,我们的方法准确地捕捉了典型的以注意冥想为特点的缓慢呼吸模式。此外,我们引入了第一个量化框架,以估计注意技能的集中、感官清晰度和适量性为基础,以加速计得出的呼吸数据为基础。我们开发并测试了我们在受控和现实世界环境中举办的261次注意技能评估会的算法,提高了系统使用标准应用程序对接收生物信号反馈的实验组的算法,表明,呼吸反馈可以提高系统的可用性。我们的呼吸跟踪模型实现了每分钟1.6个呼吸口(MAE)的中度错误,与地面真理数据密切吻一致,同时我们开发和测试265年的智能技能评估,同时将智能学习能力跟踪了我们80年的智能学习的学习水平,以提升的进度,从而显示我们学习的学习的学习的学习的学习的学习的学习。
Article 199
Title@2025-07-23 (3): Towards Effective Open-set Graph Class-incremental Learning
Title: Towards Effective Open-set Graph Class-incremental Learning | Auf dem Weg zu einem effektiven, offenen, klasseninternen Lernen in der Graphen-Klasse | 走向有效的开放设置图表升入级学习 2507.17687v1 |
Authors (6): Jiazhen Chen, Zheng Ma, Sichao Fu, Mingbin Feng, Tony S. Wirjanto, Weihua Ou
Graph class-incremental learning (GCIL) allows graph neural networks (GNNs) to adapt to evolving graph analytical tasks by incrementally learning new class knowledge while retaining knowledge of old classes. Existing GCIL methods primarily focus on a closed-set assumption, where all test samples are presumed to belong to previously known classes. Such an assumption restricts their applicability in real-world scenarios, where unknown classes naturally emerge during inference, and are absent during training. In this paper, we explore a more challenging open-set graph class-incremental learning scenario with two intertwined challenges: catastrophic forgetting of old classes, which impairs the detection of unknown classes, and inadequate open-set recognition, which destabilizes the retention of learned knowledge. To address the above problems, a novel OGCIL framework is proposed, which utilizes pseudo-sample embedding generation to effectively mitigate catastrophic forgetting and enable robust detection of unknown classes. To be specific, a prototypical conditional variational autoencoder is designed to synthesize node embeddings for old classes, enabling knowledge replay without storing raw graph data. To handle unknown classes, we employ a mixing-based strategy to generate out-of-distribution (OOD) samples from pseudo in-distribution and current node embeddings. A novel prototypical hypersphere classification loss is further proposed, which anchors in-distribution embeddings to their respective class prototypes, while repelling OOD embeddings away. Instead of assigning all unknown samples into one cluster, our proposed objective function explicitly models them as outliers through prototype-aware rejection regions, ensuring a robust open-set recognition. Extensive experiments on five benchmarks demonstrate the effectiveness of OGCIL over existing GCIL and open-set GNN methods.
图表类入层学习( GCIL) 使图形神经系统网络( GNNS) 适应不断变化的图表分析任务, 通过渐进学习新类知识, 并保留旧类的知识。 现有的 GCIL 方法主要侧重于封闭式假设, 假设所有测试样本都属于先前已知的类别。 这种假设限制了它们在真实世界情景中的适用性, 未知类自然在推论期间自然出现, 而且在培训期间没有。 本文中, 我们探索一种更具挑战性的开放式图形类入层学习方案, 有两个相互交织的挑战: 灾难性地忘记旧类, 从而妨碍对未知类的检测, 以及缺乏开放性功能的识别, 从而破坏对所学知识的留存。 为了解决上述问题, 提出了一个新的 OGCIL 框架, 使用假的模组嵌入生成模型来有效减轻灾难性的遗忘, 并且能够有力地探测未知类。 具体地说, 一种原型的有条件的自变式自动电解码, 旨在合成旧类的结点, , 将原始图表数据储存中的知识重新显示。 为了处理未知类的类的类, 我们用一个不为正态的变型的变式的O的变式的变式的GODILD, , , 将一个混合的模型的升级的升级的代为新的的模型, 。
Article 200
Title@2025-07-23 (3): Debiased maximum-likelihood estimators for hazard ratios under machine-learning adjustment
Title: Debiased maximum-likelihood estimators for hazard ratios under machine-learning adjustment | Debiased Maximum-Likelihood-Schätzer für Gefahrenverhältnisse unter Maschinen-Learning-Anpassung | 机学习调整下危险比率的偏差最大类似性最高估计估计值 2507.17686v1 |
Authors (2): Takashi Hayakawa, Satoshi Asai
Previous studies have shown that hazard ratios between treatment groups estimated with the Cox model are uninterpretable because the indefinite baseline hazard of the model fails to identify temporal change in the risk set composition due to treatment assignment and unobserved factors among multiple, contradictory scenarios. To alleviate this problem, especially in studies based on observational data with uncontrolled dynamic treatment and real-time measurement of many covariates, we propose abandoning the baseline hazard and using machine learning to explicitly model the change in the risk set with or without latent variables. For this framework, we clarify the context in which hazard ratios can be causally interpreted, and then develop a method based on Neyman orthogonality to compute debiased maximum-likelihood estimators of hazard ratios. Computing the constructed estimators is more efficient than computing those based on weighted regression with marginal structural Cox models. Numerical simulations confirm that the proposed method identifies the ground truth with minimal bias. These results lay the foundation for developing a useful, alternative method for causal inference with uncontrolled, observational data in modern epidemiology.
先前的研究显示,使用Cox模型估计的治疗群体之间的危险比率是无法解释的,因为该模型的无限期基准危险未能查明由于治疗分配和多种相互矛盾的假设情况中未观察到的因素,风险构成的时间变化。为了缓解这一问题,特别是在基于观测数据的研究中,对许多共差进行不受控制的动态处理和实时测量,我们提议放弃基准危险,并利用机器学习来明确模拟有潜在变量或没有潜在变量的风险组合的变化。对于这个框架,我们澄清了危险比率可以因果解释的背景,然后根据Neyman或thogoality 来计算危险比率的偏差最大相似性估计数字。计算得出的估计数字比根据边际结构Cox模型加权回归计算的结果更有效。数字模拟证实,拟议的方法以最小的偏差来识别地面真相。这些结果为在现代流行病学中以不受控制的观测数据为基础,制定一种有用的、可替代的因果关系推断方法奠定了基础。
Article 201
Title@2025-07-23 (3): LoX: Low-Rank Extrapolation Robustifies LLM Safety Against Fine-tuning
Title: LoX: Low-Rank Extrapolation Robustifies LLM Safety Against Fine-tuning | LoX: Low-Rank-Extrapolation stärkt LLM-Sicherheit gegen Feinabstimmung | LoX:低Rank外推法强力推力LLM 安全防止微调 2506.15606v2 |
Authors (6): Gabriel J. Perin, Runjin Chen, Xuxi Chen, Nina S. T. Hirata, Zhangyang Wang, Junyuan Hong
Large Language Models (LLMs) have become indispensable in real-world applications. However, their widespread adoption raises significant safety concerns, particularly in responding to socially harmful questions. Despite substantial efforts to improve model safety through alignment, aligned models can still have their safety protections undermined by subsequent fine-tuning - even when the additional training data appears benign. In this paper, we empirically demonstrate that this vulnerability stems from the sensitivity of safety-critical low-rank subspaces in LLM parameters to fine-tuning. Building on this insight, we propose a novel training-free method, termed Low-Rank Extrapolation (LoX), to enhance safety robustness by extrapolating the safety subspace of an aligned LLM. Our experimental results confirm the effectiveness of LoX, demonstrating significant improvements in robustness against both benign and malicious fine-tuning attacks while preserving the model’s adaptability to new tasks. For instance, LoX leads to 11% to 54% absolute reductions in attack success rates (ASR) facing benign or malicious fine-tuning attacks. By investigating the ASR landscape of parameters, we attribute the success of LoX to that the extrapolation moves LLM parameters to a flatter zone, thereby less sensitive to perturbations. The code is available at github.com/VITA-Group/LoX.
大型语言模型(LLMS)在现实世界应用中变得不可或缺。然而,广泛采用这些模型引起了重大的安全问题,特别是在应对对社会有害的问题时。尽管做出了大量努力,通过调整来改善模型安全,但统一模型仍然可以受到随后微调的破坏,即使额外的培训数据看起来是无害的。在本文件中,我们从经验上证明,这种脆弱性源于LLM参数中安全临界低级别子空间对微调的敏感度。根据这一认识,我们建议采用一种新的无培训方法,称为Low-Rank外推法(LOX),通过对一个匹配的LMM的安全子空间进行外推法,加强安全稳健性。我们的实验结果证实LOX的有效性,在防止良性攻击和恶意微调攻击的同时,在保持模型适应新任务方面都取得了显著的稳健性改进。例如,LOX使面临良性或恶意微调的攻击成功率的绝对下降11%至54%。我们通过调查ASR参数的景观,将LX的成功归因于的成功归因于LMM/LAVX的参数转移到一个不敏感程度。
Article 202
Title@2025-07-23 (3): Generalized Dual Discriminator GANs
Title: Generalized Dual Discriminator GANs | Generalisierte Dual Discriminator GANs | GANs 通用双辨识器 2507.17684v1 |
Authors (4): Penukonda Naga Chandana, Tejas Srivastava, Gowtham R. Kurri, V. Lalitha
Dual discriminator generative adversarial networks (D2 GANs) were introduced to mitigate the problem of mode collapse in generative adversarial networks. In D2 GANs, two discriminators are employed alongside a generator: one discriminator rewards high scores for samples from the true data distribution, while the other favors samples from the generator. In this work, we first introduce dual discriminator $\alpha$-GANs (D2 $\alpha$-GANs), which combines the strengths of dual discriminators with the flexibility of a tunable loss function, $\alpha$-loss. We further generalize this approach to arbitrary functions defined on positive reals, leading to a broader class of models we refer to as generalized dual discriminator generative adversarial networks. For each of these proposed models, we provide theoretical analysis and show that the associated min-max optimization reduces to the minimization of a linear combination of an $f$-divergence and a reverse $f$-divergence. This generalizes the known simplification for D2-GANs, where the objective reduces to a linear combination of the KL-divergence and the reverse KL-divergence. Finally, we perform experiments on 2D synthetic data and use multiple performance metrics to capture various advantages of our GANs.
引入了双重歧视基因对抗网络(D2GANs)以缓解基因对抗网络模式崩溃的问题。在D2GANs中,两个歧视者与产生者一起使用:一个歧视者奖励真实数据分布样本的高分,而另一个则优于生成者的样本。在这项工作中,我们首先引入双重歧视者 $alpha$-GANs (D2$\alpha$-GANs),将双重歧视者的优势与金枪鱼损失功能的灵活性($\alpha$-GANs)结合起来。我们进一步推广了这一方法,将其适用于在正真实中界定的任意功能,导致我们称之为普遍双重歧视或基因对抗网络的更广泛的模型类别。对于每一种拟议的模型,我们提供理论分析,并表明相关的微量最大优化将美元波动和反价调的线性组合减少到最小化。这样,D2GANs的已知简化功能将D2-GANs(即将目标降低到KL-G-RM)的直线性组合,将我们最终使用G-GRM数据并进行各种反向性实验。
Article 203
Title@2025-07-23 (3): RAPID-Net: Accurate Pocket Identification for Binding-Site-Agnostic Docking
Title: RAPID-Net: Accurate Pocket Identification for Binding-Site-Agnostic Docking | RAPID-Net: Genaue Pocket-Identifikation für das Binden-Site-Agnostic Docking | RAPID-Net: 装订性锡石-不可知文件的精确口袋识别 2502.02371v2 |
Authors (4): Yaroslav Balytskyi, Inna Hubenko, Alina Balytska, Christopher V. Kelly
Accurate identification of druggable pockets and their features is essential for structure-based drug design and effective downstream docking. Here, we present RAPID-Net, a deep learning-based algorithm designed for the accurate prediction of binding pockets and seamless integration with docking pipelines. On the PoseBusters benchmark, RAPID-Net-guided AutoDock Vina achieves 54.9% of Top-1 poses with RMSD < 2 A and satisfying the PoseBusters chemical-validity criterion, compared to 49.1% for DiffBindFR. On the most challenging time split of PoseBusters aiming to assess generalization ability (structures submitted after September 30, 2021), RAPID-Net-guided AutoDock Vina achieves 53.1% of Top-1 poses with RMSD < 2 A and PB-valid, versus 59.5% for AlphaFold 3. Notably, in 92.2% of cases, RAPID-Net-guided Vina samples at least one pose with RMSD < 2 A (regardless of its rank), indicating that pose ranking, rather than sampling, is the primary accuracy bottleneck. The lightweight inference, scalability, and competitive accuracy of RAPID-Net position it as a viable option for large-scale virtual screening campaigns. Across diverse benchmark datasets, RAPID-Net outperforms other pocket prediction tools, including PUResNet and Kalasanty, in both docking accuracy and pocket-ligand intersection rates. Furthermore, we demonstrate the potential of RAPID-Net to accelerate the development of novel therapeutics by highlighting its performance on pharmacologically relevant targets. RAPID-Net accurately identifies distal functional sites, offering new opportunities for allosteric inhibitor design. In the case of the RNA-dependent RNA polymerase of SARS-CoV-2, RAPID-Net uncovers a wider array of potential binding pockets than existing predictors, which typically annotate only the orthosteric pocket and overlook secondary cavities.
在PoseBusters基准中,Top-1的浓度为54.9%,而RMSD的浓度为2 A,符合PoseNAsters的化学性能标准,而DiffBind FR的浓度为49.1%。 在PoseBad-NetNet的最具挑战性的时间分割中,旨在评估总体性能(9月30日、2021日之后提交的结构)的基于深层次学习的运算算算网-Net网,并与对接的管道。 在PoseBusters的Top-1 基准中,RMSD < 2 A和PB-Dock Vina的浓度为54.9%,而Top-1的浓度为RMSD的浓度为59.5%。 在92.2%的案例中,RAPID-Net的准确性能样本至少为1个, Rifferent Vina的样本与RMD的直流-Net的直径比值为2 A(级别较弱的), 显示其快速性、可变性、可变性、 数据比标的浓度为大数据的排序比标、 向更低性、可变性、可变性、可变性、可变性、可变性、可变性、可变性、可变性、可变性、可变现性、可变性、可变性、可变性、可变性、可变现性、可变现性、可变现性、可变现性、可变现性、可变现性、可变现性、可变现性、可变现性、可变现、可变现性、可变性、可变性、可变性、可变性、可变性、可变性、可变性、可变性、可变性、可变性、可变性、可变性、可变性、可变性、可变性、可变性、可变性、可变性、可变性、可变性、可变性、可变性、可变性、可变性、可变性、可变性、可变性、可变性、可变性、可变性、可变性、可变性、可变性、可变性、可变性、可变性、可变性、可变
Article 204
Title@2025-07-23 (3): On the Lipschitz Constant of Deep Networks and Double Descent
Title: On the Lipschitz Constant of Deep Networks and Double Descent | Auf der Lipschitz-Konstante von Deep Networks und Double Descent | 利普西茨深网络和双人后裔中心 2301.12309v5 |
Authors (3): Matteo Gamba, Hossein Azizpour, Mårten Björkman
Existing bounds on the generalization error of deep networks assume some form of smooth or bounded dependence on the input variable, falling short of investigating the mechanisms controlling such factors in practice. In this work, we present an extensive experimental study of the empirical Lipschitz constant of deep networks undergoing double descent, and highlight non-monotonic trends strongly correlating with the test error. Building a connection between parameter-space and input-space gradients for SGD around a critical point, we isolate two important factors – namely loss landscape curvature and distance of parameters from initialization – respectively controlling optimization dynamics around a critical point and bounding model function complexity, even beyond the training data. Our study presents novels insights on implicit regularization via overparameterization, and effective model complexity for networks trained in practice.
在这项工作中,我们介绍了对经历双向下降的深层网络的经验性Lipschitz常数的广泛实验性研究,并突出强调了与试验错误密切相关的非分子趋势。在SGD的参数-空间和输入-空间梯度之间围绕一个临界点建立联系,我们分离了两个重要因素 – – 即损失地貌曲线曲线和初始化参数的距离 – – 分别控制一个临界点周围的优化动态和约束性模型功能复杂性,甚至超出培训数据。我们的研究提出了关于通过超分法实现隐含的正规化的新观点,以及在实践中受过培训的网络的有效模型复杂性。
Article 205
Title@2025-07-23 (3): How Should We Meta-Learn Reinforcement Learning Algorithms?
Title: How Should We Meta-Learn Reinforcement Learning Algorithms? | Wie sollten wir Meta-Lernen Stärkung lernen Algorithmen? | 我们怎样才能提高学习的比喻呢? 2507.17668v1 |
Authors (4): Alexander David Goldie, Zilin Wang, Jakob Nicolaus Foerster, Shimon Whiteson
The process of meta-learning algorithms from data, instead of relying on manual design, is growing in popularity as a paradigm for improving the performance of machine learning systems. Meta-learning shows particular promise for reinforcement learning (RL), where algorithms are often adapted from supervised or unsupervised learning despite their suboptimality for RL. However, until now there has been a severe lack of comparison between different meta-learning algorithms, such as using evolution to optimise over black-box functions or LLMs to propose code. In this paper, we carry out this empirical comparison of the different approaches when applied to a range of meta-learned algorithms which target different parts of the RL pipeline. In addition to meta-train and meta-test performance, we also investigate factors including the interpretability, sample cost and train time for each meta-learning algorithm. Based on these findings, we propose several guidelines for meta-learning new RL algorithms which will help ensure that future learned algorithms are as performant as possible.
从数据中进行元学习算法的过程,而不是依靠手工设计,越来越受欢迎,作为改善机器学习系统绩效的范例。元学习显示,对强化学习特别有希望(RL),因为尽管对RL来说算法不尽理想,但往往从监督或不受监督的学习中改编算法。然而,到目前为止,不同元学习算法之间严重缺乏比较,例如利用进化法优化黑盒功能或LLMS来提议代码。在本文中,我们对适用于针对RL管道不同部分的一系列元学习算法的不同方法进行了实证比较。除了元培训和元测试绩效外,我们还调查了各种因素,包括每个元学习算法的可解释性、抽样成本和培训时间。根据这些调查结果,我们提出了几项元学习新RL算法的指导方针,这将有助于确保未来学习的算法尽可能表现良好。
Article 206
Title@2025-07-23 (3): Mammo-Mamba: A Hybrid State-Space and Transformer Architecture with Sequential Mixture of Experts for Multi-View Mammography
Title: Mammo-Mamba: A Hybrid State-Space and Transformer Architecture with Sequential Mixture of Experts for Multi-View Mammography | Mammo-Mamba: Hybride State-Space- und Transformer-Architektur mit sequentieller Mischung von Experten für Multi-View-Mammographie | Mammo-Mamba:国家空间和变形综合结构及多视力造影学专家顺序混合结构 2507.17662v1 |
Authors (4): Farnoush Bayatmakou, Reza Taleei, Nicole Simone, Arash Mohammadi
Breast cancer (BC) remains one of the leading causes of cancer-related mortality among women, despite recent advances in Computer-Aided Diagnosis (CAD) systems. Accurate and efficient interpretation of multi-view mammograms is essential for early detection, driving a surge of interest in Artificial Intelligence (AI)-powered CAD models. While state-of-the-art multi-view mammogram classification models are largely based on Transformer architectures, their computational complexity scales quadratically with the number of image patches, highlighting the need for more efficient alternatives. To address this challenge, we propose Mammo-Mamba, a novel framework that integrates Selective State-Space Models (SSMs), transformer-based attention, and expert-driven feature refinement into a unified architecture. Mammo-Mamba extends the MambaVision backbone by introducing the Sequential Mixture of Experts (SeqMoE) mechanism through its customized SecMamba block. The SecMamba is a modified MambaVision block that enhances representation learning in high-resolution mammographic images by enabling content-adaptive feature refinement. These blocks are integrated into the deeper stages of MambaVision, allowing the model to progressively adjust feature emphasis through dynamic expert gating, effectively mitigating the limitations of traditional Transformer models. Evaluated on the CBIS-DDSM benchmark dataset, Mammo-Mamba achieves superior classification performance across all key metrics while maintaining computational efficiency.
尽管最近在计算机辅助诊断系统(CAD)方面有所进展,但乳腺癌仍然是妇女癌症相关死亡率的主要原因之一,尽管近来在计算机辅助诊断系统(CAD)方面有所进步,但乳腺癌仍然是妇女癌症相关死亡率的主要原因之一。 准确和高效地解释多视乳房照片对于早期发现至关重要,促使人工智能(AI)驱动的CAD模型引起人们的兴趣激增。 虽然最先进的多视乳房X光照片分类模型主要基于变异器结构,但其计算复杂性尺度与图像补丁数量之比突出需要更有效的替代方法。为了应对这一挑战,我们建议Mammo-Mamba(Mammo-Mamba)是一个新颖的框架,将选择性的国家空间模型(SSMMM)、基于变异器的注意和专家驱动的特征改进纳入一个统一的架构中。Mammo-Mamba(Mammo-M)通过定制的Secaltical Mexcialimal 模型的升级和不断升级的升级,这些矩阵模型的升级后,将扩展Mamba(Samba)分类基准部分加强在高解乳房图像图像图像图像图像图像模型中的代表性学习。这些综合的升级,通过不断的升级的升级的模型,通过不断调整的模型的升级的模型,使整个的模型的升级的模型的升级的升级的模型的升级的升级的升级的模型的模型的升级的模型的模型的模型的升级的升级。
Article 207
Title@2025-07-23 (3): XStacking: Explanation-Guided Stacked Ensemble Learning
Title: XStacking: Explanation-Guided Stacked Ensemble Learning | XStacking: Erklärungsgeführtes Gestapeltes Ensemble Lernen | XStacking: 解释引导堆叠组合学习 2507.17650v1 |
Authors (3): Moncef Garouani, Ayah Barhrhouj, Olivier Teste
Ensemble Machine Learning (EML) techniques, especially stacking, have been shown to improve predictive performance by combining multiple base models. However, they are often criticized for their lack of interpretability. In this paper, we introduce XStacking, an effective and inherently explainable framework that addresses this limitation by integrating dynamic feature transformation with model-agnostic Shapley additive explanations. This enables stacked models to retain their predictive accuracy while becoming inherently explainable. We demonstrate the effectiveness of the framework on 29 datasets, achieving improvements in both the predictive effectiveness of the learning space and the interpretability of the resulting models. XStacking offers a practical and scalable solution for responsible ML.
综合机体学习技术,特别是堆叠式技术,通过结合多个基数模型,已证明可以提高预测性能,但往往因其缺乏可解释性而受到批评。在本文中,我们引入了XSacking,这是一个有效且内在可以解释的框架,通过将动态特征转换与模型不可知的形状添加解释结合起来来解决这一局限性。这使得堆叠式模型能够保持预测性准确性,而同时又可以内在地解释。我们展示了29个数据集框架的有效性,在学习空间的预测有效性和由此形成的模型的可解释性两方面都取得了改进。 XStacking为负责任的ML提供了一个实用且可扩展的解决方案。
Article 208
Title@2025-07-23 (3): A Concept-based approach to Voice Disorder Detection
Title: A Concept-based approach to Voice Disorder Detection | Ein konzeptbasierter Ansatz zur Erkennung von Sprachstörungen | 一种基于概念的语音疾病检测方法 2507.17799v1 |
Authors (7): Davide Ghia, Gabriele Ciravegna, Alkis Koudounas, Marco Fantini, Erika Crosetti, Giovanni Succo, Tania Cerquitelli
Voice disorders affect a significant portion of the population, and the ability to diagnose them using automated, non-invasive techniques would represent a substantial advancement in healthcare, improving the quality of life of patients. Recent studies have demonstrated that artificial intelligence models, particularly Deep Neural Networks (DNNs), can effectively address this task. However, due to their complexity, the decision-making process of such models often remain opaque, limiting their trustworthiness in clinical contexts. This paper investigates an alternative approach based on Explainable AI (XAI), a field that aims to improve the interpretability of DNNs by providing different forms of explanations. Specifically, this works focuses on concept-based models such as Concept Bottleneck Model (CBM) and Concept Embedding Model (CEM) and how they can achieve performance comparable to traditional deep learning methods, while offering a more transparent and interpretable decision framework.
最近的研究表明,人工智能模型,特别是深神经网络(DNNs)能够有效完成这项任务,然而,由于这些模型的复杂性,这些模型的决策过程往往不透明,限制了其在临床环境中的可信赖性。本文调查了基于可解释的AI(XAI)的替代方法,这个领域的目的是通过提供不同形式的解释来改进DNS的可解释性。具体地说,这项工作侧重于基于概念的模型,如“Bottleneck模型”(CBM)和“概念嵌入模型”(CEM),以及这些模型如何取得与传统的深层学习方法相类似的业绩,同时提供一个更加透明和可解释的决策框架。
Article 209
Title@2025-07-23 (3): WSM: Decay-Free Learning Rate Schedule via Checkpoint Merging for LLM Pre-training
Title: WSM: Decay-Free Learning Rate Schedule via Checkpoint Merging for LLM Pre-training | WSM: Decay-Free Learning Rate Scheduling via Checkpoint Merging für LLM Pre-Training | WSM:通过LLM培训前的检查站合并,制定无下降的学习率表 2507.17634v1 |
Authors (10): Changxin Tian, Jiapeng Wang, Qian Zhao, Kunlong Chen, Jia Liu, Ziqi Liu, Jiaxin Mao, Wayne Xin Zhao, Zhiqiang Zhang, Jun Zhou
Recent advances in learning rate (LR) scheduling have demonstrated the effectiveness of decay-free approaches that eliminate the traditional decay phase while maintaining competitive performance. Model merging techniques have emerged as particularly promising solutions in this domain. We present Warmup-Stable and Merge (WSM), a general framework that establishes a formal connection between learning rate decay and model merging. WSM provides a unified theoretical foundation for emulating various decay strategies-including cosine decay, linear decay and inverse square root decay-as principled model averaging schemes, while remaining fully compatible with diverse optimization methods. Through extensive experiments, we identify merge duration-the training window for checkpoint aggregation-as the most critical factor influencing model performance, surpassing the importance of both checkpoint interval and merge quantity. Our framework consistently outperforms the widely-adopted Warmup-Stable-Decay (WSD) approach across multiple benchmarks, achieving significant improvements of +3.5% on MATH, +2.9% on HumanEval, and +5.5% on MMLU-Pro. The performance advantages extend to supervised fine-tuning scenarios, highlighting WSM’s potential for long-term model refinement.
最近的学习率(LR)列表进展显示了消除传统衰变阶段同时保持竞争性绩效的无腐化方法的有效性。示范合并技术是这一领域的特别有希望的解决办法。我们介绍的是温度和合并(WSM),这是一个在学习率衰变和模式合并之间建立正式联系的总框架。世界学习率(WSM)为模拟各种衰变战略提供了统一的理论基础,包括共弦衰减、线性衰变和反平方根衰变平均模式,同时仍然与多种优化方法完全兼容。我们通过广泛的实验,确定检查站合并培训窗口是影响模型性能的最关键因素,超越了检查站间隔和合并数量的重要性。我们的框架始终超越了广泛采用的WSDSD(WD)方法的多重基准,大大改进了MATH的+3.5%、HumanEval的+2.9%和MLU-Pro的+5.5%。绩效优势扩大到监督的微调情景,突出WSMU的长期模型改进潜力。
Article 210
Title@2025-07-23 (3): Unified Convergence Theory of Stochastic and Variance-Reduced Cubic Newton Methods
Title: Unified Convergence Theory of Stochastic and Variance-Reduced Cubic Newton Methods | Unified Convergence Theory of Stochastic and Variance-Reduced Cubic Newton Methods | 存储和差异缩减立方立方体牛顿方法统一一致理论 2302.11962v5 |
Authors (3): El Mahdi Chayti, Nikita Doikov, Martin Jaggi
We study stochastic Cubic Newton methods for solving general possibly non-convex minimization problems. We propose a new framework, which we call the helper framework, that provides a unified view of the stochastic and variance-reduced second-order algorithms equipped with global complexity guarantees. It can also be applied to learning with auxiliary information. Our helper framework offers the algorithm designer high flexibility for constructing and analyzing the stochastic Cubic Newton methods, allowing arbitrary size batches, and the use of noisy and possibly biased estimates of the gradients and Hessians, incorporating both the variance reduction and the lazy Hessian updates. We recover the best-known complexities for the stochastic and variance-reduced Cubic Newton, under weak assumptions on the noise. A direct consequence of our theory is the new lazy stochastic second-order method, which significantly improves the arithmetic complexity for large dimension problems. We also establish complexity bounds for the classes of gradient-dominated objectives, that include convex and strongly convex problems. For Auxiliary Learning, we show that using a helper (auxiliary function) can outperform training alone if a given similarity measure is small.
我们研究了解决一般的、可能非混凝土的最小化问题的随机立方体牛顿方法。我们提出了一个新的框架,我们称之为帮助框架,对具有全球复杂度保障的随机和差异减少的二级算法提供了统一的观点,还可以用于辅助信息的学习。我们的帮助框架为算法设计者提供了建造和分析随机立方体牛顿方法的高度灵活性,允许任意的尺寸批量,以及使用对梯度和海相的杂音和可能偏差的估计,包括减少差异和懒惰的海相更新。我们在噪音的微弱假设下恢复了已知的随机和差异减少的二次算法的复杂性。我们理论的一个直接后果是新的懒惰的二次排序方法,该方法大大改进了大尺寸问题的算术复杂性。我们还为梯度强的目标类别建立了复杂度界限,其中包括调控器和强烈的调控重问题。对于辅助性学习来说,我们展示了使用类似辅助性测量法的类似功能。
Article 211
Title@2025-07-23 (3): Toward a Lightweight and Robust Design for Caching
Title: Toward a Lightweight and Robust Design for Caching | Hin zu einem leichten und robusten Design für Caching | Caching 轻量度和强力设计 2507.16242v2 |
Authors (6): Peng Chen, Hailiang Zhao, Jiaji Zhang, Xueyan Tang, Yixuan Wang, Shuiguang Deng
The online caching problem aims to minimize cache misses when serving a sequence of requests under a limited cache size. While naive learning-augmented caching algorithms achieve ideal $1$-consistency, they lack robustness guarantees. Existing robustification methods either sacrifice $1$-consistency or introduce significant computational overhead. In this paper, we introduce Guard, a lightweight robustification framework that enhances the robustness of a broad class of learning-augmented caching algorithms to $2H_k + 2$, while preserving their $1$-consistency. Guard achieves the current best-known trade-off between consistency and robustness, with only $O(1)$ additional per-request overhead, thereby maintaining the original time complexity of the base algorithm. Extensive experiments across multiple real-world datasets and prediction models validate the effectiveness of Guard in practice.
在线缓存问题的目的是在有限缓存规模下满足一系列请求时,最大限度地减少缓存误差。 虽然天真的学习缓存算法实现了理想的1美元一致性,但它们缺乏稳健性保障。 现有的稳健化方法要么牺牲1美元一致性,要么引入大量计算间接费用。 在本文中,我们引入了一个轻量级强力化框架,将广泛的经学习缓存算法的稳健性提高到2美元,同时保持其1美元的一贯性。 卫兵在一致性和稳健性之间实现了目前最著名的平衡,只有1美元的额外人均请求管理费,从而保持了基础算法的最初时间复杂性。 跨多个现实世界数据集和预测模型的大规模实验验证了卫士在实践中的有效性。
Article 212
Title@2025-07-23 (3): Machine Learning Classification and Portfolio Allocation: with Implications from Machine Uncertainty
Title: Machine Learning Classification and Portfolio Allocation: with Implications from Machine Uncertainty | Machine Learning Klassifizierung und Portfoliozuteilung: mit Implikationen aus der Maschinenunsicherheit | 机器学习分类和组合分配:机器不确定性的影响 2108.02283v2 |
Authors (2): Yang Bai, Kuntara Pukthuanthong
We use multi-class machine learning classifiers to identify the stocks that outperform or underperform other stocks. The resulting long-short portfolios achieve annual Sharpe ratios of 1.67 (value-weighted) and 3.35 (equal-weighted), with annual alphas ranging from 29\% to 48\%. These results persist after controlling for machine learning regressions and remain robust among large-cap stocks. Machine uncertainty, as measured by predicted probabilities, impairs the prediction performance. Stocks with higher machine uncertainty experience lower returns, particularly when human proxies of information uncertainty align with machine uncertainty. Consistent with the literature, such an effect is driven by the past underperformers.
我们使用多级机能学习分类法来查明业绩超过或业绩低于其他库存的库存。因此,长期短缺的投资组合每年的夏普比率为1.67(价值加权)和3.35(相等加权),每年的α值介于29至48之间。这些结果在对机器学习回归进行控制后持续存在,在大型库存中保持强劲。根据预测的概率衡量,机器不确定性会损害预测性能。机器不确定性较高的库存的回报率较低,特别是当人类信息不确定性的代身与机器不确定性相一致时。根据文献,这种效果是由过去的落后者驱动的。
Article 213
Title@2025-07-23 (3): Vision Transformer attention alignment with human visual perception in aesthetic object evaluation
Title: Vision Transformer attention alignment with human visual perception in aesthetic object evaluation | Vision Transformer Aufmerksamkeitsausrichtung mit menschlicher visueller Wahrnehmung in ästhetischer Objektauswertung | 在美学物体评价中,视觉转变器关注与人类视觉认知的一致性 2507.17616v1 |
Authors (4): Miguel Carrasco, César González-Martín, José Aranda, Luis Oliveros
Visual attention mechanisms play a crucial role in human perception and aesthetic evaluation. Recent advances in Vision Transformers (ViTs) have demonstrated remarkable capabilities in computer vision tasks, yet their alignment with human visual attention patterns remains underexplored, particularly in aesthetic contexts. This study investigates the correlation between human visual attention and ViT attention mechanisms when evaluating handcrafted objects. We conducted an eye-tracking experiment with 30 participants (9 female, 21 male, mean age 24.6 years) who viewed 20 artisanal objects comprising basketry bags and ginger jars. Using a Pupil Labs eye-tracker, we recorded gaze patterns and generated heat maps representing human visual attention. Simultaneously, we analyzed the same objects using a pre-trained ViT model with DINO (Self-DIstillation with NO Labels), extracting attention maps from each of the 12 attention heads. We compared human and ViT attention distributions using Kullback-Leibler divergence across varying Gaussian parameters (sigma=0.1 to 3.0). Statistical analysis revealed optimal correlation at sigma=2.4 +-0.03, with attention head #12 showing the strongest alignment with human visual patterns. Significant differences were found between attention heads, with heads #7 and #9 demonstrating the greatest divergence from human attention (p< 0.05, Tukey HSD test). Results indicate that while ViTs exhibit more global attention patterns compared to human focal attention, certain attention heads can approximate human visual behavior, particularly for specific object features like buckles in basketry items. These findings suggest potential applications of ViT attention mechanisms in product design and aesthetic evaluation, while highlighting fundamental differences in attention strategies between human perception and current AI models.
视觉关注机制在人类感知和美学评估中发挥着关键作用。 视觉变异器(ViTs)最近的进展在计算机视觉任务中表现出非凡的能力, 然而它们与人类视觉关注模式的匹配率仍然未得到充分探索, 特别是在美学背景下。 本研究调查人类视觉关注和ViT关注机制在评估手工艺物体时的关联性。 我们用30名参与者(9名女性,21名男性,平均年龄24.6岁)观看了20个手工物体, 包括篮子袋和姜罐。 使用普皮尔实验室的眼跟踪器,我们记录了视觉模式的视觉模式,并制作了代表人类视觉关注的热图。 与此同时,我们用事先训练过的ViT模型(与NO Labels的自我蒸馏法)分析了相同的应用目标,从12个关注头部的每个关注点中提取了关注度图示。 我们用Kullback-Leepter的注意力分布对比了30岁差异。 统计分析显示Sigbil=2.4+0.03, 以及基本产品评估机制之间的最佳关联性相关性, , 以及关注12头部的注意显示人类视觉 显示人类视觉判断值 和视觉判断值 。
Article 214
Title@2025-07-23 (3): Time Deep Gradient Flow Method for pricing American options
Title: Time Deep Gradient Flow Method for pricing American options | Time Deep Gradient Flow Methode für die Preisgestaltung amerikanischen Optionen | 美国选项定价的 “ 深梯度 “ 流程方法 2507.17606v1 |
Authors (1): Jasper Rou
In this research, we explore neural network-based methods for pricing multidimensional American put options under the BlackScholes and Heston model, extending up to five dimensions. We focus on two approaches: the Time Deep Gradient Flow (TDGF) method and the Deep Galerkin Method (DGM). We extend the TDGF method to handle the free-boundary partial differential equation inherent in American options. We carefully design the sampling strategy during training to enhance performance. Both TDGF and DGM achieve high accuracy while outperforming conventional Monte Carlo methods in terms of computational speed. In particular, TDGF tends to be faster during training than DGM.
在这一研究中,我们探索以神经网络为基础的方法,根据BlackScholes和Heston模式确定美国多层面的定价方案,其范围可达五个方面,我们侧重于两个方面:时深梯流方法和深加勒金方法。我们扩展了TDGF方法,以处理美国备选方案所固有的自由边界部分差异方程式。我们在培训期间仔细设计抽样战略,以提高绩效。TDGF和DGM在计算速度方面都非常精准,但优于传统的蒙特卡洛方法。特别是,在培训期间,TDGF往往比DGM更快。
Article 215
Title@2025-07-23 (3): Trusted Multi-view Learning under Noisy Supervision
Title: Trusted Multi-view Learning under Noisy Supervision | Vertrauenswürdiges Multi-View-Lernen unter Noisy Supervision | 在噪音监督下的可信赖的多观点学习 2404.11944v3 |
Authors (7): Yilin Zhang, Cai Xu, Han Jiang, Ziyu Guan, Wei Zhao, Xiaofei He, Murat Sensoy
Multi-view learning methods often focus on improving decision accuracy while neglecting the decision uncertainty, which significantly restricts their applications in safety-critical scenarios. To address this, trusted multi-view learning methods estimate prediction uncertainties by learning class distributions from each instance. However, these methods heavily rely on high quality ground-truth labels. This motivates us to delve into a new problem: how to develop a reliable multi-view learning model under the guidance of noisy labels? We propose the Trusted Multi view Noise Refining (TMNR) method to address this challenge by modeling label noise arising from low-quality data features and easily-confused classes. TMNR employs evidential deep neural networks to construct view-specific opinions that capture both beliefs and uncertainty. These opinions are then transformed through noise correlation matrices to align with the noisy supervision, where matrix elements are constrained by sample uncertainty to reflect label reliability. Furthermore, considering the challenge of jointly optimizing the evidence network and noise correlation matrices under noisy supervision, we further propose Trusted Multi-view Noise Re-Refining (TMNR^2 ), which disentangles this complex co-training problem by establishing different training objectives for distinct modules. TMNR^2 identifies potentially mislabeled samples through evidence-label consistency and generates pseudo-labels from neighboring information. By assigning clean samples to optimize evidential networks and noisy samples to guide noise correlation matrices, respectively, TMNR^2 reduces mapping interference and achieves stabilizes training. Experimental results demonstrate that TMNR^2 significantly outperforms baseline methods, with average accuracy improvements of 7% on datasets with 50% label noise.
多视角学习方法往往侧重于提高决策的准确性,而忽视决定不确定性,这极大地限制了其在安全临界情景中的应用。为了解决这个问题,值得信赖的多视角学习方法通过学习每类的分布来估计预测不确定性。然而,这些方法在很大程度上依赖高质量的地面真相标签。这促使我们深入到一个新的问题:如何在噪音标签的指导下开发可靠的多视角学习模式?我们建议采用信任的多视角改进(TMNR)方法,通过模拟低质量数据特征和容易配置的等级产生的标签噪音来应对这一挑战。为了解决这个问题,多视角学习方法通过学习每类的分类分布来估计不确定性。但是,这些方法在很大程度上依赖高质量的地面真相标签标签标签标签标签标签标签标签标签标签标签。此外,考虑到在噪音标签标签标签监管下联合优化证据网络和噪音相关关系矩阵的挑战,我们进一步建议信任的多视角改进(TMNRNR2) 改进标准(TMNRR2) , IMRNR使用精确度培训网络的精确度网络来构建清晰的准确性神经网络,通过确定不同的培训目标,通过透明性模型来大幅测量结果。
Article 216
Title@2025-07-23 (3): Citation Recommendation using Deep Canonical Correlation Analysis
Title: Citation Recommendation using Deep Canonical Correlation Analysis | Zitationsempfehlung mit tiefer kanonischen Korrelationsanalyse | 使用深锥体关联分析的引用建议 2507.17603v1 |
Authors (2): Conor McNamara, Effirul Ramlan
Recent advances in citation recommendation have improved accuracy by leveraging multi-view representation learning to integrate the various modalities present in scholarly documents. However, effectively combining multiple data views requires fusion techniques that can capture complementary information while preserving the unique characteristics of each modality. We propose a novel citation recommendation algorithm that improves upon linear Canonical Correlation Analysis (CCA) methods by applying Deep CCA (DCCA), a neural network extension capable of capturing complex, non-linear relationships between distributed textual and graph-based representations of scientific articles. Experiments on the large-scale DBLP (Digital Bibliography & Library Project) citation network dataset demonstrate that our approach outperforms state-of-the-art CCA-based methods, achieving relative improvements of over 11% in Mean Average Precision@10, 5% in Precision@10, and 7% in Recall@10. These gains reflect more relevant citation recommendations and enhanced ranking quality, suggesting that DCCA’s non-linear transformations yield more expressive latent representations than CCA’s linear projections.
引用建议最近的进展提高了准确性,利用多视角代表学习将学术文件中的各种模式结合起来。然而,要有效地将多重数据观点结合起来,就需要采用能够捕捉补充信息并同时保持每种模式独特性的综合技术。我们提出一个新的引用建议算法,通过应用深海CCCA(DCCA)改进线性Canonical Corrolation分析(CCA)方法,这是一种神经网络扩展,能够捕捉分布式文本和图表形式的科学文章表述之间的复杂、非线性关系。对大型DBLP(数字文献目录和图书馆项目)引用网络数据集的实验表明,我们的方法比基于CCA的线性预测更明显,在平均精度@10、5%的Precision@10和7%的Recall@10中相对改进了11%以上的平均平均精度。这些收益反映了更相关的引用建议和更高的排名质量,表明DCC的非线性转变比CC的线性潜在表述方式要大于CC的线性预测。
Article 217
Title@2025-07-23 (3): HyDRA: A Hybrid-Driven Reasoning Architecture for Verifiable Knowledge Graphs
Title: HyDRA: A Hybrid-Driven Reasoning Architecture for Verifiable Knowledge Graphs | HyDRA: Eine hybrid-getriebene Grundarchitektur für überprüfbare Wissensgraphen | HYDRA:可核实知识图的混合驱动理由结构 2507.15917v2 |
Authors (5): Adrian Kaiser, Claudiu Leoveanu-Condrei, Ryan Gold, Marius-Constantin Dinu, Markus Hofmarcher
The synergy between symbolic knowledge, often represented by Knowledge Graphs (KGs), and the generative capabilities of neural networks is central to advancing neurosymbolic AI. A primary bottleneck in realizing this potential is the difficulty of automating KG construction, which faces challenges related to output reliability, consistency, and verifiability. These issues can manifest as structural inconsistencies within the generated graphs, such as the formation of disconnected $\textit{isolated islands}$ of data or the inaccurate conflation of abstract classes with specific instances. To address these challenges, we propose HyDRA, a $\textbf{Hy}$brid-$\textbf{D}$riven $\textbf{R}$easoning $\textbf{A}$rchitecture designed for verifiable KG automation. Given a domain or an initial set of documents, HyDRA first constructs an ontology via a panel of collaborative neurosymbolic agents. These agents collaboratively agree on a set of competency questions (CQs) that define the scope and requirements the ontology must be able to answer. Given these CQs, we build an ontology graph that subsequently guides the automated extraction of triplets for KG generation from arbitrary documents. Inspired by design-by-contracts (DbC) principles, our method leverages verifiable contracts as the primary control mechanism to steer the generative process of Large Language Models (LLMs). To verify the output of our approach, we extend beyond standard benchmarks and propose an evaluation framework that assesses the functional correctness of the resulting KG by leveraging symbolic verifications as described by the neurosymbolic AI framework, $\textit{SymbolicAI}$. This work contributes a hybrid-driven architecture for improving the reliability of automated KG construction and the exploration of evaluation methods for measuring the functional integrity of its output. The code is publicly available.
象征性知识(通常以知识图(KGs)为代表)和神经网络基因能力之间的协同作用,是推进神经同步AI的关键。实现这一潜力的一个主要瓶颈是难以实现KG建筑的自动化,这面临着与产出可靠性、一致性和可核查性有关的挑战。这些问题可以表现为生成的图表中的结构不一致,例如数据断开 $\ textit{孤立的岛屿}$,或者将抽象类别与具体实例不准确地拼接。为了应对这些挑战,我们提议HyDRA,一个$\ textb{Hy}$Brid$\textb{D}的神经神经同步网络。我们建议HyDRA,一个$textbrialtyal syblationalal AI(Screyleblientybligal) 标定一个定义范围和要求 IMBlical IMLILILILILLIA(通过我们定义的直径直译方法,通过直译的直译的直译方法,通过直译的直译的直译的ILILLILILILILILILILILLL) 解释法,这些解释性文件的流程必须随后通过直译的直路路。
Article 218
Title@2025-07-23 (3): Wasserstein GAN-Based Precipitation Downscaling with Optimal Transport for Enhancing Perceptual Realism
Title: Wasserstein GAN-Based Precipitation Downscaling with Optimal Transport for Enhancing Perceptual Realism | Wasserstein GAN-based Niederschlag Downscaling mit optimalem Transport zur Verbesserung des Wahrnehmungsrealismus | 瓦森斯坦GAN的降水量降幅与最佳运输的降幅,以加强观念现实主义 2507.17798v1 |
Authors (4): Kenta Shiraishi, Yuka Muto, Atsushi Okazaki, Shunji Kotsuki
High-resolution (HR) precipitation prediction is essential for reducing damage from stationary and localized heavy rainfall; however, HR precipitation forecasts using process-driven numerical weather prediction models remains challenging. This study proposes using Wasserstein Generative Adversarial Network (WGAN) to perform precipitation downscaling with an optimal transport cost. In contrast to a conventional neural network trained with mean squared error, the WGAN generated visually realistic precipitation fields with fine-scale structures even though the WGAN exhibited slightly lower performance on conventional evaluation metrics. The learned critic of WGAN correlated well with human perceptual realism. Case-based analysis revealed that large discrepancies in critic scores can help identify both unrealistic WGAN outputs and potential artifacts in the reference data. These findings suggest that the WGAN framework not only improves perceptual realism in precipitation downscaling but also offers a new perspective for evaluating and quality-controlling precipitation datasets.
高分辨率降水预测对于减少固定和局部大降雨造成的损害至关重要;然而,使用过程驱动的数字天气预测模型进行的人力资源降水预测仍然具有挑战性。本研究报告提议使用瓦塞斯特因基因反versarial网络(WGAN)进行降水降水降水降水,并采用最佳运输成本。与经过中度平方误差培训的传统神经网络相比,降水预测生成了视觉现实的降水场,并具有微小结构。即使降水预测网络在常规评估指标上表现略低。WGAN的熟知评论家认为,降水降水降水量指标与人类的观念现实性密切相关。基于案例的分析表明,评分中的巨大差异有助于识别参考数据中不切实际的WGAN产出和潜在文物。这些研究结果表明,WGAN框架不仅改善了降水降水降水降水的想象现实主义,而且还为评估和质量控制降水数据集提供了新的视角。
Article 219
Title@2025-07-23 (3): First, Learn What You Don’t Know: Active Information Gathering for Driving at the Limits of Handling
Title: First, Learn What You Don’t Know: Active Information Gathering for Driving at the Limits of Handling | Zuerst erfahren Sie, was Sie nicht wissen: Aktive Informationen sammeln für das Fahren an den Grenzen der Handhabung | 首先,学习你不知道的东西:为在处理的极限驾驶而积极收集信息 2411.00107v2 |
Authors (7): Alexander Davydov, Franck Djeumou, Marcus Greiff, Makoto Suminaka, Michael Thompson, John Subosits, Thomas Lew
Combining data-driven models that adapt online and model predictive control (MPC) has enabled effective control of nonlinear systems. However, when deployed on unstable systems, online adaptation may not be fast enough to ensure reliable simultaneous learning and control. For example, a controller on a vehicle executing highly dynamic maneuvers–such as drifting to avoid an obstacle–may push the vehicle’s tires to their friction limits, destabilizing the vehicle and allowing modeling errors to quickly compound and cause a loss of control. To address this challenge, we present an active information gathering framework for identifying vehicle dynamics as quickly as possible. We propose an expressive vehicle dynamics model that leverages Bayesian last-layer meta-learning to enable rapid online adaptation. The model’s uncertainty estimates are used to guide informative data collection and quickly improve the model prior to deployment. Dynamic drifting experiments on a Toyota Supra show that (i) the framework enables reliable control of a vehicle at the edge of stability, (ii) online adaptation alone may not suffice for zero-shot control and can lead to undesirable transient errors or spin-outs, and (iii) active data collection helps achieve reliable performance.
将在线和模型预测控制(MPC)的由数据驱动的模型结合起来,使得能够有效控制非线性系统。然而,如果在不稳定的系统上部署,在线适应可能不够快,无法确保可靠的同时学习和控制。例如,对车辆进行高度动态操纵,如漂流,以避免出现障碍,可能将车辆轮胎推向摩擦极限,破坏车辆稳定,允许模型错误迅速复合并造成控制丧失。为了应对这一挑战,我们提出了一个积极的信息收集框架,以便尽快查明车辆动态。我们提议了一个清晰的车辆动态模型,利用Bayesian的上层元数据学习,以便能够快速在线适应。模型的不确定性估计用于指导信息化数据收集工作,并在部署之前迅速改进模型。丰田苏普拉的动态漂浮实验显示:(一) 框架能够可靠地控制处于稳定边缘的车辆,(二) 仅靠在线适应不足以进行零发控制,并可能导致不可取的中转错误或旋转尾出,以及(三) 积极数据收集有助于实现可靠的性业绩。
Article 220
Title@2025-07-23 (3): Constructing Optimal Noise Channels for Enhanced Robustness in Quantum Machine Learning
Title: Constructing Optimal Noise Channels for Enhanced Robustness in Quantum Machine Learning | Konstruieren von optimalen Lärmkanälen für verbesserte Robustheit im Quantum Machine Learning | 构建量子机器学习中增强强力的最佳噪音通道 2404.16417v2 |
Authors (3): David Winderl, Nicola Franco, Jeanette Miriam Lorenz
With the rapid advancement of Quantum Machine Learning (QML), the critical need to enhance security measures against adversarial attacks and protect QML models becomes increasingly evident. In this work, we outline the connection between quantum noise channels and differential privacy (DP), by constructing a family of noise channels which are inherently $\epsilon$-DP: $(\alpha, \gamma)$-channels. Through this approach, we successfully replicate the $\epsilon$-DP bounds observed for depolarizing and random rotation channels, thereby affirming the broad generality of our framework. Additionally, we use a semi-definite program to construct an optimally robust channel. In a small-scale experimental evaluation, we demonstrate the benefits of using our optimal noise channel over depolarizing noise, particularly in enhancing adversarial accuracy. Moreover, we assess how the variables $\alpha$ and $\gamma$ affect the certifiable robustness and investigate how different encoding methods impact the classifier’s robustness.
随着量子机器学习(QML)的快速发展,加强安全措施防止对抗性攻击和保护QML模式的迫切需要越来越明显。在这项工作中,我们通过建立一个本质上是$@epslon$-DP的噪音频道大家庭($(alpha,\gamma)-DP),概述了量子噪声频道和差别隐私(DP)之间的联系。通过这种方法,我们成功地复制了为分解和随机旋转频道观测到的美元-DP界限,从而肯定了我们框架的广泛普遍性。此外,我们使用半定型程序来构建一个最佳强健的频道。在一次小规模的实验评估中,我们展示了利用我们最佳噪音频道克服噪声除极化的好处,特别是提高对抗性噪音的准确性。此外,我们评估变量$\alpha$和$\gamma$如何影响可验证的坚固度,并调查不同的编码方法如何影响分类者的坚固度。
Article 221
Title@2025-07-23 (3): GenSelect: A Generative Approach to Best-of-N
Title: GenSelect: A Generative Approach to Best-of-N | GenSelect: Ein generativer Ansatz zum Best-of-N | GenSect: 产生最佳N型的方法 2507.17797v1 |
Authors (5): Shubham Toshniwal, Ivan Sorokin, Aleksander Ficek, Ivan Moshkov, Igor Gitman
Generative reward models with parallel sampling have enabled effective test-time scaling for reasoning tasks. Current approaches employ pointwise scoring of individual solutions or pairwise comparisons. However, pointwise methods underutilize LLMs’ comparative abilities, while pairwise methods scale inefficiently with larger sampling budgets. We introduce GenSelect, where the LLM uses long reasoning to select the best solution among N candidates. This leverages LLMs’ comparative strengths while scaling efficiently across parallel sampling budgets. For math reasoning, we demonstrate that reasoning models, such as QwQ and DeepSeek-R1-0528, excel at GenSelect, outperforming existing scoring approaches with simple prompting.
具有平行抽样的创用奖励模式使得能够有效地测试推理任务的时间比例。目前的方法采用有分数的个别解决办法评分或对称比较。但是,有分数的方法没有充分利用LLMs的比较能力,而有分数的方法与较大的采样预算相比却没有效率。我们引入了GenSelect, LLM利用长期推理在N候选人中选择最佳解决办法。这在平行采样预算之间利用LLMs的相对优势,同时有效推广。关于数学推理,我们证明,QwQ和DeepSeek-R1-0528等推理模型优于GenSelect,以简单快速的方式优于现有的评分方法。
Article 222
Title@2025-07-23 (3): SToFM: a Multi-scale Foundation Model for Spatial Transcriptomics
Title: SToFM: a Multi-scale Foundation Model for Spatial Transcriptomics | SToFM: ein Multi-Skala-Stiftungsmodell für räumliche Transkriptomik | SToFM:空间转换学多规模基础模型 2507.11588v2 |
Authors (6): Suyuan Zhao, Yizhen Luo, Ganbo Yang, Yan Zhong, Hao Zhou, Zaiqing Nie
Spatial Transcriptomics (ST) technologies provide biologists with rich insights into single-cell biology by preserving spatial context of cells. Building foundational models for ST can significantly enhance the analysis of vast and complex data sources, unlocking new perspectives on the intricacies of biological tissues. However, modeling ST data is inherently challenging due to the need to extract multi-scale information from tissue slices containing vast numbers of cells. This process requires integrating macro-scale tissue morphology, micro-scale cellular microenvironment, and gene-scale gene expression profile. To address this challenge, we propose SToFM, a multi-scale Spatial Transcriptomics Foundation Model. SToFM first performs multi-scale information extraction on each ST slice, to construct a set of ST sub-slices that aggregate macro-, micro- and gene-scale information. Then an SE(2) Transformer is used to obtain high-quality cell representations from the sub-slices. Additionally, we construct \textbf{SToCorpus-88M}, the largest high-resolution spatial transcriptomics corpus for pretraining. SToFM achieves outstanding performance on a variety of downstream tasks, such as tissue region semantic segmentation and cell type annotation, demonstrating its comprehensive understanding of ST data through capturing and integrating multi-scale information.
空间转换技术(ST)为生物学家提供了对单细胞生物学的丰富洞察力,保护了细胞的空间环境。为ST建立基础模型可以大大增强对广泛而复杂的数据源的分析,打开关于生物组织复杂性的新视角。然而,由于需要从含有大量细胞的组织切片中提取多尺度信息,ST数据本身就具有挑战性。这一过程需要将大型组织形态学、微尺度细胞微型环境以及基因尺度基因表达剖析结合起来。为了应对这一挑战,我们建议SToFM(一个多尺度空间转换基础模型模型)。SToFM首先对每个ST切片进行多尺度的信息提取,以构建一套综合宏观、微和基因尺度信息的ST子切切片。然后,SE(2)变异器用于从子虱中获取高质量的细胞表象。此外,我们建造了最大高分辨率空间缩图集模型,用于培训前。SToFMM首先对每个ST切片进行多尺度的信息提取,以构建一套包含宏观、微型和基因尺度信息的ST子切切片。STRM(Stographal)将一系列数据整合成成成,并展示其跨级的跨级的跨区域数据,通过Stegradudestration(Stradududududududududududududududude),通过S&dedududududududududududududududude)的多级的跨结构部分取得一系列数据。
Article 223
Title@2025-07-23 (3): Enhancing Quantum Federated Learning with Fisher Information-Based Optimization
Title: Enhancing Quantum Federated Learning with Fisher Information-Based Optimization | Verbesserung des Quantum-Federated-Learnings mit Fisher Information-based Optimization | 加强以渔业信息为基础的优化的量子联邦学习 2507.17580v1 |
Authors (2): Amandeep Singh Bhatia, Sabre Kais
Federated Learning (FL) has become increasingly popular across different sectors, offering a way for clients to work together to train a global model without sharing sensitive data. It involves multiple rounds of communication between the global model and participating clients, which introduces several challenges like high communication costs, heterogeneous client data, prolonged processing times, and increased vulnerability to privacy threats. In recent years, the convergence of federated learning and parameterized quantum circuits has sparked significant research interest, with promising implications for fields such as healthcare and finance. By enabling decentralized training of quantum models, it allows clients or institutions to collaboratively enhance model performance and outcomes while preserving data privacy. Recognizing that Fisher information can quantify the amount of information that a quantum state carries under parameter changes, thereby providing insight into its geometric and statistical properties. We intend to leverage this property to address the aforementioned challenges. In this work, we propose a Quantum Federated Learning (QFL) algorithm that makes use of the Fisher information computed on local client models, with data distributed across heterogeneous partitions. This approach identifies the critical parameters that significantly influence the quantum model’s performance, ensuring they are preserved during the aggregation process. Our research assessed the effectiveness and feasibility of QFL by comparing its performance against other variants, and exploring the benefits of incorporating Fisher information in QFL settings. Experimental results on ADNI and MNIST datasets demonstrate the effectiveness of our approach in achieving better performance and robustness against the quantum federated averaging method.
联邦学习联盟(FL)在不同部门越来越受欢迎,为客户提供了一个合作的方式,以在不共享敏感数据的情况下培训一个全球模式;它涉及全球模式与参与客户之间的多轮沟通,提出了通信成本高、客户数据不一、处理时间长和隐私威胁脆弱性增加等若干挑战;近年来,联邦学习联盟和参数化量子电路的趋同引起了巨大的研究兴趣,对医疗保健和金融等领域产生了有希望的影响;通过对量子模型进行分散化培训,它使客户或机构能够合作提高模型的绩效和成果,同时保护数据隐私;认识到渔业信息可以量化量子体状态在参数变化下带来的信息数量,从而深入了解其地理计量和统计特性;我们打算利用这种属性应对上述挑战;在这项工作中,我们建议采用量子学习联盟(QFL)算法算出利用根据当地客户模式计算的渔业信息,数据分布在各种差异区之间,从而产生很有希望的影响;这一方法确定了对量子模型绩效有重大影响的关键参数,确保它们在汇总过程中得到保存;我们的研究评估了一个量子质国家数据质量模型的准确性效益,并比较了我们FLFLQ的进度方法的成本效益。
Article 224
Title@2025-07-23 (3): Boosting Ray Search Procedure of Hard-label Attacks with Transfer-based Priors
Title: Boosting Ray Search Procedure of Hard-label Attacks with Transfer-based Priors | Förderung der Ray-Suche von Hard-Label-Angriffen mit transferbasierten Prioren | 采用基于转移的前期程序,推动对硬标签袭击的雷光搜索程序 2507.17577v1 |
Authors (4): Chen Ma, Xinjie Xu, Shuyu Cheng, Qi Xuan
One of the most practical and challenging types of black-box adversarial attacks is the hard-label attack, where only the top-1 predicted label is available. One effective approach is to search for the optimal ray direction from the benign image that minimizes the $\ell_p$-norm distance to the adversarial region. The unique advantage of this approach is that it transforms the hard-label attack into a continuous optimization problem. The objective function value is the ray’s radius, which can be obtained via binary search at a high query cost. Existing methods use a “sign trick” in gradient estimation to reduce the number of queries. In this paper, we theoretically analyze the quality of this gradient estimation and propose a novel prior-guided approach to improve ray search efficiency both theoretically and empirically. Specifically, we utilize the transfer-based priors from surrogate models, and our gradient estimators appropriately integrate them by approximating the projection of the true gradient onto the subspace spanned by these priors and random directions, in a query-efficient manner. We theoretically derive the expected cosine similarities between the obtained gradient estimators and the true gradient, and demonstrate the improvement achieved by incorporating priors. Extensive experiments on the ImageNet and CIFAR-10 datasets show that our approach significantly outperforms 11 state-of-the-art methods in terms of query efficiency.
最实际和最具挑战性的黑盒对抗性攻击类型之一是硬标签攻击,只有顶层-1的预测标签。 一种有效的方法是从良性图像中寻找最佳的射线方向, 将美元/ p$- 北纬距离最小化到对抗性区域。 这种方法的独特优势在于它将硬标签攻击转化为连续优化问题。 客观的功能值是射线半径, 可以通过高查询成本的二进制搜索获得。 现有方法在梯度估计中使用“ 指派花招” 来减少查询次数。 在本文中, 我们理论上分析这种梯度估计的质量, 并提出一种创新的、 先前指导的、 新的、 以理论方式的, 以提高在理论上和 之前的光搜索效率。 具体地, 我们利用了基于转移的隐蔽模型, 以及我们的梯度估计器, 将真实的梯度投射到这些先前和随机方向的亚空间。 我们从理论上推算出在获得的梯度 梯度模型中预期的近似相似点, 并且用真实的模型模型模型展示了我们之前的深度和十进式的模型的方法, 。
Article 225
Title@2025-07-23 (3): Federated Behavioural Planes: Explaining the Evolution of Client Behaviour in Federated Learning
Title: Federated Behavioural Planes: Explaining the Evolution of Client Behaviour in Federated Learning | Federated Behavioural Planes: Erklärung der Evolution des Kundenverhaltens im Federated Learning | 联邦计划:解释联邦学习中客户行为演变的原因 2405.15632v3 |
Authors (6): Dario Fenoglio, Gabriele Dominici, Pietro Barbiero, Alberto Tonda, Martin Gjoreski, Marc Langheinrich
Federated Learning (FL), a privacy-aware approach in distributed deep learning environments, enables many clients to collaboratively train a model without sharing sensitive data, thereby reducing privacy risks. However, enabling human trust and control over FL systems requires understanding the evolving behaviour of clients, whether beneficial or detrimental for the training, which still represents a key challenge in the current literature. To address this challenge, we introduce Federated Behavioural Planes (FBPs), a novel method to analyse, visualise, and explain the dynamics of FL systems, showing how clients behave under two different lenses: predictive performance (error behavioural space) and decision-making processes (counterfactual behavioural space). Our experiments demonstrate that FBPs provide informative trajectories describing the evolving states of clients and their contributions to the global model, thereby enabling the identification of clusters of clients with similar behaviours. Leveraging the patterns identified by FBPs, we propose a robust aggregation technique named Federated Behavioural Shields to detect malicious or noisy client models, thereby enhancing security and surpassing the efficacy of existing state-of-the-art FL defense mechanisms. Our code is publicly available on GitHub.
联邦学习联合会(FL)是分布式深层学习环境中的一种隐私意识方法,它使许多客户能够在不共享敏感数据的情况下合作培训模型,从而减少隐私风险。然而,使人类信任和控制FL系统需要了解客户不断变化的行为,无论是对培训有益还是有害,这仍然是当前文献中的一个关键挑战。为了应对这一挑战,我们引入了联邦行为计划(FBPs),这是分析、视觉化和解释FL系统动态的一种新颖方法,表明客户在两种不同透镜下的行为方式:预测性表现(行为空间错误)和决策过程(行为空间),我们的实验表明FBPs提供了信息性轨迹,描述客户不断变化的状态及其对全球模式的贡献,从而能够识别具有类似行为的客户群。我们利用FBPs确定的模式,提出了一种名为联邦行为盾牌的强健的集成技术,用以检测恶意或噪音客户模式,从而加强安全和超过现有先进的FL防御机制的功效。我们的代码在GiHub上公开提供。
Article 226
Title@2025-07-23 (3): A Physically Driven Long Short Term Memory Model for Estimating Snow Water Equivalent over the Continental United States
Title: A Physically Driven Long Short Term Memory Model for Estimating Snow Water Equivalent over the Continental United States | Ein physikalisch angetriebenes Langzeit-Speichermodell zur Schätzung von Schneewasser, das über den Kontinent Vereinigte Staaten äquivalent ist | 估算美国大陆等效雪水的物理驱动长长短期记忆模型 2504.20129v2 |
Authors (6): Arun M. Saranathan, Mahmoud Saeedimoghaddam, Brandon Smith, Deepthi Raghunandan, Grey Nearing, Craig Pelissier
Snow is an essential input for various land surface models. Seasonal snow estimates are available as snow water equivalent (SWE) from process-based reanalysis products or locally from in situ measurements. While the reanalysis products are computationally expensive and available at only fixed spatial and temporal resolutions, the in situ measurements are highly localized and sparse. To address these issues and enable the analysis of the effect of a large suite of physical, morphological, and geological conditions on the presence and amount of snow, we build a Long Short-Term Memory (LSTM) network, which is able to estimate the SWE based on time series input of the various physical/meteorological factors as well static spatial/morphological factors. Specifically, this model breaks down the SWE estimation into two separate tasks: (i) a classification task that indicates the presence/absence of snow on a specific day and (ii) a regression task that indicates the height of the SWE on a specific day in the case of snow presence. The model is trained using physical/in situ SWE measurements from the SNOw TELemetry (SNOTEL) snow pillows in the western United States. We will show that trained LSTM models have a classification accuracy of $\geq 93\%$ for the presence of snow and a coefficient of correlation of $\sim 0.9$ concerning their SWE estimates. We will also demonstrate that the models can generalize both spatially and temporally to previously unseen data.
雪是各种陆地表面模型的基本投入。季节性积雪估计是来自基于过程的再分析产品或现场测量的当地测得的积雪水当量(SWE)的积雪当量(SWE)。虽然再分析产品计算成本昂贵,只能以固定的时空分辨率和时间分辨率提供,但现场测量则高度局部和稀少。为了解决这些问题,并能够分析大量物理、形态和地质条件对积雪的存在和积雪量的影响,我们建立了一个长期短期内存(LSTM)网络,它能够根据各种物理/气象因素的时间序列投入以及静态的空间/形态因素来估计SWE。具体而言,这一模型将SWE的估算分成两个不同的任务:(一) 分类任务,表明某一天是否有/没有雪;和(二) 回归任务,表明SWE在某一天对积雪的存在和积雪数量的影响。该模型利用SNOWE测量(SNOTEL)的物理/现场测量来估计SWE(SNOTEL)的积雪枕头。我们还将用经过训练的RYS-rental mexal mexexal ex exexexexmal ex ex ex ex exmexmexal exmexmexmexmexmexmexmex ex exmex ex $S.
Article 227
Title@2025-07-23 (3): Scalable DC Optimization via Adaptive Frank-Wolfe Algorithms
Title: Scalable DC Optimization via Adaptive Frank-Wolfe Algorithms | Skalierbare DC-Optimierung über adaptive Frank-Wolfe Algorithmen | 通过适应性 Frank-Wolfe Algorithms 进行可缩放的DC优化 2507.17545v1 |
Authors (1): Sebastian Pokutta
We consider the problem of minimizing a difference of (smooth) convex functions over a compact convex feasible region $P$, i.e., $\min_{x \in P} f(x) - g(x)$, with smooth $f$ and Lipschitz continuous $g$. This computational study builds upon and complements the framework of Maskan et al. [2025] by integrating advanced Frank-Wolfe variants to reduce computational overhead. We empirically show that constrained DC problems can be efficiently solved using a combination of the Blended Pairwise Conditional Gradients (BPCG) algorithm [Tsuji et al., 2022] with warm-starting and the adaptive error bound from Maskan et al. [2025]. The result is a highly efficient and scalable projection-free algorithm for constrained DC optimization.
我们考虑了如何将(mooth) convex功能的差别最小化于一个紧凑的可行区域$P$(即$\minx\ in P} f(x) - g(x)$,平滑美元和Lipschitz 连续美元。这一计算研究以Maskan et al. [2025] 框架为基础,并补充了这一框架,整合先进的Frank-Wolfe变量,以减少计算间接费用。我们从经验上表明,限制的DC问题可以通过混合混合混合混合使用混合的 “ BPPCG “ 算法[Tsuji et al. 2022],加上热启动和由Malskan et al. [2025] 约束的适应错误。其结果是,由于限制DC的优化,一种高效和可缩放的投影算法可以有效解决。
Article 228
Title@2025-07-23 (3): Optimal differentially private kernel learning with random projection
Title: Optimal differentially private kernel learning with random projection | Optimales differenzielles privates Kernel-Lernen mit Zufallsprojektion | 以随机预测的方式进行最佳、有差别的私人内核学习 2507.17544v1 |
Authors (3): Bonwoo Lee, Cheolwoo Park, Jeongyoun Ahn
Differential privacy has become a cornerstone in the development of privacy-preserving learning algorithms. This work addresses optimizing differentially private kernel learning within the empirical risk minimization (ERM) framework. We propose a novel differentially private kernel ERM algorithm based on random projection in the reproducing kernel Hilbert space using Gaussian processes. Our method achieves minimax-optimal excess risk for both the squared loss and Lipschitz-smooth convex loss functions under a local strong convexity condition. We further show that existing approaches based on alternative dimension reduction techniques, such as random Fourier feature mappings or $\ell_2$ regularization, yield suboptimal generalization performance. Our key theoretical contribution also includes the derivation of dimension-free generalization bounds for objective perturbation-based private linear ERM – marking the first such result that does not rely on noisy gradient-based mechanisms. Additionally, we obtain sharper generalization bounds for existing differentially private kernel ERM algorithms. Empirical evaluations support our theoretical claims, demonstrating that random projection enables statistically efficient and optimally private kernel learning. These findings provide new insights into the design of differentially private algorithms and highlight the central role of dimension reduction in balancing privacy and utility.
差异隐私已成为发展隐私保护学习算法的基石。 这项工作涉及在实验风险最小化( ERM) 框架内优化差异私人核心学习。 我们提出基于利用 Gaussian 进程在复制核心空间生成的Hilbert 空间中随机投影的新颖、 差异私人核心机构风险管理算法。 我们的方法为平方损失和Lipschitz- smoot- smovex 损失功能带来了最小最大风险。 我们进一步表明, 以其他减少规模技术为基础的现有方法, 如随机的 Fourier 地貌特征映射或$\ell_2美元正规化, 产生非最佳的通用性绩效。 我们的主要理论贡献还包括为客观的扰动性私人线性机构风险管理设定无维通用约束的衍生值。 这是不依赖噪音的梯度机制的第一个此类结果。 此外, 我们获得了现有差异性私人核心机构风险管理算法的更精确的概括性约束。 实证评估支持我们的新理论主张, 证明随机性预测有助于统计效率和优化的私人核心视野设计中的差异性定位, 以及中央范围研究这些解释。
Article 229
Title@2025-07-23 (3): Clustering-based hard negative sampling for supervised contrastive speaker verification
Title: Clustering-based hard negative sampling for supervised contrastive speaker verification | Clustering-basierte harte Negativprobenahme für überwachte kontrastive Lautsprecherprüfung | 分组制硬底抽样,用于有监督的对比式发言者核查 2507.17540v1 |
Authors (5): Piotr Masztalski, Michał Romaniuk, Jakub Żak, Mateusz Matuszewski, Konrad Kowalczyk
In speaker verification, contrastive learning is gaining popularity as an alternative to the traditionally used classification-based approaches. Contrastive methods can benefit from an effective use of hard negative pairs, which are different-class samples particularly challenging for a verification model due to their similarity. In this paper, we propose CHNS - a clustering-based hard negative sampling method, dedicated for supervised contrastive speaker representation learning. Our approach clusters embeddings of similar speakers, and adjusts batch composition to obtain an optimal ratio of hard and easy negatives during contrastive loss calculation. Experimental evaluation shows that CHNS outperforms a baseline supervised contrastive approach with and without loss-based hard negative sampling, as well as a state-of-the-art classification-based approach to speaker verification by as much as 18 % relative EER and minDCF on the VoxCeleb dataset using two lightweight model architectures.
在语音校验中,对比式学习作为一种传统使用的基于分类的方法的替代方法越来越受欢迎。对比式方法可以受益于对硬式负对的有效利用,硬式负对的样本是不同类别的样本,由于其相似性对核查模式特别具有挑战性。在本文中,我们建议使用基于集群的硬式负面抽样方法CHNS(一种基于集群的硬式否定式抽样方法),专门用于监督的对比式演讲人代表制学习。我们的方法组群包含类似的发言者,并调整批量组成,以便在对比式损失计算中获得硬式和易式否定的最佳比率。实验性评估表明,CHNS(CHNS)优于一种有监督的、有监督的、无基于损失的硬式负面抽样的基线对比方法,以及在VoxCeleb数据集中采用两种轻量模型结构对发言者进行高达18%的相对EER和minDCF的最先进的分类法。
Article 230
Title@2025-07-23 (3): CoCAI: Copula-based Conformal Anomaly Identification for Multivariate Time-Series
Title: CoCAI: Copula-based Conformal Anomaly Identification for Multivariate Time-Series | CoCAI: Copula-basierte konforme Anomalien-Identifikation für multivariate Zeitreihen | COCAI:多变时间序列的常规异常识别 2507.17796v1 |
Authors (5): Nicholas A. Pearson, Francesca Zanello, Davide Russo, Luca Bortolussi, Francesca Cairoli
We propose a novel framework that harnesses the power of generative artificial intelligence and copula-based modeling to address two critical challenges in multivariate time-series analysis: delivering accurate predictions and enabling robust anomaly detection. Our method, Copula-based Conformal Anomaly Identification for Multivariate Time-Series (CoCAI), leverages a diffusion-based model to capture complex dependencies within the data, enabling high quality forecasting. The model’s outputs are further calibrated using a conformal prediction technique, yielding predictive regions which are statistically valid, i.e., cover the true target values with a desired confidence level. Starting from these calibrated forecasts, robust outlier detection is performed by combining dimensionality reduction techniques with copula-based modeling, providing a statistically grounded anomaly score. CoCAI benefits from an offline calibration phase that allows for minimal overhead during deployment and delivers actionable results rooted in established theoretical foundations. Empirical tests conducted on real operational data derived from water distribution and sewerage systems confirm CoCAI’s effectiveness in accurately forecasting target sequences of data and in identifying anomalous segments within them.
我们提议了一个新框架,利用基因人工智能和以合金为基础的模型的力量,应对多变时间序列分析中的两大挑战:提供准确的预测,并促成有力的异常检测。我们的方法,即基于 Copula 的多变时间序列共正异常识别(CoCAI),利用基于扩散的模型来捕捉数据中复杂的依赖性,从而能够进行高质量的预报。模型的产出进一步使用符合逻辑的预测技术加以校准,产生统计上有效的预测区域,即以预期的信任度覆盖真正的目标值。从这些校准的预测开始,强有力的外部检测是通过将维度减少技术与基于Copula的模型相结合,提供基于统计的异常分数。 CoCAI从一个离线校准阶段中受益,这一阶段使得部署期间的间接费用最小化,并带来基于既定理论基础的可操作的结果。对从水分配和下水道系统得出的实际操作数据进行的经验性测试证实了CAI在准确预测数据目标序列和确定数据内部的异常部分方面的有效性。
Article 231
Title@2025-07-23 (3): Federated Majorize-Minimization: Beyond Parameter Aggregation
Title: Federated Majorize-Minimization: Beyond Parameter Aggregation | Föderierte Majorize-Minimierung: Jenseits der Parameteraggregation | 联邦多数-私有化:超越参数聚合 2507.17534v1 |
Authors (4): Aymeric Dieuleveut, Gersende Fort, Mahmoud Hegazy, Hoi-To Wai
This paper proposes a unified approach for designing stochastic optimization algorithms that robustly scale to the federated learning setting. Our work studies a class of Majorize-Minimization (MM) problems, which possesses a linearly parameterized family of majorizing surrogate functions. This framework encompasses (proximal) gradient-based algorithms for (regularized) smooth objectives, the Expectation Maximization algorithm, and many problems seen as variational surrogate MM. We show that our framework motivates a unifying algorithm called Stochastic Approximation Stochastic Surrogate MM (\SSMM), which includes previous stochastic MM procedures as special instances. We then extend \SSMM\ to the federated setting, while taking into consideration common bottlenecks such as data heterogeneity, partial participation, and communication constraints; this yields \QSMM. The originality of \QSMM\ is to learn locally and then aggregate information characterizing the \textit{surrogate majorizing function}, contrary to classical algorithms which learn and aggregate the \textit{original parameter}. Finally, to showcase the flexibility of this methodology beyond our theoretical setting, we use it to design an algorithm for computing optimal transport maps in the federated setting.
本文提出了设计稳健规模至联合学习环境的随机优化算法的统一方法。 我们的工作研究了一系列主要- 最小化(MM) 问题, 其中含有一个主要替代功能的线性参数式组合。 这个框架包括( 常规) 平稳目标的( 常规) 梯度法( procial) , 期望最大化算法, 以及被视为变异替代 MM 的许多问题 。 我们显示, 我们的框架激励着一种名为 斯托切性适应( Stochatic Approximation) Socketrogate MM MM (\ SSMMM) 的统一算法( SSMM ) , 其中包括先前作为特殊实例的随机微小微MM( MM) 程序 。 然后我们将\ SSMM( MM) 扩展至联合设置的设置环境, 同时考虑到共同的瓶颈, 如数据异质性、 部分参与和通信限制; 产值 SMMMM 。 SMM\ 的原始性是学习当地信息, 然后综合信息, 确定\ 文本{ 主要的功能化功能 , , , 不同于学习和综合的经典算法, 在我们的最佳计算方法中, 展示我们的最佳计算方法。
Article 232
Title@2025-07-23 (3): HiFi-Stream: Streaming Speech Enhancement with Generative Adversarial Networks
Title: HiFi-Stream: Streaming Speech Enhancement with Generative Adversarial Networks | HiFi-Stream: Streaming-Sprachverbesserung mit generativen Adversarial-Netzwerken | HiFi-Stream:利用创性反对性网络加强语音交流 2503.17141v2 |
Authors (2): Ekaterina Dmitrieva, Maksim Kaledin
Speech Enhancement techniques have become core technologies in mobile devices and voice software. Still, modern deep learning solutions often require high amount of computational resources what makes their usage on low-resource devices challenging. We present HiFi-Stream, an optimized version of recently published HiFi++ model. Our experiments demonstrate that HiFi-Stream saves most of the qualities of the original model despite its size and computational complexity improved in comparison to the original HiFi++ making it one of the smallest and fastest models available. The model is evaluated in streaming setting where it demonstrates its superior performance in comparison to modern baselines.
语音增强技术已成为移动设备和语音软件的核心技术,然而,现代深层学习解决方案往往需要大量计算资源,这使得其在低资源设备上的使用具有挑战性。我们介绍了最近出版的HiFi++模型的优化版本HiFi-Stream。我们的实验表明,HiFi-Stream保存了原模型的大部分质量,尽管其规模和计算复杂性与原HiFi++相比有所改善,使其成为现有最小和最快的模型之一。该模型在流程设置中进行了评估,显示其与现代基线相比的优异性。
Article 233
Title@2025-07-23 (3): Channel Estimation for RIS-Assisted mmWave Systems via Diffusion Models
Title: Channel Estimation for RIS-Assisted mmWave Systems via Diffusion Models | Kanalschätzung für RIS-gestützte mmWave-Systeme über Diffusionsmodelle | 通过扩散模型对RIS-辅助毫米防波系统的通道估计 2506.07770v2 |
Authors (9): Yang Wang, Yin Xu, Cixiao Zhang, Zhiyong Chen, Mingzeng Dai, Haiming Wang, Bingchao Liu, Dazhi He, Meixia Tao
Reconfigurable intelligent surface (RIS) has been recognized as a promising technology for next-generation wireless communications. However, the performance of RIS-assisted systems critically depends on accurate channel state information (CSI). To address this challenge, this letter proposes a novel channel estimation method for RIS-aided millimeter-wave (mmWave) systems based on diffusion models (DMs). Specifically, the forward diffusion process of the original signal is formulated to model the received signal as a noisy observation within the framework of DMs. Subsequently, the channel estimation task is formulated as the reverse diffusion process, and a sampling algorithm based on denoising diffusion implicit models (DDIMs) is developed to enable effective inference. Furthermore, a lightweight neural network, termed BRCNet, is introduced to replace the conventional U-Net, significantly reducing the number of parameters and computational complexity. Extensive experiments conducted under various scenarios demonstrate that the proposed method consistently outperforms existing baselines.
重新配置智能表面(RIS)已被公认为是下一代无线通信的一种有希望的技术,然而,RIS辅助系统的性能严重依赖准确的频道状态信息(CSI),为了应对这一挑战,本信提议以扩散模型(DMs)为基础,为RIS辅助毫米波(mmWave)系统采用新的频道估计方法。具体地说,最初信号的前方扩散过程是为了在MDS框架内将接收的信号作为噪音观测模型来建模。随后,频道估计任务被拟订为反向扩散进程,并开发了基于分辨扩散隐含模型(DDIMs)的取样算法,以便能够有效地推断。此外,还引入了称为BRCNet的轻量神经网络,以取代传统的U-Net,大大减少参数和计算复杂性。在各种假设下进行的广泛实验表明,拟议的方法始终超越了现有基线。
Article 234
Title@2025-07-23 (3): Sampling-enabled scalable manifold learning unveils discriminative cluster structure of high-dimensional data
Title: Sampling-enabled scalable manifold learning unveils discriminative cluster structure of high-dimensional data | Samplingfähiges skalierbares, vielfältiges Lernen enthüllt diskriminative Clusterstruktur hochdimensionaler Daten | 抽样式可扩缩、可扩缩的多元学习揭开高维数据的歧视性集群结构 2401.01100v3 |
Authors (7): Dehua Peng, Zhipeng Gui, Wenzhang Wei, Fa Li, Jie Gui, Huayi Wu, Jianya Gong
As a pivotal branch of machine learning, manifold learning uncovers the intrinsic low-dimensional structure within complex nonlinear manifolds in high-dimensional space for visualization, classification, clustering, and gaining key insights. Although existing techniques have achieved remarkable successes, they suffer from extensive distortions of cluster structure, which hinders the understanding of underlying patterns. Scalability issues also limit their applicability for handling large-scale data. We hence propose a sampling-based Scalable manifold learning technique that enables Uniform and Discriminative Embedding, namely SUDE, for large-scale and high-dimensional data. It starts by seeking a set of landmarks to construct the low-dimensional skeleton of the entire data, and then incorporates the non-landmarks into the learned space based on the constrained locally linear embedding (CLLE). We empirically validated the effectiveness of SUDE on synthetic datasets and real-world benchmarks, and applied it to analyze single-cell data and detect anomalies in electrocardiogram (ECG) signals. SUDE exhibits distinct advantage in scalability with respect to data size and embedding dimension, and has promising performance in cluster separation, integrity, and global structure preservation. The experiments also demonstrate notable robustness in embedding quality as the sampling rate decreases.
作为机器学习的关键分支,多方面的学习揭示了在高维空间的复杂非线性构件中内在的低维结构,以便进行可视化、分类、集群和获得关键洞察力。虽然现有技术取得了显著的成功,但它们受到群集结构的广泛扭曲,这妨碍了对基本模式的理解。可缩放问题也限制了其对处理大规模数据的适用性。因此,我们提出了一种基于取样的可缩放的多元学习技术,使大规模和高维数据的统一和分裂性嵌入系统,即SUDE能够对大规模和高维数据进行统一和分裂性嵌入。SUDE首先寻求一系列里程碑,以构建整个数据的低维层骨架,然后将非地标纳入基于有限本地线性嵌入(CLLE)的学习空间。我们实证了SUDE在合成数据集和实际世界基准方面的有效性,并应用它来分析单细胞数据并检测电卡图信号中的异常现象。SUDE在数据大小和嵌入层面方面表现出明显的优势,并且在集群分离、完整性、稳健的全球结构保存率中表现出良好的业绩。
Article 235
Title@2025-07-23 (3): Generalized Advantage Estimation for Distributional Policy Gradients
Title: Generalized Advantage Estimation for Distributional Policy Gradients | Generalisierte Vorteil Schätzung für Verteilungspolitik Gradienten | 分配政策梯度一般有利因素估计 2507.17530v1 |
Authors (3): Shahil Shaik, Jonathon M. Smereka, Yue Wang
Generalized Advantage Estimation (GAE) has been used to mitigate the computational complexity of reinforcement learning (RL) by employing an exponentially weighted estimation of the advantage function to reduce the variance in policy gradient estimates. Despite its effectiveness, GAE is not designed to handle value distributions integral to distributional RL, which can capture the inherent stochasticity in systems and is hence more robust to system noises. To address this gap, we propose a novel approach that utilizes the optimal transport theory to introduce a Wasserstein-like directional metric, which measures both the distance and the directional discrepancies between probability distributions. Using the exponentially weighted estimation, we leverage this Wasserstein-like directional metric to derive distributional GAE (DGAE). Similar to traditional GAE, our proposed DGAE provides a low-variance advantage estimate with controlled bias, making it well-suited for policy gradient algorithms that rely on advantage estimation for policy updates. We integrated DGAE into three different policy gradient methods. Algorithms were evaluated across various OpenAI Gym environments and compared with the baselines with traditional GAE to assess the performance.
通用优势估计法(GAE)已被用于减轻强化学习(RL)的计算复杂性,其方法是对优势功能进行指数加权估计,以减少政策梯度估计的差异。尽管具有效力,但GAE的设计目的不是处理分布式RL所不可或缺的价值分配,因为它可以捕捉系统中固有的随机性,因此对系统噪音更为有力。为弥补这一差距,我们提议采用一种新的方法,利用最佳运输理论引入瓦塞斯坦式方向性指标,以衡量概率分布之间的距离和方向差异。我们利用指数加权估计法,利用瓦塞斯坦式的方向性指示性指标获取分布式GAE(DGAE)。与传统的GAE相似,我们提议的DGAE提供了一种低差异性优势估计,具有受控的偏差,因此很适合政策梯度算法,依靠对政策更新的优势估计。我们将DGAEE纳入三种不同的政策梯度方法。在各种OpenAI Gym环境中进行了评估,并与传统的GAE的基线比较。
Article 236
Title@2025-07-23 (3): Generalized Low-Rank Matrix Contextual Bandits with Graph Information
Title: Generalized Low-Rank Matrix Contextual Bandits with Graph Information | Generalisierte Low-Rank Matrix Kontextuelle Banditen mit Graph Information | 带有图表信息的通用低射速矩阵背景土匪 2507.17528v1 |
Authors (5): Yao Wang, Jiannan Li, Yue Kang, Shanxing Gao, Zhenxin Xiao
The matrix contextual bandit (CB), as an extension of the well-known multi-armed bandit, is a powerful framework that has been widely applied in sequential decision-making scenarios involving low-rank structure. In many real-world scenarios, such as online advertising and recommender systems, additional graph information often exists beyond the low-rank structure, that is, the similar relationships among users/items can be naturally captured through the connectivity among nodes in the corresponding graphs. However, existing matrix CB methods fail to explore such graph information, and thereby making them difficult to generate effective decision-making policies. To fill in this void, we propose in this paper a novel matrix CB algorithmic framework that builds upon the classical upper confidence bound (UCB) framework. This new framework can effectively integrate both the low-rank structure and graph information in a unified manner. Specifically, it involves first solving a joint nuclear norm and matrix Laplacian regularization problem, followed by the implementation of a graph-based generalized linear version of the UCB algorithm. Rigorous theoretical analysis demonstrates that our procedure outperforms several popular alternatives in terms of cumulative regret bound, owing to the effective utilization of graph information. A series of synthetic and real-world data experiments are conducted to further illustrate the merits of our procedure.
矩阵背景土匪(CB)是众所周知的多武装土匪的延伸,是一个强大的框架,在涉及低层次结构的连续决策情景中广泛应用。在许多现实世界情景中,如在线广告和建议系统,额外的图表信息往往存在于低层次结构之外,即用户/项目之间的类似关系可以通过相应图表中的节点之间的连接自然地捕捉。然而,现有的矩阵CB方法未能探索这种图表信息,从而难以制定有效的决策政策。为了填补这一空白,我们在本文件中提议了一个新的矩阵CB算法框架,以传统的高信任约束框架为基础。这个新框架可以统一地有效地将低层次结构和图形信息结合起来。具体地说,它涉及首先解决联合核规范和拉帕帕卡的规范化问题,然后实施基于图表的通用直线版UCB算法。严格的理论分析表明,我们的程序在累积性遗憾方面优劣了几个流行的替代方法,因为有效地利用了图形信息,而合成数据系列则是我们进行的实际实验的优点。
Article 237
Title@2025-07-23 (3): Integrating Physics-Based and Data-Driven Approaches for Probabilistic Building Energy Modeling
Title: Integrating Physics-Based and Data-Driven Approaches for Probabilistic Building Energy Modeling | Integration physikbasierter und datengestützter Ansätze zur probabilistischen Gebäudeenergiemodellierung | 将基于物理和数据驱动的综合办法纳入概率建建能建能建 建 建 建 建 建 建 建 建 建 建 建 建 建 建 建 建 建 建 建 建 建 建 建 建 建 建 建 建 建 建 建 建 建 建 建 建 建 建 建 建 建 建 建 建 建 建 建 建 建 建 建 建 建 建 建 建 建 建 建 建 建 建 建 建 建 建 建 建 建 建 建 建 建 建 建 建 建 建 建 建 建 建 建 建 建 建 建 建 建 建 建 建 建 建 建 建 建 建 建 建 建 建 建 建 建 建 建 建 建 建 建 建 建 建 建 建 建 建 建 建 建 建 建 建 建 建 建 建 建 建 建 建 建 建 建 建 建 建 建 建 建 建 建 建 建 建 建 建 建 建 建 建 建 建 建 建 建 建 建 建 建 建 建 建 建 建 建 建 建 建 建 建 建 建 建 建 建 建 建 建 建 建 建 建 建 建 建 建 建 建 建 建 建 建 建 建 建 建 建 建 建 建 建 建 建 建 建 建 建 建 建 建 建 建 建 建 建 建 建 建 建 建 建 建 建 建 建 建 建 建 建 建 建 建 建 建 建 2507.17526v1 |
Authors (3): Leandro Von Krannichfeldt, Kristina Orehounig, Olga Fink
Building energy modeling is a key tool for optimizing the performance of building energy systems. Historically, a wide spectrum of methods has been explored – ranging from conventional physics-based models to purely data-driven techniques. Recently, hybrid approaches that combine the strengths of both paradigms have gained attention. These include strategies such as learning surrogates for physics-based models, modeling residuals between simulated and observed data, fine-tuning surrogates with real-world measurements, using physics-based outputs as additional inputs for data-driven models, and integrating the physics-based output into the loss function the data-driven model. Despite this progress, two significant research gaps remain. First, most hybrid methods focus on deterministic modeling, often neglecting the inherent uncertainties caused by factors like weather fluctuations and occupant behavior. Second, there has been little systematic comparison within a probabilistic modeling framework. This study addresses these gaps by evaluating five representative hybrid approaches for probabilistic building energy modeling, focusing on quantile predictions of building thermodynamics in a real-world case study. Our results highlight two main findings. First, the performance of hybrid approaches varies across different building room types, but residual learning with a Feedforward Neural Network performs best on average. Notably, the residual approach is the only model that produces physically intuitive predictions when applied to out-of-distribution test data. Second, Quantile Conformal Prediction is an effective procedure for calibrating quantile predictions in case of indoor temperature modeling.
建立能源模型是优化建设能源系统绩效的关键工具。历史上,人们一直在探索一系列广泛的方法 – – 从传统的基于物理的模型到纯粹的数据驱动技术。最近,结合这两种模式优势的混合方法得到了关注。最近,结合这两种模式优势的混合方法获得了关注。其中包括物理学模型的学习代用器、模拟数据和观测数据之间的剩余模型、以真实世界测量的微调代代用法进行微调,将基于物理的产出作为数据驱动模型的补充投入,并将基于物理的产出纳入数据驱动模型的损失函数。尽管取得了这一进展,但依然存在着两个重大的研究差距。首先,大多数混合方法侧重于确定性模型,往往忽视由气候波动和悬浮行为等因素造成的内在不确定性。第二,在概率模型框架内几乎没有进行系统比较,用真实数据模型微调来微调替代这些差距,通过评估五种具有代表性的混合方法来建立稳定性能源模型,侧重于在现实世界案例研究中建立热力动力学模型的微变压模型。我们的成果突出两个主要的模型发现。首先,混合方法的运行情况是稳定性模型,在构建一种最佳的实地测试模型时,只有在模拟模型中进行。
Article 238
Title@2025-07-23 (3): LSDM: LLM-Enhanced Spatio-temporal Diffusion Model for Service-Level Mobile Traffic Prediction
Title: LSDM: LLM-Enhanced Spatio-temporal Diffusion Model for Service-Level Mobile Traffic Prediction | LSDM: LLM-gesteigertes Spatio-temporales Diffusionsmodell für Service-Level-Mobilverkehrsvorhersage | LSDM:LLM-增强的用于服务级移动交通预测的时空传播模型 2507.17795v1 |
Authors (5): Shiyuan Zhang, Tong Li, Zhu Xiao, Hongyang Du, Kaibin Huang
Service-level mobile traffic prediction for individual users is essential for network efficiency and quality of service enhancement. However, current prediction methods are limited in their adaptability across different urban environments and produce inaccurate results due to the high uncertainty in personal traffic patterns, the lack of detailed environmental context, and the complex dependencies among different network services. These challenges demand advanced modeling techniques that can capture dynamic traffic distributions and rich environmental features. Inspired by the recent success of diffusion models in distribution modeling and Large Language Models (LLMs) in contextual understanding, we propose an LLM-Enhanced Spatio-temporal Diffusion Model (LSDM). LSDM integrates the generative power of diffusion models with the adaptive learning capabilities of transformers, augmented by the ability to capture multimodal environmental information for modeling service-level patterns and dynamics. Extensive evaluations on real-world service-level datasets demonstrate that the model excels in traffic usage predictions, showing outstanding generalization and adaptability. After incorporating contextual information via LLM, the performance improves by at least 2.83% in terms of the coefficient of determination. Compared to models of a similar type, such as CSDI, the root mean squared error can be reduced by at least 8.29%. The code and dataset will be available at: https://github.com/SoftYuaneR/LSDM.
然而,由于个人交通模式的高度不确定性、缺乏详细的环境背景以及不同网络服务之间复杂的依赖性,目前的预测方法在不同的城市环境中的适应性有限,并产生不准确的结果。这些挑战要求采用先进的模型技术,能够捕捉动态交通分布和丰富的环境特征。受最近分销模型推广模式的成功和根据背景理解的大型语言模型(LLLM)的启发,我们提议了一个LLM-Enhanced Spatio-stopal Dimotion motion 模型(LSDM)。LSDM将传播模型的基因化能力与变异器的适应性学习能力相结合,并辅之以为服务级别模式和动态建模获取多式环境信息的能力。对现实世界服务级数据集的广泛评估表明,模型在交通使用预测方面十分出色,显示了出色的一般化和适应性。在通过LMM将背景信息纳入背景信息后,在确定系数方面至少提高了2.83%的性能。在类似类型的模型中,可以比R-YDMDM/DMR/SDMR/SDMLO的模型更小一些。在平方根上是最低的,可以降低的。
Article 239
Title@2025-07-23 (3): Data-Driven Exploration for a Class of Continuous-Time Indefinite Linear–Quadratic Reinforcement Learning Problems
Title: Data-Driven Exploration for a Class of Continuous-Time Indefinite Linear–Quadratic Reinforcement Learning Problems | Daten-getriebene Exploration für eine Klasse von kontinuierlichen-Zeit-Unbestimmte Linear–Quadratische Verstärkung Lernprobleme | 连续-不定期线性-宽压强化学习问题分类数据探索 2507.00358v2 |
Authors (2): Yilie Huang, Xun Yu Zhou
We study reinforcement learning (RL) for the same class of continuous-time stochastic linear–quadratic (LQ) control problems as in \cite{huang2024sublinear}, where volatilities depend on both states and controls while states are scalar-valued and running control rewards are absent. We propose a model-free, data-driven exploration mechanism that adaptively adjusts entropy regularization by the critic and policy variance by the actor. Unlike the constant or deterministic exploration schedules employed in \cite{huang2024sublinear}, which require extensive tuning for implementations and ignore learning progresses during iterations, our adaptive exploratory approach boosts learning efficiency with minimal tuning. Despite its flexibility, our method achieves a sublinear regret bound that matches the best-known model-free results for this class of LQ problems, which were previously derived only with fixed exploration schedules. Numerical experiments demonstrate that adaptive explorations accelerate convergence and improve regret performance compared to the non-adaptive model-free and model-based counterparts.
我们研究的是与在\cite{huang2024 sublinear}中相同的连续随机线性赤道控制问题(LQ)的强化学习(RL),在这些问题上,挥发性取决于国家和控制,而国家则缺乏卡路里价值和运行控制回报。我们建议了一个无模型的、数据驱动的探索机制,以适应性的方式调整批评者和行为方的政策差异,不同于在\cite{huang2024 sublinear}中使用的固定或确定性的勘探时间表,该时间表要求对执行过程进行广泛调整,忽视迭代期间的学习进展,我们的适应性探索方法提高了学习效率,尽管它具有灵活性,但我们的方法还是取得了亚线性遗憾,它与这一类LQ问题最著名的无模型的结果相匹配,而以前只有固定的勘探时间表才得出。Numerical实验表明,与非适应性模型和基于模型的对应方相比,适应性探索加快了趋同性并改进了遗憾表现。
Article 240
Title@2025-07-23 (3): HOTA: Hamiltonian framework for Optimal Transport Advection
Title: HOTA: Hamiltonian framework for Optimal Transport Advection | HOTA: Hamiltonsche Rahmenbedingungen für eine optimale Verkehrsanbindung | 汉密尔顿最佳交通评估框架 2507.17513v1 |
Authors (4): Nazar Buzun, Daniil Shlenskii, Maxim Bobrin, Dmitry V. Dylov
Optimal transport (OT) has become a natural framework for guiding the probability flows. Yet, the majority of recent generative models assume trivial geometry (e.g., Euclidean) and rely on strong density-estimation assumptions, yielding trajectories that do not respect the true principles of optimality in the underlying manifold. We present Hamiltonian Optimal Transport Advection (HOTA), a Hamilton-Jacobi-Bellman based method that tackles the dual dynamical OT problem explicitly through Kantorovich potentials, enabling efficient and scalable trajectory optimization. Our approach effectively evades the need for explicit density modeling, performing even when the cost functionals are non-smooth. Empirically, HOTA outperforms all baselines in standard benchmarks, as well as in custom datasets with non-differentiable costs, both in terms of feasibility and optimality.
最佳运输(OT)已成为指导概率流动的自然框架。然而,大多数最近的基因模型都假设了微不足道的几何(例如,欧几里德),并依赖强烈的密度估计假设,产生不遵守根本方方面面最佳性原则的轨迹。 我们介绍了汉密尔顿最佳交通评估(HOTA),这是以汉密尔顿-雅科比-贝勒曼为基础的一种方法,它通过康托罗维奇的潜力明确解决双重动态的OT问题,从而能够实现高效和可缩放的轨迹优化。 我们的方法有效地避免了明确密度模型的需求,即使成本功能不光滑,也能够运行。 随机地,HOMTA在标准基准中超越了所有基线,在定制数据集中也超越了在可行性和最佳性方面无差别的成本。
Article 241
Title@2025-07-23 (3): Can One Domain Help Others? A Data-Centric Study on Multi-Domain Reasoning via Reinforcement Learning
Title: Can One Domain Help Others? A Data-Centric Study on Multi-Domain Reasoning via Reinforcement Learning | Kann eine Domain anderen helfen? Eine Data-Centric Studie über Multi-Domain-Reasoning durch Verstärkungslernen | 一个域能帮助他人吗? 关于通过强化学习提供多领域理由的数据中心研究。 2507.17512v1 |
Authors (6): Yu Li, Zhuoshi Pan, Honglin Lin, Mengyuan Sun, Conghui He, Lijun Wu
Reinforcement Learning with Verifiable Rewards (RLVR) has emerged as a powerful paradigm for enhancing the reasoning capabilities of LLMs. Existing research has predominantly concentrated on isolated reasoning domains such as mathematical problem-solving, coding tasks, or logical reasoning. However, real world reasoning scenarios inherently demand an integrated application of multiple cognitive skills. Despite this, the interplay among these reasoning skills under reinforcement learning remains poorly understood. To bridge this gap, we present a systematic investigation of multi-domain reasoning within the RLVR framework, explicitly focusing on three primary domains: mathematical reasoning, code generation, and logical puzzle solving. We conduct a comprehensive study comprising four key components: (1) Leveraging the GRPO algorithm and the Qwen-2.5-7B model family, our study thoroughly evaluates the models’ in-domain improvements and cross-domain generalization capabilities when trained on single-domain datasets. (2) Additionally, we examine the intricate interactions including mutual enhancements and conflicts that emerge during combined cross-domain training. (3) To further understand the influence of SFT on RL, we also analyze and compare performance differences between base and instruct models under identical RL configurations. (4) Furthermore, we delve into critical RL training details, systematically exploring the impacts of curriculum learning strategies, variations in reward design, and language-specific factors. Through extensive experiments, our results offer significant insights into the dynamics governing domain interactions, revealing key factors influencing both specialized and generalizable reasoning performance. These findings provide valuable guidance for optimizing RL methodologies to foster comprehensive, multi-domain reasoning capabilities in LLMs.
现有研究主要集中于数学解决问题、编码任务或逻辑推理等孤立的推理领域。然而,现实世界推理假设本身要求综合应用多种认知技能。尽管如此,这些推理技能在强化学习过程中的相互作用仍然没有得到很好的理解。为了缩小这一差距,我们提出了对RLVR框架内多领域推理的系统调查,明确侧重于三个主要领域:数学推理、代码生成和逻辑解谜。我们开展了一项由四个关键部分组成的全面研究:(1)利用GROP逻辑算法和Qwen-2.5-7B模型组等孤立推理领域,我们的研究彻底评估了模型的日常改进和交叉概括能力。尽管如此,这些在强化学习过程中的推理技能之间的相互作用仍然不甚为清楚。为了弥合这一差距,我们研究了RL框架中的相互加强和冲突,明确侧重于数学推理、代码生成和逻辑解解谜题。 我们还分析并比较了基础和指示模型之间的不同性差异,根据相同的RL系统推理学模型,提出了在设计过程中的精细的精细度分析、对常规分析结果的精细分析结果。
Article 242
Title@2025-07-23 (3): Fake or Real: The Impostor Hunt in Texts for Space Operations
Title: Fake or Real: The Impostor Hunt in Texts for Space Operations | Fake or Real: Die Impostorjagd in Texten für Weltraumoperationen | 虚假或真实:空间业务文字中的伪造者猎杀 2507.13508v3 |
Authors (9): Agata Kaczmarek, Dawid Płudowski, Piotr Wilczyński, Krzysztof Kotowski, Ramez Shendy, Evridiki Ntagiou, Jakub Nalepa, Artur Janicki, Przemysław Biecek
The “Fake or Real” competition hosted on Kaggle (https://www.kaggle.com/competitions/fake-or-real-the-impostor-hunt ) is the second part of a series of follow-up competitions and hackathons related to the “Assurance for Space Domain AI Applications” project funded by the European Space Agency (https://assurance-ai.space-codev.org/ ). The competition idea is based on two real-life AI security threats identified within the project – data poisoning and overreliance in Large Language Models. The task is to distinguish between the proper output from LLM and the output generated under malicious modification of the LLM. As this problem was not extensively researched, participants are required to develop new techniques to address this issue or adjust already existing ones to this problem’s statement.
以Kaggle为主的“假或真”竞赛(https://www.kaggle.com/competitions/fake-or-real-the-impostor-hunt)是一系列后续竞赛和黑客赛的第二部分,与欧洲空间局资助的“支持空间域AI应用”项目(https://assurance-ai.space-codv.org/)有关。 竞争理念的基础是该项目中查明的两个真实的AI安全威胁 – – 数据中毒和大语言模型中的过度依赖。任务是区分LLM的适当产出和LLM恶意修改产生的产出。 由于这一问题没有得到广泛的研究,与会者必须开发新的技术来解决这一问题,或根据这一问题的说明调整已有的技术。
Article 243
Title@2025-07-23 (3): Graph Neural Network Approach to Predicting Magnetization in Quasi-One-Dimensional Ising Systems
Title: Graph Neural Network Approach to Predicting Magnetization in Quasi-One-Dimensional Ising Systems | Graphischer Ansatz des neuralen Netzwerks zur Vorhersage der Magnetisierung in Quasi-One-Dimensional Ising Systemen | Quasi-单一二元化离子系统中预测磁化的神经网络方法 2507.17509v1 |
Authors (3): V. Slavin, O. Kryvchikov, D. Laptev
We present a graph-based deep learning framework for predicting the magnetic properties of quasi-one-dimensional Ising spin systems. The lattice geometry is encoded as a graph and processed by a graph neural network (GNN) followed by fully connected layers. The model is trained on Monte Carlo simulation data and accurately reproduces key features of the magnetization curve, including plateaus, critical transition points, and the effects of geometric frustration. It captures both local motifs and global symmetries, demonstrating that GNNs can infer magnetic behavior directly from structural connectivity. The proposed approach enables efficient prediction of magnetization without the need for additional Monte Carlo simulations.
我们提出了一个基于图表的深度学习框架,用于预测准一维的Ising旋转系统的磁性特性。拉蒂几何制成图解,由图形神经网络(GNN)进行编码,然后用完全连接的层进行处理。模型接受蒙特卡洛模拟数据培训,并准确地复制磁化曲线的关键特征,包括高原、关键过渡点和几何挫败的影响。它既包括局部的模型,也包括全球的对称,表明GNN可以直接从结构连接中推断磁性行为。拟议的方法使得能够有效地预测磁化,而不需要额外的蒙特卡洛模拟。
Article 244
Title@2025-07-23 (3): Joint Multi-Target Detection-Tracking in Cognitive Massive MIMO Radar via POMCP
Title: Joint Multi-Target Detection-Tracking in Cognitive Massive MIMO Radar via POMCP | Gemeinsames Multi-Target-Erkennungs-Tracking im kognitiven Massiv MIMO Radar über POMCP | 通过POMCP在认知性大规模弥集性海事组织雷达上联合进行多目标多目标探测-跟踪 2507.17506v1 |
Authors (4): Imad Bouhou, Stefano Fortunati, Leila Gharsalli, Alexandre Renaux
This correspondence presents a power-aware cognitive radar framework for joint detection and tracking of multiple targets in a massive multiple-input multiple-output (MIMO) radar environment. Building on a previous single-target algorithm based on Partially Observable Monte Carlo Planning (POMCP), we extend it to the multi-target case by assigning each target an independent POMCP tree, enabling scalable and efficient planning. Departing from uniform power allocation-which is often suboptimal with varying signal-to-noise ratios (SNRs)-our approach predicts each target’s future angular position and expected received power, based on its estimated range and radar cross-section (RCS). These predictions guide adaptive waveform design via a constrained optimization problem that allocates transmit energy to enhance the detectability of weaker or distant targets, while ensuring sufficient power for high-SNR targets. The reward function in the underlying partially observable Markov decision process (POMDP) is also modified to prioritize accurate spatial and power estimation. Simulations involving multiple targets with different SNRs confirm the effectiveness of our method. The proposed framework for the cognitive radar improves detection probability for low-SNR targets and achieves more accurate tracking compared to approaches using uniform or orthogonal waveforms. These results demonstrate the potential of the POMCP-based framework for adaptive, efficient multi-target radar systems.
该对应文件为在大规模多投入多输出(MIMO)雷达环境中联合探测和跟踪多个目标提供了一个能觉察和跟踪多目标的认知雷达框架。根据以前基于部分可观测的蒙特卡洛规划(POMCP)的单一目标算法,我们将其扩展至多目标情况,为每个目标指定了独立的POMCP树,从而能够进行可扩缩和有效的规划。从统一的权力分配(往往不理想,信号到噪音比率不尽相同)-我们的方法预测每个目标的未来角位置和预期获得的能量,基于其估计的射程和雷达交叉路段(RCS)。这些预测通过有限的优化问题引导适应波形设计,分配能量以加强较弱或较远目标的可探测性,同时确保高分辨率目标有足够的动力。部分可观测的Markov决策程序(POMDP)的奖励功能也作了修改,以便确定准确的空间和电量估计的先后顺序。与不同的SNRRs模拟涉及多个目标的模拟,证实了我们的方法的有效性。这些预测通过有限的优化的优化问题设计框架指导了适应性波形格式的设计设计,即利用认知雷达测测算法或更精确的分辨率测测算结果,以显示低空间和测算结果框架。
Article 245
Title@2025-07-23 (3): DNT: a Deeply Normalized Transformer that can be trained by Momentum SGD
Title: DNT: a Deeply Normalized Transformer that can be trained by Momentum SGD | DNT: ein tief normalisierter Transformer, der von Momentum SGD trainiert werden kann | DNT:一种可接受 “ 动力 “ SPGD培训的 “ 高度正常化 “ 变异器 2507.17501v1 |
Authors (7): Xianbiao Qi, Marco Chen, Wenjie Xiao, Jiaquan Ye, Yelin He, Chun-Guang Li, Zhouchen Lin
Transformers have become the de facto backbone of modern deep learning, yet their training typically demands an advanced optimizer with adaptive learning rate like AdamW, rather than a momentum SGDW (mSGDW). Previous works show that it is mainly due to a heavy-tailed distribution of the gradients. In this paper, we introduce a Deeply Normalized Transformer (DNT), which is meticulously engineered to overcome this limitation enabling seamless training with vanilla mSGDW while yielding comparable performance to the Transformers trained via AdamW. To be specific, in DNT, we strategically integrate normalization techniques at proper positions in the Transformers to effectively modulate the Jacobian matrices of each layer, balance the influence of weights, activations, and their interactions, and thus enable the distributions of gradients concentrated. We provide both theoretical justifications of the normalization technique used in our DNT and extensive empirical evaluation on two popular Transformer architectures to validate that: a) DNT outperforms its counterparts (\ie, ViT and GPT), and b) DNT can be effectively trained with vanilla mSGDW.
改革者已成为现代深层学习的事实上的支柱,然而,他们的培训通常要求一种先进的优化,适应性学习率,如AdamW,而不是SGDW(MSDW)的动力。以前的工作表明,这主要是由于梯度的分布非常繁琐。在本文中,我们引入了一种高度正常化的变异器(DNT),它经过精心设计,以克服这一局限性,使香草混凝固的变异器能够进行无缝的培训,同时使通过亚当W培训的变异器产生类似的性能。具体地说,在DNT中,我们从战略上整合了变异器的适当位置上的正常化技术,以有效调节每个层的雅各基质矩阵,平衡重量、活化及其相互作用的影响,从而使得梯度的分布得以集中。我们为我们DNT使用的正常化技术提供了理论上的理由,并对两种流行的变异器结构进行了广泛的经验评价,以证实:(a) DNT比其对应方(\,VIT和GPT)和b)DNT可以有效地与Villa MSD进行训练。
Article 246
Title@2025-07-23 (3): Fast post-process Bayesian inference with Variational Sparse Bayesian Quadrature
Title: Fast post-process Bayesian inference with Variational Sparse Bayesian Quadrature | Schnelle post-process Bayesische Schlussfolgerung mit Variational Sparse Bayesische Quadratur | 贝叶斯推断法与变异的斯帕鲁贝伊斯二次夸度 2303.05263v4 |
Authors (4): Chengkun Li, Grégoire Clarté, Martin Jørgensen, Luigi Acerbi
In applied Bayesian inference scenarios, users may have access to a large number of pre-existing model evaluations, for example from maximum-a-posteriori (MAP) optimization runs. However, traditional approximate inference techniques make little to no use of this available information. We propose the framework of post-process Bayesian inference as a means to obtain a quick posterior approximation from existing target density evaluations, with no further model calls. Within this framework, we introduce Variational Sparse Bayesian Quadrature (VSBQ), a method for post-process approximate inference for models with black-box and potentially noisy likelihoods. VSBQ reuses existing target density evaluations to build a sparse Gaussian process (GP) surrogate model of the log posterior density function. Subsequently, we leverage sparse-GP Bayesian quadrature combined with variational inference to achieve fast approximate posterior inference over the surrogate. We validate our method on challenging synthetic scenarios and real-world applications from computational neuroscience. The experiments show that VSBQ builds high-quality posterior approximations by post-processing existing optimization traces, with no further model evaluations.
在应用的Bayesian 推断假设中,用户可能有机会获得大量先前存在的模型评价,例如从最大黑箱和潜在噪音可能性模型的后处理近似推理方法;然而,传统的近似推理技术很少或根本没有利用这种现有信息;我们提议了后处理巴耶斯推断框架,作为从现有目标密度评价中获得快速后近似近似,而没有进一步的模型呼叫;在此框架内,我们采用Variational Sparse Bayesian 二次曲线(VSBQ),这是黑箱和潜在噪音可能性模型的后处理近似推理方法;VSBQ再利用现有目标密度评价,以建立稀少的Gossian进程(GP)代金模型,以构建日志后密度函数的稀释模型;随后,我们利用稀有-GP Bayesian 二次曲线和变异推论,以达到近似近似后推论;我们验证了我们质疑合成假设和从计算神经科学中实际应用的方法。实验显示,VSBQ的目前目标密度评价是没有高质量的后视镜。
Article 247
Title@2025-07-23 (3): To Trust or Not to Trust: On Calibration in ML-based Resource Allocation for Wireless Networks
Title: To Trust or Not to Trust: On Calibration in ML-based Resource Allocation for Wireless Networks | Vertrauen oder nicht vertrauen: Kalibrierung in ML-basierte Ressourcenzuteilung für drahtlose Netzwerke | 信任或不信任:校准无线网络基于ML的资源分配 2507.17494v1 |
Authors (5): Rashika Raina, Nidhi Simmons, David E. Simmons, Michel Daoud Yacoub, Trung Q. Duong
In next-generation communications and networks, machine learning (ML) models are expected to deliver not only accurate predictions but also well-calibrated confidence scores that reflect the true likelihood of correct decisions. This paper studies the calibration performance of an ML-based outage predictor within a single-user, multi-resource allocation framework. We first establish key theoretical properties of this system’s outage probability (OP) under perfect calibration. Importantly, we show that as the number of resources grows, the OP of a perfectly calibrated predictor approaches the expected output conditioned on it being below the classification threshold. In contrast, when only one resource is available, the system’s OP equals the model’s overall expected output. We then derive the OP conditions for a perfectly calibrated predictor. These findings guide the choice of the classification threshold to achieve a desired OP, helping system designers meet specific reliability requirements. We also demonstrate that post-processing calibration cannot improve the system’s minimum achievable OP, as it does not introduce new information about future channel states. Additionally, we show that well-calibrated models are part of a broader class of predictors that necessarily improve OP. In particular, we establish a monotonicity condition that the accuracy-confidence function must satisfy for such improvement to occur. To demonstrate these theoretical properties, we conduct a rigorous simulation-based analysis using post-processing calibration techniques: Platt scaling and isotonic regression. As part of this framework, the predictor is trained using an outage loss function specifically designed for this system. Furthermore, this analysis is performed on Rayleigh fading channels with temporal correlation captured by Clarke’s 2D model, which accounts for receiver mobility.
在下一代通信和网络中,机器学习(ML)模型预计不仅会提供准确的预测,而且会提供反映正确决策真实可能性的准确度校准信任分数。本文研究基于 ML 的断流预测器在单一用户、多资源配置框架内的校准性能。我们首先在完全校准的情况下为这个系统的断流概率(OP)建立关键的理论属性。重要的是,我们表明,随着资源数量的增加,一个精确校准的预测器的OP接近以它低于分类阈值为条件的预测结果。相比之下,当只有一个资源可用时,这个系统的运作轨迹等于模型的总体预期产出。我们随后为一个完全校准的预测器设计程序设定了运行条件。这些结果指导了分类阈值的选择,帮助系统设计者达到特定的可靠性要求。我们还表明,后处理校准无法改进系统基于最小的模型OP,因为它不会引入关于未来频道状况的新信息。此外,我们所培训的模型是更宽度模型的一部分, 用于更精确的预测器的精确度功能。我们必须用这种精确度分析来提高这种精确度。
Article 248
Title@2025-07-23 (3): Infinite Video Understanding
Title: Infinite Video Understanding | Unendliches Video-Verständnis | 无限视频理解 2507.09068v2 |
Authors (9): Dell Zhang, Xiangyu Chen, Jixiang Luo, Mengxi Jia, Changzhi Sun, Ruilong Ren, Jingren Liu, Hao Sun, Xuelong Li
The rapid advancements in Large Language Models (LLMs) and their multimodal extensions (MLLMs) have ushered in remarkable progress in video understanding. However, a fundamental challenge persists: effectively processing and comprehending video content that extends beyond minutes or hours. While recent efforts like Video-XL-2 have demonstrated novel architectural solutions for extreme efficiency, and advancements in positional encoding such as HoPE and VideoRoPE++ aim to improve spatio-temporal understanding over extensive contexts, current state-of-the-art models still encounter significant computational and memory constraints when faced with the sheer volume of visual tokens from lengthy sequences. Furthermore, maintaining temporal coherence, tracking complex events, and preserving fine-grained details over extended periods remain formidable hurdles, despite progress in agentic reasoning systems like Deep Video Discovery. This position paper posits that a logical, albeit ambitious, next frontier for multimedia research is Infinite Video Understanding – the capability for models to continuously process, understand, and reason about video data of arbitrary, potentially never-ending duration. We argue that framing Infinite Video Understanding as a blue-sky research objective provides a vital north star for the multimedia, and the wider AI, research communities, driving innovation in areas such as streaming architectures, persistent memory mechanisms, hierarchical and adaptive representations, event-centric reasoning, and novel evaluation paradigms. Drawing inspiration from recent work on long/ultra-long video understanding and several closely related fields, we outline the core challenges and key research directions towards achieving this transformative capability.
大语言模型及其多式扩展的快速进展在视频理解方面带来了显著的进步。然而,一个根本性的挑战依然存在:有效处理和理解超过几分钟或小时的视频内容。虽然最近的努力,如视频XL-2展示了超效率的新建筑解决方案,以及定位编码(如HOPE和VideoROPE+++)的进步,目的是增进对广泛背景的时空理解,但当前最先进的模型在面对长顺序的视觉标本数量之多时,仍然在计算和记忆上遇到巨大的限制。此外,保持时间一致性,跟踪复杂事件,保存长时期的精细微细节,仍然是巨大的障碍,尽管深视频探索系统等代理推理系统取得了进展。 这份立场文件认为,一个逻辑(尽管雄心勃勃勃的)和下方域多媒体研究的下一个前沿是无限的视频理解 – – 模型能够持续处理、理解和解释任意性、可能永无止的视频数据。我们把视频理解作为蓝天空研究目标,跟踪复杂视频理解,并保存长时期的精细微的详情细节细节细节,为最新的历史结构、历史结构、历史结构、历史结构图象学和历史图象学、历史图理学、历史图解。
Article 249
Title@2025-07-23 (3): Leveraging Diffusion Models for Parameterized Quantum Circuit Generation
Title: Leveraging Diffusion Models for Parameterized Quantum Circuit Generation | Nutzung von Diffusionsmodellen für die parameterisierte Quantum Circuit Generation | 利用可计量量子电路生成的传播模型 2505.20863v3 |
Authors (4): Daniel Barta, Darya Martyniuk, Johannes Jung, Adrian Paschke
Quantum computing holds immense potential, yet its practical success depends on multiple factors, including advances in quantum circuit design. In this paper, we introduce a generative approach based on denoising diffusion models (DMs) to synthesize parameterized quantum circuits (PQCs). Extending the recent diffusion model pipeline of F"urrutter et al. [1], our model effectively conditions the synthesis process, enabling the simultaneous generation of circuit architectures and their continuous gate parameters. We demonstrate our approach in synthesizing PQCs optimized for generating high-fidelity Greenberger-Horne-Zeilinger (GHZ) states and achieving high accuracy in quantum machine learning (QML) classification tasks. Our results indicate a strong generalization across varying gate sets and scaling qubit counts, highlighting the versatility and computational efficiency of diffusion-based methods. This work illustrates the potential of generative models as a powerful tool for accelerating and optimizing the design of PQCs, supporting the development of more practical and scalable quantum applications.
量子计算具有巨大的潜力,但其实际成功取决于多种因素,包括量子电路设计的进步。在本文中,我们采用了基于分解扩散模型(DMs)的基因化方法,以合成参数化量子电路(PQCs)。扩展了F"urrutter et al.[1]的最新扩散模型管道(F"urrutter et al.[1]),我们的模型有效地为合成过程创造了条件,使得能够同时生成电路结构及其连续门参数。我们展示了我们的方法,将优化生成高纤维化格林伯格-豪尔尼-泽林格(GHGHZ)州的PQCs(GHZ)州并实现量子机学习(QML)分类任务的高度精确性。我们的结果显示,在不同的门套件和缩放计中都非常普遍,突出了基于扩散方法的多功能和计算效率。这项工作展示了基因化模型作为加速和优化PQC设计的一个强大工具的潜力,支持开发更实用和可扩展的量子应用。
Article 250
Title@2025-07-23 (3): The Surprising Agreement Between Convex Optimization Theory and Learning-Rate Scheduling for Large Model Training
Title: The Surprising Agreement Between Convex Optimization Theory and Learning-Rate Scheduling for Large Model Training | Die überraschende Vereinbarung zwischen Convex-Optimierungstheorie und Lern-Rate-Scheeduling für große Modellausbildung | 大型示范培训的 Convex优化理论和学习-学习-进度安排之间令人惊讶的协定 2501.18965v2 |
Authors (5): Fabian Schaipp, Alexander Hägele, Adrien Taylor, Umut Simsekli, Francis Bach
We show that learning-rate schedules for large model training behave surprisingly similar to a performance bound from non-smooth convex optimization theory. We provide a bound for the constant schedule with linear cooldown; in particular, the practical benefit of cooldown is reflected in the bound due to the absence of logarithmic terms. Further, we show that this surprisingly close match between optimization theory and practice can be exploited for learning-rate tuning: we achieve noticeable improvements for training 124M and 210M Llama-type models by (i) extending the schedule for continued training with optimal learning-rate, and (ii) transferring the optimal learning-rate across schedules.
我们发现,大型示范培训的学习进度表与非悬浮性锥形优化理论的绩效表有惊人的相似之处。 我们为固定进度表提供了线性冷却的约束;特别是,冷却的实际好处反映在约束表中,因为没有对数术语。 此外,我们表明,优化理论与实践之间的这种令人惊讶的近似性可以用来调整学习速度:我们通过(一) 将继续培训的时间表延长,采用最佳学习速度,(二) 将最佳学习速度转换到所有时间表,在培训124M型和210MLlama型模型方面取得了显著的改进。
Article 251
Title@2025-07-23 (3): SRMambaV2: Biomimetic Attention for Sparse Point Cloud Upsampling in Autonomous Driving
Title: SRMambaV2: Biomimetic Attention for Sparse Point Cloud Upsampling in Autonomous Driving | SRMambaV2: Biomimetische Aufmerksamkeit für Sparse Point Cloud Upsampling im autonomen Fahren | SRMambaV2:在自主驾驶中抽取点云取样的生物模拟注意 2507.17479v1 |
Authors (4): Chuang Chen, Xiaolin Qin, Jing Hu, Wenyi Ge
Upsampling LiDAR point clouds in autonomous driving scenarios remains a significant challenge due to the inherent sparsity and complex 3D structures of the data. Recent studies have attempted to address this problem by converting the complex 3D spatial scenes into 2D image super-resolution tasks. However, due to the sparse and blurry feature representation of range images, accurately reconstructing detailed and complex spatial topologies remains a major difficulty. To tackle this, we propose a novel sparse point cloud upsampling method named SRMambaV2, which enhances the upsampling accuracy in long-range sparse regions while preserving the overall geometric reconstruction quality. Specifically, inspired by human driver visual perception, we design a biomimetic 2D selective scanning self-attention (2DSSA) mechanism to model the feature distribution in distant sparse areas. Meanwhile, we introduce a dual-branch network architecture to enhance the representation of sparse features. In addition, we introduce a progressive adaptive loss (PAL) function to further refine the reconstruction of fine-grained details during the upsampling process. Experimental results demonstrate that SRMambaV2 achieves superior performance in both qualitative and quantitative evaluations, highlighting its effectiveness and practical value in automotive sparse point cloud upsampling tasks.
由于数据固有的广度和复杂的三维数据结构,在自主驾驶情景中抽取LIDAR点云层仍是一项重大挑战。最近的研究试图通过将复杂的三维空间场景转换为2D图像超分辨率任务来解决这一问题。然而,由于分布图像呈现的特征稀少且模糊不清,准确重建详细和复杂的空间地形仍是一个重大困难。为了解决这一问题,我们提议了一种叫SRMambaV2的新颖的稀薄点云层取样方法,这种方法在保存总体几何重建质量的同时,提高了长距离稀少区域的精确度。具体地说,我们设计了一种生物模拟2D选择性扫描自我注意(DSSA)机制,以模拟遥远的稀少地区的地貌分布。与此同时,我们引入了一种双层网络结构,以加强稀薄地貌的代表性。此外,我们引入了一种渐进式适应性损失(PAL)功能,以进一步改进在升级过程中微缩裁量性细节的重建工作。实验结果表明,SRMMamba2在现实和定量评估中都取得了高水平。
Article 252
Title@2025-07-23 (3): BGM-HAN: A Hierarchical Attention Network for Accurate and Fair Decision Assessment on Semi-Structured Profiles
Title: BGM-HAN: A Hierarchical Attention Network for Accurate and Fair Decision Assessment on Semi-Structured Profiles | BGM-HAN: Hierarchisches Aufmerksamkeitsnetzwerk für eine genaue und faire Entscheidungsbeurteilung von semistrukturierten Profilen | BGM-HAN:关于半结构概况的准确和公平决定评估的等级关注网络 2507.17472v1 |
Authors (3): Junhua Liu, Roy Ka-Wei Lee, Kwan Hui Lim
Human decision-making in high-stakes domains often relies on expertise and heuristics, but is vulnerable to hard-to-detect cognitive biases that threaten fairness and long-term outcomes. This work presents a novel approach to enhancing complex decision-making workflows through the integration of hierarchical learning alongside various enhancements. Focusing on university admissions as a representative high-stakes domain, we propose BGM-HAN, an enhanced Byte-Pair Encoded, Gated Multi-head Hierarchical Attention Network, designed to effectively model semi-structured applicant data. BGM-HAN captures multi-level representations that are crucial for nuanced assessment, improving both interpretability and predictive performance. Experimental results on real admissions data demonstrate that our proposed model significantly outperforms both state-of-the-art baselines from traditional machine learning to large language models, offering a promising framework for augmenting decision-making in domains where structure, context, and fairness matter. Source code is available at: https://github.com/junhua/bgm-han.
人类在高入学率领域的决策往往依赖专门知识和超自然论,但容易受到难以察觉的认知偏见的威胁,这些认知偏见威胁到公平和长期结果。这项工作提出了通过将等级学习与各种强化相结合来加强复杂的决策工作流程的新颖办法。我们建议把大学录取作为具有代表性的高入学率领域,我们建议BGM-HAN,一个强化的Byte-Pair编码、Gated多头等级关注网络,目的是有效地模拟半结构化的申请人数据。BGM-HAN收集了对于细微评估、改进可解释性和预测性业绩至关重要的多层次代表。关于实际录取数据的实验结果表明,我们提议的模型大大超越了从传统机器学习到大语言模式的最新基线,为在结构、背景和公平性事务领域加强决策提供了有希望的框架。源代码见:https://github.com/junhua/bgm-han。
Article 253
Title@2025-07-23 (3): Demonstration of Efficient Predictive Surrogates for Large-scale Quantum Processors
Title: Demonstration of Efficient Predictive Surrogates for Large-scale Quantum Processors | Demonstration effizienter vorausschauender Surrogate für große Quantenprozessoren | 大型量子处理器高效预测加速器演示演示 2507.17470v1 |
Authors (8): Wei-You Liao, Yuxuan Du, Xinbiao Wang, Tian-Ci Tian, Yong Luo, Bo Du, Dacheng Tao, He-Liang Huang
The ongoing development of quantum processors is driving breakthroughs in scientific discovery. Despite this progress, the formidable cost of fabricating large-scale quantum processors means they will remain rare for the foreseeable future, limiting their widespread application. To address this bottleneck, we introduce the concept of predictive surrogates, which are classical learning models designed to emulate the mean-value behavior of a given quantum processor with provably computational efficiency. In particular, we propose two predictive surrogates that can substantially reduce the need for quantum processor access in diverse practical scenarios. To demonstrate their potential in advancing digital quantum simulation, we use these surrogates to emulate a quantum processor with up to 20 programmable superconducting qubits, enabling efficient pre-training of variational quantum eigensolvers for families of transverse-field Ising models and identification of non-equilibrium Floquet symmetry-protected topological phases. Experimental results reveal that the predictive surrogates not only reduce measurement overhead by orders of magnitude, but can also surpass the performance of conventional, quantum-resource-intensive approaches. Collectively, these findings establish predictive surrogates as a practical pathway to broadening the impact of advanced quantum processors.
量子处理器的持续开发正在推动科学发现中的突破。尽管取得了这一进展,但制造大规模量子处理器的巨大成本在可预见的将来仍然十分罕见,限制了它们的广泛应用。为了解决这一瓶颈问题,我们引入了预测代孕器的概念,这是典型的学习模式,旨在模仿特定量子处理器的平均值行为,并具有可变的计算效率。特别是,我们提议了两个预测的代孕器,可以大大减少量子处理器在不同实际情景下对存取的需求。为了展示它们在推进数字量子模拟方面的潜力,我们利用这些代孕器来模仿量子处理器,该代孕器最多可编程20个可编程的超导量子。为了解决这一瓶颈,我们引入了预测代孕器的概念,这是典型的学习模式,旨在模仿特定量子处理器的平均值行为,并查明非等离子体的花质调调温度阶段。实验结果显示,预测的代孕不仅减少数量级测量管理,而且能够超过常规的量子资源密集型方法的性能。这些实际的预测结果将扩大到先进路径。
Article 254
Title@2025-07-23 (3): MIRA: Medical Time Series Foundation Model for Real-World Health Data
Title: MIRA: Medical Time Series Foundation Model for Real-World Health Data | MIRA: Medical Time Series Foundation Modell für real-World Gesundheitsdaten | 医疗时间系列基金会实际世界卫生数据模型 2506.07584v3 |
Authors (11): Hao Li, Bowen Deng, Chang Xu, Zhiyuan Feng, Viktor Schlegel, Yu-Hao Huang, Yizheng Sun, Jingyuan Sun, Kailai Yang, Yiyao Yu, Jiang Bian
A unified foundation model for medical time series – pretrained on open access and ethics board-approved medical corpora – offers the potential to reduce annotation burdens, minimize model customization, and enable robust transfer across clinical institutions, modalities, and tasks, particularly in data-scarce or privacy-constrained environments. However, existing generalist time series foundation models struggle to handle medical time series data due to their inherent challenges, including irregular intervals, heterogeneous sampling rates, and frequent missing values. To address these challenges, we introduce MIRA, a unified foundation model specifically designed for medical time series forecasting. MIRA incorporates a Continuous-Time Rotary Positional Encoding that enables fine-grained modeling of variable time intervals, a frequency-specific mixture-of-experts layer that routes computation across latent frequency regimes to further promote temporal specialization, and a Continuous Dynamics Extrapolation Block based on Neural ODE that models the continuous trajectory of latent states, enabling accurate forecasting at arbitrary target timestamps. Pretrained on a large-scale and diverse medical corpus comprising over 454 billion time points collect from publicly available datasets, MIRA achieves reductions in forecasting errors by an average of 10% and 7% in out-of-distribution and in-distribution scenarios, respectively, when compared to other zero-shot and fine-tuned baselines. We also introduce a comprehensive benchmark spanning multiple downstream clinical tasks, establishing a foundation for future research in medical time series modeling.
医疗时间序列的统一基础模型 – – 在开放存取和道德道德委员会核准的医疗团团 – – 接受医疗时间序列的统一基础模型培训之前,可以减少批注负担,尽量减少模式定制,使临床机构、模式和任务,特别是数据偏差或隐私受限制的环境能够实现强有力的跨临床机构、模式和任务的转移;然而,现有的通用时间序列模型由于其内在挑战,包括不定期间隔、不同抽样率和经常缺失值等,难以处理医疗时间序列数据;为了应对这些挑战,我们引入了MIRA,这是一个专门为医疗时间序列预测设计的统一基础模型。 MIRA包含一个连续时间旋转的扶轮定位编码,能够对可变时间间隔、具体频率混合专家层进行精确的建模,从而在潜在频率系统之间进行计算,以进一步促进时间专业化,以及基于神经观察模型的连续动态动态外推学区,以模拟潜伏状态的持续轨迹,从而得以对任意的目标时标点进行准确预测。 MIRA包含超过454亿个时间点的大型和多种医学团。 MIRA包含从公开数据集收集的连续时间点的连续旋转定位定位定位定位系统,一个精确的模型结构结构结构结构结构,可以精确地建成不同频率混合混合结构,并进行模型,同时进行计算,同时进行计算,同时进行跨频率的计算,同时进行计算,并在10 %基比标定出一个平均基准,同时进行预测,并进行10 %基准,并调整,并进行比标值。
Article 255
Title@2025-07-23 (3): Mapping of Weed Management Methods in Orchards using Sentinel-2 and PlanetScope Data
Title: Mapping of Weed Management Methods in Orchards using Sentinel-2 and PlanetScope Data | Kartierung von Unkraut-Management-Methoden in Obstgärten mit Sentinel-2 und PlanetScope-Daten | 利用哨兵-2和行星域数据绘制果园杂草管理方法图 2504.19991v2 |
Authors (8): Ioannis Kontogiorgakis, Iason Tsardanidis, Dimitrios Bormpoudakis, Ilias Tsoumas, Dimitra A. Loka, Christos Noulas, Alexandros Tsitouras, Charalampos Kontoes
Effective weed management is crucial for improving agricultural productivity, as weeds compete with crops for vital resources like nutrients and water. Accurate maps of weed management methods are essential for policymakers to assess farmer practices, evaluate impacts on vegetation health, biodiversity, and climate, as well as ensure compliance with policies and subsidies. However, monitoring weed management methods is challenging as they commonly rely on ground-based field surveys, which are often costly, time-consuming and subject to delays. In order to tackle this problem, we leverage earth observation data and Machine Learning (ML). Specifically, we developed separate ML models using Sentinel-2 and PlanetScope satellite time series data, respectively, to classify four distinct weed management methods (Mowing, Tillage, Chemical-spraying, and No practice) in orchards. The findings demonstrate the potential of ML-driven remote sensing to enhance the efficiency and accuracy of weed management mapping in orchards.
有效的杂草管理对于提高农业生产力至关重要,因为杂草与作物争夺养分和水等重要资源。准确的杂草管理方法地图对于决策者评估农民做法、评估对植被健康、生物多样性和气候的影响以及确保遵守政策和补贴至关重要。然而,监测杂草管理方法具有挑战性,因为它们通常依赖于地面实地调查,而地面调查往往成本高、耗时长且可能会延误。为了解决这一问题,我们利用地球观测数据和机器学习(ML)来利用地球观测数据和机器学习(ML)。具体地说,我们利用哨兵2和行星SlanetScope卫星时间序列数据分别开发了单独的杂草模型,对果园中的四种不同的杂草管理方法(Mowing、Tillage、化学喷洒和无实践)进行分类。研究结果表明,由ML驱动的遥感有可能提高果园中杂草管理绘图的效率和准确性。
Article 256
Title@2025-07-23 (3): C3RL: Rethinking the Combination of Channel-independence and Channel-mixing from Representation Learning
Title: C3RL: Rethinking the Combination of Channel-independence and Channel-mixing from Representation Learning | C3RL: Die Kombination von Kanal-Unabhängigkeit und Kanal-Mixing aus Repräsentationslernen neu denken | C3RL:重新思考将频道独立和频道混合与代表性学习相结合的问题 2507.17454v1 |
Authors (3): Shusen Ma, Yun-Bo Zhao, Yu Kang
Multivariate time series forecasting has drawn increasing attention due to its practical importance. Existing approaches typically adopt either channel-mixing (CM) or channel-independence (CI) strategies. CM strategy can capture inter-variable dependencies but fails to discern variable-specific temporal patterns. CI strategy improves this aspect but fails to fully exploit cross-variable dependencies like CM. Hybrid strategies based on feature fusion offer limited generalization and interpretability. To address these issues, we propose C3RL, a novel representation learning framework that jointly models both CM and CI strategies. Motivated by contrastive learning in computer vision, C3RL treats the inputs of the two strategies as transposed views and builds a siamese network architecture: one strategy serves as the backbone, while the other complements it. By jointly optimizing contrastive and prediction losses with adaptive weighting, C3RL balances representation and forecasting performance. Extensive experiments on seven models show that C3RL boosts the best-case performance rate to 81.4\% for models based on CI strategy and to 76.3\% for models based on CM strategy, demonstrating strong generalization and effectiveness. The code will be available once the paper is accepted.
现有方法通常采用渠道混合(CM)或渠道独立(CI)战略。CM战略可以捕捉不同依赖性,但无法辨别具体的时间模式。CI战略改进了这一方面,但未能充分利用CM等不同依赖性。基于特征融合的混合战略提供了有限的概括性和可解释性。为了解决这些问题,我们提议C3RL,这是一个新的代表性学习框架,它是一个共同模拟CM和CI战略的新模式。在计算机愿景的对比学习的推动下,C3RL将两个战略的投入视为转换观点,并构建一个结构结构:一个战略作为主干线,而另一个则补充它。通过联合优化对比和预测损失,调整权重,C3RL平衡代表性和预测业绩。对七个模式的广泛实验显示,C3RL将基于CI战略的模式的最佳业绩率提高到81.4。CM战略模型的最佳业绩率和基于CM战略的模型76.3,一旦获得接受,将显示强有力的一般化和有效性。
Article 257
Title@2025-07-23 (3): Efficient Neural Network Verification via Order Leading Exploration of Branch-and-Bound Trees
Title: Efficient Neural Network Verification via Order Leading Exploration of Branch-and-Bound Trees | Effiziente Neuralnetzverifizierung durch Order Leading Exploration von Zweig-und-Bound-Bäumen | 通过分树和环形树的有序主要勘探进行高效神经网络核查 2507.17453v1 |
Authors (7): Guanqin Zhang, Kota Fukuda, Zhenya Zhang, H. M. N. Dilum Bandara, Shiping Chen, Jianjun Zhao, Yulei Sui
The vulnerability of neural networks to adversarial perturbations has necessitated formal verification techniques that can rigorously certify the quality of neural networks. As the state-of-the-art, branch and bound (BaB) is a “divide-and-conquer” strategy that applies off-the-shelf verifiers to sub-problems for which they perform better. While BaB can identify the sub-problems that are necessary to be split, it explores the space of these sub-problems in a naive “first-come-first-serve” manner, thereby suffering from an issue of inefficiency to reach a verification conclusion. To bridge this gap, we introduce an order over different sub-problems produced by BaB, concerning with their different likelihoods of containing counterexamples. Based on this order, we propose a novel verification framework Oliva that explores the sub-problem space by prioritizing those sub-problems that are more likely to find counterexamples, in order to efficiently reach the conclusion of the verification. Even if no counterexample can be found in any sub-problem, it only changes the order of visiting different sub-problem and so will not lead to a performance degradation. Specifically, Oliva has two variants, including $Oliva^{GR}$, a greedy strategy that always prioritizes the sub-problems that are more likely to find counterexamples, and $Oliva^{SA}$, a balanced strategy inspired by simulated annealing that gradually shifts from exploration to exploitation to locate the globally optimal sub-problems. We experimentally evaluate the performance of Oliva on 690 verification problems spanning over 5 models with datasets MNIST and CIFAR10. Compared to the state-of-the-art approaches, we demonstrate the speedup of Oliva for up to 25X in MNIST, and up to 80X in CIFAR10.
神经网络对对抗性扰动的脆弱性要求正式的核查技术来严格验证神经网络的质量。 由于神经网络的状态、 分支和约束( BAB) 是一种“ 分解和解析” 战略, 将现成的核查器应用到它们表现较好的子问题。 虽然BAB 可以找出需要分解的子问题, 但是它会探索这些小问题的空间, 以天真的“ 先到先得” 方式解决这些小问题, 从而影响神经网络质量问题, 从而导致无法达成核查结论。 为了弥合这一差距, 我们对BAB 产生的不同子问题, 将现成的校外校外校外校外校外校外校外校外校外校外校外校外校外校外校外校外校外校外校外校外校外校外校外校外校外校外校外校外校外校外校外校外校外校外校外校外校外校外。 我们提议一个新的校外校外校外校外校外校外校外校外的校外校外校外校外校外校外校外校外校外校外校外校外校外校外校外校外校外校外校外校外校外校外校外校外校外校外校外校外校外校外校外校外校外校外校外校外校外校外校外, , , , , , 等校外校外校外校外校外校外校外校外校外校外校外校外校外校外校外校外校外校外校外校外校外校外校外校外校外校外校外校外校外校外校外校外校外校外校外校外校外校外校外校外校外校外校外校外校外校外校外校外校外校外校外校外校外校外校外校外校外校外校外校外校外校外校外校外校外校外校外校外校外校内的校内的校外校外校外校外校外校外校外校外校外校外校外校外校外校外校外校外校外校外校外的
Article 258
Title@2025-07-23 (3): JEDI: The Force of Jensen-Shannon Divergence in Disentangling Diffusion Models
Title: JEDI: The Force of Jensen-Shannon Divergence in Disentangling Diffusion Models | JEDI: Die Macht der Jensen-Shannon-Divergenz bei entwirrenden Diffusionsmodellen | JEDI: 詹森-夏农分解扩散模型的分解力量 2505.19166v2 |
Authors (3): Eric Tillmann Bill, Enis Simsar, Thomas Hofmann
We introduce JEDI, a test-time adaptation method that enhances subject separation and compositional alignment in diffusion models without requiring retraining or external supervision. JEDI operates by minimizing semantic entanglement in attention maps using a novel Jensen-Shannon divergence based objective. To improve efficiency, we leverage adversarial optimization, reducing the number of updating steps required. JEDI is model-agnostic and applicable to architectures such as Stable Diffusion 1.5 and 3.5, consistently improving prompt alignment and disentanglement in complex scenes. Additionally, JEDI provides a lightweight, CLIP-free disentanglement score derived from internal attention distributions, offering a principled benchmark for compositional alignment under test-time conditions. Code and results are available at https://ericbill21.github.io/JEDI/.
我们引入了JEDI, 这是一种测试-时间适应方法,可以加强传播模式的主体分离和构成一致性,而无需再培训或外部监督; JEDI通过使用基于差异的新型Jensen-Shannon目标,最大限度地减少注意力图中的语义纠缠; 为提高效率,我们利用对抗性优化,减少所需更新步骤的数量; JEDI是模型-不可知性,适用于稳定扩散1.5和3.5等建筑,不断改进快速匹配和复杂场景的分解; 此外, JEDI提供了从内部关注分布中得出的轻量、CLIP-无分解分数,为测试时条件下的构成一致性提供了原则性基准; 代码和结果可在https://ericbill21.githubio/JEDI/查阅。
Article 259
Title@2025-07-23 (3): Persistent Patterns in Eye Movements: A Topological Approach to Emotion Recognition
Title: Persistent Patterns in Eye Movements: A Topological Approach to Emotion Recognition | Persistente Muster in Augenbewegungen: Ein topologischer Ansatz zur Emotionserkennung | 眼睛运动中的持久性模式:对情感认识的主观学方法 2507.17450v1 |
Authors (5): Arsha Niksa, Hooman Zare, Ali Shahrabi, Hanieh Hatami, Mohammadreza Razvan
We present a topological pipeline for automated multiclass emotion recognition from eye-tracking data. Delay embeddings of gaze trajectories are analyzed using persistent homology. From the resulting persistence diagrams, we extract shape-based features such as mean persistence, maximum persistence, and entropy. A random forest classifier trained on these features achieves up to $75.6\%$ accuracy on four emotion classes, which are the quadrants the Circumplex Model of Affect. The results demonstrate that persistence diagram geometry effectively encodes discriminative gaze dynamics, suggesting a promising topological approach for affective computing and human behavior analysis.
我们展示了一种从眼睛跟踪数据中自动识别多级情绪的地形管道。 使用持久性同质分析凝视轨迹的延迟嵌入。 从由此产生的持久性图表中,我们提取了基于形状的特征,如平均持久性、最大持久性和酶性。 接受过有关这些特征培训的随机森林分类师在四种情感类别上达到了75.6 $的准确度,这四种情感类别是北极效应模型的四分之一。 研究结果表明,持续性图表几何法有效地将歧视性视觉动态编码起来,这表明了有希望的情感计算和人类行为分析的地形学方法。
Article 260
Title@2025-07-23 (3): Doubly robust outlier resistant inference on causal treatment effect
Title: Doubly robust outlier resistant inference on causal treatment effect | Doppelt robuste aussergewöhnliche resistente Inferenz auf kausalen Behandlungseffekt | 关于因果处理效果的断断实有力的外部抗异物抗性推论 2507.17439v1 |
Authors (1): Joonsung Kang
Outliers can severely distort causal effect estimation in observational studies, yet this issue has received limited attention in the literature. Their influence is especially pronounced in small sample sizes, where detecting and removing outliers becomes increasingly difficult. Therefore, it is essential to estimate treatment effects robustly without excluding these influential data points. To address this, we propose a doubly robust point estimator for the average treatment effect under a contaminated model that includes outliers. Robustness in outcome regression is achieved through a robust estimating equation, while covariate balancing propensity scores (CBPS) ensure resilience in propensity score modeling. To prevent model overfitting due to the inclusion of numerous parameters, we incorporate variable selection. All these components are unified under a penalized empirical likelihood framework. For confidence interval estimation, most existing approaches rely on asymptotic properties, which may be unreliable in finite samples. We derive an optimal finite-sample confidence interval for the average treatment effect using our proposed estimating equation, ensuring that the interval bounds remain unaffected by outliers. Through simulations and a real-world application involving hypertension data with outliers, we demonstrate that our method consistently outperforms existing approaches in both accuracy and robustness.
在观察研究中,外部线可严重扭曲因果关系估计,但这一问题在文献中受到的关注有限。其影响在小样本规模中特别明显,发现和清除外部线越来越困难。因此,在不排除这些有影响的数据点的情况下,必须强有力地估计治疗效果,同时不排除这些有影响的数据点。为了解决这个问题,我们提议在包括外部线在内的受污染模型下,为平均治疗效果提供一个双强的点估计点。结果回归的强力通过稳健的估计方程式实现,而平衡性能分数(CBPS)则确保适应性分数模型的弹性。为防止模型因包含许多参数而过度适应模型,我们采用了变量选择。所有这些组成部分都统一在一个受惩罚的经验可能性框架之下。关于信任间隔估计,大多数现有方法依赖无保障特性,这些特性在有限的样本中可能不可靠。我们用我们提议的估计方程式为平均治疗效果得出一个最佳的有限范围信任间隔,确保间隔线不受外部线的影响。我们通过模拟和涉及外部线的实时数据的实际世界应用,我们证明我们的方法始终不差。
Article 261
Title@2025-07-23 (3): Gathering and Exploiting Higher-Order Information when Training Large Structured Models
Title: Gathering and Exploiting Higher-Order Information when Training Large Structured Models | Sammeln und Ausnutzen von Informationen höherer Ordnung beim Training großer strukturierter Modelle | 培训大型结构型模型时收集和利用高级命令信息 2312.03885v4 |
Authors (1): Pierre Wolinski
When training large models, such as neural networks, the full derivatives of order 2 and beyond are usually inaccessible, due to their computational cost. Therefore, among the second-order optimization methods, it is common to bypass the computation of the Hessian by using first-order information, such as the gradient of the parameters (e.g., quasi-Newton methods) or the activations (e.g., K-FAC). In this paper, we focus on the exact and explicit computation of projections of the Hessian and higher-order derivatives on well-chosen subspaces relevant for optimization. Namely, for a given partition of the set of parameters, we compute tensors that can be seen as “higher-order derivatives according to the partition”, at a reasonable cost as long as the number of subsets of the partition remains small. Then, we give some examples of how these tensors can be used. First, we show how to compute a learning rate per subset of parameters, which can be used for hyperparameter tuning. Second, we show how to use these tensors at order 2 to construct an optimization method that uses information contained in the Hessian. Third, we show how to use these tensors at order 3 (information contained in the third derivative of the loss) to regularize this optimization method. The resulting training step has several interesting properties, including: it takes into account long-range interactions between the layers of the trained neural network, which is usually not the case in similar methods (e.g., K-FAC); the trajectory of the optimization is invariant under affine layer-wise reparameterization.
当培训大型模型(如神经网络)时,由于计算成本,通常无法获取第2级及第2级以上的完整衍生物。因此,在第二阶优化方法中,通常使用第一阶信息(如准纽顿方法)或启动(如K-FAC)等参数的梯度来绕过赫森人的计算,例如,神经网络的梯度(例如准纽顿方法)或启动(例如K-FAC)。在本文件中,我们侧重于精确和明确地计算赫森和更高阶级衍生物预测的预测,这些预测与优化相关的选合子空间。也就是说,对于一套参数的某个特定分区,我们用“根据分区计算更高阶衍生物”的方法计算出赫森人的计算。在第二阶梯值中,我们通常用这些梯度的梯度来计算出“更高阶梯度衍生物 ” , 而在第三个阶梯度法中,我们用的是“更阶梯度”法来计算出“更阶梯度方法,在三个阶梯层中,我们用的是“更阶梯”的方法,在“更阶层”中,我们用的是“更阶阶梯法”中,在“更阶法”中,在“更阶”中,在“更阶法中,在“更阶层”中,在“更阶法”中,在“更阶法”中,在“更阶梯度”中,在“更阶法”中,在“第3阶法”中,在“第3阶法”中,让我们”中,在“更阶法”中,在“我们用的是“第3阶法”中,在“更阶法”中,在“第3阶法”中,在“我们用到“我们用的是“更阶法”中,在“再”中,在“再”中”中,在“第3阶法”中,在“第3阶法”中,在“第3阶”中,在“。”中,在“。”中,在“。”中,在“第3阶法”中,在“第3行”中,在“。”中,在“第3阶梯号”中,在“第3阶法”中,在“第3阶法”中,在“第3阶法”中,
Article 262
Title@2025-07-23 (3): Ctx2TrajGen: Traffic Context-Aware Microscale Vehicle Trajectories using Generative Adversarial Imitation Learning
Title: Ctx2TrajGen: Traffic Context-Aware Microscale Vehicle Trajectories using Generative Adversarial Imitation Learning | Ctx2TrajGen: Traffic Context-Aware Microscale Fahrzeug-Trajektorien mit Generative Adversarial Imitation Learning | Ctx2TrajGen: 利用产生反逆模拟学习的交通环境-软件微型车辆轨迹 2507.17418v1 |
Authors (5): Joobin Jin, Seokjun Hong, Gyeongseon Baek, Yeeun Kim, Byeongjoon Noh
Precise modeling of microscopic vehicle trajectories is critical for traffic behavior analysis and autonomous driving systems. We propose Ctx2TrajGen, a context-aware trajectory generation framework that synthesizes realistic urban driving behaviors using GAIL. Leveraging PPO and WGAN-GP, our model addresses nonlinear interdependencies and training instability inherent in microscopic settings. By explicitly conditioning on surrounding vehicles and road geometry, Ctx2TrajGen generates interaction-aware trajectories aligned with real-world context. Experiments on the drone-captured DRIFT dataset demonstrate superior performance over existing methods in terms of realism, behavioral diversity, and contextual fidelity, offering a robust solution to data scarcity and domain shift without simulation.
精确地模拟微型车辆轨迹对于交通行为分析和自主驾驶系统至关重要。 我们提议Ctx2TrajGen,这是利用GAIL综合现实的城市驾驶行为的一种环境觉悟轨迹生成框架。 利用 PPO 和WGAN-GP, 我们的模型处理微型环境中固有的非线性相互依存关系和培训不稳定性。 Ctx2TrajGen通过对周围车辆和道路几何进行明确调节,产生了与现实世界环境相一致的互动觉悟轨迹。 无人驾驶飞机捕获的DRIFT数据集实验显示,在现实主义、行为多样性和背景真实性方面,优于现有方法,为数据稀缺和未经模拟的域转移提供了强有力的解决方案。
Article 263
Title@2025-07-23 (3): A Comprehensive Evaluation on Quantization Techniques for Large Language Models
Title: A Comprehensive Evaluation on Quantization Techniques for Large Language Models | Eine umfassende Bewertung von Quantisierungstechniken für große Sprachmodelle | 对大语言模型量化技术的综合评价 2507.17417v1 |
Authors (3): Yutong Liu, Cairong Zhao, Guosheng Hu
For large language models (LLMs), post-training quantization (PTQ) can significantly reduce memory footprint and computational overhead. Model quantization is a rapidly evolving research field. Though many papers have reported breakthrough performance, they may not conduct experiments on the same ground since one quantization method usually contains multiple components. In addition, analyzing the theoretical connections among existing methods is crucial for in-depth understanding. To bridge these gaps, we conduct an extensive review of state-of-the-art methods and perform comprehensive evaluations on the same ground to ensure fair comparisons. To our knowledge, this fair and extensive investigation remains critically important yet underexplored. To better understand the theoretical connections, we decouple the published quantization methods into two steps: pre-quantization transformation and quantization error mitigation. We define the former as a preprocessing step applied before quantization to reduce the impact of outliers, making the data distribution flatter and more suitable for quantization. Quantization error mitigation involves techniques that offset the errors introduced during quantization, thereby enhancing model performance. We evaluate and analyze the impact of different components of quantization methods. Additionally, we analyze and evaluate the latest MXFP4 data format and its performance. Our experimental results demonstrate that optimized rotation and scaling yield the best performance for pre-quantization transformation, and combining low-rank compensation with GPTQ occasionally outperforms using GPTQ alone for quantization error mitigation. Furthermore, we explore the potential of the latest MXFP4 quantization and reveal that the optimal pre-quantization transformation strategy for INT4 does not generalize well to MXFP4, inspiring further investigation.
对于大型语言模型(LLMS)来说,培训后量化(PTQ)可以大大减少记忆足迹和计算间接费用。模型量化是一个迅速演变的研究领域。虽然许多论文都报告了突破性业绩,但可能不会在同一地面进行实验,因为一个量化方法通常包含多个组成部分。此外,分析现有方法之间的理论联系对于深入理解至关重要。为了缩小这些差距,我们广泛审查最先进的方法,并在同一地点进行全面评价,以确保公平比较。据我们了解,这一公平和广泛的调查仍然至关重要,但并未得到充分探讨。为了更好地了解理论联系,我们将公布的量化方法分解为两个步骤:量化前转换和量化误差减缓。此外,我们把前者定义为在量化前采用的处理步骤,以减少外差的影响,使数据分发更加赞美,更适合量化。量化错误的缓解方法包括抵消在量化过程中引入的错误,从而增强模型性能。我们评估并分析不同部分的量化误差影响, 量化4 量化4 量化和再量化的升级战略,我们用量化前的量化和再定性的最新业绩分析。我们用量化后的最佳业绩模型来评估。
Article 264
Title@2025-07-23 (3): How Well Does GPT-4o Understand Vision? Evaluating Multimodal Foundation Models on Standard Computer Vision Tasks
Title: How Well Does GPT-4o Understand Vision? Evaluating Multimodal Foundation Models on Standard Computer Vision Tasks | Wie gut versteht GPT-4o Vision? Bewertung multimodaler Basismodelle auf Standard Computer Vision Aufgaben | GPT-4o GPT-4o如何理解愿景?评估标准计算机愿景任务多模式基金会模式 2507.01955v2 |
Authors (6): Rahul Ramachandran, Ali Garjani, Roman Bachmann, Andrei Atanov, Oğuzhan Fatih Kar, Amir Zamir
Multimodal foundation models, such as GPT-4o, have recently made remarkable progress, but it is not clear where exactly these models stand in terms of understanding vision. In this paper, we benchmark the performance of popular multimodal foundation models (GPT-4o, o4-mini, Gemini 1.5 Pro and Gemini 2.0 Flash, Claude 3.5 Sonnet, Qwen2-VL, Llama 3.2) on standard computer vision tasks (semantic segmentation, object detection, image classification, depth and surface normal prediction) using established datasets (e.g., COCO, ImageNet and its variants, etc). The main challenges to performing this are: 1) most models are trained to output text and cannot natively express versatile domains, such as segments or 3D geometry, and 2) many leading models are proprietary and accessible only at an API level, i.e., there is no weight access to adapt them. We address these challenges by translating standard vision tasks into equivalent text-promptable and API-compatible tasks via prompt chaining to create a standardized benchmarking framework. We observe that 1) the models are not close to the state-of-the-art specialist models at any task. However, 2) they are respectable generalists; this is remarkable as they are presumably trained on primarily image-text-based tasks. 3) They perform semantic tasks notably better than geometric ones. 4) While the prompt-chaining techniques affect performance, better models exhibit less sensitivity to prompt variations. 5) GPT-4o performs the best among non-reasoning models, securing the top position in 4 out of 6 tasks, 6) reasoning models, e.g. o3, show improvements in geometric tasks, and 7) a preliminary analysis of models with native image generation, like the latest GPT-4o, shows they exhibit quirks like hallucinations and spatial misalignments.
GPT-4o等多式基础模型最近取得了显著进展,但尚不清楚这些模型在理解愿景方面的确切位置。在本文中,我们用固定的数据集(例如,COCO、图像网及其变异器等)衡量了流行型多式联运基础模型(GPT-4o、o4-mini、Gemini1.5 Pro和Gemini 2.0 Flash、Claude 3.5 Sonnet、Qwen2-VL、Llama 3.2)在标准计算机愿景任务(静态空间分解、物体探测、图像分类、深度和表面正常预测)方面的性能。 使用固定的数据集(例如,COCOCO、图像网及其变异器等 ) 。 实现这一点的主要挑战是:(1) 多数模型都经过了输出文本文本培训,无法本地显示多功能域,例如区段或3D几度和2) 。 许多主要模型都是专有的, 只能在 APIPI 级别上进行调整。 我们通过将标准视觉任务转换成对应的文本-可探测和AP- 比较的模型, 通过快速的模型来建立标准化基准框架。我们观察的变换的模型, 3, 发现这些模型是不甚甚甚甚甚不甚甚甚甚甚甚甚甚甚甚甚甚甚甚甚甚甚甚甚甚甚甚甚甚甚甚甚甚于专业的G- 任务。
Article 265
Title@2025-07-23 (3): Learning from Scratch: Structurally-masked Transformer for Next Generation Lib-free Simulation
Title: Learning from Scratch: Structurally-masked Transformer for Next Generation Lib-free Simulation | Von Scratch lernen: Strukturell maskierter Transformer für Lib-freie Simulation der nächsten Generation | 从 Scratch 中学习: 下一代自由模拟的结构性巨型变形器 2507.17396v1 |
Authors (3): Junlang Huang, Hao Chen, Zhong Guan
This paper proposes a neural framework for power and timing prediction of multi-stage data path, distinguishing itself from traditional lib-based analytical methods dependent on driver characterization and load simplifications. To the best of our knowledge, this is the first language-based, netlist-aware neural network designed explicitly for standard cells. Our approach employs two pre-trained neural models of waveform prediction and delay estimation that directly infer transient waveforms and propagation delays from SPICE netlists, conditioned on critical physical parameters such as load capacitance, input slew, and gate size. This method accurately captures both intrinsic and coupling-induced delay effects without requiring simplification or interpolation. For multi-stage timing prediction, we implement a recursive propagation strategy where predicted waveforms from each stage feed into subsequent stages, cumulatively capturing delays across the logic chain. This approach ensures precise timing alignment and complete waveform visibility throughout complex signal pathways. The waveform prediction utilizes a hybrid CNN-Transformer architecture with netlist-aware node-level encoding, addressing traditional Transformers’ fixed input dimensionality constraints. Additionally, specialized subnetworks separately handle primary delay estimation and crosstalk correction. Experimental results demonstrate SPICE-level accuracy, consistently achieving RMSE below 0.0098 across diverse industrial circuits. The proposed framework provides a scalable, structurally adaptable neural alternative to conventional power and timing engines, demonstrating high fidelity to physical circuit behaviors.
本文提出多阶段数据路径的能量和时间预测神经框架,将其与依赖驱动器特性和负荷简化的传统基于lib的分析方法区分开来,与基于驱动器特性和负荷简化的传统分析方法区分开来。 据我们所知,这是专门为标准单元格设计的首个基于语言的、网列清单的神经网络网络。我们的方法采用了两种预先训练的波形预测和延迟估计神经模型,这些神经模型直接推导中转波形波形和SPICE网络列表的传播延迟,其条件是,关键物理参数,如负载能力、输入读取和大门大小。这种方法准确地捕捉到内在和合并导致的延迟效应,而无需简化或内插。对于多阶段的时间预测而言,我们采用了一种循环传播战略,其中每个阶段预测的波形形态将输入后各阶段,累积地捕捉整个逻辑链的延迟。这个方法确保精确的时间吻合,并在整个复杂的信号路径路径中完全可见波形。波形预测利用一种混合的CNN-变形结构结构,有净列表-感知度的节点编码,解决传统变换式的固定输入维度限制。此外,专门的次网络的次网络将显示在空间结构结构结构结构结构结构结构结构结构结构结构结构结构结构结构结构结构结构结构结构结构结构结构结构结构框架之下,在连续地显示下进行连续地显示。
Article 266
Title@2025-07-23 (3): Causal Mechanism Estimation in Multi-Sensor Systems Across Multiple Domains
Title: Causal Mechanism Estimation in Multi-Sensor Systems Across Multiple Domains | Causal Mechanism Abschätzung in Multi-Sensor-Systemen über mehrere Domains | 跨多域多传感器系统中因果机制估算 2507.17792v1 |
Authors (3): Jingyi Yu, Tim Pychynski, Marco F. Huber
To gain deeper insights into a complex sensor system through the lens of causality, we present common and individual causal mechanism estimation (CICME), a novel three-step approach to inferring causal mechanisms from heterogeneous data collected across multiple domains. By leveraging the principle of Causal Transfer Learning (CTL), CICME is able to reliably detect domain-invariant causal mechanisms when provided with sufficient samples. The identified common causal mechanisms are further used to guide the estimation of the remaining causal mechanisms in each domain individually. The performance of CICME is evaluated on linear Gaussian models under scenarios inspired from a manufacturing process. Building upon existing continuous optimization-based causal discovery methods, we show that CICME leverages the benefits of applying causal discovery on the pooled data and repeatedly on data from individual domains, and it even outperforms both baseline methods under certain scenarios.
为了从因果关系的角度更深入地了解复杂的传感器系统,我们提出了共同和个别因果机制估计(CICME),这是从跨多个领域收集的不同数据中推断因果机制的新颖的三步方法。通过利用因果转移学习原则,CICME能够可靠地探测到有足够样本的因果机制。已查明的共同因果机制还被进一步用于指导对每个领域剩余因果机制的个别估计。CICME的性能在由制造过程启发的假设情景下,根据线性高斯模型进行评估。在现有基于优化的连续因果发现方法的基础上,我们表明,CICME利用了将因果发现应用于集合数据和对单个领域数据反复应用的好处,甚至在某些假设情景下超越了两种基线方法。
Article 267
Title@2025-07-23 (3): Helix 1.0: An Open-Source Framework for Reproducible and Interpretable Machine Learning on Tabular Scientific Data
Title: Helix 1.0: An Open-Source Framework for Reproducible and Interpretable Machine Learning on Tabular Scientific Data | Helix 1.0: Ein Open-Source-Framework für reproduzierbares und interpretierbares maschinelles Lernen auf tabellarischen wissenschaftlichen Daten | Helix 1.0:关于表格科学数据可复制和可解释的机器学习的开放源码框架 2507.17791v1 |
Authors (11): Eduardo Aguilar-Bejarano, Daniel Lea, Karthikeyan Sivakumar, Jimiama M. Mase, Reza Omidvar, Ruizhe Li, Troy Kettle, James Mitchell-White, Morgan R Alexander, David A Winkler, Grazziela Figueredo
Helix is an open-source, extensible, Python-based software framework to facilitate reproducible and interpretable machine learning workflows for tabular data. It addresses the growing need for transparent experimental data analytics provenance, ensuring that the entire analytical process – including decisions around data transformation and methodological choices – is documented, accessible, reproducible, and comprehensible to relevant stakeholders. The platform comprises modules for standardised data preprocessing, visualisation, machine learning model training, evaluation, interpretation, results inspection, and model prediction for unseen data. To further empower researchers without formal training in data science to derive meaningful and actionable insights, Helix features a user-friendly interface that enables the design of computational experiments, inspection of outcomes, including a novel interpretation approach to machine learning decisions using linguistic terms all within an integrated environment. Released under the MIT licence, Helix is accessible via GitHub and PyPI, supporting community-driven development and promoting adherence to the FAIR principles.
Hython的软件框架是一个开放源码、可扩展的、基于Python的软件框架,以便利对表格数据进行可复制和可解释的机器学习工作流程,解决对透明试验性数据分析出处的日益需要,确保整个分析过程 – – 包括关于数据转换和方法选择的决定 – – 都有文件记录、可获取、可复制和为相关利益攸关方所理解,该平台包括标准化的预处理数据、直观化、机器学习模型培训、评价、解释、结果检查和对无形数据进行模型预测的模块。为了进一步增强研究人员的能力,使其无须接受数据科学的正式培训,以获得有意义和可操作的洞见,Hylix具有一种方便用户的界面,使其能够设计计算实验、结果检查,包括对在综合环境中全部使用语言术语的机器学习决策采用新解释方法。根据麻省理学学院许可证,Helix可通过GitHub和PyPI进入,支持社区驱动的发展,并促进遵守FIR原则。
Article 268
Title@2025-07-23 (3): Confidence Calibration in Vision-Language-Action Models
Title: Confidence Calibration in Vision-Language-Action Models | Vertrauenskalibrierung in Vision-Language-Action-Modelle | 愿景-语言-行动模式中的信任调和 2507.17383v1 |
Authors (2): Thomas P Zollo, Richard Zemel
Trustworthy robot behavior requires not only high levels of task success but also that the robot can reliably quantify how likely it is to succeed. To this end, we present the first systematic study of confidence calibration in vision-language-action (VLA) foundation models, which map visual observations and natural-language instructions to low-level robot motor commands. We begin with extensive benchmarking to understand the critical relationship between task success and calibration error across multiple datasets and VLA variants, finding that task performance and calibration are not in tension. Next, we introduce prompt ensembles for VLAs, a lightweight, Bayesian-inspired algorithm that averages confidence across paraphrased instructions and consistently improves calibration. We further analyze calibration over the task time horizon, showing that confidence is often most reliable after making some progress, suggesting natural points for risk-aware intervention. Finally, we reveal differential miscalibration across action dimensions and propose action-wise Platt scaling, a method to recalibrate each action dimension independently to produce better confidence estimates. Our aim in this study is to begin to develop the tools and conceptual understanding necessary to render VLAs both highly performant and highly trustworthy via reliable uncertainty quantification.
值得信赖的机器人行为不仅要求高层次的任务成功,而且机器人可以可靠地量化它成功的可能性。 为此,我们首次对视觉语言动作基础模型(VLA)中的信任度校准进行了系统化研究,该模型将视觉观测和自然语言指示映射到低级别的机器人发动机指令中。我们首先进行广泛的基准,以了解任务成功与校准错误之间在多个数据集和VLA变量之间的关键关系,发现任务性能和校准没有处于紧张状态。接下来,我们引入了VLAs的快速组合,一个轻量级的、巴耶斯人启发的算法,该算法将信任度平均地置于参数指令中,并不断改进校准。我们进一步分析任务时间范围上的校准,表明在取得某些进展后,信心往往最为可靠,提出了风险觉悟干预的自然点。最后,我们揭示了不同行动层面之间的差异性差,并提出了适合行动的Plat的缩放,一种独立调整每个行动层面的方法,以产生更好的信心估计。我们这项研究的目的是开始开发工具,并通过高度可靠的量化来进行高度可靠的和概念上的量化。
Article 269
Title@2025-07-23 (3): Continual Generalized Category Discovery: Learning and Forgetting from a Bayesian Perspective
Title: Continual Generalized Category Discovery: Learning and Forgetting from a Bayesian Perspective | Continual Generalized Category Discovery: Lernen und Vergessen aus einer bayesischen Perspektive | 发现:从巴伊西亚角度学习和遗忘 2507.17382v1 |
Authors (2): Hao Dai, Jagmohan Chauhan
Continual Generalized Category Discovery (C-GCD) faces a critical challenge: incrementally learning new classes from unlabeled data streams while preserving knowledge of old classes. Existing methods struggle with catastrophic forgetting, especially when unlabeled data mixes known and novel categories. We address this by analyzing C-GCD’s forgetting dynamics through a Bayesian lens, revealing that covariance misalignment between old and new classes drives performance degradation. Building on this insight, we propose Variational Bayes C-GCD (VB-CGCD), a novel framework that integrates variational inference with covariance-aware nearest-class-mean classification. VB-CGCD adaptively aligns class distributions while suppressing pseudo-label noise via stochastic variational updates. Experiments show VB-CGCD surpasses prior art by +15.21% with the overall accuracy in the final session on standard benchmarks. We also introduce a new challenging benchmark with only 10% labeled data and extended online phases, VB-CGCD achieves a 67.86% final accuracy, significantly higher than state-of-the-art (38.55%), demonstrating its robust applicability across diverse scenarios. Code is available at: https://github.com/daihao42/VB-CGCD
持续通用分类发现(C-GCD)面临一个严峻的挑战:在保存旧类知识的同时,从未标签的数据流中逐步学习新课程,同时保留旧类知识。 现有方法在灾难性的遗忘中挣扎, 特别是在未标签的数据混杂已知和小类的情况下。 我们通过Bayesian镜头分析C-GCD的遗忘动态, 揭示旧类和新类之间的共变不一致导致性能退化。 我们在此洞察的基础上, 提出了Variational Bayes C-GCD(VB-CGCD)(VB-CGCD)的新框架, 将变异的推论与近类平均值分类相结合。 VB- CGCD 适应性调整类分布,同时通过随机变异性变异性更新抑制假标签噪音。 实验显示VB- CGCD比先前艺术高出15.21 %。 我们还提出了一个新的具有挑战性的基准,只有10%的标签数据和扩大的在线阶段, VB- CGCD 实现了67.86%的最终精确度。 VB- 在他的州/ 版本中, 展示了比 ASB- bal- basion- basion- avicion- dis
Article 270
Title@2025-07-23 (3): Towards Efficient Generative Large Language Model Serving: A Survey from Algorithms to Systems
Title: Towards Efficient Generative Large Language Model Serving: A Survey from Algorithms to Systems | Auf dem Weg zu einem effizienten generativen großen Sprachmodell: Eine Umfrage von Algorithmen zu Systemen | 实现高效产生大型语文示范服务:从等级到系统的调查 2312.15234v2 |
Authors (7): Xupeng Miao, Gabriele Oliaro, Zhihao Zhang, Xinhao Cheng, Hongyi Jin, Tianqi Chen, Zhihao Jia
In the rapidly evolving landscape of artificial intelligence (AI), generative large language models (LLMs) stand at the forefront, revolutionizing how we interact with our data. However, the computational intensity and memory consumption of deploying these models present substantial challenges in terms of serving efficiency, particularly in scenarios demanding low latency and high throughput. This survey addresses the imperative need for efficient LLM serving methodologies from a machine learning system (MLSys) research perspective, standing at the crux of advanced AI innovations and practical system optimizations. We provide in-depth analysis, covering a spectrum of solutions, ranging from cutting-edge algorithmic modifications to groundbreaking changes in system designs. The survey aims to provide a comprehensive understanding of the current state and future directions in efficient LLM serving, offering valuable insights for researchers and practitioners in overcoming the barriers of effective LLM deployment, thereby reshaping the future of AI.
在迅速变化的人工智能(AI)环境中,基因化的大型语言模型(LLMs)处于最前沿,使我们与数据互动的方式发生了革命性的变化;然而,这些模型的部署的计算强度和记忆消耗在提高效率方面构成了重大挑战,特别是在要求低潜伏和高吞吐量的情景下;这项调查从机器学习系统(MLSys)研究角度探讨了高效的LLMM方法的迫切需要,处于先进的AI创新和实用系统优化的关键位置;我们提供深入分析,涉及一系列解决办法,从尖端的算法修改到系统设计的突破性变化;调查的目的是全面了解高效LLM服务的现状和未来方向,为研究人员和从业人员克服有效部署LLMM的障碍提供宝贵的见解,从而重新塑造AI的未来。
Article 271
Title@2025-07-23 (3): ViRN: Variational Inference and Distribution Trilateration for Long-Tailed Continual Representation Learning
Title: ViRN: Variational Inference and Distribution Trilateration for Long-Tailed Continual Representation Learning | ViRN: Variationale Schlussfolgerung und Verteilung Trilateration für langgestrecktes kontinuierliches Repräsentationslernen | VIRN: 长期旷课的连续代表制学习的变异推理和分布推推力 2507.17368v1 |
Authors (3): Hao Dai, Chong Tang, Jagmohan Chauhan
Continual learning (CL) with long-tailed data distributions remains a critical challenge for real-world AI systems, where models must sequentially adapt to new classes while retaining knowledge of old ones, despite severe class imbalance. Existing methods struggle to balance stability and plasticity, often collapsing under extreme sample scarcity. To address this, we propose ViRN, a novel CL framework that integrates variational inference (VI) with distributional trilateration for robust long-tailed learning. First, we model class-conditional distributions via a Variational Autoencoder to mitigate bias toward head classes. Second, we reconstruct tail-class distributions via Wasserstein distance-based neighborhood retrieval and geometric fusion, enabling sample-efficient alignment of tail-class representations. Evaluated on six long-tailed classification benchmarks, including speech (e.g., rare acoustic events, accents) and image tasks, ViRN achieves a 10.24% average accuracy gain over state-of-the-art methods.
为了解决这个问题,我们提议ViRN,这是一个新颖的CL框架,将差异推导(VI)与分布式三角相结合,以便进行强有力的长尾学习。首先,我们通过变式自动电解码模型模拟类级分布,以减少对头类的偏向。第二,我们通过瓦瑟斯坦远程社区检索和几何聚合重建尾类分布,使尾类代表结构能够进行抽样高效的组合。根据六个长长的分类基准,包括语音(例如稀有的声学事件、口音)和图像任务,ViRN实现了10.24%的平均准确率高于最先进的方法。
Article 272
Title@2025-07-23 (3): Leveraging RAG-LLMs for Urban Mobility Simulation and Analysis
Title: Leveraging RAG-LLMs for Urban Mobility Simulation and Analysis | Nutzung von RAG-LLMs für Simulation und Analyse der urbanen Mobilität | 为城市流动模拟和分析利用RAG-LLMs进行城市流动模拟和分析 2507.10382v2 |
Authors (4): Yue Ding, Conor McCarthy, Kevin O’Shea, Mingming Liu
With the rise of smart mobility and shared e-mobility services, numerous advanced technologies have been applied to this field. Cloud-based traffic simulation solutions have flourished, offering increasingly realistic representations of the evolving mobility landscape. LLMs have emerged as pioneering tools, providing robust support for various applications, including intelligent decision-making, user interaction, and real-time traffic analysis. As user demand for e-mobility continues to grow, delivering comprehensive end-to-end solutions has become crucial. In this paper, we present a cloud-based, LLM-powered shared e-mobility platform, integrated with a mobile application for personalized route recommendations. The optimization module is evaluated based on travel time and cost across different traffic scenarios. Additionally, the LLM-powered RAG framework is evaluated at the schema level for different users, using various evaluation methods. Schema-level RAG with XiYanSQL achieves an average execution accuracy of 0.81 on system operator queries and 0.98 on user queries.
随着智能流动和共享电子流动服务的兴起,在这一领域应用了许多先进技术;云基交通模拟解决方案蓬勃发展,对不断变化的流动格局提供了越来越现实的描述;LLM公司已成为开拓性工具,为各种应用提供了强有力的支持,包括智能决策、用户互动和实时交通分析;随着用户对电子流动的需求继续增长,提供全面的端对端解决方案变得至关重要;在本文件中,我们提出了一个基于云的、LLM公司驱动的共享电子流动平台,与个人化路线建议移动应用程序相结合;优化模块根据不同交通情况之间的旅行时间和成本进行评估;此外,LLM公司驱动的RAG框架在系统一级为不同用户提供评价,使用各种评价方法;Schema公司一级RAG和XYanSQL公司的平均执行准确度为:系统操作员查询0.81,用户查询0.98。
Article 273
Title@2025-07-23 (3): Artificial Intelligence for Green Hydrogen Yield Prediction and Site Suitability using SHAP-Based Composite Index: Focus on Oman
Title: Artificial Intelligence for Green Hydrogen Yield Prediction and Site Suitability using SHAP-Based Composite Index: Focus on Oman | Künstliche Intelligenz für Green Hydrogen Yield Prediction und Site Suitability mit SHAP-Based Composite Index: Fokus auf Oman | 利用以SHAP为基础的综合综合指数,对绿色氢氢氢氢、年产量预测和场地适用性进行人工智能:阿曼 2507.14219v2 |
Authors (2): Obumneme Zimuzor Nwafor, Mohammed Abdul Majeed Al Hooti
As nations seek sustainable alternatives to fossil fuels, green hydrogen has emerged as a promising strategic pathway toward decarbonisation, particularly in solar-rich arid regions. However, identifying optimal locations for hydrogen production requires the integration of complex environmental, atmospheric, and infrastructural factors, often compounded by limited availability of direct hydrogen yield data. This study presents a novel Artificial Intelligence (AI) framework for computing green hydrogen yield and site suitability index using mean absolute SHAP (SHapley Additive exPlanations) values. This framework consists of a multi-stage pipeline of unsupervised multi-variable clustering, supervised machine learning classifier and SHAP algorithm. The pipeline trains on an integrated meteorological, topographic and temporal dataset and the results revealed distinct spatial patterns of suitability and relative influence of the variables. With model predictive accuracy of 98%, the result also showed that water proximity, elevation and seasonal variation are the most influential factors determining green hydrogen site suitability in Oman with mean absolute shap values of 2.470891, 2.376296 and 1.273216 respectively. Given limited or absence of ground-truth yield data in many countries that have green hydrogen prospects and ambitions, this study offers an objective and reproducible alternative to subjective expert weightings, thus allowing the data to speak for itself and potentially discover novel latent groupings without pre-imposed assumptions. This study offers industry stakeholders and policymakers a replicable and scalable tool for green hydrogen infrastructure planning and other decision making in data-scarce regions.
随着各国寻求化石燃料的可持续替代物,绿色氢气已成为实现去碳化的一个有希望的战略途径,特别是在太阳能丰富的干旱地区;然而,确定氢生产的最佳地点需要综合复杂的环境、大气和基础设施因素,往往由于直接氢产出数据有限而加剧。本研究报告提出了一个新的人工智能(AI)框架,用于计算绿色氢产量和场地适合性指数,使用平均绝对SHAPLY Additive(SHapley Additives Exposation)值计算绿色氢的绿色氢产量和场地适合性指标。这一框架包括一个多阶段的管道,由不可监督的多变组、受监督的机器学习分类和SHAP算法组成。关于综合气象、地形和时间数据集的管道列车列列列列列列列列列列列列列,结果显示各种变量的适合性和相对影响的空间模式不同。模型准确性为98%,结果还表明,水的接近性、高度和季节性变化是决定阿曼绿色氢站点是否适合、绝对稀释值分别为2.470891、2.376296和1.23216。由于许多国家拥有绿色氢前景和时间段的地面收益的替代数据数据数据,因此,本项决策者和雄心研究为可能将提出一个目标和正值分析工具,因此,因此,本研究提供了一个目标和方向和正向更能分析工具,从而提出一个潜在的数据基础基础,从而有可能提出一个潜在数据分析工具。
Article 274
Title@2025-07-23 (3): DynaSearcher: Dynamic Knowledge Graph Augmented Search Agent via Multi-Reward Reinforcement Learning
Title: DynaSearcher: Dynamic Knowledge Graph Augmented Search Agent via Multi-Reward Reinforcement Learning | DynaSearcher: Dynamic Knowledge Graph Augmented Search Agent via Multi-Reward Verstärkungslernen | DynaSearcher:通过多奖励强化学习增加搜索代理 2507.17365v1 |
Authors (4): Chuzhan Hao, Wenfeng Feng, Yuewei Zhang, Hao Wang
Multi-step agentic retrieval systems based on large language models (LLMs) have demonstrated remarkable performance in complex information search tasks. However, these systems still face significant challenges in practical applications, particularly in generating factually inconsistent intermediate queries and inefficient search trajectories, which can lead to reasoning deviations or redundant computations. To address these issues, we propose DynaSearcher, an innovative search agent enhanced by dynamic knowledge graphs and multi-reward reinforcement learning (RL). Specifically, our system leverages knowledge graphs as external structured knowledge to guide the search process by explicitly modeling entity relationships, thereby ensuring factual consistency in intermediate queries and mitigating biases from irrelevant information. Furthermore, we employ a multi-reward RL framework for fine-grained control over training objectives such as retrieval accuracy, efficiency, and response quality. This framework promotes the generation of high-quality intermediate queries and comprehensive final answers, while discouraging unnecessary exploration and minimizing information omissions or redundancy. Experimental results demonstrate that our approach achieves state-of-the-art answer accuracy on six multi-hop question answering datasets, matching frontier LLMs while using only small-scale models and limited computational resources. Furthermore, our approach demonstrates strong generalization and robustness across diverse retrieval environments and larger-scale models, highlighting its broad applicability.
以大型语言模型为基础的多步代理检索系统在复杂的信息搜索任务方面表现显著,然而,这些系统在实际应用方面仍面临重大挑战,特别是在产生事实上不一致的中间查询和低效率的搜索轨迹方面,这可能导致偏差或重复计算。为了解决这些问题,我们提议DynaSearcher,这是一个创新的搜索代理机构,由动态知识图和多回报强化学习所强化的动态知识图和多回报强化学习(RL)。具体地说,我们的系统利用知识图作为外部结构化知识来指导搜索进程,方法是明确建立实体关系模型,从而确保中间查询的实际一致性和减少不相关信息的偏差。此外,我们采用多奖励性RL框架,对检索准确性、效率和响应质量等培训目标进行精细的精细控制。这个框架促进产生高质量的中间查询和全面的最后答案,同时阻止不必要的探索和尽量减少信息遗漏或冗余。实验结果表明,我们的方法在六个多窗口解答数据集时达到了最新答案的准确性,同时匹配前沿LMSMS,同时仅使用小规模模型和范围较强的回收性强的模型。
Article 275
Title@2025-07-23 (3): Adaptive Repetition for Mitigating Position Bias in LLM-Based Ranking
Title: Adaptive Repetition for Mitigating Position Bias in LLM-Based Ranking | Adaptive Wiederholung für die Abmilderung von Positions-Bias im LLM-basierten Ranking | 以LLM为基础的排名中减轻职位偏见的适应性重复 2507.17788v1 |
Authors (4): Ali Vardasbi, Gustavo Penha, Claudia Hauff, Hugues Bouchard
When using LLMs to rank items based on given criteria, or evaluate answers, the order of candidate items can influence the model’s final decision. This sensitivity to item positioning in a LLM’s prompt is known as position bias. Prior research shows that this bias exists even in large models, though its severity varies across models and tasks. In addition to position bias, LLMs also exhibit varying degrees of low repetition consistency, where repeating the LLM call with the same candidate ordering can lead to different rankings. To address both inconsistencies, a common approach is to prompt the model multiple times with different candidate orderings and aggregate the results via majority voting. However, this repetition strategy, significantly increases computational costs. Extending prior findings, we observe that both the direction – favoring either the earlier or later candidate in the prompt – and magnitude of position bias across instances vary substantially, even within a single dataset. This observation highlights the need for a per-instance mitigation strategy. To this end, we introduce a dynamic early-stopping method that adaptively determines the number of repetitions required for each instance. Evaluating our approach across three LLMs of varying sizes and on two tasks, namely re-ranking and alignment, we demonstrate that transitioning to a dynamic repetition strategy reduces the number of LLM calls by an average of 81%, while preserving the accuracy. Furthermore, we propose a confidence-based adaptation to our early-stopping method, reducing LLM calls by an average of 87% compared to static repetition, with only a slight accuracy trade-off relative to our original early-stopping method.
当使用LLMS根据特定标准对项目进行排序或评价答案时,候选人项目顺序可以影响模型的最终决定。这种对项目定位在LLM的迅速度的敏感性被称为位置偏差。先前的研究显示,这种偏差存在于大型模型中,尽管其严重程度因模型和任务而异。除了偏差外,LLMS还表现出不同程度的静态一致性,重复LM的LM呼叫与同一位候选人的订单可能导致不同的排名。为了解决这两个不一致之处,一个共同的方法是促使模型多次重复,不同候选人订购,并通过多数投票汇总结果。然而,这种重复战略极大地增加了计算成本。扩大先前的调查结果,我们观察到这种偏差存在于大模型中,既有利于更早或更晚的候选人,也存在于不同的模型中。除了定位偏差外,LM还表现出不同程度的偏差,甚至在单一数据集中,这种偏差也表现出不同程度的重复性。为此,我们引入了一种动态的早期停止方法,即适应性地决定每个案例需要重复的次数。我们评价了三个LM的相对不同规模和两种不同程度的计算成本。我们评估了一种动态的比较的方法,也就是说的调整了我们要求的顺序,一个动态的顺序,而缩小了一种调整的方法是排序的顺序的顺序的顺序的顺序的顺序的顺序。
Article 276
Title@2025-07-23 (3): Monitoring digestate application on agricultural crops using Sentinel-2 Satellite imagery
Title: Monitoring digestate application on agricultural crops using Sentinel-2 Satellite imagery | Überwachung der Gärung auf landwirtschaftlichen Nutzpflanzen mit Sentinel-2 Satellitenbildern | 利用Sentinel-2卫星图像对农作物施用监测消化法 2504.19996v2 |
Authors (5): Andreas Kalogeras, Dimitrios Bormpoudakis, Iason Tsardanidis, Dimitra A. Loka, Charalampos Kontoes
The widespread use of Exogenous Organic Matter in agriculture necessitates monitoring to assess its effects on soil and crop health. This study evaluates optical Sentinel-2 satellite imagery for detecting digestate application, a practice that enhances soil fertility but poses environmental risks like microplastic contamination and nitrogen losses. In the first instance, Sentinel-2 satellite image time series (SITS) analysis of specific indices (EOMI, NDVI, EVI) was used to characterize EOM’s spectral behavior after application on the soils of four different crop types in Thessaly, Greece. Furthermore, Machine Learning (ML) models (namely Random Forest, k-NN, Gradient Boosting and a Feed-Forward Neural Network), were used to investigate digestate presence detection, achieving F1-scores up to 0.85. The findings highlight the potential of combining remote sensing and ML for scalable and cost-effective monitoring of EOM applications, supporting precision agriculture and sustainability.
由于农业广泛使用外生有机物质,因此有必要进行监测,以评估其对土壤和作物健康的影响。这项研究评估了光学哨点2卫星图像,以探测消化剂应用,这是一种提高土壤肥力的做法,但会给环境带来风险,如微塑料污染和氮流失。首先,Sentinel-2卫星图像时间序列(SITS)对具体指数(EOM、NDVI、EVI)的分析,用于在希腊Thesaly四个不同作物类型土壤应用后确定EMOM光谱行为的特点。此外,机器学习模型(即随机森林、k-NNN、梯子推动和进源向神经网络)被用来调查消化物存在探测,达到0.85的F1点。研究结果突出表明,遥感和ML合在一起,有可能对EOM应用进行可扩展和成本效益高的监测,支持精确的农业和可持续性。
Article 277
Title@2025-07-23 (3): Hyperbolic Deep Learning for Foundation Models: A Survey
Title: Hyperbolic Deep Learning for Foundation Models: A Survey | Hyperbolisches Deep Learning für Gründungsmodelle: Eine Umfrage | 用于基础模型的超双曲深修:调查 2507.17787v1 |
Authors (5): Neil He, Hiren Madhu, Ngoc Bui, Menglin Yang, Rex Ying
Foundation models pre-trained on massive datasets, including large language models (LLMs), vision-language models (VLMs), and large multimodal models, have demonstrated remarkable success in diverse downstream tasks. However, recent studies have shown fundamental limitations of these models: (1) limited representational capacity, (2) lower adaptability, and (3) diminishing scalability. These shortcomings raise a critical question: is Euclidean geometry truly the optimal inductive bias for all foundation models, or could incorporating alternative geometric spaces enable models to better align with the intrinsic structure of real-world data and improve reasoning processes? Hyperbolic spaces, a class of non-Euclidean manifolds characterized by exponential volume growth with respect to distance, offer a mathematically grounded solution. These spaces enable low-distortion embeddings of hierarchical structures (e.g., trees, taxonomies) and power-law distributions with substantially fewer dimensions compared to Euclidean counterparts. Recent advances have leveraged these properties to enhance foundation models, including improving LLMs’ complex reasoning ability, VLMs’ zero-shot generalization, and cross-modal semantic alignment, while maintaining parameter efficiency. This paper provides a comprehensive review of hyperbolic neural networks and their recent development for foundation models. We further outline key challenges and research directions to advance the field.
在大规模数据集,包括大型语言模型(LLMS)、愿景语言模型(VLMS)和大型多式联运模型(VLMS)方面经过预先培训的基础模型,在大规模数据集方面,包括大型语言模型(LLMS)、视觉语言模型(VLMS)和大型多式联运模型,都显示在各种下游任务中取得了显著的成功;然而,最近的研究表明,这些模型存在根本性的局限性:(1) 代表性能力有限,(2) 适应能力较低,(3) 缩放能力降低,(3) 缩放能力降低。这些缺陷提出了一个关键问题:欧几何地测量真正是所有基础模型的最佳缩影偏差,或者可以纳入替代几何几何空间,使模型能够更好地与真实世界数据的内在结构保持一致,并改进推理过程? 超双曲空间,非欧clidea 的多元方块块块,其特点是在距离方面快速增长,提供了一种基于数学的解决方案的解决办法。这些空间使得等级结构(例如树木、分类等)和权力法系分布得以低扭曲,其尺寸比Ecloidean-law分布得到大大的分布。
Article 278
Title@2025-07-23 (3): DeCo-SGD: Joint Optimization of Delay Staleness and Gradient Compression Ratio for Distributed SGD
Title: DeCo-SGD: Joint Optimization of Delay Staleness and Gradient Compression Ratio for Distributed SGD | DeCo-SGD: Gemeinsame Optimierung der Verzögerungsstabilität und des Gradienten-Kompressions-Verhältnisses für verteilte SGD | DeCo-SGD: 分配的SGD延迟滞缓和逐步压缩比率联合优化 2507.17346v1 |
Authors (7): Rongwei Lu, Jingyan Jiang, Chunyang Li, Haotian Dong, Xingguang Wei, Delin Cai, Zhi Wang
Distributed machine learning in high end-to-end latency and low, varying bandwidth network environments undergoes severe throughput degradation. Due to its low communication requirements, distributed SGD (D-SGD) remains the mainstream optimizer in such challenging networks, but it still suffers from significant throughput reduction. To mitigate these limitations, existing approaches typically employ gradient compression and delayed aggregation to alleviate low bandwidth and high latency, respectively. To address both challenges simultaneously, these strategies are often combined, introducing a complex three-way trade-off among compression ratio, staleness (delayed synchronization steps), and model convergence rate. To achieve the balance under varying bandwidth conditions, an adaptive policy is required to dynamically adjust these parameters. Unfortunately, existing works rely on static heuristic strategies due to the lack of theoretical guidance, which prevents them from achieving this goal. This study fills in this theoretical gap by introducing a new theoretical tool, decomposing the joint optimization problem into a traditional convergence rate analysis with multiple analyzable noise terms. We are the first to reveal that staleness exponentially amplifies the negative impact of gradient compression on training performance, filling a critical gap in understanding how compressed and delayed gradients affect training. Furthermore, by integrating the convergence rate with a network-aware time minimization condition, we propose DeCo-SGD, which dynamically adjusts the compression ratio and staleness based on the real-time network condition and training task. DeCo-SGD achieves up to 5.07 and 1.37 speed-ups over D-SGD and static strategy in high-latency and low, varying bandwidth networks, respectively.
在高端到端的延迟度和低低端的带宽网络环境中,分散的机器学习在高端到端的延迟度和低端的带宽网络环境中发生严重的吞化退化。由于通信需求低,分布式的SGD(D-SGD)仍然是这些具有挑战性的网络的主流优化者,但这种优化仍然受到显著的减少。为缓解这些限制,现有方法通常采用梯度压缩和延迟汇总,以缓解低带宽和高延缓度。为了同时应对这两个挑战,这些战略往往同时结合,在压缩比率、低延迟同步步骤(延迟同步步骤)和模式趋同率之间实行复杂的三向交易。为了在不同的带宽条件下实现平衡,需要制定适应性政策来动态调整这些参数。不幸的是,由于缺乏理论指导,现有工作依赖于静态的超速战略,因此无法实现这一目标。这一研究通过引入新的理论工具,将联合优化问题转化为传统的趋同率分析,同时采用多种可解的噪音,我们首先发现,渐变的加速度将梯度递增速度战略放大了对培训网络的负面影响。
Article 279
Title@2025-07-23 (3): Reinforcement Learning for Accelerated Aerodynamic Shape Optimisation
Title: Reinforcement Learning for Accelerated Aerodynamic Shape Optimisation | Verstärktes Lernen zur beschleunigten aerodynamischen Formoptimierung | 加速空气动力元件优化强化学习 2507.17786v1 |
Authors (7): Florian Sobieczky, Alfredo Lopez, Erika Dudkin, Christopher Lackner, Matthias Hochsteger, Bernhard Scheichl, Helmut Sobieczky
We introduce a reinforcement learning (RL) based adaptive optimization algorithm for aerodynamic shape optimization focused on dimensionality reduction. The form in which RL is applied here is that of a surrogate-based, actor-critic policy evaluation MCMC approach allowing for temporal ‘freezing’ of some of the parameters to be optimized. The goals are to minimize computational effort, and to use the observed optimization results for interpretation of the discovered extrema in terms of their role in achieving the desired flow-field. By a sequence of local optimized parameter changes around intermediate CFD simulations acting as ground truth, it is possible to speed up the global optimization if (a) the local neighbourhoods of the parameters in which the changed parameters must reside are sufficiently large to compete with the grid-sized steps and its large number of simulations, and (b) the estimates of the rewards and costs on these neighbourhoods necessary for a good step-wise parameter adaption are sufficiently accurate. We give an example of a simple fluid-dynamical problem on which the method allows interpretation in the sense of a feature importance scoring.
我们引入了基于强化学习(RL)的适应性优化空气动力元件优化适应性优化算法,该算法以降低维度为重点。这里应用RL的形式是替代基于行为体-批评政策评价的MCMC方法,允许对一些参数进行时间“冻结”以优化某些参数。目标是最大限度地减少计算努力,并使用观测到的优化结果来解释所发现的外形在实现理想流程领域中的作用。通过围绕中间 CFD模拟作为地面真理的一系列局部优化参数变化,如果(a) 改变参数所处参数的本地环境足以与网格大小步骤及其大量模拟进行竞争,以及(b) 对这些居民区的奖励和成本的估算是足够准确的,我们举了一个简单的流体动力问题为例,该方法允许对特征重要性评分进行解释。
Article 280
Title@2025-07-23 (3): Principled Multimodal Representation Learning
Title: Principled Multimodal Representation Learning | Grundsatz des multimodalen Repräsentationslernens | 注重原则的多模式代表制学习 2507.17343v1 |
Authors (4): Xiaohao Liu, Xiaobo Xia, See-Kiong Ng, Tat-Seng Chua
Multimodal representation learning seeks to create a unified representation space by integrating diverse data modalities to improve multimodal understanding. Traditional methods often depend on pairwise contrastive learning, which relies on a predefined anchor modality, restricting alignment across all modalities. Recent advances have investigated the simultaneous alignment of multiple modalities, yet several challenges remain, such as limitations imposed by fixed anchor points and instability arising from optimizing the product of singular values. To address the challenges, in this paper, we propose Principled Multimodal Representation Learning (PMRL), a novel framework that achieves simultaneous alignment of multiple modalities without anchor dependency in a more stable manner. Specifically, grounded in the theoretical insight that full alignment corresponds to a rank-1 Gram matrix, PMRL optimizes the dominant singular value of the representation matrix to align modalities along a shared leading direction. We propose a softmax-based loss function that treats singular values as logits to prioritize the largest singular value. Besides, instance-wise contrastive regularization on the leading eigenvectors maintains inter-instance separability and prevents representation collapse. Extensive experiments across diverse tasks demonstrate PMRL’s superiority compared to baseline methods. The source code will be publicly available.
多模式代表性学习力求通过整合多种数据模式来创造统一的代表性空间,以提高多式联运的理解。传统方法往往依赖于双向对比学习,这种学习依赖于预先确定的固定模式,限制所有模式的一致。最近的进展调查了多种模式同时一致的问题,但仍然存在若干挑战,例如固定固定固定固定固定固定点的限制和因优化单值产品而产生的不稳定。为了应对挑战,我们在本文件中提议了原则化的多模式代表性学习(PMRL),这是一个新颖的框架,它以更稳定的方式实现多种模式同步一致,而没有固定依赖性。具体地说,基于与一级Gram矩阵完全一致的理论洞察力,PMRL优化了代表矩阵的主要单一价值,以便按照共同的主导方向调整模式。我们提议了一个基于软式损失功能,将单值视为对最大单值的逻辑,以便优先考虑最大单值。此外,对主要非基因学家的对比性规范也保持了内部的分离性,防止代表的崩溃。在各种任务中进行广泛的实验,表明PMRL的优势与基线方法相比,源代码将公开提供。
Article 281
Title@2025-07-23 (3): Self-similarity Analysis in Deep Neural Networks
Title: Self-similarity Analysis in Deep Neural Networks | Selbstähnlichkeitsanalyse in tiefen neuralen Netzwerken | 深神经网络中的自我差异分析 2507.17785v1 |
Authors (7): Jingyi Ding, Chengwen Qi, Hongfei Wang, Jianshe Wu, Licheng Jiao, Yuwei Guo, Jian Gao
Current research has found that some deep neural networks exhibit strong hierarchical self-similarity in feature representation or parameter distribution. However, aside from preliminary studies on how the power-law distribution of weights across different training stages affects model performance,there has been no quantitative analysis on how the self-similarity of hidden space geometry influences model weight optimization, nor is there a clear understanding of the dynamic behavior of internal neurons. Therefore, this paper proposes a complex network modeling method based on the output features of hidden-layer neurons to investigate the self-similarity of feature networks constructed at different hidden layers, and analyzes how adjusting the degree of self-similarity in feature networks can enhance the classification performance of deep neural networks. Validated on three types of networks MLP architectures, convolutional networks, and attention architectures this study reveals that the degree of self-similarity exhibited by feature networks varies across different model architectures. Furthermore, embedding constraints on the self-similarity of feature networks during the training process can improve the performance of self-similar deep neural networks (MLP architectures and attention architectures) by up to 6 percentage points.
目前的研究发现,一些深层神经网络在特征表现或参数分布方面表现出高度等级的自差性;然而,除了对不同培训阶段重量的分权法如何影响模型性能的初步研究之外,对于隐藏空间几何的自异性如何影响模型重力优化,没有进行定量分析,也没有对内部神经元的动态行为有明确的了解;因此,本文件建议采用一种复杂的网络建模方法,以隐藏层神经元的输出特征为基础,调查在不同隐藏层建造的特征网络的自异性,并分析调整特征网络的自异性能如何提高深层神经网络的分类性能。这项研究对三种类型的网络MLP结构、动态网络和关注结构进行了验证,表明不同模型结构中特征网络展示的自异性程度各不相同。 此外,在培训过程中对特征网络的自异性在深度神经网络(MLP结构和关注结构)的性能提升到6个百分点。
Article 282
Title@2025-07-23 (3): Optimizing Privacy-Utility Trade-off in Decentralized Learning with Generalized Correlated Noise
Title: Optimizing Privacy-Utility Trade-off in Decentralized Learning with Generalized Correlated Noise | Optimierung der Privatsphäre-Utility-Trade-off im dezentralisierten Lernen mit generalisierter korrelierter Geräuschentwicklung | 与普遍相关联的噪音优化分散化学习中的隐私-公用事业交易 2501.14644v2 |
Authors (3): Angelo Rodio, Zheng Chen, Erik G. Larsson
Decentralized learning enables distributed agents to collaboratively train a shared machine learning model without a central server, through local computation and peer-to-peer communication. Although each agent retains its dataset locally, sharing local models can still expose private information about the local training datasets to adversaries. To mitigate privacy attacks, a common strategy is to inject random artificial noise at each agent before exchanging local models between neighbors. However, this often leads to utility degradation due to the negative effects of cumulated artificial noise on the learning algorithm. In this work, we introduce CorN-DSGD, a novel covariance-based framework for generating correlated privacy noise across agents, which unifies several state-of-the-art methods as special cases. By leveraging network topology and mixing weights, CorN-DSGD optimizes the noise covariance to achieve network-wide noise cancellation. Experimental results show that CorN-DSGD cancels more noise than existing pairwise correlation schemes, improving model performance under formal privacy guarantees.
分散化学习使分布式代理商能够通过本地计算和同侪通信,合作培训一个没有中央服务器的共享机器学习模式。虽然每个代理商保留了本地数据集,但共享本地模型仍能向对手披露关于本地培训数据集的私人信息。为了减少隐私攻击,一个共同战略是在邻居之间交换本地模型之前,向每个代理商注入随机人为噪音。然而,这往往导致公用事业退化,因为累积式人工噪音对学习算法产生了负面影响。在这项工作中,我们引入了CORN-DSGD,这是一个基于新颖的共变换式框架,用于在代理商之间产生相关的隐私噪音,将几种最先进的方法统一为特例。通过利用网络表层学和混合重量,CORN-DSGD优化了噪音共变数,以实现全网络的噪音取消。实验结果表明,CORN-DSGD取消的噪音多于现有的双向关联计划,在正式的隐私保障下改进模型性能。
Article 283
Title@2025-07-23 (3): A Learning-based Domain Decomposition Method
Title: A Learning-based Domain Decomposition Method | Eine lernbasierte Methode der Domänenzersetzung | 以学习为基础的域分解方法 2507.17328v1 |
Authors (3): Rui Wu, Nikola Kovachki, Burigede Liu
Recent developments in mechanical, aerospace, and structural engineering have driven a growing need for efficient ways to model and analyse structures at much larger and more complex scales than before. While established numerical methods like the Finite Element Method remain reliable, they often struggle with computational cost and scalability when dealing with large and geometrically intricate problems. In recent years, neural network-based methods have shown promise because of their ability to efficiently approximate nonlinear mappings. However, most existing neural approaches are still largely limited to simple domains, which makes it difficult to apply to real-world PDEs involving complex geometries. In this paper, we propose a learning-based domain decomposition method (L-DDM) that addresses this gap. Our approach uses a single, pre-trained neural operator-originally trained on simple domains-as a surrogate model within a domain decomposition scheme, allowing us to tackle large and complicated domains efficiently. We provide a general theoretical result on the existence of neural operator approximations in the context of domain decomposition solution of abstract PDEs. We then demonstrate our method by accurately approximating solutions to elliptic PDEs with discontinuous microstructures in complex geometries, using a physics-pretrained neural operator (PPNO). Our results show that this approach not only outperforms current state-of-the-art methods on these challenging problems, but also offers resolution-invariance and strong generalization to microstructural patterns unseen during training.
机械、航空航天和结构工程的近期发展促使人们日益需要高效的方法,以比以前更大规模和更加复杂的规模来模拟和分析结构。虽然像“精度元素法”这样的既定数字方法仍然可靠,但在处理大规模和几何复杂的问题时往往会与计算成本和可缩缩进性挣扎。近年来,以神经网络为基础的方法显示出了希望,因为它们能够有效地接近非线性绘图。然而,大多数现有的神经方法仍然基本上局限于简单的领域,因此难以适用于涉及复杂地貌的真实世界PDE。在本文中,我们提出了一种基于学习的域分解法(L-DDM),以弥补这一差距。我们的方法使用一个单一的、预先训练的神经操作员,在简单领域解析计划内,作为一种隐蔽模型,使我们能够有效地处理大而复杂的领域。我们现有的神经操作员近似于抽象PDES的域分解定位解决方案。我们随后展示了我们的方法,通过精确的稳健的域分解方法,而不是在复杂的地球物理结构中展示了我们目前的方法。
Article 284
Title@2025-07-23 (3): RIS-aided Latent Space Alignment for Semantic Channel Equalization
Title: RIS-aided Latent Space Alignment for Semantic Channel Equalization | RIS-gestützte Latent Space Alignment für semantische Kanalausgleich | RIS援助的静语频道平准空间对齐 2507.16450v2 |
Authors (5): Tomás Hüttebräucker, Mario Edoardo Pandolfo, Simone Fiorellino, Emilio Calvanese Strinati, Paolo Di Lorenzo
Semantic communication systems introduce a new paradigm in wireless communications, focusing on transmitting the intended meaning rather than ensuring strict bit-level accuracy. These systems often rely on Deep Neural Networks (DNNs) to learn and encode meaning directly from data, enabling more efficient communication. However, in multi-user settings where interacting agents are trained independently-without shared context or joint optimization-divergent latent representations across AI-native devices can lead to semantic mismatches, impeding mutual understanding even in the absence of traditional transmission errors. In this work, we address semantic mismatch in Multiple-Input Multiple-Output (MIMO) channels by proposing a joint physical and semantic channel equalization framework that leverages the presence of Reconfigurable Intelligent Surfaces (RIS). The semantic equalization is implemented as a sequence of transformations: (i) a pre-equalization stage at the transmitter; (ii) propagation through the RIS-aided channel; and (iii) a post-equalization stage at the receiver. We formulate the problem as a constrained Minimum Mean Squared Error (MMSE) optimization and propose two solutions: (i) a linear semantic equalization chain, and (ii) a non-linear DNN-based semantic equalizer. Both methods are designed to operate under semantic compression in the latent space and adhere to transmit power constraints. Through extensive evaluations, we show that the proposed joint equalization strategies consistently outperform conventional, disjoint approaches to physical and semantic channel equalization across a broad range of scenarios and wireless channel conditions.
在无线通信中,语义通信系统引入了一个新的范式,重点是传达预想的含义,而不是确保严格的比位级准确性。这些系统往往依靠深神经网络直接学习和编码数据的含义,从而能够提高通信效率。然而,在多用户环境中,互动代理器在没有共享环境或全新设备之间联合优化和分散的潜在表现的情况下接受独立培训,可能导致语义错配,即使在没有传统传输错误的情况下,也妨碍相互理解。在这项工作中,我们通过提出一个联合物理和语义频道平准框架,利用重新配置的智能表面的存在,解决多输入多输出(MIIMO)渠道的语义错配问题。语义平准作为一系列变异过程加以实施:(一) 发射机的不平等前阶段;(二) 通过RIS辅助频道传播;和(三) 在接收器中,基于后平权化的阶段,我们将问题描述为受限制的最低平面平面平面频道(MMSE)平面频道,优化并提出两种解决方案:(i) 以平面平面平面平面平面平面平面的平面策略,在我们设计的平面平面平面平面平面平面平面平面上,在平面平面平面平面平面上展示一个平坦的平坦的平坦的平坦的平面结构,在平面上展示一个平坦的平坦的平坦的平面上,在平面上,在平坦的平坦的平坦的平坦的平坦的平坦的平坦的平坦的平坦的平坦的平直径。
Article 285
Title@2025-07-23 (3): Towards Detecting Persuasion on Social Media: From Model Development to Insights on Persuasion Strategies
Title: Towards Detecting Persuasion on Social Media: From Model Development to Insights on Persuasion Strategies | Auf dem Weg zur Erkennbarkeit von Überzeugungen in sozialen Medien: Von der Modellentwicklung zu Erkenntnissen über Überzeugungsstrategien | 探索社会媒体的观察:从示范发展到观察社会媒体的观察 2503.13844v2 |
Authors (6): Elyas Meguellati, Stefano Civelli, Pietro Bernardelle, Shazia Sadiq, Irwin King, Gianluca Demartini
Political advertising plays a pivotal role in shaping public opinion and influencing electoral outcomes, often through subtle persuasive techniques embedded in broader propaganda strategies. Detecting these persuasive elements is crucial for enhancing voter awareness and ensuring transparency in democratic processes. This paper presents an integrated approach that bridges model development and real-world application through two interconnected studies. First, we introduce a lightweight model for persuasive text detection that achieves state-of-the-art performance in Subtask 3 of SemEval 2023 Task 3 while requiring significantly fewer computational resources and training data than existing methods. Second, we demonstrate the model’s practical utility by collecting the Australian Federal Election 2022 Facebook Ads (APA22) dataset, partially annotating a subset for persuasion, and fine-tuning the model to adapt from mainstream news to social media content. We then apply the fine-tuned model to label the remainder of the APA22 dataset, revealing distinct patterns in how political campaigns leverage persuasion through different funding strategies, word choices, demographic targeting, and temporal shifts in persuasion intensity as election day approaches. Our findings not only underscore the necessity of domain-specific modeling for analyzing persuasion on social media but also show how uncovering these strategies can enhance transparency, inform voters, and promote accountability in digital campaigns.
政治广告在形成公众舆论和影响选举结果方面起着关键作用,通常是通过在更广泛的宣传战略中嵌入的微妙的说服技巧。发现这些有说服力的因素对于提高选民意识和确保民主进程的透明度至关重要。本文件介绍了一种综合方法,通过两个相互关联的研究将模式发展和现实世界的应用连接起来。首先,我们引入了一种轻量化的说服性文本检测模型,在SemEval 2023任务3的Subtask 3中实现最先进的表现,同时需要比现有方法少得多的计算资源和培训数据。第二,我们通过收集2022年澳大利亚联邦选举的Facebook广告数据集(APA22),部分说明说服因素,并微调模型,从主流新闻到社会媒体内容的适应。然后我们采用微调模型,将APA22数据集的其余部分贴上标签,揭示政治运动如何通过不同的筹资战略、文字选择、人口目标以及选举日方法的瞬间转变说服力等不同模式。我们的结论不仅突出了分析社会媒体说服力的域模型的必要性,而且还展示了如何提高透明度。
Article 286
Title@2025-07-23 (3): Nearly Minimax Discrete Distribution Estimation in Kullback-Leibler Divergence with High Probability
Title: Nearly Minimax Discrete Distribution Estimation in Kullback-Leibler Divergence with High Probability | Fast Minimax Diskrete Distribution Schätzung in Kullback-Leibler Divergenz mit hoher Wahrscheinlichkeit | Kullback- Leibler 高概率差异中近微小马克分解分布估计值 2507.17316v1 |
Authors (3): Dirk van der Hoeven, Julia Olkhovskaia, Tim van Erven
We consider the problem of estimating a discrete distribution $p$ with support of size $K$ and provide both upper and lower bounds with high probability in KL divergence. We prove that in the worst case, for any estimator $\widehat{p}$, with probability at least $\delta$, $\text{KL}(p | \widehat{p}) \geq C\max{K,\ln(K)\ln(1/\delta) }/n $, where $n$ is the sample size and $C > 0$ is a constant. We introduce a computationally efficient estimator $p^{\text{OTB}}$, based on Online to Batch conversion and suffix averaging, and show that with probability at least $1 - \delta$ $\text{KL}(p | \widehat{p}) \leq C(K\log(\log(K)) + \ln(K)\ln(1/\delta)) /n$. Furthermore, we also show that with sufficiently many observations relative to $\log(1/\delta)$, the maximum likelihood estimator $\bar{p}$ guarantees that with probability at least $1-\delta$ $$ 1/6 \chi^2(\bar{p}|p) \leq 1/4 \chi^2(p|\bar{p}) \leq \text{KL}(p | \bar{p}) \leq C(K + \log(1/\delta))/n\,, $$ where $\chi^2$ denotes the $\chi^2$-divergence. |
我们考虑在支持规模为2K美元的情况下估算离散分配 $p 的问题 。 我们证明, 在最坏的情况下, 对于任何估算值$\ 全域哈特{p} 美元, 概率至少为$delta$, $\ text{K} (p\\\ 全域哈特{p} $\\ maxk{K,\ ln( K)\ ln( br) { / delta 美元, 其中美元是样本规模, $C > 美元是一个不变的上限 。 我们根据在线到 Batch 转换和 ffix 平均, 引入一个计算效率高效的估算值 $ p> text{ OTB} 美元, 并显示, 概率至少为 1 -\ delta$\ k\ c} (p)\ 全域哈特}\ c (K\ \ rlog( k) + ln( K) 美元 (K) 和 美元 ( 美元= delta) 美元 美元( 美元) 。 此外, 我们还显示, 许多观测到最高的可能性是 $ $ 。
Article 287
Title@2025-07-23 (3): Confounded Causal Imitation Learning with Instrumental Variables
Title: Confounded Causal Imitation Learning with Instrumental Variables | Konfounded Causal Imitation Learning with Instrumental Variables | 带有乐器变量的因果模仿学习 2507.17309v1 |
Authors (6): Yan Zeng, Shenglan Nie, Feng Xie, Libo Huang, Peng Wu, Zhi Geng
Imitation learning from demonstrations usually suffers from the confounding effects of unmeasured variables (i.e., unmeasured confounders) on the states and actions. If ignoring them, a biased estimation of the policy would be entailed. To break up this confounding gap, in this paper, we take the best of the strong power of instrumental variables (IV) and propose a Confounded Causal Imitation Learning (C2L) model. This model accommodates confounders that influence actions across multiple timesteps, rather than being restricted to immediate temporal dependencies. We develop a two-stage imitation learning framework for valid IV identification and policy optimization. In particular, in the first stage, we construct a testing criterion based on the defined pseudo-variable, with which we achieve identifying a valid IV for the C2L models. Such a criterion entails the sufficient and necessary identifiability conditions for IV validity. In the second stage, with the identified IV, we propose two candidate policy learning approaches: one is based on a simulator, while the other is offline. Extensive experiments verified the effectiveness of identifying the valid IV as well as learning the policy.
从演示中吸取的模拟学习通常会受到无法计量的变量(即无法计量的混乱者)对州和行动的影响。如果忽视这些变量(即无法计量的混乱者)对州和行动的影响,那么就必须对政策进行有偏差的估计。为了打破这一令人困惑的差距,我们在本文件中采用工具变量(四)的强大力量,并提出一个具有说服力的C2L模型。这一模型包括影响跨多个时间步骤行动的混混者,而不是局限于直接的时间依赖者。我们为有效的四类识别和政策优化开发了两阶段模拟学习框架。特别是,在第一阶段,我们根据界定的假变式模型设计了一个测试标准,我们据此为C2L模型确定一个有效的四类。这种标准要求为四类有效确定充分而必要的识别条件。在第二阶段,我们提出两个备选政策学习方法:一个基于模拟,另一个则脱线。我们进行了广泛的实验,以验证确定有效的四类验证政策的有效性,作为学习政策。
Article 288
Title@2025-07-23 (3): R-Stitch: Dynamic Trajectory Stitching for Efficient Reasoning
Title: R-Stitch: Dynamic Trajectory Stitching for Efficient Reasoning | R-Stitch: Dynamische Trajektorien-Stitching für effiziente Vernunft | R-Stitch: 高效理性的动态轨迹切换 2507.17307v1 |
Authors (6): Zhuokun Chen, Zeren Chen, Jiahao He, Mingkui Tan, Jianfei Cai, Bohan Zhuang
Chain-of-thought (CoT) reasoning enhances the problem-solving capabilities of large language models by encouraging step-by-step intermediate reasoning during inference. While effective, CoT introduces substantial computational overhead due to its reliance on autoregressive decoding over long token sequences. Existing acceleration strategies either reduce sequence length through early stopping or compressive reward designs, or improve decoding speed via speculative decoding with smaller models. However, speculative decoding suffers from limited speedup when the agreement between small and large models is low, and fails to exploit the potential advantages of small models in producing concise intermediate reasoning. In this paper, we present R-Stitch, a token-level, confidence-based hybrid decoding framework that accelerates CoT inference by switching between a small language model (SLM) and a large language model (LLM) along the reasoning trajectory. R-Stitch uses the SLM to generate tokens by default and delegates to the LLM only when the SLM’s confidence falls below a threshold. This design avoids full-sequence rollback and selectively invokes the LLM on uncertain steps, preserving both efficiency and answer quality. R-Stitch is model-agnostic, training-free, and compatible with standard decoding pipelines. Experiments on math reasoning benchmarks demonstrate that R-Stitch achieves up to 85\% reduction in inference latency with negligible accuracy drop, highlighting its practical effectiveness in accelerating CoT reasoning.
思维链(CoT)推理提高了大型语言模型的解决问题能力,鼓励在推理过程中逐步地进行中间推理,从而增强了大语言模型的解决问题能力。虽然这一推理是有效的,但CoT由于依赖长期象征性序列的自动递进解码,引入了大量的计算间接费用。现有的加速战略要么通过早期停止或压缩奖励设计来缩短序列长度,要么通过与较小模型的投机性解码来提高解码速度。然而,投机解码工作由于小型和大型模型之间的协议低调而速度有限,未能利用小型模型的潜在优势来生成简明中间推理。在本文中,我们提出了一个象征性的、基于信任的混合解码框架,通过在小语言模型(SLM)和大型语言模型(LLLM)之间转换来加快 CoT的猜疑。R-Stitch使用可持续土地管理来生成默认和代表的代记号,只有当可持续土地管理的信心降到一个阈值,而该设计避免后退步和有选择性地引用LLM的精确性推理学,在不精确的阶中,既能、效率,又解算算,又解算。
Article 289
Title@2025-07-23 (3): Universal Fourier Neural Operators for Micromechanics
Title: Universal Fourier Neural Operators for Micromechanics | Universal Fourier-Neural-Betreiber für Mikromechanik | 通用微型机械天体神经操作员 2507.12233v2 |
Authors (2): Binh Huy Nguyen, Matti Schneider
Solving cell problems in homogenization is hard, and available deep-learning frameworks fail to match the speed and generality of traditional computational frameworks. More to the point, it is generally unclear what to expect of machine-learning approaches, let alone single out which approaches are promising. In the work at hand, we advocate Fourier Neural Operators (FNOs) for micromechanics, empowering them by insights from computational micromechanics methods based on the fast Fourier transform (FFT). We construct an FNO surrogate mimicking the basic scheme foundational for FFT-based methods and show that the resulting operator predicts solutions to cell problems with arbitrary stiffness distribution only subject to a material-contrast constraint up to a desired accuracy. In particular, there are no restrictions on the material symmetry like isotropy, on the number of phases and on the geometry of the interfaces between materials. Also, the provided fidelity is sharp and uniform, providing explicit guarantees leveraging our physical empowerment of FNOs. To show the desired universal approximation property, we construct an FNO explicitly that requires no training to begin with. Still, the obtained neural operator complies with the same memory requirements as the basic scheme and comes with runtimes proportional to classical FFT solvers. In particular, large-scale problems with more than 100 million voxels are readily handled. The goal of this work is to underline the potential of FNOs for solving micromechanical problems, linking FFT-based methods to FNOs. This connection is expected to provide a fruitful exchange between both worlds.
解决同质化中的细胞问题非常困难, 现有的深层次学习框架无法与传统计算框架的速度和普遍性相匹配。 更重要的是, 一般来说还不清楚机器学习方法的预期是什么, 更不用说单挑哪些方法有希望。 在手头的工作上, 我们提倡微机械学的Fourier神经操作器(FNOs), 通过基于快速Fourier变换的计算微机械方法的洞察力增强它们的能力。 我们提供的准确性和统一性是明确的保证, 利用基于FFFFT方法的基本方案基础, 并显示由此产生的操作器预测, 以任意的僵硬性分布的细胞问题的解决方案, 仅受到预期的准确性物质- 调控限制。 特别是, 对材料的对材料的对称性、 阶段数量和对材料间界面的几何测度没有限制。 此外, 所提供的真实性和统一性是明确的保证, 利用 FNOs 的物理能力提供我们所期望的通用的连接性。 为了显示理想的通用近性属性, 我们所建立的FNO 明确预测一个对分质分布式的细胞分配的细胞问题的解决办法, 将不需要对FNO 的对FFFFrialalalalal 进行 的对硬质的常规和直径比常规的预期的预期的对等的对硬度要求的对硬度与FFFFMLLLLLLL 的预期的处理的对操作的对硬度与FM 的预期的周期的对等的周期进行更多的要求进行更多的修正。
Article 290
Title@2025-07-23 (3): Cautious Next Token Prediction
Title: Cautious Next Token Prediction | Vorsichtige nächste Zeichen Vorhersage | 谨慎的次下 Tok 预测 2507.03038v2 |
Authors (10): Yizhou Wang, Lingzhi Zhang, Yue Bai, Mang Tik Chiu, Zhengmian Hu, Mingyuan Zhang, Qihua Dong, Yu Yin, Sohrab Amirghodsi, Yun Fu
Next token prediction paradigm has been prevailing for autoregressive models in the era of LLMs. The current default sampling choice for popular LLMs is temperature scaling together with nucleus sampling to balance diversity and coherence. Nevertheless, such approach leads to inferior performance in various NLP tasks when the model is not certain about testing questions. To this end, we propose a brand new training-free decoding strategy, dubbed as Cautious Next Token Prediction (CNTP). In the decoding process, if the model has comparatively high prediction entropy at a certain step, we sample multiple trials starting from the step independently and stop when encountering any punctuation. Then we select the trial with the lowest perplexity score viewed as the most probable and reliable trial path given the model’s capacity. The trial number is negatively correlated with the prediction confidence, i.e., the less confident the model is, the more trials it should sample. This is consistent with human beings’ behaviour: when feeling uncertain or unconfident, one tends to think more creatively, exploring multiple thinking paths, to cautiously select the path one feels most confident about. Extensive experiments on both LLMs and MLLMs show that our proposed CNTP approach outperforms existing standard decoding strategies consistently by a clear margin. Moreover, the integration of CNTP with self consistency can further improve over vanilla self consistency. We believe our proposed CNTP has the potential to become one of the default choices for LLM decoding. Code is available at https://github.com/wyzjack/CNTP.
在LLMM时代,下一个象征性的预测范式已经流行于下一个LLM时代的自动递增模式。目前流行的LLM的默认抽样选择是温度与核心取样一起的温度缩放,以平衡多样性和一致性。然而,当模型不确定测试问题时,这种方法导致各种NLP任务表现低劣。为此,我们提议了一个全新的无培训解码战略,称为“高端下Token预测 ” (CNTP) 。在解码过程中,如果模型在某个步骤有相对较高的预测选择,我们从步骤独立开始的多重试验,并在遇到任何标点时停止。然后,我们选择试验时,以最低的迷惑评分作为模型能力中最有可能和最可靠的试验路径。试算数字与预测信心有负关系,即,该模型不太自信,它应该做更多的试算。这与人类的行为是一致的:当感觉不确定或不自信,人们倾向于更富有创造性地探索多重思维路径,然后在遇到任何标定点时,我们谨慎地选择一种最不易理解的路径。 CNTP最有可能、最可靠地试验路径来显示我们的CNTP的自我调整的自我定位。 和不断的自我调整的自我调整的自我调整的自我调整的自我定位,让我们的自我调整的自我调整的自我调整的自我调整的自我调整的自我调整的自我调整的自我调整的自我调整。
Article 291
Title@2025-07-23 (3): A Spatio-Temporal Machine Learning Model for Mortgage Credit Risk: Default Probabilities and Loan Portfolios
Title: A Spatio-Temporal Machine Learning Model for Mortgage Credit Risk: Default Probabilities and Loan Portfolios | Ein Spatio-Temporal Machine Learning Modell für Hypothekenkreditrisiko: Standardwahrscheinlichkeiten und Kreditportfolios | 抵押信贷风险:默认概率和贷款组合的Spadio-临时机械学习模式 2410.02846v2 |
Authors (2): Pascal Kündig, Fabio Sigrist
We introduce a novel machine learning model for credit risk by combining tree-boosting with a latent spatio-temporal Gaussian process model accounting for frailty correlation. This allows for modeling non-linearities and interactions among predictor variables in a flexible data-driven manner and for accounting for spatio-temporal variation that is not explained by observable predictor variables. We also show how estimation and prediction can be done in a computationally efficient manner. In an application to a large U.S. mortgage credit risk data set, we find that both predictive default probabilities for individual loans and predictive loan portfolio loss distributions obtained with our novel approach are more accurate compared to conventional independent linear hazard models and also linear spatio-temporal models. Using interpretability tools for machine learning models, we find that the likely reasons for this outperformance are strong interaction and non-linear effects in the predictor variables and the presence of spatio-temporal frailty effects.
我们引入了一种新的信用风险机器学习模式,将树速和潜在的patio-netior Gaussian过程模型结合到脆弱相关关系中,这样可以灵活地以数据驱动的方式模拟预测变量之间的非线性和相互作用,并用可观测的预测变量无法解释的线性时性变异进行核算。我们还展示了如何以计算效率高的方式进行估算和预测。在对大型美国按揭信用风险数据集的应用中,我们发现,与传统的独立线性危害模型和线性黑洞-时性模型相比,个人贷款预测违约概率和以我们新颖方法获得的预测性贷款组合损失分布都更为准确。我们发现,使用机器学习模型的可解释性工具,这种超常性表现的可能原因是预测变量中的强大互动和非线性效应以及时性虚效应的存在。
Article 292
Title@2025-07-23 (3): On Temporal Guidance and Iterative Refinement in Audio Source Separation
Title: On Temporal Guidance and Iterative Refinement in Audio Source Separation | Zur zeitlichen Führung und iterativen Verfeinerung in der Audioquelle Trennung | 关于音频源分离的时间指导和动态改进 2507.17297v1 |
Authors (5): Tobias Morocutti, Jonathan Greif, Paul Primus, Florian Schmid, Gerhard Widmer
Spatial semantic segmentation of sound scenes (S5) involves the accurate identification of active sound classes and the precise separation of their sources from complex acoustic mixtures. Conventional systems rely on a two-stage pipeline - audio tagging followed by label-conditioned source separation - but are often constrained by the absence of fine-grained temporal information critical for effective separation. In this work, we address this limitation by introducing a novel approach for S5 that enhances the synergy between the event detection and source separation stages. Our key contributions are threefold. First, we fine-tune a pre-trained Transformer to detect active sound classes. Second, we utilize a separate instance of this fine-tuned Transformer to perform sound event detection (SED), providing the separation module with detailed, time-varying guidance. Third, we implement an iterative refinement mechanism that progressively enhances separation quality by recursively reusing the separator’s output from previous iterations. These advancements lead to significant improvements in both audio tagging and source separation performance, as demonstrated by our system’s second-place finish in Task 4 of the DCASE Challenge 2025. Our implementation and model checkpoints are available in our GitHub repository: https://github.com/theMoro/dcase25task4 .
声场的空间静默区隔(S5)涉及准确识别活跃声频等级和精确区分其源源与复杂的声频混合物。常规系统依靠两阶段管道—-音频标记,然后贴标签,然后按标签对源进行分离—-但往往受到缺乏对有效分离至关重要的细微刻度时间信息的限制。在这项工作中,我们通过对S5采用一种新颖的方法来解决这一局限性,该方法将加强事件探测和源分离阶段之间的协同作用。我们的主要贡献是三重。首先,我们微调一个经过预先训练的变异器,以探测活跃声频等级。第二,我们利用这个经过微调的变异器进行音频事件探测(SED)的单独实例,为分离模块提供详细、时间变化的指导。第三,我们实施了一个迭代改进机制,通过反复使用先前的分隔器输出,逐步提高分离质量。这些进步导致音频标记和源分离性能的显著改进,正如我们系统在DCASE挑战2025任务4的第二位完成的音频标记。我们的执行和模型检查站可以在我们的GiH数据库中:http://Mbastrobor4。
Article 293
Title@2025-07-23 (3): VLA-Touch: Enhancing Vision-Language-Action Models with Dual-Level Tactile Feedback
Title: VLA-Touch: Enhancing Vision-Language-Action Models with Dual-Level Tactile Feedback | VLA-Touch: Erweiterung von Vision-Language-Action-Modellen mit Dual-Level Taktiles Feedback | VLA-Touch:加强具有双轨反馈的愿景-语言-行动模式 2507.17294v1 |
Authors (5): Jianxin Bi, Kevin Yuchen Ma, Ce Hao, Mike Zheng Shou, Harold Soh
Tactile feedback is generally recognized to be crucial for effective interaction with the physical world. However, state-of-the-art Vision-Language-Action (VLA) models lack the ability to interpret and use tactile signals, limiting their effectiveness in contact-rich tasks. Incorporating tactile feedback into these systems is challenging due to the absence of large multi-modal datasets. We present VLA-Touch, an approach that enhances generalist robot policies with tactile sensing \emph{without fine-tuning} the base VLA. Our method introduces two key innovations: (1) a pipeline that leverages a pretrained tactile-language model that provides semantic tactile feedback for high-level task planning, and (2) a diffusion-based controller that refines VLA-generated actions with tactile signals for contact-rich manipulation. Through real-world experiments, we demonstrate that our dual-level integration of tactile feedback improves task planning efficiency while enhancing execution precision. Code is open-sourced at \href{https://github.com/jxbi1010/VLA-Touch}{this URL}.
人们普遍认为,触觉反馈对于与物理界的有效互动至关重要。然而,最先进的视觉-语言-动作模型缺乏解释和使用触觉信号的能力,限制了其在接触丰富任务中的效力。将触觉反馈纳入这些系统具有挑战性,因为缺少大型多模式数据集。我们介绍了VLA-Touch,这个方法用触觉感应技术增强通用机器人政策,不作微调}基本VLA。我们的方法引入了两项关键创新:(1) 利用预先训练的触觉语言模型为高级任务规划提供语性触觉反馈的管道;(2) 基于扩散的控制器,用触觉信号改进VLA产生的行动,用于接触丰富的操纵。我们通过现实世界的实验,证明我们双层次对触觉反馈的整合提高了任务规划效率,同时提高了执行精准性。代码在href{https://giusub_Axlus10_Libxxxxxxxxxxcom.
Article 294
Title@2025-07-23 (3): Data Virtualization for Machine Learning
Title: Data Virtualization for Machine Learning | Datenvirtualisierung für maschinelles Lernen | 机器学习数据虚拟化 2507.17293v1 |
Authors (5): Saiful Khan, Joyraj Chakraborty, Philip Beaucamp, Niraj Bhujel, Min Chen
Nowadays, machine learning (ML) teams have multiple concurrent ML workflows for different applications. Each workflow typically involves many experiments, iterations, and collaborative activities and commonly takes months and sometimes years from initial data wrangling to model deployment. Organizationally, there is a large amount of intermediate data to be stored, processed, and maintained. \emph{Data virtualization} becomes a critical technology in an infrastructure to serve ML workflows. In this paper, we present the design and implementation of a data virtualization service, focusing on its service architecture and service operations. The infrastructure currently supports six ML applications, each with more than one ML workflow. The data virtualization service allows the number of applications and workflows to grow in the coming years.
目前,机器学习(ML)团队有多种同时的ML工作流程,用于不同的应用。每个工作流程通常涉及许多实验、迭代和协作活动,通常需要几个月甚至几年的时间,从最初的数据相互交织到模型部署。在组织上,有大量的中间数据有待储存、处理和维护。\emph{Data虚拟化}成为为ML工作流程提供服务的基础设施中的一项关键技术。在本文件中,我们介绍了数据虚拟化服务的设计和实施,重点是其服务结构和服务业务。基础设施目前支持六个ML应用程序,每个应用程序都有一个以上的ML工作流程。数据虚拟化服务使应用程序和工作流程的数量在未来几年中得以增长。
Article 295
Title@2025-07-23 (3): Decentralized Federated Learning of Probabilistic Generative Classifiers
Title: Decentralized Federated Learning of Probabilistic Generative Classifiers | Dezentrales Föderiertes Lernen von probabilistischen Generativen Klassifikatoren | 风险生成分类法的联邦分权分权学习 2507.17285v1 |
Authors (3): Aritz Pérez, Carlos Echegoyen, Guzmán Santafé
Federated learning is a paradigm of increasing relevance in real world applications, aimed at building a global model across a network of heterogeneous users without requiring the sharing of private data. We focus on model learning over decentralized architectures, where users collaborate directly to update the global model without relying on a central server. In this context, the current paper proposes a novel approach to collaboratively learn probabilistic generative classifiers with a parametric form. The framework is composed by a communication network over a set of local nodes, each of one having its own local data, and a local updating rule. The proposal involves sharing local statistics with neighboring nodes, where each node aggregates the neighbors’ information and iteratively learns its own local classifier, which progressively converges to a global model. Extensive experiments demonstrate that the algorithm consistently converges to a globally competitive model across a wide range of network topologies, network sizes, local dataset sizes, and extreme non-i.i.d. data distributions.
联邦学习是一种在现实世界应用中日益具有相关性的范例,目的是在不要求分享私人数据的情况下,在多样化用户的网络中建立一个全球模型。我们注重在分散结构中进行示范学习,用户直接合作更新全球模型,而不必依赖中央服务器。在这方面,本文提出一种新的方法,以参数形式合作学习概率性基因分类。框架由一组当地节点的通信网络组成,每个节点都有自己的本地数据,以及地方更新规则。建议涉及与邻接节点分享当地统计数据,每个节点汇集邻居的信息,并反复学习自己的本地分类器,这些分类器逐渐与全球模式趋同。广泛的实验表明,算法在一系列广泛的网络结构、网络规模、本地数据集大小和极端的非i.i.d数据分布方面,始终与全球竞争模式趋同。
Article 296
Title@2025-07-23 (3): Hardware-Efficient Photonic Tensor Core: Accelerating Deep Neural Networks with Structured Compression
Title: Hardware-Efficient Photonic Tensor Core: Accelerating Deep Neural Networks with Structured Compression | Hardware-Effizient Photonic Tensor Core: Beschleunigen von tiefen neuralen Netzwerken mit strukturierter Kompression | 硬件-高效光学光学时标核心:有结构压缩的加速深神经网络 2502.01670v2 |
Authors (6): Shupeng Ning, Hanqing Zhu, Chenghao Feng, Jiaqi Gu, David Z. Pan, Ray T. Chen
The rapid growth in computing demands, particularly driven by artificial intelligence applications, has begun to exceed the capabilities of traditional electronic hardware. Optical computing offers a promising alternative due to its parallelism, high computational speed, and low power consumption. However, existing photonic integrated circuits are constrained by large footprints, costly electro-optical interfaces, and complex control mechanisms, limiting the practical scalability of optical neural networks (ONNs). To address these limitations, we introduce a block-circulant photonic tensor core for a structure-compressed optical neural network (StrC-ONN) architecture. The structured compression technique substantially reduces both model complexity and hardware resources without sacrificing the versatility of neural networks, and achieves accuracy comparable to uncompressed models. Additionally, we propose a hardware-aware training framework to compensate for on-chip nonidealities to improve model robustness and accuracy. Experimental validation through image processing and classification tasks demonstrates that our StrC-ONN achieves a reduction in trainable parameters of up to 74.91%,while still maintaining competitive accuracy levels. Performance analyses further indicate that this hardware-software co-design approach is expected to yield a 3.56 times improvement in power efficiency. By reducing both hardware requirements and control complexity across multiple dimensions, this work explores a new pathway toward practical and scalable ONNs, highlighting a promising route to address future computational efficiency challenges.
计算机需求的快速增长,特别是由人工智能应用驱动的需求的快速增长,已开始超过传统电子硬件的能力。光化计算由于其平行性、高计算速度和低电耗,提供了一个有希望的替代方案。然而,现有的光电综合电路受到大脚印、昂贵的电光界面和复杂的控制机制的限制,限制了光导神经网络的实际缩放能力。为解决这些限制,我们为结构压抑的光电神经网络(StrC-ONN)架构引入了块-电环光电点核心。结构化压缩技术大大降低了模型复杂性和硬件资源,同时又不牺牲神经网络的多功能,并实现了与未压缩模型相当的准确性。此外,我们提议了一个硬件认知培训框架,以补偿光电网络上的不理想性,提高模型的稳健性和准确性。我们通过图像处理和分类任务的实验性验证表明,我们SstrC-N公司在可培训的参数上降低了74.91%,同时保持了竞争性的准确性水平。绩效分析还进一步表明,在提高成本方面,预期的方法是提高成本的路径上,同时要求。
Article 297
Title@2025-07-23 (3): Advancing Multimodal Reasoning via Reinforcement Learning with Cold Start
Title: Advancing Multimodal Reasoning via Reinforcement Learning with Cold Start | Multimodale Reasoning durch verstärktes Lernen mit kaltem Start fördern | 通过 “ 冷起 “ 的强化学习推进多模式理由 2505.22334v2 |
Authors (8): Lai Wei, Yuting Li, Kaipeng Zheng, Chen Wang, Yue Wang, Linghe Kong, Lichao Sun, Weiran Huang
Recent advancements in large language models (LLMs) have demonstrated impressive chain-of-thought reasoning capabilities, with reinforcement learning (RL) playing a crucial role in this progress. While “aha moment” patterns–where models exhibit self-correction through reflection–are often attributed to emergent properties from RL, we first demonstrate that these patterns exist in multimodal LLMs (MLLMs) prior to RL training but may not necessarily correlate with improved reasoning performance. Building on these insights, we present a comprehensive study on enhancing multimodal reasoning through a two-stage approach: (1) supervised fine-tuning (SFT) as a cold start with structured chain-of-thought reasoning patterns, followed by (2) reinforcement learning via GRPO to further refine these capabilities. Our extensive experiments show that this combined approach consistently outperforms both SFT-only and RL-only methods across challenging multimodal reasoning benchmarks. The resulting models achieve state-of-the-art performance among open-source MLLMs at both 3B and 7B scales, with our 7B model showing substantial improvements over base models (e.g., 66.3 %$\rightarrow$73.4 % on MathVista, 62.9 %$\rightarrow$70.4 % on We-Math) and our 3B model achieving performance competitive with several 7B models. Overall, this work provides practical guidance for building advanced multimodal reasoning models. Our code is available at https://github.com/waltonfuture/RL-with-Cold-Start.
大型语言模型(LLMS)最近的进展显示了令人印象深刻的思维推理能力,其中强化学习(RL)在这一进展中发挥着关键作用。 虽然“aha moment”模式 — — 模型通过倒影显示自我校正,常常归因于RL的突发特性,但我们首先表明,这些模式存在于RL培训之前的多式LMS(MLMS)中,但不一定与改进推理性能相关。基于这些见解,我们提交了一份关于通过两阶段方法加强多式联运推理能力的全面研究报告:(1) 监督微调(SFT),作为结构化思维推理模式的寒冷开端,随后通过GROPO进行强化学习,以进一步完善这些能力。我们的广泛实验表明,这种综合方法在挑战性多式推理基准中始终优于SFT(MLM)和RLMM(MLM)两种模式之间都达到了最先进的业绩。 由此得出的模型是3B和7B尺度,我们的7B模型显示基础模型(e.g.$66, $\right\ charrial_B hal destral deal deal deal deal sal sal ex sal deal sal deal sal ex exmal exmal exmal exmal exmal exmal exmal ex ex ex ex)。
Article 298
Title@2025-07-23 (3): Prolonging Tool Life: Learning Skillful Use of General-purpose Tools through Lifespan-guided Reinforcement Learning
Title: Prolonging Tool Life: Learning Skillful Use of General-purpose Tools through Lifespan-guided Reinforcement Learning | Verlängerung des Werkzeuglebens: Erlernen eines kompetenzvollen Einsatzes von Allzweck-Werkzeugen durch lebenslanges Stärkungslernen | 延长工具寿命:通过终身指导强化学习学习如何熟练使用普通用途工具 2507.17275v1 |
Authors (4): Po-Yen Wu, Cheng-Yu Kuo, Yuki Kadokawa, Takamitsu Matsubara
In inaccessible environments with uncertain task demands, robots often rely on general-purpose tools that lack predefined usage strategies. These tools are not tailored for particular operations, making their longevity highly sensitive to how they are used. This creates a fundamental challenge: how can a robot learn a tool-use policy that both completes the task and prolongs the tool’s lifespan? In this work, we address this challenge by introducing a reinforcement learning (RL) framework that incorporates tool lifespan as a factor during policy optimization. Our framework leverages Finite Element Analysis (FEA) and Miner’s Rule to estimate Remaining Useful Life (RUL) based on accumulated stress, and integrates the RUL into the RL reward to guide policy learning toward lifespan-guided behavior. To handle the fact that RUL can only be estimated after task execution, we introduce an Adaptive Reward Normalization (ARN) mechanism that dynamically adjusts reward scaling based on estimated RULs, ensuring stable learning signals. We validate our method across simulated and real-world tool use tasks, including Object-Moving and Door-Opening with multiple general-purpose tools. The learned policies consistently prolong tool lifespan (up to 8.01x in simulation) and transfer effectively to real-world settings, demonstrating the practical value of learning lifespan-guided tool use strategies.
在任务要求不确定、无法满足的环境下,机器人往往依赖缺乏预先确定使用战略的通用工具。这些工具不是为特定操作量而定制的,因此其寿命对于如何使用这些工具非常敏感。这造成了一个根本性的挑战:机器人如何学习既完成任务又延长工具寿命的工具使用政策?在这项工作中,我们通过引入一个强化学习(RL)框架来应对这一挑战,该框架将工具寿命作为政策优化过程中的一个因素纳入其中。我们的框架利用了精度元素分析(FEA)和Miner规则来根据累积的压力来估计剩余使用寿命(RUL),并将RUL纳入RL奖励中,以指导关于寿命指导行为的政策学习。要处理RUL只能在任务执行后才能估计其寿命延长寿命的这一事实,我们引入了适应性再常态(ARN)机制,根据估计的RUL值来动态调整奖励规模,确保稳定的学习信号。我们验证了我们在模拟和现实世界工具中使用的方法,包括用多种通用的模具移动和开关工具,并用多种通用的模版模拟工具来持续延长学习,以学习到全球寿命周期工具。
Article 299
Title@2025-07-23 (3): Leveraging Knowledge Graphs and LLM Reasoning to Identify Operational Bottlenecks for Warehouse Planning Assistance
Title: Leveraging Knowledge Graphs and LLM Reasoning to Identify Operational Bottlenecks for Warehouse Planning Assistance | Nutzung von Wissensgraphen und LLM-Gründung zur Ermittlung operativer Engpässe für die Lagerplanung | 利用知识图和LLM解释,查明用于仓库规划援助的业务瓶颈 2507.17273v1 |
Authors (4): Rishi Parekh, Saisubramaniam Gopalakrishnan, Zishan Ahmad, Anirudh Deodhar
Analyzing large, complex output datasets from Discrete Event Simulations (DES) of warehouse operations to identify bottlenecks and inefficiencies is a critical yet challenging task, often demanding significant manual effort or specialized analytical tools. Our framework integrates Knowledge Graphs (KGs) and Large Language Model (LLM)-based agents to analyze complex Discrete Event Simulation (DES) output data from warehouse operations. It transforms raw DES data into a semantically rich KG, capturing relationships between simulation events and entities. An LLM-based agent uses iterative reasoning, generating interdependent sub-questions. For each sub-question, it creates Cypher queries for KG interaction, extracts information, and self-reflects to correct errors. This adaptive, iterative, and self-correcting process identifies operational issues mimicking human analysis. Our DES approach for warehouse bottleneck identification, tested with equipment breakdowns and process irregularities, outperforms baseline methods. For operational questions, it achieves near-perfect pass rates in pinpointing inefficiencies. For complex investigative questions, we demonstrate its superior diagnostic ability to uncover subtle, interconnected issues. This work bridges simulation modeling and AI (KG+LLM), offering a more intuitive method for actionable insights, reducing time-to-insight, and enabling automated warehouse inefficiency evaluation and diagnosis.
分析仓库操作分解事件模拟(DES)的大型、复杂的大型产出数据集,以分析仓库操作中的分解事件模拟(DES),找出瓶颈和低效率,这是一项关键但富有挑战性的任务,往往需要大量手工或专门的分析工具。我们的框架整合了知识图和大语言模型(LLM)的代理,以分析仓库操作中复杂的分解事件模拟(DES)输出数据。将原始数据转换成精密丰富的 KG,捕捉模拟事件和实体之间的关系。一个基于LLLM的代理使用迭接推法,产生相互依存的子问题。对于每一个子问题,它都为KG互动、提取信息和自我反射软件创建Cypher查询,以纠正错误。这个适应性、迭接和自我校正的过程确定了模拟操作问题,用设备解析和处理过程不规范测试了EDES数据,在确定低效率方面达到了近于效果的通过率。对于复杂的调查问题,我们展示了它的高级诊断能力,在发现微妙的、相互连锁的AI方面提供了一种精确的模拟和可分析方法。
Article 300
Title@2025-07-23 (3): Bayesian Optimization of Robustness Measures under Input Uncertainty: A Randomized Gaussian Process Upper Confidence Bound Approach
Title: Bayesian Optimization of Robustness Measures under Input Uncertainty: A Randomized Gaussian Process Upper Confidence Bound Approach | Bayesische Optimierung von Robustheitsmaßen unter Input Uncertainty: Ein Randomized Gaussian Prozess Oberer Vertrauensbund Ansatz | Bayesian 优化投入不确定性下的有力措施:随机化高斯进程最高信任度办法 2504.03172v2 |
Authors (1): Yu Inatsu
Bayesian optimization based on the Gaussian process upper confidence bound (GP-UCB) offers a theoretical guarantee for optimizing black-box functions. In practice, however, black-box functions often involve input uncertainty. To handle such cases, GP-UCB can be extended to optimize evaluation criteria known as robustness measures. However, GP-UCB-based methods for robustness measures require a trade-off parameter, $\beta$, which, as in the original GP-UCB, must be set sufficiently large to ensure theoretical validity. In this study, we propose randomized robustness measure GP-UCB (RRGP-UCB), a novel method that samples $\beta$ from a chi-squared-based probability distribution. This approach eliminates the need to explicitly specify $\beta$. Notably, the expected value of $\beta$ under this distribution is not excessively large. Furthermore, we show that RRGP-UCB provides tight bounds on the expected regret between the optimal and estimated solutions. Numerical experiments demonstrate the effectiveness of the proposed method.
nan
Article 301
Title@2025-07-23 (3): EXGnet: a single-lead explainable-AI guided multiresolution network with train-only quantitative features for trustworthy ECG arrhythmia classification
Title: EXGnet: a single-lead explainable-AI guided multiresolution network with train-only quantitative features for trustworthy ECG arrhythmia classification | EXGnet: ein einbleiiges, erklärbares, KI-geführtes Multiauflösungsnetzwerk mit nur zuggebundenen quantitativen Eigenschaften für eine vertrauenswürdige EKG-Arrhythmieklassifizierung | EXGnet:一个单一领导、可解释的、以AI为指南的多分辨率网络,在可信赖ECG心律失常分类方面,只有培训的量化特征 2506.12404v2 |
Authors (3): Tushar Talukder Showrav, Soyabul Islam Lincoln, Md. Kamrul Hasan
Deep learning has significantly propelled the performance of ECG arrhythmia classification, yet its clinical adoption remains hindered by challenges in interpretability and deployment on resource-constrained edge devices. To bridge this gap, we propose EXGnet, a novel and reliable ECG arrhythmia classification network tailored for single-lead signals, specifically designed to balance high accuracy, explainability, and edge compatibility. EXGnet integrates XAI supervision during training via a normalized cross-correlation based loss, directing the model’s attention to clinically relevant ECG regions, similar to a cardiologist’s focus. This supervision is driven by automatically generated ground truth, derived through an innovative heart rate variability-based approach, without the need for manual annotation. To enhance classification accuracy without compromising deployment simplicity, we incorporate quantitative ECG features during training. These enrich the model with multi-domain knowledge but are excluded during inference, keeping the model lightweight for edge deployment. Additionally, we introduce an innovative multiresolution block to efficiently capture both short and long-term signal features while maintaining computational efficiency. Rigorous evaluation on the Chapman and Ningbo benchmark datasets validates the supremacy of EXGnet, which achieves average five-fold accuracies of 98.762% and 96.932%, and F1-scores of 97.910% and 95.527%, respectively. Comprehensive ablation studies and both quantitative and qualitative interpretability assessment confirm that the XAI guidance is pivotal, demonstrably enhancing the model’s focus and trustworthiness. Overall, EXGnet sets a new benchmark by combining high-performance arrhythmia classification with interpretability, paving the way for more trustworthy and accessible portable ECG based health monitoring systems.
nan
Article 302
Title@2025-07-23 (3): Knowledge Abstraction for Knowledge-based Semantic Communication: A Generative Causality Invariant Approach
Title: Knowledge Abstraction for Knowledge-based Semantic Communication: A Generative Causality Invariant Approach | Wissensabstraktion für wissensbasierte semantische Kommunikation: Eine generative Kausalität invarianter Ansatz | 基于知识的语义交流知识抽象学知识摘要:产生因果性易变方法 2507.17784v1 |
Authors (6): Minh-Duong Nguyen, Quoc-Viet Pham, Nguyen H. Tran, Hoang-Khoi Do, Duy T. Ngo, Won-Joo Hwang
In this study, we design a low-complexity and generalized AI model that can capture common knowledge to improve data reconstruction of the channel decoder for semantic communication. Specifically, we propose a generative adversarial network that leverages causality-invariant learning to extract causal and non-causal representations from the data. Causal representations are invariant and encompass crucial information to identify the data’s label. They can encapsulate semantic knowledge and facilitate effective data reconstruction at the receiver. Moreover, the causal mechanism ensures that learned representations remain consistent across different domains, making the system reliable even with users collecting data from diverse domains. As user-collected data evolves over time causing knowledge divergence among users, we design sparse update protocols to improve the invariant properties of the knowledge while minimizing communication overheads. Three key observations were drawn from our empirical evaluations. Firstly, causality-invariant knowledge ensures consistency across different devices despite the diverse training data. Secondly, invariant knowledge has promising performance in classification tasks, which is pivotal for goal-oriented semantic communications. Thirdly, our knowledge-based data reconstruction highlights the robustness of our decoder, which surpasses other state-of-the-art data reconstruction and semantic compression methods in terms of Peak Signal-to-Noise Ratio (PSNR).
nan
Article 303
Title@2025-07-23 (3): Rethinking VAE: From Continuous to Discrete Representations Without Probabilistic Assumptions
Title: Rethinking VAE: From Continuous to Discrete Representations Without Probabilistic Assumptions | VAE neu denken: Von kontinuierlichen zu diskreten Repräsentationen ohne probabilistische Annahmen | 重新思考VAE:从连续到分解的表述,无概率假设 2507.17255v1 |
Authors (1): Songxuan Shi
This paper explores the generative capabilities of Autoencoders (AEs) and establishes connections between Variational Autoencoders (VAEs) and Vector Quantized-Variational Autoencoders (VQ-VAEs) through a reformulated training framework. We demonstrate that AEs exhibit generative potential via latent space interpolation and perturbation, albeit limited by undefined regions in the encoding space. To address this, we propose a new VAE-like training method that introduces clustering centers to enhance data compactness and ensure well-defined latent spaces without relying on traditional KL divergence or reparameterization techniques. Experimental results on MNIST, CelebA, and FashionMNIST datasets show smooth interpolative transitions, though blurriness persists. Extending this approach to multiple learnable vectors, we observe a natural progression toward a VQ-VAE-like model in continuous space. However, when the encoder outputs multiple vectors, the model degenerates into a discrete Autoencoder (VQ-AE), which combines image fragments without learning semantic representations. Our findings highlight the critical role of encoding space compactness and dispersion in generative modeling and provide insights into the intrinsic connections between VAEs and VQ-VAEs, offering a new perspective on their design and limitations.
nan
Article 304
Title@2025-07-23 (3): DistrAttention: An Efficient and Flexible Self-Attention Mechanism on Modern GPUs
Title: DistrAttention: An Efficient and Flexible Self-Attention Mechanism on Modern GPUs | DistrAchtung: Ein effizienter und flexibler Selbstaufmerksamkeitsmechanismus für moderne GPUs | 危 难:关于现代全球公益物的高效和灵活自控机制 2507.17245v1 |
Authors (7): Haolin Jin, Mengbai Xiao, Yuan Yuan, Xiao Zhang, Dongxiao Yu, Guanghui Zhang, Haoliang Wang
The Transformer architecture has revolutionized deep learning, delivering the state-of-the-art performance in areas such as natural language processing, computer vision, and time series prediction. However, its core component, self-attention, has the quadratic time complexity relative to input sequence length, which hinders the scalability of Transformers. The exsiting approaches on optimizing self-attention either discard full-contextual information or lack of flexibility. In this work, we design DistrAttention, an effcient and flexible self-attention mechanism with the full context. DistrAttention achieves this by grouping data on the embedding dimensionality, usually referred to as $d$. We realize DistrAttention with a lightweight sampling and fusion method that exploits locality-sensitive hashing to group similar data. A block-wise grouping framework is further designed to limit the errors introduced by locality sensitive hashing. By optimizing the selection of block sizes, DistrAttention could be easily integrated with FlashAttention-2, gaining high-performance on modern GPUs. We evaluate DistrAttention with extensive experiments. The results show that our method is 37% faster than FlashAttention-2 on calculating self-attention. In ViT inference, DistrAttention is the fastest and the most accurate among approximate self-attention mechanisms. In Llama3-1B, DistrAttention still achieves the lowest inference time with only 1% accuray loss.
nan
Article 305
Title@2025-07-23 (3): Eco-Friendly AI: Unleashing Data Power for Green Federated Learning
Title: Eco-Friendly AI: Unleashing Data Power for Green Federated Learning | Eco-friendly KI: Entleashing Data Power für Green Federated Learning | 生态友好型AI:绿色联邦学习的释放数据动力 2507.17241v1 |
Authors (2): Mattia Sabella, Monica Vitali
The widespread adoption of Artificial Intelligence (AI) and Machine Learning (ML) comes with a significant environmental impact, particularly in terms of energy consumption and carbon emissions. This pressing issue highlights the need for innovative solutions to mitigate AI’s ecological footprint. One of the key factors influencing the energy consumption of ML model training is the size of the training dataset. ML models are often trained on vast amounts of data continuously generated by sensors and devices distributed across multiple locations. To reduce data transmission costs and enhance privacy, Federated Learning (FL) enables model training without the need to move or share raw data. While FL offers these advantages, it also introduces challenges due to the heterogeneity of data sources (related to volume and quality), computational node capabilities, and environmental impact. This paper contributes to the advancement of Green AI by proposing a data-centric approach to Green Federated Learning. Specifically, we focus on reducing FL’s environmental impact by minimizing the volume of training data. Our methodology involves the analysis of the characteristics of federated datasets, the selecting of an optimal subset of data based on quality metrics, and the choice of the federated nodes with the lowest environmental impact. We develop a comprehensive methodology that examines the influence of data-centric factors, such as data quality and volume, on FL training performance and carbon emissions. Building on these insights, we introduce an interactive recommendation system that optimizes FL configurations through data reduction, minimizing environmental impact during training. Applying this methodology to time series classification has demonstrated promising results in reducing the environmental impact of FL tasks.
nan
Article 306
Title@2025-07-23 (3): NeuroHD-RA: Neural-distilled Hyperdimensional Model with Rhythm Alignment
Title: NeuroHD-RA: Neural-distilled Hyperdimensional Model with Rhythm Alignment | NeuroHD-RA: Neural-destilliertes Hyperdimensionales Modell mit Rhythm Alignment | NeuroHD-RA:具有同步调整的神经蒸蒸多维模型 2507.14184v3 |
Authors (5): ZhengXiao He, Jinghao Wen, Huayu Li, Siyuan Tian, Ao Li
We present a novel and interpretable framework for electrocardiogram (ECG)-based disease detection that combines hyperdimensional computing (HDC) with learnable neural encoding. Unlike conventional HDC approaches that rely on static, random projections, our method introduces a rhythm-aware and trainable encoding pipeline based on RR intervals, a physiological signal segmentation strategy that aligns with cardiac cycles. The core of our design is a neural-distilled HDC architecture, featuring a learnable RR-block encoder and a BinaryLinear hyperdimensional projection layer, optimized jointly with cross-entropy and proxy-based metric loss. This hybrid framework preserves the symbolic interpretability of HDC while enabling task-adaptive representation learning. Experiments on Apnea-ECG and PTB-XL demonstrate that our model significantly outperforms traditional HDC and classical ML baselines, achieving 73.09\% precision and an F1 score of 0.626 on Apnea-ECG, with comparable robustness on PTB-XL. Our framework offers an efficient and scalable solution for edge-compatible ECG classification, with strong potential for interpretable and personalized health monitoring.
nan
Article 307
Title@2025-07-23 (3): P3SL: Personalized Privacy-Preserving Split Learning on Heterogeneous Edge Devices
Title: P3SL: Personalized Privacy-Preserving Split Learning on Heterogeneous Edge Devices | P3SL: Personalisiertes Datenschutz-Erhalten von Split-Lernen auf heterogenen Edge-Geräten | P3SL: 个人化隐私保护关于异异异异边缘装置的分离学习 2507.17228v1 |
Authors (5): Wei Fan, JinYi Yoon, Xiaochang Li, Huajie Shao, Bo Ji
Split Learning (SL) is an emerging privacy-preserving machine learning technique that enables resource constrained edge devices to participate in model training by partitioning a model into client-side and server-side sub-models. While SL reduces computational overhead on edge devices, it encounters significant challenges in heterogeneous environments where devices vary in computing resources, communication capabilities, environmental conditions, and privacy requirements. Although recent studies have explored heterogeneous SL frameworks that optimize split points for devices with varying resource constraints, they often neglect personalized privacy requirements and local model customization under varying environmental conditions. To address these limitations, we propose P3SL, a Personalized Privacy-Preserving Split Learning framework designed for heterogeneous, resource-constrained edge device systems. The key contributions of this work are twofold. First, we design a personalized sequential split learning pipeline that allows each client to achieve customized privacy protection and maintain personalized local models tailored to their computational resources, environmental conditions, and privacy needs. Second, we adopt a bi-level optimization technique that empowers clients to determine their own optimal personalized split points without sharing private sensitive information (i.e., computational resources, environmental conditions, privacy requirements) with the server. This approach balances energy consumption and privacy leakage risks while maintaining high model accuracy. We implement and evaluate P3SL on a testbed consisting of 7 devices including 4 Jetson Nano P3450 devices, 2 Raspberry Pis, and 1 laptop, using diverse model architectures and datasets under varying environmental conditions.
nan
Article 308
Title@2025-07-23 (3): Dataset Distillation as Data Compression: A Rate-Utility Perspective
Title: Dataset Distillation as Data Compression: A Rate-Utility Perspective | Datensatzdestillation als Datenkompression: Eine Rate-Utility-Perspektive | 将数据集作为数据压缩进行蒸馏:率-功用视角 2507.17221v1 |
Authors (6): Youneng Bao, Yiping Liu, Zhuo Chen, Yongsheng Liang, Mu Li, Kede Ma
Driven by the ``scale-is-everything’’ paradigm, modern machine learning increasingly demands ever-larger datasets and models, yielding prohibitive computational and storage requirements. Dataset distillation mitigates this by compressing an original dataset into a small set of synthetic samples, while preserving its full utility. Yet, existing methods either maximize performance under fixed storage budgets or pursue suitable synthetic data representations for redundancy removal, without jointly optimizing both objectives. In this work, we propose a joint rate-utility optimization method for dataset distillation. We parameterize synthetic samples as optimizable latent codes decoded by extremely lightweight networks. We estimate the Shannon entropy of quantized latents as the rate measure and plug any existing distillation loss as the utility measure, trading them off via a Lagrange multiplier. To enable fair, cross-method comparisons, we introduce bits per class (bpc), a precise storage metric that accounts for sample, label, and decoder parameter costs. On CIFAR-10, CIFAR-100, and ImageNet-128, our method achieves up to $170\times$ greater compression than standard distillation at comparable accuracy. Across diverse bpc budgets, distillation losses, and backbone architectures, our approach consistently establishes better rate-utility trade-offs.
nan
Article 309
Title@2025-07-23 (3): A Low-Cost Machine Learning Approach for Timber Diameter Estimation
Title: A Low-Cost Machine Learning Approach for Timber Diameter Estimation | Ein Low-Cost Machine Learning Ansatz für die Schätzung des Holzdurchmessers | 木材直径估算的低成本机器学习方法 2507.17219v1 |
Authors (3): Fatemeh Hasanzadeh Fard, Sanaz Hasanzadeh Fard, Mehdi Jonoobi
The wood processing industry, particularly in facilities such as sawmills and MDF production lines, requires accurate and efficient identification of species and thickness of the wood. Although traditional methods rely heavily on expert human labor, they are slow, inconsistent, and prone to error, especially when processing large volumes. This study focuses on practical and cost-effective machine learning frameworks that automate the estimation of timber log diameter using standard RGB images captured under real-world working conditions. We employ the YOLOv5 object detection algorithm, fine-tuned on a public dataset (TimberSeg 1.0), to detect individual timber logs and estimate thickness through bounding-box dimensions. Unlike previous methods that require expensive sensors or controlled environments, this model is trained on images taken in typical industrial sheds during timber delivery. Experimental results show that the model achieves a mean Average Precision (mAP@0.5) of 0.64, demonstrating reliable log detection even with modest computing resources. This lightweight, scalable solution holds promise for practical integration into existing workflows, including on-site inventory management and preliminary sorting, particularly in small and medium-sized operations.
nan
Article 310
Title@2025-07-23 (3): APTx Neuron: A Unified Trainable Neuron Architecture Integrating Activation and Computation
Title: APTx Neuron: A Unified Trainable Neuron Architecture Integrating Activation and Computation | APTx Neuron: Unified Trainable Neuron Architecture Integrating Activation and Computation | APTx Neuron: 统一可训练的中子建筑综合激活和计算 2507.14270v2 |
Authors (1): Ravin Kumar
We propose the APTx Neuron, a novel, unified neural computation unit that integrates non-linear activation and linear transformation into a single trainable expression. The APTx Neuron is derived from the APTx activation function, thereby eliminating the need for separate activation layers and making the architecture both computationally efficient and elegant. The proposed neuron follows the functional form $y = \sum_{i=1}^{n} ((\alpha_i + \tanh(\beta_i x_i)) \cdot \gamma_i x_i) + \delta$, where all parameters $\alpha_i$, $\beta_i$, $\gamma_i$, and $\delta$ are trainable. We validate our APTx Neuron-based architecture on the MNIST dataset, achieving up to 96.69% test accuracy in just 20 epochs using approximately 332K trainable parameters. The results highlight the superior expressiveness and computational efficiency of the APTx Neuron compared to traditional neurons, pointing toward a new paradigm in unified neuron design and the architectures built upon it.
nan
Article 311
Title@2025-07-23 (3): Blind Source Separation of Single-Channel Mixtures via Multi-Encoder Autoencoders
Title: Blind Source Separation of Single-Channel Mixtures via Multi-Encoder Autoencoders | Blindquelle Trennung von Single-Channel-Mischungen über Multi-Encoder-Autoencoder | 通过多 Encder 自动自动编码器将单一气道混合体的盲源分离 2309.07138v4 |
Authors (2): Matthew B. Webster, Joonnyong Lee
The task of blind source separation (BSS) involves separating sources from a mixture without prior knowledge of the sources or the mixing system. Single-channel mixtures and non-linear mixtures are a particularly challenging problem in BSS. In this paper, we propose a novel method for addressing BSS with single-channel non-linear mixtures by leveraging the natural feature subspace specialization ability of multi-encoder autoencoders. During the training phase, our method unmixes the input into the separate encoding spaces of the multi-encoder network and then remixes these representations within the decoder for a reconstruction of the input. Then to perform source inference, we introduce a novel encoding masking technique whereby masking out all but one of the encodings enables the decoder to estimate a source signal. To this end, we also introduce a sparse mixing loss that encourages sparse remixing of source encodings throughout the decoder and a so-called zero reconstruction loss on the decoder for coherent source estimations. To analyze and evaluate our method, we conduct experiments on a toy dataset, designed to demonstrate this property of feature subspace specialization, and with real-world biosignal recordings from a polysomnography sleep study for extracting respiration from electrocardiogram and photoplethysmography signals.
nan
Article 312
Title@2025-07-23 (3): HypoChainer: A Collaborative System Combining LLMs and Knowledge Graphs for Hypothesis-Driven Scientific Discovery
Title: HypoChainer: A Collaborative System Combining LLMs and Knowledge Graphs for Hypothesis-Driven Scientific Discovery | HypoChainer: Ein kollaboratives System zur Kombination von LLMs und Wissensgraphen für hypothesisgetriebene wissenschaftliche Entdeckungen | HypoChainner:一个合作系统,将假设-驱动科学发现所利用的LLMs和知识图集结合起来 2507.17209v1 |
Authors (5): Haoran Jiang, Shaohan Shi, Yunjie Yao, Chang Jiang, Quan Li
Modern scientific discovery faces growing challenges in integrating vast and heterogeneous knowledge critical to breakthroughs in biomedicine and drug development. Traditional hypothesis-driven research, though effective, is constrained by human cognitive limits, the complexity of biological systems, and the high cost of trial-and-error experimentation. Deep learning models, especially graph neural networks (GNNs), have accelerated prediction generation, but the sheer volume of outputs makes manual selection for validation unscalable. Large language models (LLMs) offer promise in filtering and hypothesis generation, yet suffer from hallucinations and lack grounding in structured knowledge, limiting their reliability. To address these issues, we propose HypoChainer, a collaborative visualization framework that integrates human expertise, LLM-driven reasoning, and knowledge graphs (KGs) to enhance hypothesis generation and validation. HypoChainer operates in three stages: First, exploration and contextualization – experts use retrieval-augmented LLMs (RAGs) and dimensionality reduction to navigate large-scale GNN predictions, assisted by interactive explanations. Second, hypothesis chain formation – experts iteratively examine KG relationships around predictions and semantically linked entities, refining hypotheses with LLM and KG suggestions. Third, validation prioritization – refined hypotheses are filtered based on KG-supported evidence to identify high-priority candidates for experimentation, with visual analytics further strengthening weak links in reasoning. We demonstrate HypoChainer’s effectiveness through case studies in two domains and expert interviews, highlighting its potential to support interpretable, scalable, and knowledge-grounded scientific discovery.
nan
Article 313
Title@2025-07-23 (3): AlignDistil: Token-Level Language Model Alignment as Adaptive Policy Distillation
Title: AlignDistil: Token-Level Language Model Alignment as Adaptive Policy Distillation | AlignDistil: Token-Level-Sprachmodell Alignment als Adaptive Policy Destillation | Aligndistil: 作为适应性政策蒸馏的调整级语言模式模型对齐 2503.02832v3 |
Authors (6): Songming Zhang, Xue Zhang, Tong Zhang, Bojie Hu, Yufeng Chen, Jinan Xu
In modern large language models (LLMs), LLM alignment is of crucial importance and is typically achieved through methods such as reinforcement learning from human feedback (RLHF) and direct preference optimization (DPO). However, in most existing methods for LLM alignment, all tokens in the response are optimized using a sparse, response-level reward or preference annotation. The ignorance of token-level rewards may erroneously punish high-quality tokens or encourage low-quality tokens, resulting in suboptimal performance and slow convergence speed. To address this issue, we propose AlignDistil, an RLHF-equivalent distillation method for token-level reward optimization. Specifically, we introduce the reward learned by DPO into the RLHF objective and theoretically prove the equivalence between this objective and a token-level distillation process, where the teacher distribution linearly combines the logits from the DPO model and a reference model. On this basis, we further bridge the accuracy gap between the reward from the DPO model and the pure reward model, by building a contrastive DPO reward with a normal and a reverse DPO model. Moreover, to avoid under- and over-optimization on different tokens, we design a token adaptive logit extrapolation mechanism to construct an appropriate teacher distribution for each token. Experimental results demonstrate the superiority of our AlignDistil over existing methods and showcase fast convergence due to its token-level distributional reward optimization.
nan
Article 314
Title@2025-07-23 (3): Filter-And-Refine: A MLLM Based Cascade System for Industrial-Scale Video Content Moderation
Title: Filter-And-Refine: A MLLM Based Cascade System for Industrial-Scale Video Content Moderation | Filter-And-Refine: Ein MLLM-basiertes Cascade-System für die Moderation von industriellen Videoinhalten | 筛选和注释:一个基于 MLLM 的工业规模视频内容调节系统 2507.17204v1 |
Authors (9): Zixuan Wang, Jinghao Shi, Hanzhong Liang, Xiang Shen, Vera Wen, Zhiqian Chen, Yifan Wu, Zhixin Zhang, Hongyu Xiong
Effective content moderation is essential for video platforms to safeguard user experience and uphold community standards. While traditional video classification models effectively handle well-defined moderation tasks, they struggle with complicated scenarios such as implicit harmful content and contextual ambiguity. Multimodal large language models (MLLMs) offer a promising solution to these limitations with their superior cross-modal reasoning and contextual understanding. However, two key challenges hinder their industrial adoption. First, the high computational cost of MLLMs makes full-scale deployment impractical. Second, adapting generative models for discriminative classification remains an open research problem. In this paper, we first introduce an efficient method to transform a generative MLLM into a multimodal classifier using minimal discriminative training data. To enable industry-scale deployment, we then propose a router-ranking cascade system that integrates MLLMs with a lightweight router model. Offline experiments demonstrate that our MLLM-based approach improves F1 score by 66.50% over traditional classifiers while requiring only 2% of the fine-tuning data. Online evaluations show that our system increases automatic content moderation volume by 41%, while the cascading deployment reduces computational cost to only 1.5% of direct full-scale deployment.
nan
Article 315
Title@2025-07-23 (3): Spintronic Bayesian Hardware Driven by Stochastic Magnetic Domain Wall Dynamics
Title: Spintronic Bayesian Hardware Driven by Stochastic Magnetic Domain Wall Dynamics | Spintronic Bayesian Hardware angetrieben von stochastischen magnetischen Domain Wall Dynamics | Spentronic Bayesian 硬器驱动器, 由实心磁域域外壁动态驱动 2507.17193v1 |
Authors (10): Tianyi Wang, Bingqian Dai, Kin Wong, Yaochen Li, Yang Cheng, Qingyuan Shu, Haoran He, Puyang Huang, Hanshen Huang, Kang L. Wang
As artificial intelligence (AI) advances into diverse applications, ensuring reliability of AI models is increasingly critical. Conventional neural networks offer strong predictive capabilities but produce deterministic outputs without inherent uncertainty estimation, limiting their reliability in safety-critical domains. Probabilistic neural networks (PNNs), which introduce randomness, have emerged as a powerful approach for enabling intrinsic uncertainty quantification. However, traditional CMOS architectures are inherently designed for deterministic operation and actively suppress intrinsic randomness. This poses a fundamental challenge for implementing PNNs, as probabilistic processing introduces significant computational overhead. To address this challenge, we introduce a Magnetic Probabilistic Computing (MPC) platform-an energy-efficient, scalable hardware accelerator that leverages intrinsic magnetic stochasticity for uncertainty-aware computing. This physics-driven strategy utilizes spintronic systems based on magnetic domain walls (DWs) and their dynamics to establish a new paradigm of physical probabilistic computing for AI. The MPC platform integrates three key mechanisms: thermally induced DW stochasticity, voltage controlled magnetic anisotropy (VCMA), and tunneling magnetoresistance (TMR), enabling fully electrical and tunable probabilistic functionality at the device level. As a representative demonstration, we implement a Bayesian Neural Network (BNN) inference structure and validate its functionality on CIFAR-10 classification tasks. Compared to standard 28nm CMOS implementations, our approach achieves a seven orders of magnitude improvement in the overall figure of merit, with substantial gains in area efficiency, energy consumption, and speed. These results underscore the MPC platform’s potential to enable reliable and trustworthy physical AI systems.
nan
Article 316
Title@2025-07-23 (3): Met$^2$Net: A Decoupled Two-Stage Spatio-Temporal Forecasting Model for Complex Meteorological Systems
Title: Met$^2$Net: A Decoupled Two-Stage Spatio-Temporal Forecasting Model for Complex Meteorological Systems | Met$^2$Net: Ein entkoppeltes zweistufiges Spatio-Temporal-Prognosemodell für komplexe meteorologische Systeme | Met$2美元 Net:一个分解的复杂气象系统双层空间-时空预报模型 2507.17189v1 |
Authors (4): Shaohan Li, Hao Yang, Min Chen, Xiaolin Qin
The increasing frequency of extreme weather events due to global climate change urges accurate weather prediction. Recently, great advances have been made by the \textbf{end-to-end methods}, thanks to deep learning techniques, but they face limitations of \textit{representation inconsistency} in multivariable integration and struggle to effectively capture the dependency between variables, which is required in complex weather systems. Treating different variables as distinct modalities and applying a \textbf{two-stage training approach} from multimodal models can partially alleviate this issue, but due to the inconformity in training tasks between the two stages, the results are often suboptimal. To address these challenges, we propose an implicit two-stage training method, configuring separate encoders and decoders for each variable. In detailed, in the first stage, the Translator is frozen while the Encoders and Decoders learn a shared latent space, in the second stage, the Encoders and Decoders are frozen, and the Translator captures inter-variable interactions for prediction. Besides, by introducing a self-attention mechanism for multivariable fusion in the latent space, the performance achieves further improvements. Empirically, extensive experiments show the state-of-the-art performance of our method. Specifically, it reduces the MSE for near-surface air temperature and relative humidity predictions by 28.82\% and 23.39\%, respectively. The source code is available at https://github.com/ShremG/Met2Net.
nan
Article 317
Title@2025-07-23 (3): GhostUMAP2: Measuring and Analyzing (r,d)-Stability of UMAP
Title: GhostUMAP2: Measuring and Analyzing (r,d)-Stability of UMAP | GhostUMAP2: Messung und Analyse (r,d)-Stabilität von UMAP | JUMUMAAP2:测量和分析(r,d)-UMAP的稳定性 2507.17174v1 |
Authors (3): Myeongwon Jung, Takanori Fujiwara, Jaemin Jo
Despite the widespread use of Uniform Manifold Approximation and Projection (UMAP), the impact of its stochastic optimization process on the results remains underexplored. We observed that it often produces unstable results where the projections of data points are determined mostly by chance rather than reflecting neighboring structures. To address this limitation, we introduce (r,d)-stability to UMAP: a framework that analyzes the stochastic positioning of data points in the projection space. To assess how stochastic elements, specifically initial projection positions and negative sampling, impact UMAP results, we introduce “ghosts”, or duplicates of data points representing potential positional variations due to stochasticity. We define a data point’s projection as (r,d)-stable if its ghosts perturbed within a circle of radius r in the initial projection remain confined within a circle of radius d for their final positions. To efficiently compute the ghost projections, we develop an adaptive dropping scheme that reduces a runtime up to 60% compared to an unoptimized baseline while maintaining approximately 90% of unstable points. We also present a visualization tool that supports the interactive exploration of the (r,d)-stability of data points. Finally, we demonstrate the effectiveness of our framework by examining the stability of projections of real-world datasets and present usage guidelines for the effective use of our framework.
nan
Article 318
Title@2025-07-23 (3): Privacy-Preserving Multimodal News Recommendation through Federated Learning
Title: Privacy-Preserving Multimodal News Recommendation through Federated Learning | Datenschutz-Erhaltung multimodaler Nachrichten Empfehlung durch Federated Learning | 通过联邦学习促进隐私保护多模式新闻建议 2507.15460v3 |
Authors (3): Mehdi Khalaj, Shahrzad Golestani Najafabadi, Julita Vassileva
Personalized News Recommendation systems (PNR) have emerged as a solution to information overload by predicting and suggesting news items tailored to individual user interests. However, traditional PNR systems face several challenges, including an overreliance on textual content, common neglect of short-term user interests, and significant privacy concerns due to centralized data storage. This paper addresses these issues by introducing a novel multimodal federated learning-based approach for news recommendation. First, it integrates both textual and visual features of news items using a multimodal model, enabling a more comprehensive representation of content. Second, it employs a time-aware model that balances users’ long-term and short-term interests through multi-head self-attention networks, improving recommendation accuracy. Finally, to enhance privacy, a federated learning framework is implemented, enabling collaborative model training without sharing user data. The framework divides the recommendation model into a large server-maintained news model and a lightweight user model shared between the server and clients. The client requests news representations (vectors) and a user model from the central server, then computes gradients with user local data, and finally sends their locally computed gradients to the server for aggregation. The central server aggregates gradients to update the global user model and news model. The updated news model is further used to infer news representation by the server. To further safeguard user privacy, a secure aggregation algorithm based on Shamir’s secret sharing is employed. Experiments on a real-world news dataset demonstrate strong performance compared to existing systems, representing a significant advancement in privacy-preserving personalized news recommendation.
nan
Article 319
Title@2025-07-23 (3): Unmasking Trees for Tabular Data
Title: Unmasking Trees for Tabular Data | Entlarvung von Bäumen für tabellarische Daten | 用于表格数据解压缩树 2407.05593v5 |
Authors (1): Calvin McCarter
Despite much work on advanced deep learning and generative modeling techniques for tabular data generation and imputation, traditional methods have continued to win on imputation benchmarks. We herein present UnmaskingTrees, a simple method for tabular imputation (and generation) employing gradient-boosted decision trees which are used to incrementally unmask individual features. On a benchmark for out-of-the-box performance on 27 small tabular datasets, UnmaskingTrees offers leading performance on imputation; state-of-the-art performance on generation given data with missingness; and competitive performance on vanilla generation given data without missingness. To solve the conditional generation subproblem, we propose a tabular probabilistic prediction method, BaltoBot, which fits a balanced tree of boosted tree classifiers. Unlike older methods, it requires no parametric assumption on the conditional distribution, accommodating features with multimodal distributions; unlike newer diffusion methods, it offers fast sampling, closed-form density estimation, and flexible handling of discrete variables. We finally consider our two approaches as meta-algorithms, demonstrating in-context learning-based generative modeling with TabPFN.
nan
Article 320
Title@2025-07-23 (3): Attention-Based Multiscale Temporal Fusion Network for Uncertain-Mode Fault Diagnosis in Multimode Processes
Title: Attention-Based Multiscale Temporal Fusion Network for Uncertain-Mode Fault Diagnosis in Multimode Processes | Aufmerksamkeitsbasiertes Multiscale Temporal Fusion Network für unsichere Fehlerdiagnosen in Multimode-Prozessen | 多模式进程中不确定-Mode失密诊断多波段时空聚变网络 2504.05172v3 |
Authors (3): Guangqiang Li, M. Amine Atoui, Xiangshun Li
Fault diagnosis in multimode processes plays a critical role in ensuring the safe operation of industrial systems across multiple modes. It faces a great challenge yet to be addressed - that is, the significant distributional differences among monitoring data from multiple modes make it difficult for the models to extract shared feature representations related to system health conditions. In response to this problem, this paper introduces a novel method called attention-based multiscale temporal fusion network. The multiscale depthwise convolution and gated recurrent unit are employed to extract multiscale contextual local features and long-short-term features. Instance normalization is applied to suppress mode-specific information. Furthermore, a temporal attention mechanism is designed to focus on critical time points with higher cross-mode shared information, thereby enhancing the accuracy of fault diagnosis. The proposed model is applied to Tennessee Eastman process dataset and three-phase flow facility dataset. The experiments demonstrate that the proposed model achieves superior diagnostic performance and maintains a small model size. The source code will be available on GitHub at https://github.com/GuangqiangLi/AMTFNet.
nan
Article 321
Title@2025-07-23 (3): Tabular Diffusion based Actionable Counterfactual Explanations for Network Intrusion Detection
Title: Tabular Diffusion based Actionable Counterfactual Explanations for Network Intrusion Detection | Tabuläre Diffusion basierte, gegenfaktische Erklärungen zur Netzwerkintrusionserkennung | 用于网络入侵探测的基于传播表的可行动反事实解释 2507.17161v1 |
Authors (2): Vinura Galwaduge, Jagath Samarabandu
Modern network intrusion detection systems (NIDS) frequently utilize the predictive power of complex deep learning models. However, the “black-box” nature of such deep learning methods adds a layer of opaqueness that hinders the proper understanding of detection decisions, trust in the decisions and prevent timely countermeasures against such attacks. Explainable AI (XAI) methods provide a solution to this problem by providing insights into the causes of the predictions. The majority of the existing XAI methods provide explanations which are not convenient to convert into actionable countermeasures. In this work, we propose a novel diffusion-based counterfactual explanation framework that can provide actionable explanations for network intrusion attacks. We evaluated our proposed algorithm against several other publicly available counterfactual explanation algorithms on 3 modern network intrusion datasets. To the best of our knowledge, this work also presents the first comparative analysis of existing counterfactual explanation algorithms within the context of network intrusion detection systems. Our proposed method provide minimal, diverse counterfactual explanations out of the tested counterfactual explanation algorithms in a more efficient manner by reducing the time to generate explanations. We also demonstrate how counterfactual explanations can provide actionable explanations by summarizing them to create a set of global rules. These rules are actionable not only at instance level but also at the global level for intrusion attacks. These global counterfactual rules show the ability to effectively filter out incoming attack queries which is crucial for efficient intrusion detection and defense mechanisms.
nan
Article 322
Title@2025-07-23 (3): Flexible Coded Distributed Convolution Computing for Enhanced Straggler Resilience and Numerical Stability in Distributed CNNs
Title: Flexible Coded Distributed Convolution Computing for Enhanced Straggler Resilience and Numerical Stability in Distributed CNNs | Flexibles Coded Distributed Convolution Computing für verbesserte Straggler-Resilienz und numerische Stabilität in verteilten CNNs | 增强钢固者的抗力和数字稳定性的灵活代码化分布式分散式电动计算器在分布式有线电视上的分布式有线电视 2411.01579v2 |
Authors (7): Shuo Tan, Rui Liu, Xuesong Han, XianLei Long, Kai Wan, Linqi Song, Yong Li
Deploying Convolutional Neural Networks (CNNs) on resource-constrained devices necessitates efficient management of computational resources, often via distributed environments susceptible to latency from straggler nodes. This paper introduces the Flexible Coded Distributed Convolution Computing (FCDCC) framework to enhance straggler resilience and numerical stability in distributed CNNs. We extend Coded Distributed Computing (CDC) with Circulant and Rotation Matrix Embedding (CRME) which was originally proposed for matrix multiplication to high-dimensional tensor convolution. For the proposed scheme, referred to as the Numerically Stable Coded Tensor Convolution (NSCTC) scheme, we also propose two new coded partitioning schemes: Adaptive-Padding Coded Partitioning (APCP) for the input tensor and Kernel-Channel Coded Partitioning (KCCP) for the filter tensor. These strategies enable linear decomposition of tensor convolutions and encoding them into CDC subtasks, combining model parallelism with coded redundancy for robust and efficient execution. Theoretical analysis identifies an optimal trade-off between communication and storage costs. Empirical results validate the framework’s effectiveness in computational efficiency, straggler resilience, and scalability across various CNN architectures.
nan
Article 323
Title@2025-07-23 (3): JAM: Keypoint-Guided Joint Prediction after Classification-Aware Marginal Proposal for Multi-Agent Interaction
Title: JAM: Keypoint-Guided Joint Prediction after Classification-Aware Marginal Proposal for Multi-Agent Interaction | JAM: Keypoint-Guided Joint Prediction nach Classification-Aware Marginal-Vorschlag für Multi-Agent-Interaktion | JAM:关于多机构互动的分类-软件边际建议之后的关键点指导联合预测 2507.17152v1 |
Authors (4): Fangze Lin, Ying He, Fei Yu, Hong Zhang
Predicting the future motion of road participants is a critical task in autonomous driving. In this work, we address the challenge of low-quality generation of low-probability modes in multi-agent joint prediction. To tackle this issue, we propose a two-stage multi-agent interactive prediction framework named \textit{keypoint-guided joint prediction after classification-aware marginal proposal} (JAM). The first stage is modeled as a marginal prediction process, which classifies queries by trajectory type to encourage the model to learn all categories of trajectories, providing comprehensive mode information for the joint prediction module. The second stage is modeled as a joint prediction process, which takes the scene context and the marginal proposals from the first stage as inputs to learn the final joint distribution. We explicitly introduce key waypoints to guide the joint prediction module in better capturing and leveraging the critical information from the initial predicted trajectories. We conduct extensive experiments on the real-world Waymo Open Motion Dataset interactive prediction benchmark. The results show that our approach achieves competitive performance. In particular, in the framework comparison experiments, the proposed JAM outperforms other prediction frameworks and achieves state-of-the-art performance in interactive trajectory prediction. The code is available at https://github.com/LinFunster/JAM to facilitate future research.
nan
Article 324
Title@2025-07-23 (3): PICore: Physics-Informed Unsupervised Coreset Selection for Data Efficient Neural Operator Training
Title: PICore: Physics-Informed Unsupervised Coreset Selection for Data Efficient Neural Operator Training | PICore: Physik-informierte, unüberwachte Coreset-Auswahl für dateneffiziente Neuraloperator-Schulungen | PICore: 数据高效神经操作员培训的物理-内建无监督核心集选择 2507.17151v1 |
Authors (4): Anirudh Satheesh, Anant Khandelwal, Mucong Ding, Radu Balan
Neural operators offer a powerful paradigm for solving partial differential equations (PDEs) that cannot be solved analytically by learning mappings between function spaces. However, there are two main bottlenecks in training neural operators: they require a significant amount of training data to learn these mappings, and this data needs to be labeled, which can only be accessed via expensive simulations with numerical solvers. To alleviate both of these issues simultaneously, we propose PICore, an unsupervised coreset selection framework that identifies the most informative training samples without requiring access to ground-truth PDE solutions. PICore leverages a physics-informed loss to select unlabeled inputs by their potential contribution to operator learning. After selecting a compact subset of inputs, only those samples are simulated using numerical solvers to generate labels, reducing annotation costs. We then train the neural operator on the reduced labeled dataset, significantly decreasing training time as well. Across four diverse PDE benchmarks and multiple coreset selection strategies, PICore achieves up to 78% average increase in training efficiency relative to supervised coreset selection methods with minimal changes in accuracy. We provide code at https://github.com/Asatheesh6561/PICore.
nan
Article 325
Title@2025-07-23 (3): ScSAM: Debiasing Morphology and Distributional Variability in Subcellular Semantic Segmentation
Title: ScSAM: Debiasing Morphology and Distributional Variability in Subcellular Semantic Segmentation | ScSAM: Debiasing Morphology and Distributional Variability in subzellulärer semantischer Segmentierung | ScSAM: 子细胞间断分解中减少对分细胞分解的道德和分布变异性的影响 2507.17149v1 |
Authors (7): Bo Fang, Jianan Fan, Dongnan Liu, Hang Chang, Gerald J. Shami, Filip Braet, Weidong Cai
The significant morphological and distributional variability among subcellular components poses a long-standing challenge for learning-based organelle segmentation models, significantly increasing the risk of biased feature learning. Existing methods often rely on single mapping relationships, overlooking feature diversity and thereby inducing biased training. Although the Segment Anything Model (SAM) provides rich feature representations, its application to subcellular scenarios is hindered by two key challenges: (1) The variability in subcellular morphology and distribution creates gaps in the label space, leading the model to learn spurious or biased features. (2) SAM focuses on global contextual understanding and often ignores fine-grained spatial details, making it challenging to capture subtle structural alterations and cope with skewed data distributions. To address these challenges, we introduce ScSAM, a method that enhances feature robustness by fusing pre-trained SAM with Masked Autoencoder (MAE)-guided cellular prior knowledge to alleviate training bias from data imbalance. Specifically, we design a feature alignment and fusion module to align pre-trained embeddings to the same feature space and efficiently combine different representations. Moreover, we present a cosine similarity matrix-based class prompt encoder to activate class-specific features to recognize subcellular categories. Extensive experiments on diverse subcellular image datasets demonstrate that ScSAM outperforms state-of-the-art methods.
nan
Article 326
Title@2025-07-23 (3): SADA: Stability-guided Adaptive Diffusion Acceleration
Title: SADA: Stability-guided Adaptive Diffusion Acceleration | SADA: Stabilitätsgeführte Adaptive Diffusions-Beschleunigung | SADA: 稳定导向的适应性扩散加速 2507.17135v1 |
Authors (10): Ting Jiang, Yixiao Wang, Hancheng Ye, Zishan Shao, Jingwei Sun, Jingyang Zhang, Zekai Chen, Jianyi Zhang, Yiran Chen, Hai Li
Diffusion models have achieved remarkable success in generative tasks but suffer from high computational costs due to their iterative sampling process and quadratic attention costs. Existing training-free acceleration strategies that reduce per-step computation cost, while effectively reducing sampling time, demonstrate low faithfulness compared to the original baseline. We hypothesize that this fidelity gap arises because (a) different prompts correspond to varying denoising trajectory, and (b) such methods do not consider the underlying ODE formulation and its numerical solution. In this paper, we propose Stability-guided Adaptive Diffusion Acceleration (SADA), a novel paradigm that unifies step-wise and token-wise sparsity decisions via a single stability criterion to accelerate sampling of ODE-based generative models (Diffusion and Flow-matching). For (a), SADA adaptively allocates sparsity based on the sampling trajectory. For (b), SADA introduces principled approximation schemes that leverage the precise gradient information from the numerical ODE solver. Comprehensive evaluations on SD-2, SDXL, and Flux using both EDM and DPM++ solvers reveal consistent $\ge 1.8\times$ speedups with minimal fidelity degradation (LPIPS $\leq 0.10$ and FID $\leq 4.5$) compared to unmodified baselines, significantly outperforming prior methods. Moreover, SADA adapts seamlessly to other pipelines and modalities: It accelerates ControlNet without any modifications and speeds up MusicLDM by $1.8\times$ with $\sim 0.01$ spectrogram LPIPS.
nan
Article 327
Title@2025-07-23 (3): Enabling Self-Improving Agents to Learn at Test Time With Human-In-The-Loop Guidance
Title: Enabling Self-Improving Agents to Learn at Test Time With Human-In-The-Loop Guidance | Selbstverbessernde Mittel zur Testzeit mit der Anleitung für Mensch-in-der-Loop lernen | 使自我改进代理机构能够在测试时以 “ 人工在网上 “ 指南学习 2507.17131v1 |
Authors (11): Yufei He, Ruoyu Li, Alex Chen, Yue Liu, Yulin Chen, Yuan Sui, Cheng Chen, Yi Zhu, Luca Luo, Frank Yang, Bryan Hooi
Large language model (LLM) agents often struggle in environments where rules and required domain knowledge frequently change, such as regulatory compliance and user risk screening. Current approaches, like offline fine-tuning and standard prompting, are insufficient because they cannot effectively adapt to new knowledge during actual operation. To address this limitation, we propose the Adaptive Reflective Interactive Agent (ARIA), an LLM agent framework designed specifically to continuously learn updated domain knowledge at test time. ARIA assesses its own uncertainty through structured self-dialogue, proactively identifying knowledge gaps and requesting targeted explanations or corrections from human experts. It then systematically updates an internal, timestamped knowledge repository with provided human guidance, detecting and resolving conflicting or outdated knowledge through comparisons and clarification queries. We evaluate ARIA on the realistic customer due diligence name screening task on TikTok Pay, alongside publicly available dynamic knowledge tasks. Results demonstrate significant improvements in adaptability and accuracy compared to baselines using standard offline fine-tuning and existing self-improving agents. ARIA is deployed within TikTok Pay serving over 150 million monthly active users, confirming its practicality and effectiveness for operational use in rapidly evolving environments.
nan
Article 328
Title@2025-07-23 (3): OkadaTorch: A Differentiable Programming of Okada Model to Calculate Displacements and Strains from Fault Parameters
Title: OkadaTorch: A Differentiable Programming of Okada Model to Calculate Displacements and Strains from Fault Parameters | OkadaTorch: Eine differenzierte Programmierung des Okada-Modells zur Berechnung von Displacements und Strains aus Fehlerparametern | OkadaTorch: Okada 模型的不同编程,用以计算与故障参数有关的流离失所和 Strains 2507.17126v1 |
Authors (3): Masayoshi Someya, Taisuke Yamada, Tomohisa Okazaki
The Okada model is a widely used analytical solution for displacements and strains caused by a point or rectangular dislocation source in a 3D elastic half-space. We present OkadaTorch, a PyTorch implementation of the Okada model, where the entire code is differentiable; gradients with respect to input can be easily computed using automatic differentiation (AD). Our work consists of two components: a direct translation of the original Okada model into PyTorch, and a convenient wrapper interface for efficiently computing gradients and Hessians with respect to either observation station coordinates or fault parameters. This differentiable framework is well suited for fault parameter inversion, including gradient-based optimization, Bayesian inference, and integration with scientific machine learning (SciML) models. Our code is available here: https://github.com/msomeya1/OkadaTorch
nan
Article 329
Title@2025-07-23 (3): Model Compression Engine for Wearable Devices Skin Cancer Diagnosis
Title: Model Compression Engine for Wearable Devices Skin Cancer Diagnosis | Modell-Kompressions-Engine für tragbare Geräte Hautkrebs-Diagnose | 穿戴设备皮肤癌症诊断模型压缩引擎 2507.17125v1 |
Authors (6): Jacob M. Delgado-López, Andrea P. Seda-Hernandez, Juan D. Guadalupe-Rosado, Luis E. Fernandez Ramirez, Miguel Giboyeaux-Camilo, Wilfredo E. Lugo-Beauchamp
Skin cancer is one of the most prevalent and preventable types of cancer, yet its early detection remains a challenge, particularly in resource-limited settings where access to specialized healthcare is scarce. This study proposes an AI-driven diagnostic tool optimized for embedded systems to address this gap. Using transfer learning with the MobileNetV2 architecture, the model was adapted for binary classification of skin lesions into “Skin Cancer” and “Other.” The TensorRT framework was employed to compress and optimize the model for deployment on the NVIDIA Jetson Orin Nano, balancing performance with energy efficiency. Comprehensive evaluations were conducted across multiple benchmarks, including model size, inference speed, throughput, and power consumption. The optimized models maintained their performance, achieving an F1-Score of 87.18% with a precision of 93.18% and recall of 81.91%. Post-compression results showed reductions in model size of up to 0.41, along with improvements in inference speed and throughput, and a decrease in energy consumption of up to 0.93 in INT8 precision. These findings validate the feasibility of deploying high-performing, energy-efficient diagnostic tools on resource-constrained edge devices. Beyond skin cancer detection, the methodologies applied in this research have broader applications in other medical diagnostics and domains requiring accessible, efficient AI solutions. This study underscores the potential of optimized AI systems to revolutionize healthcare diagnostics, thereby bridging the divide between advanced technology and underserved regions.
nan
Article 330
Title@2025-07-23 (3): Computer Vision for Real-Time Monkeypox Diagnosis on Embedded Systems
Title: Computer Vision for Real-Time Monkeypox Diagnosis on Embedded Systems | Computer Vision für Echtzeit-Monkeypox-Diagnose auf Embedded-Systemen | 关于嵌入系统实时猴子天花诊断的计算机愿景 2507.17123v1 |
Authors (4): Jacob M. Delgado-López, Ricardo A. Morell-Rodriguez, Sebastián O. Espinosa-Del Rosario, Wilfredo E. Lugo-Beauchamp
The rapid diagnosis of infectious diseases, such as monkeypox, is crucial for effective containment and treatment, particularly in resource-constrained environments. This study presents an AI-driven diagnostic tool developed for deployment on the NVIDIA Jetson Orin Nano, leveraging the pre-trained MobileNetV2 architecture for binary classification. The model was trained on the open-source Monkeypox Skin Lesion Dataset, achieving a 93.07% F1-Score, which reflects a well-balanced performance in precision and recall. To optimize the model, the TensorRT framework was used to accelerate inference for FP32 and to perform post-training quantization for FP16 and INT8 formats. TensorRT’s mixed-precision capabilities enabled these optimizations, which reduced the model size, increased inference speed, and lowered power consumption by approximately a factor of two, all while maintaining the original accuracy. Power consumption analysis confirmed that the optimized models used significantly less energy during inference, reinforcing their suitability for deployment in resource-constrained environments. The system was deployed with a Wi-Fi Access Point (AP) hotspot and a web-based interface, enabling users to upload and analyze images directly through connected devices such as mobile phones. This setup ensures simple access and seamless connectivity, making the tool practical for real-world applications. These advancements position the diagnostic tool as an efficient, scalable, and energy-conscious solution to address diagnosis challenges in underserved regions, paving the way for broader adoption in low-resource healthcare settings.
nan
Article 331
Title@2025-07-23 (3): Robust Five-Class and binary Diabetic Retinopathy Classification Using Transfer Learning and Data Augmentation
Title: Robust Five-Class and binary Diabetic Retinopathy Classification Using Transfer Learning and Data Augmentation | Robuste Fünf-Klasse und binäre diabetische Retinopathie Klassifizierung mittels Transfer Lernen und Datenvergrößerung | 五类强力细胞和二分体糖尿病病理病理学分类,利用转让学习和数据增强 2507.17121v1 |
Authors (2): Faisal Ahmed, Mohammad Alfrad Nobel Bhuiyan
Diabetic retinopathy (DR) is a leading cause of vision loss worldwide, and early diagnosis through automated retinal image analysis can significantly reduce the risk of blindness. This paper presents a robust deep learning framework for both binary and five-class DR classification, leveraging transfer learning and extensive data augmentation to address the challenges of class imbalance and limited training data. We evaluate a range of pretrained convolutional neural network architectures, including variants of ResNet and EfficientNet, on the APTOS 2019 dataset. For binary classification, our proposed model achieves a state-of-the-art accuracy of 98.9%, with a precision of 98.6%, recall of 99.3%, F1-score of 98.9%, and an AUC of 99.4%. In the more challenging five-class severity classification task, our model obtains a competitive accuracy of 84.6% and an AUC of 94.1%, outperforming several existing approaches. Our findings also demonstrate that EfficientNet-B0 and ResNet34 offer optimal trade-offs between accuracy and computational efficiency across both tasks. These results underscore the effectiveness of combining class-balanced augmentation with transfer learning for high-performance DR diagnosis. The proposed framework provides a scalable and accurate solution for DR screening, with potential for deployment in real-world clinical environments.
nan
Article 332
Title@2025-07-23 (3): Probabilistic Graphical Models: A Concise Tutorial
Title: Probabilistic Graphical Models: A Concise Tutorial | Probabilistische Graphische Modelle: Ein kurzes Tutorial | 概率概率图形模型:简洁的教学 2507.17116v1 |
Authors (4): Jacqueline Maasch, Willie Neiswanger, Stefano Ermon, Volodymyr Kuleshov
Probabilistic graphical modeling is a branch of machine learning that uses probability distributions to describe the world, make predictions, and support decision-making under uncertainty. Underlying this modeling framework is an elegant body of theory that bridges two mathematical traditions: probability and graph theory. This framework provides compact yet expressive representations of joint probability distributions, yielding powerful generative models for probabilistic reasoning. This tutorial provides a concise introduction to the formalisms, methods, and applications of this modeling framework. After a review of basic probability and graph theory, we explore three dominant themes: (1) the representation of multivariate distributions in the intuitive visual language of graphs, (2) algorithms for learning model parameters and graphical structures from data, and (3) algorithms for inference, both exact and approximate.
nan
Article 333
Title@2025-07-23 (3): EnsemW2S: Enhancing Weak-to-Strong Generalization with Large Language Model Ensembles
Title: EnsemW2S: Enhancing Weak-to-Strong Generalization with Large Language Model Ensembles | EnsemW2S: Verbesserung der Schwach-zu-Strong-Verallgemeinerung mit großsprachigen Modellensembles | EnsemW2S:用大语言模型组合加强弱至强的通用化 2410.04571v3 |
Authors (9): Aakriti Agrawal, Mucong Ding, Zora Che, Chenghao Deng, Anirudh Satheesh, Bang An, Bayan Bruss, John Langford, Furong Huang
With Large Language Models (LLMs) rapidly approaching and potentially surpassing human-level performance, it has become imperative to develop approaches capable of effectively supervising and enhancing these powerful models using smaller, human-level models exposed to only human-level data. We address this critical weak-to-strong (W2S) generalization challenge by proposing a novel method aimed at improving weak experts, by training on the same limited human-level data, enabling them to generalize to complex, super-human-level tasks. Our approach, called EnsemW2S, employs a token-level ensemble strategy that iteratively combines multiple weak experts, systematically addressing the shortcomings identified in preceding iterations. By continuously refining these weak models, we significantly enhance their collective ability to supervise stronger student models. We extensively evaluate the generalization performance of both the ensemble of weak experts and the subsequent strong student model across in-distribution (ID) and out-of-distribution (OOD) datasets. For OOD, we specifically introduce question difficulty as an additional dimension for defining distributional shifts. Our empirical results demonstrate notable improvements, achieving 4%, and 3.2% improvements on ID datasets and, upto 6% and 2.28% on OOD datasets for experts and student models respectively, underscoring the effectiveness of our proposed method in advancing W2S generalization.
nan
Article 334
Title@2025-07-23 (3): Reinforcement Learning Fine-Tunes a Sparse Subnetwork in Large Language Models
Title: Reinforcement Learning Fine-Tunes a Sparse Subnetwork in Large Language Models | Verstärktes Lernen Fine-Tunes ein Sparse Subnetwork in großen Sprachmodellen | 以大语言模式建立粗略的子网络 2507.17107v1 |
Authors (1): Andrii Balashov
Reinforcement learning (RL) is a key post-pretraining step for aligning large language models (LLMs) with complex tasks and human preferences. While it is often assumed that RL fine-tuning requires updating most of a model’s parameters, we challenge this assumption with a surprising finding: RL fine-tuning consistently modifies only a small subnetwork (typically 5-30% of weights), leaving most parameters unchanged. We call this phenomenon RL-induced parameter update sparsity. It arises naturally, without any sparsity constraints or parameter-efficient tuning, and appears across multiple RL algorithms (e.g., PPO, DPO, SimPO, PRIME) and model families (e.g., OpenAI, Meta, and open-source LLMs). Moreover, the subnetworks updated by RL show substantial overlap across different seeds, datasets, and algorithms-far exceeding chance-suggesting a partially transferable structure in the pretrained model. We show that fine-tuning only this sparse subnetwork recovers full model performance and yields parameters nearly identical to the fully fine-tuned model. Our analysis suggests this sparsity emerges because RL operates near the model’s original distribution, requiring only targeted changes. KL penalties, gradient clipping, and on-policy dynamics have limited effect on the sparsity pattern. These findings shed new light on how RL adapts models: not by shifting all weights, but by focusing training on a small, consistently updated subnetwork. This insight enables more efficient RL methods and reframes sparsity through the lens of the lottery ticket hypothesis.
nan
Article 335
Title@2025-07-23 (3): ZORMS-LfD: Learning from Demonstrations with Zeroth-Order Random Matrix Search
Title: ZORMS-LfD: Learning from Demonstrations with Zeroth-Order Random Matrix Search | ZORMS-LfD: Aus Demonstrationen lernen mit der Zufallsmatrix-Suche der Nullten Ordnung | ZORMS-LfD: 学习用零极随机矩阵搜索从演示中学习 2507.17096v1 |
Authors (4): Olivia Dry, Timothy L. Molloy, Wanxin Jin, Iman Shames
We propose Zeroth-Order Random Matrix Search for Learning from Demonstrations (ZORMS-LfD). ZORMS-LfD enables the costs, constraints, and dynamics of constrained optimal control problems, in both continuous and discrete time, to be learned from expert demonstrations without requiring smoothness of the learning-loss landscape. In contrast, existing state-of-the-art first-order methods require the existence and computation of gradients of the costs, constraints, dynamics, and learning loss with respect to states, controls and/or parameters. Most existing methods are also tailored to discrete time, with constrained problems in continuous time receiving only cursory attention. We demonstrate that ZORMS-LfD matches or surpasses the performance of state-of-the-art methods in terms of both learning loss and compute time across a variety of benchmark problems. On unconstrained continuous-time benchmark problems, ZORMS-LfD achieves similar loss performance to state-of-the-art first-order methods with an over $80$\% reduction in compute time. On constrained continuous-time benchmark problems where there is no specialized state-of-the-art method, ZORMS-LfD is shown to outperform the commonly used gradient-free Nelder-Mead optimization method.
nan
Article 336
Title@2025-07-23 (3): Joint Pedestrian and Vehicle Traffic Optimization in Urban Environments using Reinforcement Learning
Title: Joint Pedestrian and Vehicle Traffic Optimization in Urban Environments using Reinforcement Learning | Gemeinsame Fußgänger- und Fahrzeugverkehrsoptimierung in städtischen Umgebungen mittels Verstärkungslernen | 利用强化学习在城市环境中联合优化步行和车辆交通 2504.05018v2 |
Authors (5): Bibek Poudel, Xuan Wang, Weizi Li, Lei Zhu, Kevin Heaslip
Reinforcement learning (RL) holds significant promise for adaptive traffic signal control. While existing RL-based methods demonstrate effectiveness in reducing vehicular congestion, their predominant focus on vehicle-centric optimization leaves pedestrian mobility needs and safety challenges unaddressed. In this paper, we present a deep RL framework for adaptive control of eight traffic signals along a real-world urban corridor, jointly optimizing both pedestrian and vehicular efficiency. Our single-agent policy is trained using real-world pedestrian and vehicle demand data derived from Wi-Fi logs and video analysis. The results demonstrate significant performance improvements over traditional fixed-time signals, reducing average wait times per pedestrian and per vehicle by up to 67% and 52% respectively, while simultaneously decreasing total wait times for both groups by up to 67% and 53%. Additionally, our results demonstrate generalization capabilities across varying traffic demands, including conditions entirely unseen during training, validating RL’s potential for developing transportation systems that serve all road users.
nan
Article 337
Title@2025-07-22 (2): Deformable Cluster Manipulation via Whole-Arm Policy Learning
Title: Deformable Cluster Manipulation via Whole-Arm Policy Learning | Verformbare Clustermanipulation über Ganzarm Policy Learning | 通过全Arm政策学习进行变形集束操纵 2507.17085v1 |
Authors (6): Jayadeep Jacob, Wenzheng Zhang, Houston Warren, Paulo Borges, Tirthankar Bandyopadhyay, Fabio Ramos
Manipulating clusters of deformable objects presents a substantial challenge with widespread applicability, but requires contact-rich whole-arm interactions. A potential solution must address the limited capacity for realistic model synthesis, high uncertainty in perception, and the lack of efficient spatial abstractions, among others. We propose a novel framework for learning model-free policies integrating two modalities: 3D point clouds and proprioceptive touch indicators, emphasising manipulation with full body contact awareness, going beyond traditional end-effector modes. Our reinforcement learning framework leverages a distributional state representation, aided by kernel mean embeddings, to achieve improved training efficiency and real-time inference. Furthermore, we propose a novel context-agnostic occlusion heuristic to clear deformables from a target region for exposure tasks. We deploy the framework in a power line clearance scenario and observe that the agent generates creative strategies leveraging multiple arm links for de-occlusion. Finally, we perform zero-shot sim-to-real policy transfer, allowing the arm to clear real branches with unknown occlusion patterns, unseen topology, and uncertain dynamics.
nan
Article 338
Title@2025-07-22 (2): A Parameter-Efficient Quantum Anomaly Detection Method on a Superconducting Quantum Processor
Title: A Parameter-Efficient Quantum Anomaly Detection Method on a Superconducting Quantum Processor | Eine Parameter-effiziente Quantenanomalie-Erkennungsmethode auf einem supraleitenden Quantenprozessor | 超导量子处理器超导量子处理器的参数有效量子异常探测方法 2412.16867v4 |
Authors (3): Maida Wang, Jinyang Jiang, Peter V. Coveney
Quantum machine learning has gained attention for its potential to address computational challenges. However, whether those algorithms can effectively solve practical problems and outperform their classical counterparts, especially on current quantum hardware, remains a critical question. In this work, we propose a novel quantum machine learning method, called Parameter-Efficient Quantum Anomaly Detection (PEQAD), for practical image anomaly detection, which aims to achieve both parameter efficiency and superior accuracy compared to classical models. Emulation results indicate that PEQAD demonstrates favourable recognition capabilities compared to classical baselines, achieving an average accuracy of over 90% on benchmarks with significantly fewer trainable parameters. Theoretical analysis confirms that PEQAD has a comparable expressivity to classical counterparts while requiring only a fraction of the parameters. Furthermore, we demonstrate the first implementation of a quantum anomaly detection method for general image datasets on a superconducting quantum processor. Specifically, we achieve an accuracy of over 80% with only 16 parameters on the device, providing initial evidence of PEQAD’s practical viability in the noisy intermediate-scale quantum era and highlighting its significant reduction in parameter requirements.
nan
Article 339
Title@2025-07-22 (2): Advanced U-Net Architectures with CNN Backbones for Automated Lung Cancer Detection and Segmentation in Chest CT Images
Title: Advanced U-Net Architectures with CNN Backbones for Automated Lung Cancer Detection and Segmentation in Chest CT Images | Erweiterte U-Net-Architekturen mit CNN-Backbones für automatisierte Lungenkrebserkennung und Segmentierung in Brust CT-Bildern | 使用有线电视新闻网用于肺癌自动检测和切斯特CT图象分割的U-Net高级建筑 2507.09898v2 |
Authors (4): Alireza Golkarieh, Kiana Kiashemshaki, Sajjad Rezvani Boroujeni, Nasibeh Asadi Isakan
This study investigates the effectiveness of U-Net architectures integrated with various convolutional neural network (CNN) backbones for automated lung cancer detection and segmentation in chest CT images, addressing the critical need for accurate diagnostic tools in clinical settings. A balanced dataset of 832 chest CT images (416 cancerous and 416 non-cancerous) was preprocessed using Contrast Limited Adaptive Histogram Equalization (CLAHE) and resized to 128x128 pixels. U-Net models were developed with three CNN backbones: ResNet50, VGG16, and Xception, to segment lung regions. After segmentation, CNN-based classifiers and hybrid models combining CNN feature extraction with traditional machine learning classifiers (Support Vector Machine, Random Forest, and Gradient Boosting) were evaluated using 5-fold cross-validation. Metrics included accuracy, precision, recall, F1-score, Dice coefficient, and ROC-AUC. U-Net with ResNet50 achieved the best performance for cancerous lungs (Dice: 0.9495, Accuracy: 0.9735), while U-Net with VGG16 performed best for non-cancerous segmentation (Dice: 0.9532, Accuracy: 0.9513). For classification, the CNN model using U-Net with Xception achieved 99.1 percent accuracy, 99.74 percent recall, and 99.42 percent F1-score. The hybrid CNN-SVM-Xception model achieved 96.7 percent accuracy and 97.88 percent F1-score. Compared to prior methods, our framework consistently outperformed existing models. In conclusion, combining U-Net with advanced CNN backbones provides a powerful method for both segmentation and classification of lung cancer in CT scans, supporting early diagnosis and clinical decision-making.
nan
Article 340
Title@2025-07-22 (2): Language model developers should report train-test overlap
Title: Language model developers should report train-test overlap | Entwickler von Sprachmodellen sollten Überlappungen von Zugversuchen melden | 语言模式开发者应报告培训测试重叠情况 2410.08385v2 |
Authors (7): Andy K Zhang, Kevin Klyman, Yifan Mai, Yoav Levine, Yian Zhang, Rishi Bommasani, Percy Liang
Language models are extensively evaluated, but correctly interpreting evaluation results requires knowledge of train-test overlap which refers to the extent to which the language model is trained on the very data it is being tested on. The public currently lacks adequate information about train-test overlap: most models have no public train-test overlap statistics, and third parties cannot directly measure train-test overlap since they do not have access to the training data. To make this clear, we document the practices of 30 model developers, finding that just 9 developers report train-test overlap: 4 developers release training data under open-source licenses, enabling the community to directly measure train-test overlap, and 5 developers publish their train-test overlap methodology and statistics. By engaging with language model developers, we provide novel information about train-test overlap for three additional developers. Overall, we take the position that language model developers should publish train-test overlap statistics and/or training data whenever they report evaluation results on public test sets. We hope our work increases transparency into train-test overlap to increase the community-wide trust in model evaluations.
nan
Article 341
Title@2025-07-22 (2): Sensor Drift Compensation in Electronic-Nose-Based Gas Recognition Using Knowledge Distillation
Title: Sensor Drift Compensation in Electronic-Nose-Based Gas Recognition Using Knowledge Distillation | Sensor-Drift-Kompensation in der elektronisch-nasebasierten Gaserkennung mittels Wissensdestillation | 利用知识蒸馏在基于电子喷气气体识别中 使用知识蒸馏 2507.17071v1 |
Authors (2): Juntao Lin, Xianghao Zhan
Due to environmental changes and sensor aging, sensor drift challenges the performance of electronic nose systems in gas classification during real-world deployment. Previous studies using the UCI Gas Sensor Array Drift Dataset reported promising drift compensation results but lacked robust statistical experimental validation and may overcompensate for sensor drift, losing class-related variance.To address these limitations and improve sensor drift compensation with statistical rigor, we first designed two domain adaptation tasks based on the same electronic nose dataset: using the first batch to predict the remaining batches, simulating a controlled laboratory setting; and predicting the next batch using all prior batches, simulating continuous training data updates for online training. We then systematically tested three methods: our proposed novel Knowledge Distillation (KD) method, the benchmark method Domain Regularized Component Analysis (DRCA), and a hybrid method KD-DRCA, across 30 random test set partitions on the UCI dataset. We showed that KD consistently outperformed both DRCA and KD-DRCA, achieving up to an 18% improvement in accuracy and 15% in F1-score, demonstrating KD’s superior effectiveness in drift compensation. This is the first application of KD for electronic nose drift mitigation, significantly outperforming the previous state-of-the-art DRCA method and enhancing the reliability of sensor drift compensation in real-world environments.
nan
Article 342
Title@2025-07-22 (2): Advancing Robustness in Deep Reinforcement Learning with an Ensemble Defense Approach
Title: Advancing Robustness in Deep Reinforcement Learning with an Ensemble Defense Approach | Robustheit im Deep Reinforcement Learning mit einem Ensemble Defense Approach fördern | 以组合防御方法推进深强化学习的强力 2507.17070v1 |
Authors (4): Adithya Mohan, Dominik Rößle, Daniel Cremers, Torsten Schön
Recent advancements in Deep Reinforcement Learning (DRL) have demonstrated its applicability across various domains, including robotics, healthcare, energy optimization, and autonomous driving. However, a critical question remains: How robust are DRL models when exposed to adversarial attacks? While existing defense mechanisms such as adversarial training and distillation enhance the resilience of DRL models, there remains a significant research gap regarding the integration of multiple defenses in autonomous driving scenarios specifically. This paper addresses this gap by proposing a novel ensemble-based defense architecture to mitigate adversarial attacks in autonomous driving. Our evaluation demonstrates that the proposed architecture significantly enhances the robustness of DRL models. Compared to the baseline under FGSM attacks, our ensemble method improves the mean reward from 5.87 to 18.38 (over 213% increase) and reduces the mean collision rate from 0.50 to 0.09 (an 82% decrease) in the highway scenario and merge scenario, outperforming all standalone defense strategies.
nan
Article 343
Title@2025-07-22 (2): The FIX Benchmark: Extracting Features Interpretable to eXperts
Title: The FIX Benchmark: Extracting Features Interpretable to eXperts | Der FIX-Benchmark: Merkmale extrahieren Interpretierbar auf eXperts | FIX基准:提取可解释为eXperts的地物 2409.13684v4 |
Authors (13): Helen Jin, Shreya Havaldar, Chaehyeon Kim, Anton Xue, Weiqiu You, Helen Qu, Marco Gatti, Daniel A Hashimoto, Bhuvnesh Jain, Amin Madani, Masao Sako, Lyle Ungar, Eric Wong
Feature-based methods are commonly used to explain model predictions, but these methods often implicitly assume that interpretable features are readily available. However, this is often not the case for high-dimensional data, and it can be hard even for domain experts to mathematically specify which features are important. Can we instead automatically extract collections or groups of features that are aligned with expert knowledge? To address this gap, we present FIX (Features Interpretable to eXperts), a benchmark for measuring how well a collection of features aligns with expert knowledge. In collaboration with domain experts, we propose FIXScore, a unified expert alignment measure applicable to diverse real-world settings across cosmology, psychology, and medicine domains in vision, language, and time series data modalities. With FIXScore, we find that popular feature-based explanation methods have poor alignment with expert-specified knowledge, highlighting the need for new methods that can better identify features interpretable to experts.
nan
Article 344
Title@2025-07-22 (2): Risk In Context: Benchmarking Privacy Leakage of Foundation Models in Synthetic Tabular Data Generation
Title: Risk In Context: Benchmarking Privacy Leakage of Foundation Models in Synthetic Tabular Data Generation | Risk In Context: Benchmarking Privacy Leakage of Foundation Models in Synthetic Tabular Data Generation | 背景风险:根据基准确定合成图表数据生成基础模型的隐私渗漏 2507.17066v1 |
Authors (4): Jessup Byun, Xiaofeng Lin, Joshua Ward, Guang Cheng
Synthetic tabular data is essential for machine learning workflows, especially for expanding small or imbalanced datasets and enabling privacy-preserving data sharing. However, state-of-the-art generative models (GANs, VAEs, diffusion models) rely on large datasets with thousands of examples. In low-data settings, often the primary motivation for synthetic data, these models can overfit, leak sensitive records, and require frequent retraining. Recent work uses large pre-trained transformers to generate rows via in-context learning (ICL), which needs only a few seed examples and no parameter updates, avoiding retraining. But ICL repeats seed rows verbatim, introducing a new privacy risk that has only been studied in text. The severity of this risk in tabular synthesis-where a single row may identify a person-remains unclear. We address this gap with the first benchmark of three foundation models (GPT-4o-mini, LLaMA 3.3 70B, TabPFN v2) against four baselines on 35 real-world tables from health, finance, and policy. We evaluate statistical fidelity, downstream utility, and membership inference leakage. Results show foundation models consistently have the highest privacy risk. LLaMA 3.3 70B reaches up to 54 percentage points higher true-positive rate at 1% FPR than the safest baseline. GPT-4o-mini and TabPFN are also highly vulnerable. We plot the privacy-utility frontier and show that CTGAN and GPT-4o-mini offer better tradeoffs. A factorial study finds that three zero-cost prompt tweaks-small batch size, low temperature, and using summary statistics-can reduce worst-case AUC by 14 points and rare-class leakage by up to 39 points while maintaining over 90% fidelity. Our benchmark offers a practical guide for safer low-data synthesis with foundation models.
nan
Article 345
Title@2025-07-22 (2): A Coalition Game for On-demand Multi-modal 3D Automated Delivery System
Title: A Coalition Game for On-demand Multi-modal 3D Automated Delivery System | Ein Koalitionsspiel für Multimodales 3D-Automatisiertes Liefersystem auf Abruf | 供求多式3D自动交付系统联盟游戏 2412.17252v2 |
Authors (2): Farzan Moosavi, Bilal Farooq
We introduce a multi-modal autonomous delivery optimization framework as a coalition game for a fleet of UAVs and ADRs operating in two overlaying networks to address last-mile delivery in urban environments, including high-density areas and time-critical applications. The problem is defined as multiple depot pickup and delivery with time windows constrained over operational restrictions, such as vehicle battery limitation, precedence time window, and building obstruction. Utilizing the coalition game theory, we investigate cooperation structures among the modes to capture how strategic collaboration can improve overall routing efficiency. To do so, a generalized reinforcement learning model is designed to evaluate the cost-sharing and allocation to different modes to learn the cooperative behaviour with respect to various realistic scenarios. Our methodology leverages an end-to-end deep multi-agent policy gradient method augmented by a novel spatio-temporal adjacency neighbourhood graph attention network using a heterogeneous edge-enhanced attention model and transformer architecture. Several numerical experiments on last-mile delivery applications have been conducted, showing the results from the case study in the city of Mississauga, which shows that despite the incorporation of an extensive network in the graph for two modes and a complex training structure, the model addresses realistic operational constraints and achieves high-quality solutions compared with the existing transformer-based and classical methods. It can perform well on non-homogeneous data distribution, generalizes well on different scales and configurations, and demonstrates a robust cooperative performance under stochastic scenarios across various tasks, which is effectively reflected by coalition analysis and cost allocation to signify the advantage of cooperation.
nan
Article 346
Title@2025-07-22 (2): Pragmatic Policy Development via Interpretable Behavior Cloning
Title: Pragmatic Policy Development via Interpretable Behavior Cloning | Pragmatische Politikentwicklung durch interpretierbares Verhalten Klonen | 通过可解释行为克隆制定实用政策 2507.17056v1 |
Authors (4): Anton Matsson, Yaochen Rao, Heather J. Litman, Fredrik D. Johansson
Offline reinforcement learning (RL) holds great promise for deriving optimal policies from observational data, but challenges related to interpretability and evaluation limit its practical use in safety-critical domains. Interpretability is hindered by the black-box nature of unconstrained RL policies, while evaluation – typically performed off-policy – is sensitive to large deviations from the data-collecting behavior policy, especially when using methods based on importance sampling. To address these challenges, we propose a simple yet practical alternative: deriving treatment policies from the most frequently chosen actions in each patient state, as estimated by an interpretable model of the behavior policy. By using a tree-based model, which is specifically designed to exploit patterns in the data, we obtain a natural grouping of states with respect to treatment. The tree structure ensures interpretability by design, while varying the number of actions considered controls the degree of overlap with the behavior policy, enabling reliable off-policy evaluation. This pragmatic approach to policy development standardizes frequent treatment patterns, capturing the collective clinical judgment embedded in the data. Using real-world examples in rheumatoid arthritis and sepsis care, we demonstrate that policies derived under this framework can outperform current practice, offering interpretable alternatives to those obtained via offline RL.
nan
Article 347
Title@2025-07-22 (2): Shared Control of Holonomic Wheelchairs through Reinforcement Learning
Title: Shared Control of Holonomic Wheelchairs through Reinforcement Learning | Gemeinsame Kontrolle von Holonomic Rollstuhls durch Verstärkungslernen | 通过强化学习共同控制全神轮椅 2507.17055v1 |
Authors (3): Jannis Bähler, Diego Paez-Granados, Jorge Peña-Queralta
Smart electric wheelchairs can improve user experience by supporting the driver with shared control. State-of-the-art work showed the potential of shared control in improving safety in navigation for non-holonomic robots. However, for holonomic systems, current approaches often lead to unintuitive behavior for the user and fail to utilize the full potential of omnidirectional driving. Therefore, we propose a reinforcement learning-based method, which takes a 2D user input and outputs a 3D motion while ensuring user comfort and reducing cognitive load on the driver. Our approach is trained in Isaac Gym and tested in simulation in Gazebo. We compare different RL agent architectures and reward functions based on metrics considering cognitive load and user comfort. We show that our method ensures collision-free navigation while smartly orienting the wheelchair and showing better or competitive smoothness compared to a previous non-learning-based method. We further perform a sim-to-real transfer and demonstrate, to the best of our knowledge, the first real-world implementation of RL-based shared control for an omnidirectional mobility platform.
nan
Article 348
Title@2025-07-22 (2): Beyond Single-Channel: Multichannel Signal Imaging for PPG-to-ECG Reconstruction with Vision Transformers
Title: Beyond Single-Channel: Multichannel Signal Imaging for PPG-to-ECG Reconstruction with Vision Transformers | Beyond Single-Channel: Multichannel Signal Imaging für PPG-zu-ECG-Rekonstruktion mit Vision Transformern | 超越单一通道:利用愿景变形器进行PPG到ECG重建的多通道信号成像 2505.21767v2 |
Authors (5): Xiaoyan Li, Shixin Xu, Faisal Habib, Arvind Gupta, Huaxiong Huang
Reconstructing ECG from PPG is a promising yet challenging task. While recent advancements in generative models have significantly improved ECG reconstruction, accurately capturing fine-grained waveform features remains a key challenge. To address this, we propose a novel PPG-to-ECG reconstruction method that leverages a Vision Transformer (ViT) as the core network. Unlike conventional approaches that rely on single-channel PPG, our method employs a four-channel signal image representation, incorporating the original PPG, its first-order difference, second-order difference, and area under the curve. This multi-channel design enriches feature extraction by preserving both temporal and physiological variations within the PPG. By leveraging the self-attention mechanism in ViT, our approach effectively captures both inter-beat and intra-beat dependencies, leading to more robust and accurate ECG reconstruction. Experimental results demonstrate that our method consistently outperforms existing 1D convolution-based approaches, achieving up to 29% reduction in PRD and 15% reduction in RMSE. The proposed approach also produces improvements in other evaluation metrics, highlighting its robustness and effectiveness in reconstructing ECG signals. Furthermore, to ensure a clinically relevant evaluation, we introduce new performance metrics, including QRS area error, PR interval error, RT interval error, and RT amplitude difference error. Our findings suggest that integrating a four-channel signal image representation with the self-attention mechanism of ViT enables more effective extraction of informative PPG features and improved modeling of beat-to-beat variations for PPG-to-ECG mapping. Beyond demonstrating the potential of PPG as a viable alternative for heart activity monitoring, our approach opens new avenues for cyclic signal analysis and prediction.
nan
Article 349
Title@2025-07-22 (2): GenMol: A Drug Discovery Generalist with Discrete Diffusion
Title: GenMol: A Drug Discovery Generalist with Discrete Diffusion | GenMol: Ein Drug Discovery Generalist mit diskreter Diffusion | GenMol: 具有分辨扩散作用的药物发现通俗主义者 2501.06158v3 |
Authors (9): Seul Lee, Karsten Kreis, Srimukh Prasad Veccham, Meng Liu, Danny Reidenbach, Yuxing Peng, Saee Paliwal, Weili Nie, Arash Vahdat
Drug discovery is a complex process that involves multiple stages and tasks. However, existing molecular generative models can only tackle some of these tasks. We present Generalist Molecular generative model (GenMol), a versatile framework that uses only a single discrete diffusion model to handle diverse drug discovery scenarios. GenMol generates Sequential Attachment-based Fragment Embedding (SAFE) sequences through non-autoregressive bidirectional parallel decoding, thereby allowing the utilization of a molecular context that does not rely on the specific token ordering while having better sampling efficiency. GenMol uses fragments as basic building blocks for molecules and introduces fragment remasking, a strategy that optimizes molecules by regenerating masked fragments, enabling effective exploration of chemical space. We further propose molecular context guidance (MCG), a guidance method tailored for masked discrete diffusion of GenMol. GenMol significantly outperforms the previous GPT-based model in de novo generation and fragment-constrained generation, and achieves state-of-the-art performance in goal-directed hit generation and lead optimization. These results demonstrate that GenMol can tackle a wide range of drug discovery tasks, providing a unified and versatile approach for molecular design. Our code is available at https://github.com/NVIDIA-Digital-Bio/genmol.
nan
Article 350
Title@2025-07-22 (2): CoLT: The conditional localization test for assessing the accuracy of neural posterior estimates
Title: CoLT: The conditional localization test for assessing the accuracy of neural posterior estimates | CoLT: Der bedingte Lokalisierungstest zur Beurteilung der Genauigkeit neuronaler posteriorer Schätzungen | COLT:评估神经后天估计值准确性的有条件本地化测试 2507.17030v1 |
Authors (3): Tianyu Chen, Vansh Bansal, James G. Scott
We consider the problem of validating whether a neural posterior estimate ( q(\theta \mid x) ) is an accurate approximation to the true, unknown true posterior ( p(\theta \mid x) ). Existing methods for evaluating the quality of an NPE estimate are largely derived from classifier-based tests or divergence measures, but these suffer from several practical drawbacks. As an alternative, we introduce the \emph{Conditional Localization Test} (CoLT), a principled method designed to detect discrepancies between ( p(\theta \mid x) ) and ( q(\theta \mid x) ) across the full range of conditioning inputs. Rather than relying on exhaustive comparisons or density estimation at every ( x ), CoLT learns a localization function that adaptively selects points $\theta_l(x)$ where the neural posterior $q$ deviates most strongly from the true posterior $p$ for that $x$. This approach is particularly advantageous in typical simulation-based inference settings, where only a single draw ( \theta \sim p(\theta \mid x) ) from the true posterior is observed for each conditioning input, but where the neural posterior ( q(\theta \mid x) ) can be sampled an arbitrary number of times. Our theoretical results establish necessary and sufficient conditions for assessing distributional equality across all ( x ), offering both rigorous guarantees and practical scalability. Empirically, we demonstrate that CoLT not only performs better than existing methods at comparing $p$ and $q$, but also pinpoints regions of significant divergence, providing actionable insights for model refinement. These properties position CoLT as a state-of-the-art solution for validating neural posterior estimates.
nan
Article 351
Title@2025-07-22 (2): Koel-TTS: Enhancing LLM based Speech Generation with Preference Alignment and Classifier Free Guidance
Title: Koel-TTS: Enhancing LLM based Speech Generation with Preference Alignment and Classifier Free Guidance | Koel-TTS: Verbesserung der LLM-basierten Sprachgenerierung mit Präferenz-Ausrichtung und Klassifikator-freier Anleitung | Koel-TTS:加强基于LLM的语音生成,提供优先调整和分类免费指导 2502.05236v2 |
Authors (9): Shehzeen Hussain, Paarth Neekhara, Xuesong Yang, Edresson Casanova, Subhankar Ghosh, Mikyas T. Desta, Roy Fejgin, Rafael Valle, Jason Li
While autoregressive speech token generation models produce speech with remarkable variety and naturalness, their inherent lack of controllability often results in issues such as hallucinations and undesired vocalizations that do not conform to conditioning inputs. We introduce Koel-TTS, a suite of enhanced encoder-decoder Transformer TTS models that address these challenges by incorporating preference alignment techniques guided by automatic speech recognition and speaker verification models. Additionally, we incorporate classifier-free guidance to further improve synthesis adherence to the transcript and reference speaker audio. Our experiments demonstrate that these optimizations significantly enhance target speaker similarity, intelligibility, and naturalness of synthesized speech. Notably, Koel-TTS directly maps text and context audio to acoustic tokens, and on the aforementioned metrics, outperforms state-of-the-art TTS models, despite being trained on a significantly smaller dataset. Audio samples and demos are available on our website.
nan
Article 352
Title@2025-07-22 (2): The surprising strength of weak classifiers for validating neural posterior estimates
Title: The surprising strength of weak classifiers for validating neural posterior estimates | Die überraschende Stärke schwacher Klassifikatoren zur Validierung neuraler posteriorer Schätzungen | 证实神经后天估计值的薄弱分类师的惊人力量 2507.17026v1 |
Authors (3): Vansh Bansal, Tianyu Chen, James G. Scott
Neural Posterior Estimation (NPE) has emerged as a powerful approach for amortized Bayesian inference when the true posterior $p(\theta \mid y)$ is intractable or difficult to sample. But evaluating the accuracy of neural posterior estimates remains challenging, with existing methods suffering from major limitations. One appealing and widely used method is the classifier two-sample test (C2ST), where a classifier is trained to distinguish samples from the true posterior $p(\theta \mid y)$ versus the learned NPE approximation $q(\theta \mid y)$. Yet despite the appealing simplicity of the C2ST, its theoretical and practical reliability depend upon having access to a near-Bayes-optimal classifier – a requirement that is rarely met and, at best, difficult to verify. Thus a major open question is: can a weak classifier still be useful for neural posterior validation? We show that the answer is yes. Building on the work of Hu and Lei, we present several key results for a conformal variant of the C2ST, which converts any trained classifier’s scores – even those of weak or over-fitted models – into exact finite-sample p-values. We establish two key theoretical properties of the conformal C2ST: (i) finite-sample Type-I error control, and (ii) non-trivial power that degrades gently in tandem with the error of the trained classifier. The upshot is that even weak, biased, or overfit classifiers can still yield powerful and reliable tests. Empirically, the Conformal C2ST outperforms classical discriminative tests across a wide range of benchmarks. These results reveal the under appreciated strength of weak classifiers for validating neural posterior estimates, establishing the conformal C2ST as a practical, theoretically grounded diagnostic for modern simulation-based inference.
nan
Article 353
Title@2025-07-22 (2): CM-UNet: A Self-Supervised Learning-Based Model for Coronary Artery Segmentation in X-Ray Angiography
Title: CM-UNet: A Self-Supervised Learning-Based Model for Coronary Artery Segmentation in X-Ray Angiography | CM-UNet: Ein selbstüberwachtes lernbasiertes Modell für koronare Arteriensegmentierung in der Röntgenangiographie | CM-UNet:X射线血管成像的冠状动脉切除自上式学习模型 2507.17779v1 |
Authors (11): Camille Challier, Xiaowu Sun, Thabo Mahendiran, Ortal Senouf, Bernard De Bruyne, Denise Auberson, Olivier Müller, Stephane Fournier, Pascal Frossard, Emmanuel Abbé, Dorina Thanou
Accurate segmentation of coronary arteries remains a significant challenge in clinical practice, hindering the ability to effectively diagnose and manage coronary artery disease. The lack of large, annotated datasets for model training exacerbates this issue, limiting the development of automated tools that could assist radiologists. To address this, we introduce CM-UNet, which leverages self-supervised pre-training on unannotated datasets and transfer learning on limited annotated data, enabling accurate disease detection while minimizing the need for extensive manual annotations. Fine-tuning CM-UNet with only 18 annotated images instead of 500 resulted in a 15.2% decrease in Dice score, compared to a 46.5% drop in baseline models without pre-training. This demonstrates that self-supervised learning can enhance segmentation performance and reduce dependence on large datasets. This is one of the first studies to highlight the importance of self-supervised learning in improving coronary artery segmentation from X-ray angiography, with potential implications for advancing diagnostic accuracy in clinical practice. By enhancing segmentation accuracy in X-ray angiography images, the proposed approach aims to improve clinical workflows, reduce radiologists’ workload, and accelerate disease detection, ultimately contributing to better patient outcomes. The source code is publicly available at https://github.com/CamilleChallier/Contrastive-Masked-UNet.
nan
Article 354
Title@2025-07-22 (2): BiLO: Bilevel Local Operator Learning for PDE Inverse Problems. Part II: Efficient Uncertainty Quantification with Low-Rank Adaptation
Title: BiLO: Bilevel Local Operator Learning for PDE Inverse Problems. Part II: Efficient Uncertainty Quantification with Low-Rank Adaptation | BiLO: Bilevel Local Operator Learning für PDE Inverse Probleme. Teil II: Effiziente Unsicherheitsquantifizierung mit Low-Rank-Anpassung | BILO: 双级地方操作员学习PDE反问题,第二部分:低Rank适应性高效率的不确定性量化 2507.17019v1 |
Authors (4): Ray Zirui Zhang, Christopher E. Miles, Xiaohui Xie, John S. Lowengrub
Uncertainty quantification and inverse problems governed by partial differential equations (PDEs) are central to a wide range of scientific and engineering applications. In this second part of a two part series, we extend Bilevel Local Operator Learning (BiLO) for PDE-constrained optimization problems developed in Part 1 to the Bayesian inference framework. At the lower level, we train a network to approximate the local solution operator by minimizing the local operator loss with respect to the weights of the neural network. At the upper level, we sample the PDE parameters from the posterior distribution. We achieve efficient sampling through gradient-based Markov Chain Monte Carlo (MCMC) methods and low-rank adaptation (LoRA). Compared with existing methods based on Bayesian neural networks, our approach bypasses the challenge of sampling in the high-dimensional space of neural network weights and does not require specifying a prior distribution on the neural network solution. Instead, uncertainty propagates naturally from the data through the PDE constraints. By enforcing strong PDE constraints, the proposed method improves the accuracy of both parameter inference and uncertainty quantification. We analyze the dynamic error of the gradient in the MCMC sampler and the static error in the posterior distribution due to inexact minimization of the lower level problem and demonstrate a direct link between the tolerance for solving the lower level problem and the accuracy of the resulting uncertainty quantification. Through numerical experiments across a variety of PDE models, we demonstrate that our method delivers accurate inference and quantification of uncertainties while maintaining high computational efficiency.
nan
Article 355
Title@2025-07-22 (2): Causal Graph Fuzzy LLMs: A First Introduction and Applications in Time Series Forecasting
Title: Causal Graph Fuzzy LLMs: A First Introduction and Applications in Time Series Forecasting | Causal Graph Fuzzy LLMs: Eine erste Einführung und Anwendungen in der Zeitreihenprognose | Causal 图形模糊模糊LLMM:时间序列预测的第一介绍和应用 2507.17016v1 |
Authors (7): Omid Orang, Patricia O. Lucas, Gabriel I. F. Paiva, Petronio C. L. Silva, Felipe Augusto Rocha da Silva, Adriano Alonso Veloso, Frederico Gadelha Guimaraes
In recent years, the application of Large Language Models (LLMs) to time series forecasting (TSF) has garnered significant attention among researchers. This study presents a new frame of LLMs named CGF-LLM using GPT-2 combined with fuzzy time series (FTS) and causal graph to predict multivariate time series, marking the first such architecture in the literature. The key objective is to convert numerical time series into interpretable forms through the parallel application of fuzzification and causal analysis, enabling both semantic understanding and structural insight as input for the pretrained GPT-2 model. The resulting textual representation offers a more interpretable view of the complex dynamics underlying the original time series. The reported results confirm the effectiveness of our proposed LLM-based time series forecasting model, as demonstrated across four different multivariate time series datasets. This initiative paves promising future directions in the domain of TSF using LLMs based on FTS.
nan
Article 356
Title@2025-07-22 (2): laplax – Laplace Approximations with JAX
Title: laplax – Laplace Approximations with JAX | laplax – Laplace-Annäherungen mit JAX | 与 JAX 的拉位相近 2507.17013v1 |
Authors (7): Tobias Weber, Bálint Mucsányi, Lenard Rommel, Thomas Christie, Lars Kasüschke, Marvin Pförtner, Philipp Hennig
The Laplace approximation provides a scalable and efficient means of quantifying weight-space uncertainty in deep neural networks, enabling the application of Bayesian tools such as predictive uncertainty and model selection via Occam’s razor. In this work, we introduce laplax, a new open-source Python package for performing Laplace approximations with jax. Designed with a modular and purely functional architecture and minimal external dependencies, laplax offers a flexible and researcher-friendly framework for rapid prototyping and experimentation. Its goal is to facilitate research on Bayesian neural networks, uncertainty quantification for deep learning, and the development of improved Laplace approximation techniques.
nan
Article 357
Title@2025-07-22 (2): Towards Trustworthy AI: Secure Deepfake Detection using CNNs and Zero-Knowledge Proofs
Title: Towards Trustworthy AI: Secure Deepfake Detection using CNNs and Zero-Knowledge Proofs | Auf dem Weg zu vertrauenswürdiger KI: Sichere Deepfake-Erkennung mit CNNs und Zero-Knowledge-Proofs | 利用有线电视新闻网和零知识证明确保深假探测 2507.17010v1 |
Authors (3): H M Mohaimanul Islam, Huynh Q. N. Vo, Aditya Rane
In the era of synthetic media, deepfake manipulations pose a significant threat to information integrity. To address this challenge, we propose TrustDefender, a two-stage framework comprising (i) a lightweight convolutional neural network (CNN) that detects deepfake imagery in real-time extended reality (XR) streams, and (ii) an integrated succinct zero-knowledge proof (ZKP) protocol that validates detection results without disclosing raw user data. Our design addresses both the computational constraints of XR platforms while adhering to the stringent privacy requirements in sensitive settings. Experimental evaluations on multiple benchmark deepfake datasets demonstrate that TrustDefender achieves 95.3% detection accuracy, coupled with efficient proof generation underpinned by rigorous cryptography, ensuring seamless integration with high-performance artificial intelligence (AI) systems. By fusing advanced computer vision models with provable security mechanisms, our work establishes a foundation for reliable AI in immersive and privacy-sensitive applications.
nan
Article 358
Title@2025-07-22 (2): ORANSight-2.0: Foundational LLMs for O-RAN
Title: ORANSight-2.0: Foundational LLMs for O-RAN | ORANSight-2.0: LLM-Grundlagen für O-RAN | ORANSight-2.0.0:O-RAN基础项目 2503.05200v2 |
Authors (2): Pranshav Gajjar, Vijay K. Shah
Despite the transformative impact of Large Language Models (LLMs) across critical domains such as healthcare, customer service, and business marketing, their integration into Open Radio Access Networks (O-RAN) remains limited. This gap is primarily due to the absence of domain-specific foundational models, with existing solutions often relying on general-purpose LLMs that fail to address the unique challenges and technical intricacies of O-RAN. To bridge this gap, we introduce ORANSight-2.0 (O-RAN Insights), a pioneering initiative to develop specialized foundational LLMs tailored for O-RAN. Built on 18 models spanning five open-source LLM frameworks – Mistral, Qwen, Llama, Phi, and Gemma – ORANSight-2.0 fine-tunes models ranging from 1B to 70B parameters, significantly reducing reliance on proprietary, closed-source models while enhancing performance in O-RAN-specific tasks. At the core of ORANSight-2.0 is RANSTRUCT, a novel Retrieval-Augmented Generation (RAG)-based instruction-tuning framework that employs two LLM agents – a Mistral-based Question Generator and a Qwen-based Answer Generator – to create high-quality instruction-tuning datasets. The generated dataset is then used to fine-tune the 18 pre-trained open-source LLMs via QLoRA. To evaluate ORANSight-2.0, we introduce srsRANBench, a novel benchmark designed for code generation and codebase understanding in the context of srsRAN, a widely used 5G O-RAN stack.
nan
Article 359
Title@2025-07-22 (2): Deep RL Dual Sourcing Inventory Management with Supply and Capacity Risk Awareness
Title: Deep RL Dual Sourcing Inventory Management with Supply and Capacity Risk Awareness | Deep RL Dual Sourcing Bestandsmanagement mit Versorgungs- und Kapazitätsrisiko-Bewusstsein | 具有供应和能力风险意识的深入RL 双重保值双重保值库存管理 2507.14446v2 |
Authors (3): Defeng Liu, Ying Liu, Carson Eisenach
In this work, we study how to efficiently apply reinforcement learning (RL) for solving large-scale stochastic optimization problems by leveraging intervention models. The key of the proposed methodology is to better explore the solution space by simulating and composing the stochastic processes using pre-trained deep learning (DL) models. We demonstrate our approach on a challenging real-world application, the multi-sourcing multi-period inventory management problem in supply chain optimization. In particular, we employ deep RL models for learning and forecasting the stochastic supply chain processes under a range of assumptions. Moreover, we also introduce a constraint coordination mechanism, designed to forecast dual costs given the cross-products constraints in the inventory network. We highlight that instead of directly modeling the complex physical constraints into the RL optimization problem and solving the stochastic problem as a whole, our approach breaks down those supply chain processes into scalable and composable DL modules, leading to improved performance on large real-world datasets. We also outline open problems for future research to further investigate the efficacy of such models.
nan
Article 360
Title@2025-07-22 (2): Revisiting Randomization in Greedy Model Search
Title: Revisiting Randomization in Greedy Model Search | Randomisierung in der Suche nach Greedy-Modellen erneut besuchen | 重新审视贪婪模式搜索中的随机化 2506.15643v2 |
Authors (4): Xin Chen, Jason M. Klusowski, Yan Shuo Tan, Chang Yu
Combining randomized estimators in an ensemble, such as via random forests, has become a fundamental technique in modern data science, but can be computationally expensive. Furthermore, the mechanism by which this improves predictive performance is poorly understood. We address these issues in the context of sparse linear regression by proposing and analyzing an ensemble of greedy forward selection estimators that are randomized by feature subsampling – at each iteration, the best feature is selected from within a random subset. We design a novel implementation based on dynamic programming that greatly improves its computational efficiency. Furthermore, we show via careful numerical experiments that our method can outperform popular methods such as lasso and elastic net across a wide range of settings. Next, contrary to prevailing belief that randomized ensembling is analogous to shrinkage, we show via numerical experiments that it can simultaneously reduce training error and degrees of freedom, thereby shifting the entire bias-variance trade-off curve of the base estimator. We prove this fact rigorously in the setting of orthogonal features, in which case, the ensemble estimator rescales the ordinary least squares coefficients with a two-parameter family of logistic weights, thereby enlarging the model search space. These results enhance our understanding of random forests and suggest that implicit regularization in general may have more complicated effects than explicit regularization.
nan
Article 361
Title@2025-07-22 (2): Should Bias Always be Eliminated? A Principled Framework to Use Data Bias for OOD Generation
Title: Should Bias Always be Eliminated? A Principled Framework to Use Data Bias for OOD Generation | Sollten Bias immer beseitigt werden? Ein prinzipieller Rahmen für die Nutzung von Daten Bias für die OOD-Generierung | 是否应该永远消除偏见? 生成OOD时使用数据偏见的主要框架。 2507.17001v1 |
Authors (7): Yan Li, Guangyi Chen, Yunlong Deng, Zijian Li, Zeyu Tang, Anpeng Wu, Kun Zhang
Most existing methods for adapting models to out-of-distribution (OOD) domains rely on invariant representation learning to eliminate the influence of biased features. However, should bias always be eliminated – and if not, when should it be retained, and how can it be leveraged? To address these questions, we first present a theoretical analysis that explores the conditions under which biased features can be identified and effectively utilized. Building on this theoretical foundation, we introduce a novel framework that strategically leverages bias to complement invariant representations during inference. The framework comprises two key components that leverage bias in both direct and indirect ways: (1) using invariance as guidance to extract predictive ingredients from bias, and (2) exploiting identified bias to estimate the environmental condition and then use it to explore appropriate bias-aware predictors to alleviate environment gaps. We validate our approach through experiments on both synthetic datasets and standard domain generalization benchmarks. Results consistently demonstrate that our method outperforms existing approaches, underscoring its robustness and adaptability.
nan
Article 362
Title@2025-07-22 (2): Fine-Grained Alignment and Noise Refinement for Compositional Text-to-Image Generation
Title: Fine-Grained Alignment and Noise Refinement for Compositional Text-to-Image Generation | Feinkörnige Ausrichtung und Geräuschverfeinerung für kompositorische Text-zu-Bild-Generierung | 精细调整和噪音改进,以形成成组文字到成象 2503.06506v2 |
Authors (6): Amir Mohammad Izadi, Seyed Mohammad Hadi Hosseini, Soroush Vafaie Tabar, Ali Abdollahi, Armin Saghafian, Mahdieh Soleymani Baghshah
Text-to-image generative models have made significant advancements in recent years; however, accurately capturing intricate details in textual prompts-such as entity missing, attribute binding errors, and incorrect relationships remains a formidable challenge. In response, we present an innovative, training-free method that directly addresses these challenges by incorporating tailored objectives to account for textual constraints. Unlike layout-based approaches that enforce rigid structures and limit diversity, our proposed approach offers a more flexible arrangement of the scene by imposing just the extracted constraints from the text, without any unnecessary additions. These constraints are formulated as losses-entity missing, entity mixing, attribute binding, and spatial relationships-integrated into a unified loss that is applied in the first generation stage. Furthermore, we introduce a feedback-driven system for fine-grained initial noise refinement. This system integrates a verifier that evaluates the generated image, identifies inconsistencies, and provides corrective feedback. Leveraging this feedback, our refinement method first targets the unmet constraints by refining the faulty attention maps caused by initial noise, through the optimization of selective losses associated with these constraints. Subsequently, our unified loss function is reapplied to proceed the second generation phase. Experimental results demonstrate that our method, relying solely on our proposed objective functions, significantly enhances compositionality, achieving a 24% improvement in human evaluation and a 25% gain in spatial relationships. Furthermore, our fine-grained noise refinement proves effective, boosting performance by up to 5%. Code is available at \href{https://github.com/hadi-hosseini/noise-refinement}{https://github.com/hadi-hosseini/noise-refinement}.
nan
Article 363
Title@2025-07-22 (2): Divisive Decisions: Improving Salience-Based Training for Generalization in Binary Classification Tasks
Title: Divisive Decisions: Improving Salience-Based Training for Generalization in Binary Classification Tasks | Divisive Entscheidungen: Verbesserung der Salience-basierten Ausbildung für Generalisierung in Binary-Klassifikation Aufgaben | 不同决定:改进以素养为基础的培训,促进二元分类任务中的普遍化 2507.17000v1 |
Authors (3): Jacob Piland, Chris Sweet, Adam Czajka
Existing saliency-guided training approaches improve model generalization by incorporating a loss term that compares the model’s class activation map (CAM) for a sample’s true-class ({\it i.e.}, correct-label class) against a human reference saliency map. However, prior work has ignored the false-class CAM(s), that is the model’s saliency obtained for incorrect-label class. We hypothesize that in binary tasks the true and false CAMs should diverge on the important classification features identified by humans (and reflected in human saliency maps). We use this hypothesis to motivate three new saliency-guided training methods incorporating both true- and false-class model’s CAM into the training strategy and a novel post-hoc tool for identifying important features. We evaluate all introduced methods on several diverse binary close-set and open-set classification tasks, including synthetic face detection, biometric presentation attack detection, and classification of anomalies in chest X-ray scans, and find that the proposed methods improve generalization capabilities of deep learning models over traditional (true-class CAM only) saliency-guided training approaches. We offer source codes and model weights\footnote{GitHub repository link removed to preserve anonymity} to support reproducible research.
nan
Article 364
Title@2025-07-22 (2): Bayesian preference elicitation for decision support in multiobjective optimization
Title: Bayesian preference elicitation for decision support in multiobjective optimization | Bayesische Präferenz-Elizitation für Entscheidungsunterstützung bei multiobjektiver Optimierung | 在多目标优化中争取决策支持的贝耶斯偏好 2507.16999v1 |
Authors (3): Felix Huber, Sebastian Rojas Gonzalez, Raul Astudillo
We present a novel approach to help decision-makers efficiently identify preferred solutions from the Pareto set of a multi-objective optimization problem. Our method uses a Bayesian model to estimate the decision-maker’s utility function based on pairwise comparisons. Aided by this model, a principled elicitation strategy selects queries interactively to balance exploration and exploitation, guiding the discovery of high-utility solutions. The approach is flexible: it can be used interactively or a posteriori after estimating the Pareto front through standard multi-objective optimization techniques. Additionally, at the end of the elicitation phase, it generates a reduced menu of high-quality solutions, simplifying the decision-making process. Through experiments on test problems with up to nine objectives, our method demonstrates superior performance in finding high-utility solutions with a small number of queries. We also provide an open-source implementation of our method to support its adoption by the broader community.
nan
Article 365
Title@2025-07-22 (2): Unified Sparse-Matrix Representations for Diverse Neural Architectures
Title: Unified Sparse-Matrix Representations for Diverse Neural Architectures | Unified Sparse-Matrix-Darstellungen für unterschiedliche Neuralarchitekturen | 不同神经神经结构的统一斯普马马马力显示器 2506.01966v3 |
Authors (1): Yuzhou Zhu
Deep neural networks employ specialized architectures for vision, sequential and language tasks, yet this proliferation obscures their underlying commonalities. We introduce a unified matrix-order framework that casts convolutional, recurrent and self-attention operations as sparse matrix multiplications. Convolution is realized via an upper-triangular weight matrix performing first-order transformations; recurrence emerges from a lower-triangular matrix encoding stepwise updates; attention arises naturally as a third-order tensor factorization. We prove algebraic isomorphism with standard CNN, RNN and Transformer layers under mild assumptions. Empirical evaluations on image classification (MNIST, CIFAR-10/100, Tiny ImageNet), time-series forecasting (ETTh1, Electricity Load Diagrams) and language modeling/classification (AG News, WikiText-2, Penn Treebank) confirm that sparse-matrix formulations match or exceed native model performance while converging in comparable or fewer epochs. By reducing architecture design to sparse pattern selection, our matrix perspective aligns with GPU parallelism and leverages mature algebraic optimization tools. This work establishes a mathematically rigorous substrate for diverse neural architectures and opens avenues for principled, hardware-aware network design.
nan
Article 366
Title@2025-07-22 (2): PyG 2.0: Scalable Learning on Real World Graphs
Title: PyG 2.0: Scalable Learning on Real World Graphs | PyG 2.0: Scalable Learning on Real World Graphs | PyG 2.0: 真实世界图表上的可缩放学习 2507.16991v1 |
Authors (13): Matthias Fey, Jinu Sunil, Akihiro Nitta, Rishi Puri, Manan Shah, Blaž Stojanovič, Ramona Bendias, Alexandria Barghi, Vid Kocijan, Zecheng Zhang, Xinwei He, Jan Eric Lenssen, Jure Leskovec
PyG (PyTorch Geometric) has evolved significantly since its initial release, establishing itself as a leading framework for Graph Neural Networks. In this paper, we present Pyg 2.0 (and its subsequent minor versions), a comprehensive update that introduces substantial improvements in scalability and real-world application capabilities. We detail the framework’s enhanced architecture, including support for heterogeneous and temporal graphs, scalable feature/graph stores, and various optimizations, enabling researchers and practitioners to tackle large-scale graph learning problems efficiently. Over the recent years, PyG has been supporting graph learning in a large variety of application areas, which we will summarize, while providing a deep dive into the important areas of relational deep learning and large language modeling.
nan
Article 367
Title@2025-07-22 (2): Hierarchical Reinforcement Learning Framework for Adaptive Walking Control Using General Value Functions of Lower-Limb Sensor Signals
Title: Hierarchical Reinforcement Learning Framework for Adaptive Walking Control Using General Value Functions of Lower-Limb Sensor Signals | Hierarchisches Verstärkungs-Lern-Framework für adaptive Walking-Steuerung unter Verwendung von allgemeinen Wertfunktionen von Lower-Limb Sensor Signalen | 利用低Limb传感器信号的一般价值功能的适应性步行控制梯级强化学习框架 2507.16983v1 |
Authors (4): Sonny T. Jones, Grange M. Simpson, Patrick M. Pilarski, Ashley N. Dalrymple
Rehabilitation technology is a natural setting to study the shared learning and decision-making of human and machine agents. In this work, we explore the use of Hierarchical Reinforcement Learning (HRL) to develop adaptive control strategies for lower-limb exoskeletons, aiming to enhance mobility and autonomy for individuals with motor impairments. Inspired by prominent models of biological sensorimotor processing, our investigated HRL approach breaks down the complex task of exoskeleton control adaptation into a higher-level framework for terrain strategy adaptation and a lower-level framework for providing predictive information; this latter element is implemented via the continual learning of general value functions (GVFs). GVFs generated temporal abstractions of future signal values from multiple wearable lower-limb sensors, including electromyography, pressure insoles, and goniometers. We investigated two methods for incorporating actual and predicted sensor signals into a policy network with the intent to improve the decision-making capacity of the control system of a lower-limb exoskeleton during ambulation across varied terrains. As a key result, we found that the addition of predictions made from GVFs increased overall network accuracy. Terrain-specific performance increases were seen while walking on even ground, uneven ground, up and down ramps, and turns, terrains that are often misclassified without predictive information. This suggests that predictive information can aid decision-making during uncertainty, e.g., on terrains that have a high chance of being misclassified. This work, therefore, contributes new insights into the nuances of HRL and the future development of exoskeletons to facilitate safe transitioning and traversing across different walking environments.
nan
Article 368
Title@2025-07-22 (2): Fast and Scalable Gene Embedding Search: A Comparative Study of FAISS and ScaNN
Title: Fast and Scalable Gene Embedding Search: A Comparative Study of FAISS and ScaNN | Schnelle und skalierbare Gene-Einbettung Suche: Eine vergleichende Studie von FAISS und Scann | 快速和可缩放基因嵌入搜索:FASIS和SCANN的比较研究 2507.16978v1 |
Authors (7): Mohammad Saleh Refahi, Gavin Hearne, Harrison Muller, Kieran Lynch, Bahrad A. Sokhansanj, James R. Brown, Gail Rosen
The exponential growth of DNA sequencing data has outpaced traditional heuristic-based methods, which struggle to scale effectively. Efficient computational approaches are urgently needed to support large-scale similarity search, a foundational task in bioinformatics for detecting homology, functional similarity, and novelty among genomic and proteomic sequences. Although tools like BLAST have been widely used and remain effective in many scenarios, they suffer from limitations such as high computational cost and poor performance on divergent sequences. In this work, we explore embedding-based similarity search methods that learn latent representations capturing deeper structural and functional patterns beyond raw sequence alignment. We systematically evaluate two state-of-the-art vector search libraries, FAISS and ScaNN, on biologically meaningful gene embeddings. Unlike prior studies, our analysis focuses on bioinformatics-specific embeddings and benchmarks their utility for detecting novel sequences, including those from uncharacterized taxa or genes lacking known homologs. Our results highlight both computational advantages (in memory and runtime efficiency) and improved retrieval quality, offering a promising alternative to traditional alignment-heavy tools.
nan
Article 369
Title@2025-07-22 (2): Temporally Consistent Dynamic Scene Graphs: An End-to-End Approach for Action Tracklet Generation
Title: Temporally Consistent Dynamic Scene Graphs: An End-to-End Approach for Action Tracklet Generation | Zeitgleich konsistente dynamische Szenendiagramme: Ein End-to-End-Ansatz für die Tracklet-Generierung von Action Tracklets | 临时一致的动态场景图:行动轨迹生成的端至端办法 2412.02808v2 |
Authors (5): Raphael Ruschel, Md Awsafur Rahman, Hardik Prajapati, Suya You, B. S. Manjuanth
Understanding video content is pivotal for advancing real-world applications like activity recognition, autonomous systems, and human-computer interaction. While scene graphs are adept at capturing spatial relationships between objects in individual frames, extending these representations to capture dynamic interactions across video sequences remains a significant challenge. To address this, we present TCDSG, Temporally Consistent Dynamic Scene Graphs, an innovative end-to-end framework that detects, tracks, and links subject-object relationships across time, generating action tracklets, temporally consistent sequences of entities and their interactions. Our approach leverages a novel bipartite matching mechanism, enhanced by adaptive decoder queries and feedback loops, ensuring temporal coherence and robust tracking over extended sequences. This method not only establishes a new benchmark by achieving over 60% improvement in temporal recall@k on the Action Genome, OpenPVSG, and MEVA datasets but also pioneers the augmentation of MEVA with persistent object ID annotations for comprehensive tracklet generation. By seamlessly integrating spatial and temporal dynamics, our work sets a new standard in multi-frame video analysis, opening new avenues for high-impact applications in surveillance, autonomous navigation, and beyond.
nan
Article 370
Title@2025-07-22 (2): MRI-CORE: A Foundation Model for Magnetic Resonance Imaging
Title: MRI-CORE: A Foundation Model for Magnetic Resonance Imaging | MRI-CORE: Ein Basismodell für Magnetresonanz-Imaging | MRI-CORE:磁共振成像基础模型 2506.12186v2 |
Authors (7): Haoyu Dong, Yuwen Chen, Hanxue Gu, Nicholas Konz, Yaqian Chen, Qihang Li, Maciej A. Mazurowski
The widespread use of Magnetic Resonance Imaging (MRI) in combination with deep learning shows promise for many high-impact automated diagnostic and prognostic tools. However, training new models requires large amounts of labeled data, a challenge due to high cost of precise annotations and data privacy. To address this issue, we introduce the MRI-CORE, a vision foundation model trained using more than 6 million slices from over 110 thousand MRI volumes across 18 body locations. Our experiments show notable improvements in performance over state-of-the-art methods in 13 data-restricted segmentation tasks, as well as in image classification, and zero-shot segmentation, showing the strong potential of MRI-CORE to enable data-efficient development of artificial intelligence models. We also present data on which strategies yield most useful foundation models and a novel analysis relating similarity between pre-training and downstream task data with transfer learning performance. Our model is publicly available with a permissive license.
nan
Article 371
Title@2025-07-22 (2): A Hybrid CNN-VSSM model for Multi-View, Multi-Task Mammography Analysis: Robust Diagnosis with Attention-Based Fusion
Title: A Hybrid CNN-VSSM model for Multi-View, Multi-Task Mammography Analysis: Robust Diagnosis with Attention-Based Fusion | Hybrides CNN-VSSM-Modell für Multi-View, Multi-Task Mammographie Analyse: Robuste Diagnose mit aufmerksamkeitsbasierter Fusion | 有线电视新闻网-VSSM混合多视、多任务乳房造影分析模式:以注意力为基础的结合的强力诊断 2507.16955v1 |
Authors (6): Yalda Zafari, Roaa Elalfy, Mohamed Mabrok, Somaya Al-Maadeed, Tamer Khattab, Essam A. Rashed
Early and accurate interpretation of screening mammograms is essential for effective breast cancer detection, yet it remains a complex challenge due to subtle imaging findings and diagnostic ambiguity. Many existing AI approaches fall short by focusing on single view inputs or single-task outputs, limiting their clinical utility. To address these limitations, we propose a novel multi-view, multitask hybrid deep learning framework that processes all four standard mammography views and jointly predicts diagnostic labels and BI-RADS scores for each breast. Our architecture integrates a hybrid CNN VSSM backbone, combining convolutional encoders for rich local feature extraction with Visual State Space Models (VSSMs) to capture global contextual dependencies. To improve robustness and interpretability, we incorporate a gated attention-based fusion module that dynamically weights information across views, effectively handling cases with missing data. We conduct extensive experiments across diagnostic tasks of varying complexity, benchmarking our proposed hybrid models against baseline CNN architectures and VSSM models in both single task and multi task learning settings. Across all tasks, the hybrid models consistently outperform the baselines. In the binary BI-RADS 1 vs. 5 classification task, the shared hybrid model achieves an AUC of 0.9967 and an F1 score of 0.9830. For the more challenging ternary classification, it attains an F1 score of 0.7790, while in the five-class BI-RADS task, the best F1 score reaches 0.4904. These results highlight the effectiveness of the proposed hybrid framework and underscore both the potential and limitations of multitask learning for improving diagnostic performance and enabling clinically meaningful mammography analysis.
nan
Article 372
Title@2025-07-22 (2): Fundamental limits of distributed covariance matrix estimation via a conditional strong data processing inequality
Title: Fundamental limits of distributed covariance matrix estimation via a conditional strong data processing inequality | Grundlegende Grenzen der verteilten Kovarianz-Matrix-Schätzung über eine bedingt starke Datenverarbeitungsungleichheit | 通过有条件的强有力的数据处理不平等状况进行分布式共变量矩阵估计的基本限制 2507.16953v1 |
Authors (3): Mohammad Reza Rahmani, Mohammad Hossein Yassaee, Mohammad Reza Aref
Estimating high-dimensional covariance matrices is a key task across many fields. This paper explores the theoretical limits of distributed covariance estimation in a feature-split setting, where communication between agents is constrained. Specifically, we study a scenario in which multiple agents each observe different components of i.i.d. samples drawn from a sub-Gaussian random vector. A central server seeks to estimate the complete covariance matrix using a limited number of bits communicated by each agent. We obtain a nearly tight minimax lower bound for covariance matrix estimation under operator norm and Frobenius norm. Our main technical tool is a novel generalization of the strong data processing inequality (SDPI), termed the Conditional Strong Data Processing Inequality (C-SDPI) coefficient, introduced in this work. The C-SDPI coefficient shares key properties such as tensorization with the conventional SDPI. Crucially, it quantifies the average contraction in a state-dependent channel and can be significantly lower than the worst-case SDPI coefficient over the state input. Utilizing the doubling trick of Geng-Nair and an operator Jensen inequality, we compute this coefficient for Gaussian mixture channels. We then employ it to establish minimax lower bounds on estimation error, capturing the trade-offs among sample size, communication cost, and data dimensionality. Building on this, we present a nearly optimal estimation protocol whose sample and communication requirements match the lower bounds up to logarithmic factors. Unlike much of the existing literature, our framework does not assume infinite samples or Gaussian distributions, making it broadly applicable. Finally, we extend our analysis to interactive protocols, showing interaction can significantly reduce communication requirements compared to non-interactive schemes.
nan
Article 373
Title@2025-07-22 (2): ResidualPlanner+: a scalable matrix mechanism for marginals and beyond
Title: ResidualPlanner+: a scalable matrix mechanism for marginals and beyond | ResidualPlanner+: ein skalierbarer Matrixmechanismus für Randbereiche und darüber hinaus | 剩余规划者+:边际和边际外的可缩放矩阵机制 2305.08175v3 |
Authors (6): Yingtai Xiao, Guanlin He, Levent Toksoz, Zeyu Ding, Danfeng Zhang, Daniel Kifer
Noisy marginals are a common form of confidentiality protecting data release and are useful for many downstream tasks such as contingency table analysis, construction of Bayesian networks, and even synthetic data generation. Privacy mechanisms that provide unbiased noisy answers to linear queries (such as marginals) are known as matrix mechanisms. We propose ResidualPlanner and ResidualPlanner+, two highly scalable matrix mechanisms. ResidualPlanner is both optimal and scalable for answering marginal queries with Gaussian noise, while ResidualPlanner+ provides support for more general workloads, such as combinations of marginals and range queries or prefix-sum queries. ResidualPlanner can optimize for many loss functions that can be written as a convex function of marginal variances (prior work was restricted to just one predefined objective function). ResidualPlanner can optimize the accuracy of marginals in large scale settings in seconds, even when the previous state of the art (HDMM) runs out of memory. It even runs on datasets with 100 attributes in a couple of minutes. Furthermore, ResidualPlanner can efficiently compute variance/covariance values for each marginal (prior methods quickly run out of memory, even for relatively small datasets). ResidualPlanner+ provides support for more complex workloads that combine marginal and range/prefix-sum queries (e.g., a marginal on race, a range query on age, and a combined race/age tabulation that answers age range queries for each race). It even supports custom user-defined workloads on different attributes. With this added flexibility, ResidualPlanner+ is not necessarily optimal, however it is still extremely scalable and outperforms the prior state-of-the-art (HDMM) on prefix-sum queries both in terms of accuracy and speed.
nan
Article 374
Title@2025-07-22 (2): AURA: A Multi-Modal Medical Agent for Understanding, Reasoning & Annotation
Title: AURA: A Multi-Modal Medical Agent for Understanding, Reasoning & Annotation | AURA: Multi-Modal Medical Agent für Verständnis, Vernunft und Annotation | AURA:一个多模式医疗代理,用于理解、说明理由和说明 2507.16940v1 |
Authors (3): Nima Fathi, Amar Kumar, Tal Arbel
Recent advancements in Large Language Models (LLMs) have catalyzed a paradigm shift from static prediction systems to agentic AI agents capable of reasoning, interacting with tools, and adapting to complex tasks. While LLM-based agentic systems have shown promise across many domains, their application to medical imaging remains in its infancy. In this work, we introduce AURA, the first visual linguistic explainability agent designed specifically for comprehensive analysis, explanation, and evaluation of medical images. By enabling dynamic interactions, contextual explanations, and hypothesis testing, AURA represents a significant advancement toward more transparent, adaptable, and clinically aligned AI systems. We highlight the promise of agentic AI in transforming medical image analysis from static predictions to interactive decision support. Leveraging Qwen-32B, an LLM-based architecture, AURA integrates a modular toolbox comprising: (i) a segmentation suite with phase grounding, pathology segmentation, and anatomy segmentation to localize clinically meaningful regions; (ii) a counterfactual image-generation module that supports reasoning through image-level explanations; and (iii) a set of evaluation tools including pixel-wise difference-map analysis, classification, and advanced state-of-the-art components to assess diagnostic relevance and visual interpretability.
nan
Article 375
Title@2025-07-22 (2): Enhancing supply chain security with automated machine learning
Title: Enhancing supply chain security with automated machine learning | Verbesserung der Sicherheit der Lieferkette durch automatisiertes maschinelles Lernen | 通过自动机械学习加强供应链安全 2406.13166v3 |
Authors (3): Haibo Wang, Lutfu S. Sua, Bahram Alidaee
The increasing scale and complexity of global supply chains have led to new challenges spanning various fields, such as supply chain disruptions due to long waiting lines at the ports, material shortages, and inflation. Coupled with the size of supply chains and the availability of vast amounts of data, efforts towards tackling such challenges have led to an increasing interest in applying machine learning methods in many aspects of supply chains. Unlike other solutions, ML techniques, including Random Forest, XGBoost, LightGBM, and Neural Networks, make predictions and approximate optimal solutions faster. This paper presents an automated ML framework to enhance supply chain security by detecting fraudulent activities, predicting maintenance needs, and forecasting material backorders. Using datasets of varying sizes, results show that fraud detection achieves an 88% accuracy rate using sampling methods, machine failure prediction reaches 93.4% accuracy, and material backorder prediction achieves 89.3% accuracy. Hyperparameter tuning significantly improved the performance of these models, with certain supervised techniques like XGBoost and LightGBM reaching up to 100% precision. This research contributes to supply chain security by streamlining data preprocessing, feature selection, model optimization, and inference deployment, addressing critical challenges and boosting operational efficiency.
nan
Article 376
Title@2025-07-22 (2): SiLQ: Simple Large Language Model Quantization-Aware Training
Title: SiLQ: Simple Large Language Model Quantization-Aware Training | SiLQ: Einfaches großsprachiges Modell Quantization-Aware Training | SiLQ: 简单大语言模型量化软件培训 2507.16933v1 |
Authors (5): Steven K. Esser, Jeffrey L. McKinstry, Deepika Bablani, Rathinakumar Appuswamy, Dharmendra S. Modha
Large language models can be quantized to reduce inference time latency, model size, and energy consumption, thereby delivering a better user experience at lower cost. A challenge exists to deliver quantized models with minimal loss of accuracy in reasonable time, and in particular to do so without requiring mechanisms incompatible with specialized inference accelerators. Here, we demonstrate a simple, end-to-end quantization-aware training approach that, with an increase in total model training budget of less than 0.1%, outperforms the leading published quantization methods by large margins on several modern benchmarks, with both base and instruct model variants. The approach easily generalizes across different model architectures, can be applied to activations, cache, and weights, and requires the introduction of no additional operations to the model other than the quantization itself.
nan
Article 377
Title@2025-07-22 (2): Avoiding spectral pollution for transfer operators using residuals
Title: Avoiding spectral pollution for transfer operators using residuals | Vermeidung von spektralen Verschmutzungen für Übertragungsbetreiber mit Reststoffen | 避免对使用残留物的转移经营者的光谱污染 2507.16915v1 |
Authors (5): April Herwig, Matthew J. Colbrook, Oliver Junge, Péter Koltai, Julia Slipantschuk
Koopman operator theory enables linear analysis of nonlinear dynamical systems by lifting their evolution to infinite-dimensional function spaces. However, finite-dimensional approximations of Koopman and transfer (Frobenius–Perron) operators are prone to spectral pollution, introducing spurious eigenvalues that can compromise spectral computations. While recent advances have yielded provably convergent methods for Koopman operators, analogous tools for general transfer operators remain limited. In this paper, we present algorithms for computing spectral properties of transfer operators without spectral pollution, including extensions to the Hardy-Hilbert space. Case studies–ranging from families of Blaschke maps with known spectrum to a molecular dynamics model of protein folding–demonstrate the accuracy and flexibility of our approach. Notably, we demonstrate that spectral features can arise even when the corresponding eigenfunctions lie outside the chosen space, highlighting the functional-analytic subtleties in defining the “true” Koopman spectrum. Our methods offer robust tools for spectral estimation across a broad range of applications.
nan
Article 378
Title@2025-07-22 (2): ThinkAct: Vision-Language-Action Reasoning via Reinforced Visual Latent Planning
Title: ThinkAct: Vision-Language-Action Reasoning via Reinforced Visual Latent Planning | ThinkAct: Vision-Language-Action-Reasoning durch verstärkte visuelle Latent-Planung | 思考:通过强化视觉预备规划提出愿景-语言-行动理由 2507.16815v1 |
Authors (5): Chi-Pin Huang, Yueh-Hua Wu, Min-Hung Chen, Yu-Chiang Frank Wang, Fu-En Yang
Vision-language-action (VLA) reasoning tasks require agents to interpret multimodal instructions, perform long-horizon planning, and act adaptively in dynamic environments. Existing approaches typically train VLA models in an end-to-end fashion, directly mapping inputs to actions without explicit reasoning, which hinders their ability to plan over multiple steps or adapt to complex task variations. In this paper, we propose ThinkAct, a dual-system framework that bridges high-level reasoning with low-level action execution via reinforced visual latent planning. ThinkAct trains a multimodal LLM to generate embodied reasoning plans guided by reinforcing action-aligned visual rewards based on goal completion and trajectory consistency. These reasoning plans are compressed into a visual plan latent that conditions a downstream action model for robust action execution on target environments. Extensive experiments on embodied reasoning and robot manipulation benchmarks demonstrate that ThinkAct enables few-shot adaptation, long-horizon planning, and self-correction behaviors in complex embodied AI tasks.
nan
Article 379
Title@2025-07-22 (2): Semi-off-Policy Reinforcement Learning for Vision-Language Slow-thinking Reasoning
Title: Semi-off-Policy Reinforcement Learning for Vision-Language Slow-thinking Reasoning | Semi-off-Policy-Verstärkung Lernen für Vision-Sprache langsam denkende Vernunft | 愿景-语言-思维迟慢原因半非政策强化学习 2507.16814v1 |
Authors (10): Junhao Shen, Haiteng Zhao, Yuzhe Gu, Songyang Gao, Kuikun Liu, Haian Huang, Jianfei Gao, Dahua Lin, Wenwei Zhang, Kai Chen
Enhancing large vision-language models (LVLMs) with visual slow-thinking reasoning is crucial for solving complex multimodal tasks. However, since LVLMs are mainly trained with vision-language alignment, it is difficult to adopt on-policy reinforcement learning (RL) to develop the slow thinking ability because the rollout space is restricted by its initial abilities. Off-policy RL offers a way to go beyond the current policy, but directly distilling trajectories from external models may cause visual hallucinations due to mismatched visual perception abilities across models. To address these issues, this paper proposes SOPHIA, a simple and scalable Semi-Off-Policy RL for vision-language slow-tHInking reAsoning. SOPHIA builds a semi-off-policy behavior model by combining on-policy visual understanding from a trainable LVLM with off-policy slow-thinking reasoning from a language model, assigns outcome-based rewards to reasoning, and propagates visual rewards backward. Then LVLM learns slow-thinking reasoning ability from the obtained reasoning trajectories using propagated rewards via off-policy RL algorithms. Extensive experiments with InternVL2.5 and InternVL3.0 with 8B and 38B sizes show the effectiveness of SOPHIA. Notably, SOPHIA improves InternVL3.0-38B by 8.50% in average, reaching state-of-the-art performance among open-source LVLMs on multiple multimodal reasoning benchmarks, and even outperforms some closed-source models (e.g., GPT-4.1) on the challenging MathVision and OlympiadBench, achieving 49.08% and 49.95% pass@1 accuracy, respectively. Analysis shows SOPHIA outperforms supervised fine-tuning and direct on-policy RL methods, offering a better policy initialization for further on-policy training.
nan
Article 380
Title@2025-07-22 (2): MegaScience: Pushing the Frontiers of Post-Training Datasets for Science Reasoning
Title: MegaScience: Pushing the Frontiers of Post-Training Datasets for Science Reasoning | MegaScience: Die Grenzen von Post-Training-Datensätzen für wissenschaftliche Vernunft sprengen | 超科学:推进培训后数据集的前沿,促进科学理性 2507.16812v1 |
Authors (3): Run-Ze Fan, Zengzhi Wang, Pengfei Liu
Scientific reasoning is critical for developing AI scientists and supporting human researchers in advancing the frontiers of natural science discovery. However, the open-source community has primarily focused on mathematics and coding while neglecting the scientific domain, largely due to the absence of open, large-scale, high-quality, verifiable scientific reasoning datasets. To bridge this gap, we first present TextbookReasoning, an open dataset featuring truthful reference answers extracted from 12k university-level scientific textbooks, comprising 650k reasoning questions spanning 7 scientific disciplines. We further introduce MegaScience, a large-scale mixture of high-quality open-source datasets totaling 1.25 million instances, developed through systematic ablation studies that evaluate various data selection methodologies to identify the optimal subset for each publicly available scientific dataset. Meanwhile, we build a comprehensive evaluation system covering diverse subjects and question types across 15 benchmarks, incorporating comprehensive answer extraction strategies to ensure accurate evaluation metrics. Our experiments demonstrate that our datasets achieve superior performance and training efficiency with more concise response lengths compared to existing open-source scientific datasets. Furthermore, we train Llama3.1, Qwen2.5, and Qwen3 series base models on MegaScience, which significantly outperform the corresponding official instruct models in average performance. In addition, MegaScience exhibits greater effectiveness for larger and stronger models, suggesting a scaling benefit for scientific tuning. We release our data curation pipeline, evaluation system, datasets, and seven trained models to the community to advance scientific reasoning research.
nan
Article 381
Title@2025-07-22 (2): Revisiting Pre-trained Language Models for Vulnerability Detection
Title: Revisiting Pre-trained Language Models for Vulnerability Detection | Überprüfung vortrainierter Sprachmodelle für die Erkennung von Schwachstellen | 重新审查关于脆弱性检测的预培训语言模式 2507.16887v1 |
Authors (5): Youpeng Li, Weiliang Qi, Xuyu Wang, Fuxun Yu, Xinda Wang
The rapid advancement of pre-trained language models (PLMs) has demonstrated promising results for various code-related tasks. However, their effectiveness in detecting real-world vulnerabilities remains a critical challenge. % for the security community. While existing empirical studies evaluate PLMs for vulnerability detection (VD), their inadequate consideration in data preparation, evaluation setups, and experimental settings undermines the accuracy and comprehensiveness of evaluations. This paper introduces RevisitVD, an extensive evaluation of 17 PLMs spanning smaller code-specific PLMs and large-scale PLMs using newly constructed datasets. Specifically, we compare the performance of PLMs under both fine-tuning and prompt engineering, assess their effectiveness and generalizability across various training and testing settings, and analyze their robustness against code normalization, abstraction, and semantic-preserving transformations. Our findings reveal that, for VD tasks, PLMs incorporating pre-training tasks designed to capture the syntactic and semantic patterns of code outperform both general-purpose PLMs and those solely pre-trained or fine-tuned on large code corpora. However, these models face notable challenges in real-world scenarios, such as difficulties in detecting vulnerabilities with complex dependencies, handling perturbations introduced by code normalization and abstraction, and identifying semantic-preserving vulnerable code transformations. Also, the truncation caused by the limited context windows of PLMs can lead to a non-negligible amount of labeling errors. This study underscores the importance of thorough evaluations of model performance in practical scenarios and outlines future directions to help enhance the effectiveness of PLMs for realistic VD applications.
nan
Article 382
Title@2025-07-22 (2): Beyond Binary Rewards: Training LMs to Reason About Their Uncertainty
Title: Beyond Binary Rewards: Training LMs to Reason About Their Uncertainty | Über Binäre Belohnungen hinaus: LMs zur Vernunft über ihre Ungewissheit ausbilden | 二元奖励之后的奖励:培训 “ 以其不确定性为由 “ 培训 “ 培训 “ 培训 “ 培训 “ 培训 “ 培训 “ 培训 “ 培训 “ 培训 “ 培训 “ 培训 “ 培训 “ 培训 “ 培训 “ 培训 “ 培训 “ 培训 “ 培训 “ 培训 “ 培训 “ 培训 “ 2507.16806v1 |
Authors (7): Mehul Damani, Isha Puri, Stewart Slocum, Idan Shenfeld, Leshem Choshen, Yoon Kim, Jacob Andreas
When language models (LMs) are trained via reinforcement learning (RL) to generate natural language “reasoning chains”, their performance improves on a variety of difficult question answering tasks. Today, almost all successful applications of RL for reasoning use binary reward functions that evaluate the correctness of LM outputs. Because such reward functions do not penalize guessing or low-confidence outputs, they often have the unintended side-effect of degrading calibration and increasing the rate at which LMs generate incorrect responses (or “hallucinate”) in other problem domains. This paper describes RLCR (Reinforcement Learning with Calibration Rewards), an approach to training reasoning models that jointly improves accuracy and calibrated confidence estimation. During RLCR, LMs generate both predictions and numerical confidence estimates after reasoning. They are trained to optimize a reward function that augments a binary correctness score with a Brier score – a scoring rule for confidence estimates that incentivizes calibrated prediction. We first prove that this reward function (or any analogous reward function that uses a bounded, proper scoring rule) yields models whose predictions are both accurate and well-calibrated. We next show that across diverse datasets, RLCR substantially improves calibration with no loss in accuracy, on both in-domain and out-of-domain evaluations – outperforming both ordinary RL training and classifiers trained to assign post-hoc confidence scores. While ordinary RL hurts calibration, RLCR improves it. Finally, we demonstrate that verbalized confidence can be leveraged at test time to improve accuracy and calibration via confidence-weighted scaling methods. Our results show that explicitly optimizing for calibration can produce more generally reliable reasoning models.
nan
Article 383
Title@2025-07-22 (2): Steering Out-of-Distribution Generalization with Concept Ablation Fine-Tuning
Title: Steering Out-of-Distribution Generalization with Concept Ablation Fine-Tuning | Steuerung der Out-of-Distribution-Verallgemeinerung mit Konzeptablation Fine-Tuning | 带有 “ 缩算概念 “ 定额概念的 “ 批发外普遍化 “ 指导指导 2507.16795v1 |
Authors (6): Helena Casademunt, Caden Juang, Adam Karvonen, Samuel Marks, Senthooran Rajamanoharan, Neel Nanda
Fine-tuning large language models (LLMs) can lead to unintended out-of-distribution generalization. Standard approaches to this problem rely on modifying training data, for example by adding data that better specify the intended generalization. However, this is not always practical. We introduce Concept Ablation Fine-Tuning (CAFT), a technique that leverages interpretability tools to control how LLMs generalize from fine-tuning, without needing to modify the training data or otherwise use data from the target distribution. Given a set of directions in an LLM’s latent space corresponding to undesired concepts, CAFT works by ablating these concepts with linear projections during fine-tuning, steering the model away from unintended generalizations. We successfully apply CAFT to three fine-tuning tasks, including emergent misalignment, a phenomenon where LLMs fine-tuned on a narrow task generalize to give egregiously misaligned responses to general questions. Without any changes to the fine-tuning data, CAFT reduces misaligned responses by 10x without degrading performance on the training distribution. Overall, CAFT represents a novel approach for steering LLM generalization without modifying training data.
nan
Article 384
Title@2025-07-22 (2): Edge of Stochastic Stability: Revisiting the Edge of Stability for SGD
Title: Edge of Stochastic Stability: Revisiting the Edge of Stability for SGD | Rand der stochastischen Stabilität: Die Kante der Stabilität für SGD | 斯托卡稳定边缘:重新审视稳定边缘,促进稳定发展 2412.20553v4 |
Authors (2): Arseniy Andreyev, Pierfrancesco Beneventano
Recent findings by Cohen et al., 2021, demonstrate that when training neural networks with full-batch gradient descent with a step size of $\eta$, the largest eigenvalue $\lambda_{\max}$ of the full-batch Hessian consistently stabilizes at $\lambda_{\max} = 2/\eta$. These results have significant implications for convergence and generalization. This, however, is not the case of mini-batch stochastic gradient descent (SGD), limiting the broader applicability of its consequences. We show that SGD trains in a different regime we term Edge of Stochastic Stability (EoSS). In this regime, what stabilizes at $2/\eta$ is Batch Sharpness: the expected directional curvature of mini-batch Hessians along their corresponding stochastic gradients. As a consequence $\lambda_{\max}$ – which is generally smaller than Batch Sharpness – is suppressed, aligning with the long-standing empirical observation that smaller batches and larger step sizes favor flatter minima. We further discuss implications for mathematical modeling of SGD trajectories.
nan
Article 385
Title@2025-07-22 (2): Graph Neural Networks Gone Hogwild
Title: Graph Neural Networks Gone Hogwild | Schaubild Neurale Netze vor Hogwild | 神经网络离开霍格维勒德 2407.00494v2 |
Authors (4): Olga Solodova, Nick Richardson, Deniz Oktay, Ryan P. Adams
Graph neural networks (GNNs) appear to be powerful tools to learn state representations for agents in distributed, decentralized multi-agent systems, but generate catastrophically incorrect predictions when nodes update asynchronously during inference. This failure under asynchrony effectively excludes these architectures from many potential applications where synchrony is difficult or impossible to enforce, e.g., robotic swarms or sensor networks. In this work we identify “implicitly-defined” GNNs as a class of architectures which is provably robust to asynchronous “hogwild” inference, adapting convergence guarantees from work in asynchronous and distributed optimization. We then propose a novel implicitly-defined GNN architecture, which we call an ‘energy GNN’. We show that this architecture outperforms other GNNs from this class on a variety of synthetic tasks inspired by multi-agent systems.
nan
Article 386
Title@2025-07-22 (2): A Partitioned Sparse Variational Gaussian Process for Fast, Distributed Spatial Modeling
Title: A Partitioned Sparse Variational Gaussian Process for Fast, Distributed Spatial Modeling | Ein geteilter Sparse Variational Gaussian Prozess für schnelle, verteilte räumliche Modellierung | 快速、分布空间建模的分散分布式平面平面变异高斯进程 2507.16771v1 |
Authors (4): Michael Grosskopf, Kellin Rumsey, Ayan Biswas, Earl Lawrence
The next generation of Department of Energy supercomputers will be capable of exascale computation. For these machines, far more computation will be possible than that which can be saved to disk. As a result, users will be unable to rely on post-hoc access to data for uncertainty quantification and other statistical analyses and there will be an urgent need for sophisticated machine learning algorithms which can be trained in situ. Algorithms deployed in this setting must be highly scalable, memory efficient and capable of handling data which is distributed across nodes as spatially contiguous partitions. One suitable approach involves fitting a sparse variational Gaussian process (SVGP) model independently and in parallel to each spatial partition. The resulting model is scalable, efficient and generally accurate, but produces the undesirable effect of constructing discontinuous response surfaces due to the disagreement between neighboring models at their shared boundary. In this paper, we extend this idea by allowing for a small amount of communication between neighboring spatial partitions which encourages better alignment of the local models, leading to smoother spatial predictions and a better fit in general. Due to our decentralized communication scheme, the proposed extension remains highly scalable and adds very little overhead in terms of computation (and none, in terms of memory). We demonstrate this Partitioned SVGP (PSVGP) approach for the Energy Exascale Earth System Model (E3SM) and compare the results to the independent SVGP case.
nan
Article 387
Title@2025-07-22 (2): RadAlign: Advancing Radiology Report Generation with Vision-Language Concept Alignment
Title: RadAlign: Advancing Radiology Report Generation with Vision-Language Concept Alignment | RadAlign: Weiterentwicklung der Radiologie Report Generation mit Vision-Sprachkonzept Ausrichtung | 辐射:推进放射学报告的编制,并统一愿景-语言概念 2501.07525v2 |
Authors (5): Difei Gu, Yunhe Gao, Yang Zhou, Mu Zhou, Dimitris Metaxas
Automated chest radiographs interpretation requires both accurate disease classification and detailed radiology report generation, presenting a significant challenge in the clinical workflow. Current approaches either focus on classification accuracy at the expense of interpretability or generate detailed but potentially unreliable reports through image captioning techniques. In this study, we present RadAlign, a novel framework that combines the predictive accuracy of vision-language models (VLMs) with the reasoning capabilities of large language models (LLMs). Inspired by the radiologist’s workflow, RadAlign first employs a specialized VLM to align visual features with key medical concepts, achieving superior disease classification with an average AUC of 0.885 across multiple diseases. These recognized medical conditions, represented as text-based concepts in the aligned visual-language space, are then used to prompt LLM-based report generation. Enhanced by a retrieval-augmented generation mechanism that grounds outputs in similar historical cases, RadAlign delivers superior report quality with a GREEN score of 0.678, outperforming state-of-the-art methods’ 0.634. Our framework maintains strong clinical interpretability while reducing hallucinations, advancing automated medical imaging and report analysis through integrated predictive and generative AI. Code is available at https://github.com/difeigu/RadAlign.
nan
Article 388
Title@2025-07-22 (2): Learning novel representations of variable sources from multi-modal $\textit{Gaia}$ data via autoencoders
Title: Learning novel representations of variable sources from multi-modal $\textit{Gaia}$ data via autoencoders | Erlernen neuer Darstellungen variabler Quellen aus multimodalen $\textit{Gaia}$ Daten über Autoencoder | 通过自动编码器学习多式 $\ textit{Gaia} $ 数据变量来源的新表达式 2505.16320v2 |
Authors (21): P. Huijse, J. De Ridder, L. Eyer, L. Rimoldini, B. Holl, N. Chornay, J. Roquette, K. Nienartowicz, G. Jevardat de Fombelle, D. J. Fritzewski, A. Kemp, V. Vanlaer, M. Vanrespaille, H. Wang, M. I. Carnerero, C. M. Raiteri, G. Marton, M. Madarász, G. Clementini, P. Gavras, C. Aerts
Gaia Data Release 3 (DR3) published for the first time epoch photometry, BP/RP (XP) low-resolution mean spectra, and supervised classification results for millions of variable sources. This extensive dataset offers a unique opportunity to study their variability by combining multiple Gaia data products. In preparation for DR4, we propose and evaluate a machine learning methodology capable of ingesting multiple Gaia data products to achieve an unsupervised classification of stellar and quasar variability. A dataset of 4 million Gaia DR3 sources is used to train three variational autoencoders (VAE), which are artificial neural networks (ANNs) designed for data compression and generation. One VAE is trained on Gaia XP low-resolution spectra, another on a novel approach based on the distribution of magnitude differences in the Gaia G band, and the third on folded Gaia G band light curves. Each Gaia source is compressed into 15 numbers, representing the coordinates in a 15-dimensional latent space generated by combining the outputs of these three models. The learned latent representation produced by the ANN effectively distinguishes between the main variability classes present in Gaia DR3, as demonstrated through both supervised and unsupervised classification analysis of the latent space. The results highlight a strong synergy between light curves and low-resolution spectral data, emphasising the benefits of combining the different Gaia data products. A two-dimensional projection of the latent variables reveals numerous overdensities, most of which strongly correlate with astrophysical properties, showing the potential of this latent space for astrophysical discovery. We show that the properties of our novel latent representation make it highly valuable for variability analysis tasks, including classification, clustering and outlier detection.
nan
Article 389
Title@2025-07-22 (2): Assessing Adaptive World Models in Machines with Novel Games
Title: Assessing Adaptive World Models in Machines with Novel Games | Bewertung von adaptiven Weltmodellen in Maschinen mit neuen Spielen | 评估具有新运动会的机器中适应性世界模型 2507.12821v2 |
Authors (14): Lance Ying, Katherine M. Collins, Prafull Sharma, Cedric Colas, Kaiya Ivy Zhao, Adrian Weller, Zenna Tavares, Phillip Isola, Samuel J. Gershman, Jacob D. Andreas, Thomas L. Griffiths, Francois Chollet, Kelsey R. Allen, Joshua B. Tenenbaum
Human intelligence exhibits a remarkable capacity for rapid adaptation and effective problem-solving in novel and unfamiliar contexts. We argue that this profound adaptability is fundamentally linked to the efficient construction and refinement of internal representations of the environment, commonly referred to as world models, and we refer to this adaptation mechanism as world model induction. However, current understanding and evaluation of world models in artificial intelligence (AI) remains narrow, often focusing on static representations learned from training on massive corpora of data, instead of the efficiency and efficacy in learning these representations through interaction and exploration within a novel environment. In this Perspective, we provide a view of world model induction drawing on decades of research in cognitive science on how humans learn and adapt so efficiently; we then call for a new evaluation framework for assessing adaptive world models in AI. Concretely, we propose a new benchmarking paradigm based on suites of carefully designed games with genuine, deep and continually refreshing novelty in the underlying game structures – we refer to this class of games as novel games. We detail key desiderata for constructing these games and propose appropriate metrics to explicitly challenge and evaluate the agent’s ability for rapid world model induction. We hope that this new evaluation framework will inspire future evaluation efforts on world models in AI and provide a crucial step towards developing AI systems capable of human-like rapid adaptation and robust generalization – a critical component of artificial general intelligence.
nan
Article 390
Title@2025-07-22 (2): Towards Robust Foundation Models for Digital Pathology
Title: Towards Robust Foundation Models for Digital Pathology | Auf dem Weg zu robusten Grundmodellen für die digitale Pathologie | 走向坚固基金会数字病理学模型 2507.17845v1 |
Authors (12): Jonah Kömen, Edwin D. de Jong, Julius Hense, Hannah Marienwald, Jonas Dippel, Philip Naumann, Eric Marcus, Lukas Ruff, Maximilian Alber, Jonas Teuwen, Frederick Klauschen, Klaus-Robert Müller
Biomedical Foundation Models (FMs) are rapidly transforming AI-enabled healthcare research and entering clinical validation. However, their susceptibility to learning non-biological technical features – including variations in surgical/endoscopic techniques, laboratory procedures, and scanner hardware – poses risks for clinical deployment. We present the first systematic investigation of pathology FM robustness to non-biological features. Our work (i) introduces measures to quantify FM robustness, (ii) demonstrates the consequences of limited robustness, and (iii) proposes a framework for FM robustification to mitigate these issues. Specifically, we developed PathoROB, a robustness benchmark with three novel metrics, including the robustness index, and four datasets covering 28 biological classes from 34 medical centers. Our experiments reveal robustness deficits across all 20 evaluated FMs, and substantial robustness differences between them. We found that non-robust FM representations can cause major diagnostic downstream errors and clinical blunders that prevent safe clinical adoption. Using more robust FMs and post-hoc robustification considerably reduced (but did not yet eliminate) the risk of such errors. This work establishes that robustness evaluation is essential for validating pathology FMs before clinical adoption and demonstrates that future FM development must integrate robustness as a core design principle. PathoROB provides a blueprint for assessing robustness across biomedical domains, guiding FM improvement efforts towards more robust, representative, and clinically deployable AI systems that prioritize biological information over technical artifacts.
nan
Article 391
Title@2025-07-22 (2): GUI-G$^2$: Gaussian Reward Modeling for GUI Grounding
Title: GUI-G$^2$: Gaussian Reward Modeling for GUI Grounding | GUI-G$^2$: Gaussian Reward Modeling für GUI Grounding | GUI-G$$2美元:GUI地基的高斯奖赏模型 2507.15846v2 |
Authors (12): Fei Tang, Zhangxuan Gu, Zhengxi Lu, Xuyang Liu, Shuheng Shen, Changhua Meng, Wen Wang, Wenqi Zhang, Yongliang Shen, Weiming Lu, Jun Xiao, Yueting Zhuang
Graphical User Interface (GUI) grounding maps natural language instructions to precise interface locations for autonomous interaction. Current reinforcement learning approaches use binary rewards that treat elements as hit-or-miss targets, creating sparse signals that ignore the continuous nature of spatial interactions. Motivated by human clicking behavior that naturally forms Gaussian distributions centered on target elements, we introduce GUI Gaussian Grounding Rewards (GUI-G$^2$), a principled reward framework that models GUI elements as continuous Gaussian distributions across the interface plane. GUI-G$^2$ incorporates two synergistic mechanisms: Gaussian point rewards model precise localization through exponentially decaying distributions centered on element centroids, while coverage rewards assess spatial alignment by measuring the overlap between predicted Gaussian distributions and target regions. To handle diverse element scales, we develop an adaptive variance mechanism that calibrates reward distributions based on element dimensions. This framework transforms GUI grounding from sparse binary classification to dense continuous optimization, where Gaussian distributions generate rich gradient signals that guide models toward optimal interaction positions. Extensive experiments across ScreenSpot, ScreenSpot-v2, and ScreenSpot-Pro benchmarks demonstrate that GUI-G$^2$, substantially outperforms state-of-the-art method UI-TARS-72B, with the most significant improvement of 24.7% on ScreenSpot-Pro. Our analysis reveals that continuous modeling provides superior robustness to interface variations and enhanced generalization to unseen layouts, establishing a new paradigm for spatial reasoning in GUI interaction tasks.
nan
Article 392
Title@2025-07-22 (2): Zebra-CoT: A Dataset for Interleaved Vision Language Reasoning
Title: Zebra-CoT: A Dataset for Interleaved Vision Language Reasoning | Zebra-CoT: Ein Datensatz für interleaved Vision Language Reasoning | Zebra-CoT:关于不同视力语言理由的数据集 2507.16746v1 |
Authors (14): Ang Li, Charles Wang, Kaiyu Yue, Zikui Cai, Ollie Liu, Deqing Fu, Peng Guo, Wang Bill Zhu, Vatsal Sharan, Robin Jia, Willie Neiswanger, Furong Huang, Tom Goldstein, Micah Goldblum
Humans often use visual aids, for example diagrams or sketches, when solving complex problems. Training multimodal models to do the same, known as Visual Chain of Thought (Visual CoT), is challenging due to: (1) poor off-the-shelf visual CoT performance, which hinders reinforcement learning, and (2) the lack of high-quality visual CoT training data. We introduce $\textbf{Zebra-CoT}$, a diverse large-scale dataset with 182,384 samples, containing logically coherent interleaved text-image reasoning traces. We focus on four categories of tasks where sketching or visual reasoning is especially natural, spanning scientific questions such as geometry, physics, and algorithms; 2D visual reasoning tasks like visual search and jigsaw puzzles; 3D reasoning tasks including 3D multi-hop inference, embodied and robot planning; visual logic problems and strategic games like chess. Fine-tuning the Anole-7B model on the Zebra-CoT training corpus results in an improvement of +12% in our test-set accuracy and yields up to +13% performance gain on standard VLM benchmark evaluations. Fine-tuning Bagel-7B yields a model that generates high-quality interleaved visual reasoning chains, underscoring Zebra-CoT’s effectiveness for developing multimodal reasoning abilities. We open-source our dataset and models to support development and evaluation of visual CoT.
nan
Article 393
Title@2025-07-22 (2): SplitMeanFlow: Interval Splitting Consistency in Few-Step Generative Modeling
Title: SplitMeanFlow: Interval Splitting Consistency in Few-Step Generative Modeling | SplitMeanFlow: Intervall-Splitting-Konsistenz in wenigen Schritten generative Modellierung | SlipMeanFlow: 微小生成模型中的中间分割一致性 2507.16884v1 |
Authors (11): Yi Guo, Wei Wang, Zhihang Yuan, Rong Cao, Kuan Chen, Zhengyang Chen, Yuanyuan Huo, Yang Zhang, Yuping Wang, Shouda Liu, Yuxuan Wang
Generative models like Flow Matching have achieved state-of-the-art performance but are often hindered by a computationally expensive iterative sampling process. To address this, recent work has focused on few-step or one-step generation by learning the average velocity field, which directly maps noise to data. MeanFlow, a leading method in this area, learns this field by enforcing a differential identity that connects the average and instantaneous velocities. In this work, we argue that this differential formulation is a limiting special case of a more fundamental principle. We return to the first principles of average velocity and leverage the additivity property of definite integrals. This leads us to derive a novel, purely algebraic identity we term Interval Splitting Consistency. This identity establishes a self-referential relationship for the average velocity field across different time intervals without resorting to any differential operators. Based on this principle, we introduce SplitMeanFlow, a new training framework that enforces this algebraic consistency directly as a learning objective. We formally prove that the differential identity at the core of MeanFlow is recovered by taking the limit of our algebraic consistency as the interval split becomes infinitesimal. This establishes SplitMeanFlow as a direct and more general foundation for learning average velocity fields. From a practical standpoint, our algebraic approach is significantly more efficient, as it eliminates the need for JVP computations, resulting in simpler implementation, more stable training, and broader hardware compatibility. One-step and two-step SplitMeanFlow models have been successfully deployed in large-scale speech synthesis products (such as Doubao), achieving speedups of 20x.
nan
Article 394
Title@2025-07-22 (2): T-GRAB: A Synthetic Diagnostic Benchmark for Learning on Temporal Graphs
Title: T-GRAB: A Synthetic Diagnostic Benchmark for Learning on Temporal Graphs | T-GRAB: Ein synthetischer Diagnose-Benchmark für das Lernen auf zeitlichen Graphen | T-GRAB: 时间图学习的合成诊断基准 2507.10183v2 |
Authors (5): Alireza Dizaji, Benedict Aaron Tjandra, Mehrab Hamidi, Shenyang Huang, Guillaume Rabusseau
Dynamic graph learning methods have recently emerged as powerful tools for modelling relational data evolving through time. However, despite extensive benchmarking efforts, it remains unclear whether current Temporal Graph Neural Networks (TGNNs) effectively capture core temporal patterns such as periodicity, cause-and-effect, and long-range dependencies. In this work, we introduce the Temporal Graph Reasoning Benchmark (T-GRAB), a comprehensive set of synthetic tasks designed to systematically probe the capabilities of TGNNs to reason across time. T-GRAB provides controlled, interpretable tasks that isolate key temporal skills: counting/memorizing periodic repetitions, inferring delayed causal effects, and capturing long-range dependencies over both spatial and temporal dimensions. We evaluate 11 temporal graph learning methods on these tasks, revealing fundamental shortcomings in their ability to generalize temporal patterns. Our findings offer actionable insights into the limitations of current models, highlight challenges hidden by traditional real-world benchmarks, and motivate the development of architectures with stronger temporal reasoning abilities. The code for T-GRAB can be found at: https://github.com/alirezadizaji/T-GRAB.
nan
Article 395
Title@2025-07-22 (2): Improving Model Classification by Optimizing the Training Dataset
Title: Improving Model Classification by Optimizing the Training Dataset | Verbesserung der Modellklassifikation durch Optimierung des Trainingsdatensatzes | 通过优化培训数据集改进示范分类 2507.16729v1 |
Authors (4): Morad Tukan, Loay Mualem, Eitan Netzer, Liran Sigalat
In the era of data-centric AI, the ability to curate high-quality training data is as crucial as model design. Coresets offer a principled approach to data reduction, enabling efficient learning on large datasets through importance sampling. However, conventional sensitivity-based coreset construction often falls short in optimizing for classification performance metrics, e.g., $F1$ score, focusing instead on loss approximation. In this work, we present a systematic framework for tuning the coreset generation process to enhance downstream classification quality. Our method introduces new tunable parameters–including deterministic sampling, class-wise allocation, and refinement via active sampling, beyond traditional sensitivity scores. Through extensive experiments on diverse datasets and classifiers, we demonstrate that tuned coresets can significantly outperform both vanilla coresets and full dataset training on key classification metrics, offering an effective path towards better and more efficient model training.
nan
Article 396
Title@2025-07-22 (2): The Joys of Categorical Conformal Prediction
Title: The Joys of Categorical Conformal Prediction | Die Freuden der kategorischen konformen Vorhersage | 分类共变预言的欢乐 2507.04441v2 |
Authors (1): Michele Caprio
Conformal prediction (CP) is an Uncertainty Representation technique that delivers finite-sample calibrated prediction regions for any underlying Machine Learning model. Its status as an Uncertainty Quantification (UQ) tool, though, has remained conceptually opaque: While Conformal Prediction Regions (CPRs) give an ordinal representation of uncertainty (larger regions typically indicate higher uncertainty), they lack the capability to cardinally quantify it (twice as large regions do not imply twice the uncertainty). We adopt a category-theoretic approach to CP – framing it as a morphism, embedded in a commuting diagram, of two newly-defined categories – that brings us three joys. First, we show that – under minimal assumptions – CP is intrinsically a UQ mechanism, that is, its cardinal UQ capabilities are a structural feature of the method. Second, we demonstrate that CP bridges (and perhaps subsumes) the Bayesian, frequentist, and imprecise probabilistic approaches to predictive statistical reasoning. Finally, we show that a CPR is the image of a covariant functor. This observation is relevant to AI privacy: It implies that privacy noise added locally does not break the global coverage guarantee.
nan
Article 397
Title@2025-07-22 (2): Multi-objective Portfolio Optimization Via Gradient Descent
Title: Multi-objective Portfolio Optimization Via Gradient Descent | Multi-objektive Portfolio-Optimierung durch gradienten Abstieg | 多目标组合优化组合 2507.16717v1 |
Authors (3): Christian Oliva, Pedro R. Ventura, Luis F. Lago-Fernández
Traditional approaches to portfolio optimization, often rooted in Modern Portfolio Theory and solved via quadratic programming or evolutionary algorithms, struggle with scalability or flexibility, especially in scenarios involving complex constraints, large datasets and/or multiple conflicting objectives. To address these challenges, we introduce a benchmark framework for multi-objective portfolio optimization (MPO) using gradient descent with automatic differentiation. Our method supports any optimization objective, such as minimizing risk measures (e.g., CVaR) or maximizing Sharpe ratio, along with realistic constraints, such as tracking error limits, UCITS regulations, or asset group restrictions. We have evaluated our framework across six experimental scenarios, from single-objective setups to complex multi-objective cases, and have compared its performance against standard solvers like CVXPY and SKFOLIO. Our results show that our method achieves competitive performance while offering enhanced flexibility for modeling multiple objectives and constraints. We aim to provide a practical and extensible tool for researchers and practitioners exploring advanced portfolio optimization problems in real-world conditions.
nan
Article 398
Title@2025-07-22 (2): Learning Causally Predictable Outcomes from Psychiatric Longitudinal Data
Title: Learning Causally Predictable Outcomes from Psychiatric Longitudinal Data | Erlernen ursächlich vorhersehbarer Ergebnisse aus Psychiatrischen Langzeitdaten | 精神病纵向数据产生的可预期的学习结果 2506.16629v3 |
Authors (1): Eric V. Strobl
Causal inference in longitudinal biomedical data remains a central challenge, especially in psychiatry, where symptom heterogeneity and latent confounding frequently undermine classical estimators. Most existing methods for treatment effect estimation presuppose a fixed outcome variable and address confounding through observed covariate adjustment. However, the assumption of unconfoundedness may not hold for a fixed outcome in practice. To address this foundational limitation, we directly optimize the outcome definition to maximize causal identifiability. Our DEBIAS (Durable Effects with Backdoor-Invariant Aggregated Symptoms) algorithm learns non-negative, clinically interpretable weights for outcome aggregation, maximizing durable treatment effects and empirically minimizing both observed and latent confounding by leveraging the time-limited direct effects of prior treatments in psychiatric longitudinal data. The algorithm also furnishes an empirically verifiable test for outcome unconfoundedness. DEBIAS consistently outperforms state-of-the-art methods in recovering causal effects for clinically interpretable composite outcomes across comprehensive experiments in depression and schizophrenia.
nan
Article 399
Title@2025-07-22 (2): Screen2AX: Vision-Based Approach for Automatic macOS Accessibility Generation
Title: Screen2AX: Vision-Based Approach for Automatic macOS Accessibility Generation | Screen2AX: Vision-basierter Ansatz für automatische macOS-Zugänglichkeitsgenerierung | Screen2AX:以愿景为基础的自动 MacOS无障碍生成方法 2507.16704v1 |
Authors (5): Viktor Muryn, Marta Sumyk, Mariya Hirna, Sofiya Garkot, Maksym Shamrai
Desktop accessibility metadata enables AI agents to interpret screens and supports users who depend on tools like screen readers. Yet, many applications remain largely inaccessible due to incomplete or missing metadata provided by developers - our investigation shows that only 33% of applications on macOS offer full accessibility support. While recent work on structured screen representation has primarily addressed specific challenges, such as UI element detection or captioning, none has attempted to capture the full complexity of desktop interfaces by replicating their entire hierarchical structure. To bridge this gap, we introduce Screen2AX, the first framework to automatically create real-time, tree-structured accessibility metadata from a single screenshot. Our method uses vision-language and object detection models to detect, describe, and organize UI elements hierarchically, mirroring macOS’s system-level accessibility structure. To tackle the limited availability of data for macOS desktop applications, we compiled and publicly released three datasets encompassing 112 macOS applications, each annotated for UI element detection, grouping, and hierarchical accessibility metadata alongside corresponding screenshots. Screen2AX accurately infers hierarchy trees, achieving a 77% F1 score in reconstructing a complete accessibility tree. Crucially, these hierarchy trees improve the ability of autonomous agents to interpret and interact with complex desktop interfaces. We introduce Screen2AX-Task, a benchmark specifically designed for evaluating autonomous agent task execution in macOS desktop environments. Using this benchmark, we demonstrate that Screen2AX delivers a 2.2x performance improvement over native accessibility representations and surpasses the state-of-the-art OmniParser V2 system on the ScreenSpot benchmark.
nan
Article 400
Title@2025-07-22 (2): Pixel-Resolved Long-Context Learning for Turbulence at Exascale: Resolving Small-scale Eddies Toward the Viscous Limit
Title: Pixel-Resolved Long-Context Learning for Turbulence at Exascale: Resolving Small-scale Eddies Toward the Viscous Limit | Pixel-Resolved Long-Context Learning for Turbulence at Exascale: Lösung kleiner Eddies auf dem Weg zur Viskosegrenze | 用像素解解析的超大型扰动远程学习长像学习:解决小型艾迪问题以达到微声限制 2507.16697v1 |
Authors (9): Junqi Yin, Mijanur Palash, M. Paul Laiu, Muralikrishnan Gopalakrishnan Meena, John Gounley, Stephen M. de Bruyn Kops, Feiyi Wang, Ramanan Sankaran, Pei Zhang
Turbulence plays a crucial role in multiphysics applications, including aerodynamics, fusion, and combustion. Accurately capturing turbulence’s multiscale characteristics is essential for reliable predictions of multiphysics interactions, but remains a grand challenge even for exascale supercomputers and advanced deep learning models. The extreme-resolution data required to represent turbulence, ranging from billions to trillions of grid points, pose prohibitive computational costs for models based on architectures like vision transformers. To address this challenge, we introduce a multiscale hierarchical Turbulence Transformer that reduces sequence length from billions to a few millions and a novel RingX sequence parallelism approach that enables scalable long-context learning. We perform scaling and science runs on the Frontier supercomputer. Our approach demonstrates excellent performance up to 1.1 EFLOPS on 32,768 AMD GPUs, with a scaling efficiency of 94%. To our knowledge, this is the first AI model for turbulence that can capture small-scale eddies down to the dissipative range.
nan
Article 401
Title@2025-07-22 (2): Confidence Optimization for Probabilistic Encoding
Title: Confidence Optimization for Probabilistic Encoding | Vertrauensoptimierung für die probabilistische Kodierung | 概率编码的可信度优化 2507.16881v1 |
Authors (4): Pengjiu Xia, Yidian Huang, Wenchao Wei, Yuwen Tan
Probabilistic encoding introduces Gaussian noise into neural networks, enabling a smooth transition from deterministic to uncertain states and enhancing generalization ability. However, the randomness of Gaussian noise distorts point-based distance measurements in classification tasks. To mitigate this issue, we propose a confidence optimization probabilistic encoding (CPE) method that improves distance reliability and enhances representation learning. Specifically, we refine probabilistic encoding with two key strategies: First, we introduce a confidence-aware mechanism to adjust distance calculations, ensuring consistency and reliability in probabilistic encoding classification tasks. Second, we replace the conventional KL divergence-based variance regularization, which relies on unreliable prior assumptions, with a simpler L2 regularization term to directly constrain variance. The method we proposed is model-agnostic, and extensive experiments on natural language classification tasks demonstrate that our method significantly improves performance and generalization on both the BERT and the RoBERTa model.
nan
Article 402
Title@2025-07-22 (2): FISHER: A Foundation Model for Multi-Modal Industrial Signal Comprehensive Representation
Title: FISHER: A Foundation Model for Multi-Modal Industrial Signal Comprehensive Representation | FISCHER: Ein Basismodell für die umfassende Vertretung multimodaler industrieller Signale | 多模式工业信号综合代表制基金会模式 2507.16696v1 |
Authors (13): Pingyi Fan, Anbai Jiang, Shuwei Zhang, Zhiqiang Lv, Bing Han, Xinhu Zheng, Wenrui Liang, Junjie Li, Wei-Qiang Zhang, Yanmin Qian, Xie Chen, Cheng Lu, Jia Liu
With the rapid deployment of SCADA systems, how to effectively analyze industrial signals and detect abnormal states is an urgent need for the industry. Due to the significant heterogeneity of these signals, which we summarize as the M5 problem, previous works only focus on small sub-problems and employ specialized models, failing to utilize the synergies between modalities and the powerful scaling law. However, we argue that the M5 signals can be modeled in a unified manner due to the intrinsic similarity. As a result, we propose FISHER, a Foundation model for multi-modal Industrial Signal compreHEnsive Representation. To support arbitrary sampling rates, FISHER considers the increment of sampling rate as the concatenation of sub-band information. Specifically, FISHER takes the STFT sub-band as the modeling unit and adopts a teacher student SSL framework for pre-training. We also develop the RMIS benchmark, which evaluates the representations of M5 industrial signals on multiple health management tasks. Compared with top SSL models, FISHER showcases versatile and outstanding capabilities with a general performance gain up to 5.03%, along with much more efficient scaling curves. We also investigate the scaling law on downstream tasks and derive potential avenues for future works. FISHER is now open-sourced on https://github.com/jianganbai/FISHER
nan
Article 403
Title@2025-07-22 (2): Interpretable Topic Extraction and Word Embedding Learning using row-stochastic DEDICOM
Title: Interpretable Topic Extraction and Word Embedding Learning using row-stochastic DEDICOM | Interpretierbare Themenextraktion und Wort-Embedding Lernen mit zeilenstochastischem DEDICOM | 利用行可查的DEDICOM进行可解释专题抽取和单词嵌入学习 2507.16695v1 |
Authors (4): Lars Hillebrand, David Biesner, Christian Bauckhage, Rafet Sifa
The DEDICOM algorithm provides a uniquely interpretable matrix factorization method for symmetric and asymmetric square matrices. We employ a new row-stochastic variation of DEDICOM on the pointwise mutual information matrices of text corpora to identify latent topic clusters within the vocabulary and simultaneously learn interpretable word embeddings. We introduce a method to efficiently train a constrained DEDICOM algorithm and a qualitative evaluation of its topic modeling and word embedding performance.
nan
Article 404
Title@2025-07-22 (2): Universal Model Routing for Efficient LLM Inference
Title: Universal Model Routing for Efficient LLM Inference | Universelle Modellführung für effiziente LLM-Inferenz | 高效LLM 推导法通用通用模型规则 2502.08773v2 |
Authors (12): Wittawat Jitkrittum, Harikrishna Narasimhan, Ankit Singh Rawat, Jeevesh Juneja, Congchao Wang, Zifeng Wang, Alec Go, Chen-Yu Lee, Pradeep Shenoy, Rina Panigrahy, Aditya Krishna Menon, Sanjiv Kumar
Model routing is a simple technique for reducing the inference cost of large language models (LLMs), wherein one maintains a pool of candidate LLMs, and learns to route each prompt to the smallest feasible LLM. Existing works focus on learning a router for a fixed pool of LLMs. In this paper, we consider the problem of dynamic routing, where new, previously unobserved LLMs are available at test time. We propose UniRoute, a new approach to this problem that relies on representing each LLM as a feature vector, derived based on predictions on a set of representative prompts. Based on this, we detail two effective instantiations of UniRoute, relying on cluster-based routing and a learned cluster map respectively. We show that these are estimates of a theoretically optimal routing rule, and quantify their errors via an excess risk bound. Experiments on a range of public benchmarks show the effectiveness of UniRoute in routing amongst more than 30 unseen LLMs.
nan
Article 405
Title@2025-07-22 (2): Multilevel Picard approximations and deep neural networks with ReLU, leaky ReLU, and softplus activation overcome the curse of dimensionality when approximating semilinear parabolic partial differential equations in $L^p$-sense
Title: Multilevel Picard approximations and deep neural networks with ReLU, leaky ReLU, and softplus activation overcome the curse of dimensionality when approximating semilinear parabolic partial differential equations in $L^p$-sense | Mehrstufige Picard-Annäherungen und tiefe neuronale Netzwerke mit ReLU, undichter ReLU und Softplus-Aktivierung überwinden den Fluch der Dimensionalität, wenn sie semilineare parabolische partielle Differentialgleichungen in $L^p$-Sense annähern | 多级 Piccar 近似和深神经网络,与 ReLU、 泄漏 ReLU 和软附加激活 克服了维度的诅咒, 当半线性半线性抛抛物线部分偏差方程以 $Lp$- sense 等值接近一致时 2409.20431v4 |
Authors (2): Ariel Neufeld, Tuan Anh Nguyen
We prove that multilevel Picard approximations and deep neural networks with ReLU, leaky ReLU, and softplus activation are capable of approximating solutions of semilinear Kolmogorov PDEs in $L^\mathfrak{p}$-sense, $\mathfrak{p}\in [2,\infty)$, in the case of gradient-independent, Lipschitz-continuous nonlinearities, while the computational effort of the multilevel Picard approximations and the required number of parameters in the neural networks grow at most polynomially in both dimension $d\in \mathbb{N}$ and reciprocal of the prescribed accuracy $\epsilon$.
nan
Article 406
Title@2025-07-22 (2): Structural Effect and Spectral Enhancement of High-Dimensional Regularized Linear Discriminant Analysis
Title: Structural Effect and Spectral Enhancement of High-Dimensional Regularized Linear Discriminant Analysis | Strukturelle Wirkung und spektrale Verbesserung der hochdimensionalen Regularisierten Linearen Diskriminanzanalyse | 结构效应和高分层常规线性分线差异分析的光谱增强 2507.16682v1 |
Authors (4): Yonghan Zhang, Zhangni Pu, Lu Yan, Jiang Hu
Regularized linear discriminant analysis (RLDA) is a widely used tool for classification and dimensionality reduction, but its performance in high-dimensional scenarios is inconsistent. Existing theoretical analyses of RLDA often lack clear insight into how data structure affects classification performance. To address this issue, we derive a non-asymptotic approximation of the misclassification rate and thus analyze the structural effect and structural adjustment strategies of RLDA. Based on this, we propose the Spectral Enhanced Discriminant Analysis (SEDA) algorithm, which optimizes the data structure by adjusting the spiked eigenvalues of the population covariance matrix. By developing a new theoretical result on eigenvectors in random matrix theory, we derive an asymptotic approximation on the misclassification rate of SEDA. The bias correction algorithm and parameter selection strategy are then obtained. Experiments on synthetic and real datasets show that SEDA achieves higher classification accuracy and dimensionality reduction compared to existing LDA methods.
nan
Article 407
Title@2025-07-22 (2): Deep Unfolding Network for Nonlinear Multi-Frequency Electrical Impedance Tomography
Title: Deep Unfolding Network for Nonlinear Multi-Frequency Electrical Impedance Tomography | Deep Unfolding Netzwerk für nichtlineare Multi-Frequenz elektrische Impedanz Tomographie | 非线性多功能多功能电气阻力断层造影的深载网络 2507.16678v1 |
Authors (5): Giovanni S. Alberti, Damiana Lazzaro, Serena Morigi, Luca Ratti, Matteo Santacesaria
Multi-frequency Electrical Impedance Tomography (mfEIT) represents a promising biomedical imaging modality that enables the estimation of tissue conductivities across a range of frequencies. Addressing this challenge, we present a novel variational network, a model-based learning paradigm that strategically merges the advantages and interpretability of classical iterative reconstruction with the power of deep learning. This approach integrates graph neural networks (GNNs) within the iterative Proximal Regularized Gauss Newton (PRGN) framework. By unrolling the PRGN algorithm, where each iteration corresponds to a network layer, we leverage the physical insights of nonlinear model fitting alongside the GNN’s capacity to capture inter-frequency correlations. Notably, the GNN architecture preserves the irregular triangular mesh structure used in the solution of the nonlinear forward model, enabling accurate reconstruction of overlapping tissue fraction concentrations.
nan
Article 408
Title@2025-07-22 (2): Custom Algorithm-based Fault Tolerance for Attention Layers in Transformers
Title: Custom Algorithm-based Fault Tolerance for Attention Layers in Transformers | Benutzerdefinierte Algorithmen-basierte Fehlertoleranz für Aufmerksamkeitsschichten in Transformatoren | 自定义基于 ALgorithm 的对变换器中注意层的不宽容 2507.16676v1 |
Authors (3): Vasileios Titopoulos, Kosmas Alexandridis, Giorgos Dimitrakopoulos
Transformers and large language models (LLMs), powered by the attention mechanism, have transformed numerous AI applications, driving the need for specialized hardware accelerators. A major challenge in these accelerators is efficiently detecting errors caused by random hardware faults. Traditional algorithm-based fault tolerance (ABFT) techniques verify individual matrix multiplications but fall short in handling the full attention mechanism, particularly due to intermediate softmax normalization. This work proposes Flash-ABFT, a novel method that computes an online checksum across the entire three-matrix product of query, key and value matrices, of an attention layer, including the softmax operation, with a single check. This approach significantly reduces overhead by eliminating redundant checks while maintaining high fault-detection accuracy. Experimental results demonstrate that Flash-ABFT incurs only 5.3% hardware area overhead and less than 1.9% energy overhead, making it a cost-effective and robust solution for error detection in attention accelerators.
nan
Article 409
Title@2025-07-22 (2): GASPnet: Global Agreement to Synchronize Phases
Title: GASPnet: Global Agreement to Synchronize Phases | GASPnet: Globales Abkommen zur Synchronisierung von Phasen | GASPnet:同步阶段全球协定 2507.16674v1 |
Authors (4): Andrea Alamiaa, Sabine Muzellec, Thomas Serre, Rufin VanRullen
In recent years, Transformer architectures have revolutionized most fields of artificial intelligence, relying on an attentional mechanism based on the agreement between keys and queries to select and route information in the network. In previous work, we introduced a novel, brain-inspired architecture that leverages a similar implementation to achieve a global ‘routing by agreement’ mechanism. Such a system modulates the network’s activity by matching each neuron’s key with a single global query, pooled across the entire network. Acting as a global attentional system, this mechanism improves noise robustness over baseline levels but is insufficient for multi-classification tasks. Here, we improve on this work by proposing a novel mechanism that combines aspects of the Transformer attentional operations with a compelling neuroscience theory, namely, binding by synchrony. This theory proposes that the brain binds together features by synchronizing the temporal activity of neurons encoding those features. This allows the binding of features from the same object while efficiently disentangling those from distinct objects. We drew inspiration from this theory and incorporated angular phases into all layers of a convolutional network. After achieving phase alignment via Kuramoto dynamics, we use this approach to enhance operations between neurons with similar phases and suppresses those with opposite phases. We test the benefits of this mechanism on two datasets: one composed of pairs of digits and one composed of a combination of an MNIST item superimposed on a CIFAR-10 image. Our results reveal better accuracy than CNN networks, proving more robust to noise and with better generalization abilities. Overall, we propose a novel mechanism that addresses the visual binding problem in neural networks by leveraging the synergy between neuroscience and machine learning.
nan
Article 410
Title@2025-07-22 (2): Meta-Learning for Cold-Start Personalization in Prompt-Tuned LLMs
Title: Meta-Learning for Cold-Start Personalization in Prompt-Tuned LLMs | Meta-Learning für die Kaltstart-Personalisierung in LLMs | 以即时引导的LMM 实现低天起的个性化的元学习 2507.16672v1 |
Authors (6): Yushang Zhao, Huijie Shen, Dannier Li, Lu Chang, Chengrui Zhou, Yinuo Yang
Generative, explainable, and flexible recommender systems, derived using Large Language Models (LLM) are promising and poorly adapted to the cold-start user situation, where there is little to no history of interaction. The current solutions i.e. supervised fine-tuning and collaborative filtering are dense-user-item focused and would be expensive to maintain and update. This paper introduces a meta-learning framework, that can be used to perform parameter-efficient prompt-tuning, to effectively personalize LLM-based recommender systems quickly at cold-start. The model learns soft prompt embeddings with first-order (Reptile) and second-order (MAML) optimization by treating each of the users as the tasks. As augmentations to the input tokens, these learnable vectors are the differentiable control variables that represent user behavioral priors. The prompts are meta-optimized through episodic sampling, inner-loop adaptation, and outer-loop generalization. On MovieLens-1M, Amazon Reviews, and Recbole, we can see that our adaptive model outperforms strong baselines in NDCG@10, HR@10, and MRR, and it runs in real-time (i.e., below 300 ms) on consumer GPUs. Zero-history personalization is also supported by this scalable solution, and its 275 ms rate of adaptation allows successful real-time risk profiling of financial systems by shortening detection latency and improving payment network stability. Crucially, the 275 ms adaptation capability can enable real-time risk profiling for financial institutions, reducing systemic vulnerability detection latency significantly versus traditional compliance checks. By preventing contagion in payment networks (e.g., Fedwire), the framework strengthens national financial infrastructure resilience.
nan
Article 411
Title@2025-07-22 (2): Finding Dori: Memorization in Text-to-Image Diffusion Models Is Less Local Than Assumed
Title: Finding Dori: Memorization in Text-to-Image Diffusion Models Is Less Local Than Assumed | Dori finden: Erinnerung in Text-zu-Bild-Diffusions-Modellen ist weniger lokal als angenommen | 查找 Dori : 文本到图像传播模型的记忆比假设的要小 2507.16880v1 |
Authors (6): Antoni Kowalczuk, Dominik Hintersdorf, Lukas Struppek, Kristian Kersting, Adam Dziedzic, Franziska Boenisch
Text-to-image diffusion models (DMs) have achieved remarkable success in image generation. However, concerns about data privacy and intellectual property remain due to their potential to inadvertently memorize and replicate training data. Recent mitigation efforts have focused on identifying and pruning weights responsible for triggering replication, based on the assumption that memorization can be localized. Our research assesses the robustness of these pruning-based approaches. We demonstrate that even after pruning, minor adjustments to text embeddings of input prompts are sufficient to re-trigger data replication, highlighting the fragility of these defenses. Furthermore, we challenge the fundamental assumption of memorization locality, by showing that replication can be triggered from diverse locations within the text embedding space, and follows different paths in the model. Our findings indicate that existing mitigation strategies are insufficient and underscore the need for methods that truly remove memorized content, rather than attempting to suppress its retrieval. As a first step in this direction, we introduce a novel adversarial fine-tuning method that iteratively searches for replication triggers and updates the model to increase robustness. Through our research, we provide fresh insights into the nature of memorization in text-to-image DMs and a foundation for building more trustworthy and compliant generative AI.
nan
Article 412
Title@2025-07-22 (2): FLAIN: Mitigating Backdoor Attacks in Federated Learning via Flipping Weight Updates of Low-Activation Input Neurons
Title: FLAIN: Mitigating Backdoor Attacks in Federated Learning via Flipping Weight Updates of Low-Activation Input Neurons | FLAIN: Hecktürangriffe im Federated Learning durch kippende Gewichtsaktualisierungen von Niedrig-Aktivierungs-Eingangs-Neuronen abmildern | FLAIN:通过降低低活性输入神经的重量更新,减少联邦学习中的后门攻击 2408.08655v2 |
Authors (3): Binbin Ding, Penghui Yang, Sheng-Jun Huang
Federated learning (FL) enables multiple clients to collaboratively train machine learning models under the coordination of a central server, while maintaining privacy. However, the server cannot directly monitor the local training processes, leaving room for malicious clients to introduce backdoors into the model. Research has shown that backdoor attacks exploit specific neurons that are activated only by malicious inputs, remaining dormant with clean data. Building on this insight, we propose a novel defense method called Flipping Weight Updates of Low-Activation Input Neurons (FLAIN) to counter backdoor attacks in FL. Specifically, upon the completion of global training, we use an auxiliary dataset to identify low-activation input neurons and iteratively flip their associated weight updates. This flipping process continues while progressively raising the threshold for low-activation neurons, until the model’s performance on the auxiliary data begins to degrade significantly. Extensive experiments demonstrate that FLAIN effectively reduces the success rate of backdoor attacks across a variety of scenarios, including Non-IID data distributions and high malicious client ratios (MCR), while maintaining minimal impact on the performance of clean data.
nan
Article 413
Title@2025-07-22 (2): Recent Advances in Malware Detection: Graph Learning and Explainability
Title: Recent Advances in Malware Detection: Graph Learning and Explainability | Neueste Fortschritte bei der Malware-Erkennung: Graphisches Lernen und Erklärbarkeit | 错误软件探测:图表学习和可解释性方面的最新进展 2502.10556v2 |
Authors (7): Hossein Shokouhinejad, Roozbeh Razavi-Far, Hesamodin Mohammadian, Mahdi Rabbani, Samuel Ansong, Griffin Higgins, Ali A Ghorbani
The rapid evolution of malware has necessitated the development of sophisticated detection methods that go beyond traditional signature-based approaches. Graph learning techniques have emerged as powerful tools for modeling and analyzing the complex relationships inherent in malware behavior, leveraging advancements in Graph Neural Networks (GNNs) and related methods. This survey provides a comprehensive exploration of recent advances in malware detection, focusing on the interplay between graph learning and explainability. It begins by reviewing malware analysis techniques and datasets, emphasizing their foundational role in understanding malware behavior and supporting detection strategies. The survey then discusses feature engineering, graph reduction, and graph embedding methods, highlighting their significance in transforming raw data into actionable insights, while ensuring scalability and efficiency. Furthermore, this survey focuses on explainability techniques and their applications in malware detection, ensuring transparency and trustworthiness. By integrating these components, this survey demonstrates how graph learning and explainability contribute to building robust, interpretable, and scalable malware detection systems. Future research directions are outlined to address existing challenges and unlock new opportunities in this critical area of cybersecurity.
nan
Article 414
Title@2025-07-22 (2): Quantum Cognition Machine Learning for Forecasting Chromosomal Instability
Title: Quantum Cognition Machine Learning for Forecasting Chromosomal Instability | Quantenkognition Maschinelles Lernen zur Prognose der Chromosomeninstabilität | 预测染色体不稳定状况的量子聚合机学习 2506.03199v2 |
Authors (14): Giuseppe Di Caro, Vahagn Kirakosyan, Alexander G. Abanov, Jerome R. Busemeyer, Luca Candelori, Nadine Hartmann, Ernest T. Lam, Kharen Musaelian, Ryan Samson, Harold Steinacker, Dario Villani, Martin T. Wells, Richard J. Wenstrup, Mengjia Xu
The accurate prediction of chromosomal instability from the morphology of circulating tumor cells (CTCs) enables real-time detection of CTCs with high metastatic potential in the context of liquid biopsy diagnostics. However, it presents a significant challenge due to the high dimensionality and complexity of single-cell digital pathology data. Here, we introduce the application of Quantum Cognition Machine Learning (QCML), a quantum-inspired computational framework, to estimate morphology-predicted chromosomal instability in CTCs from patients with metastatic breast cancer. QCML leverages quantum mechanical principles to represent data as state vectors in a Hilbert space, enabling context-aware feature modeling, dimensionality reduction, and enhanced generalization without requiring curated feature selection. QCML outperforms conventional machine learning methods when tested on out of sample verification CTCs, achieving higher accuracy in identifying predicted large-scale state transitions (pLST) status from CTC-derived morphology features. These preliminary findings support the application of QCML as a novel machine learning tool with superior performance in high-dimensional, low-sample-size biomedical contexts. QCML enables the simulation of cognition-like learning for the identification of biologically meaningful prediction of chromosomal instability from CTC morphology, offering a novel tool for CTC classification in liquid biopsy.
nan
Article 415
Title@2025-07-22 (2): Soft Computing Approaches for Predicting Shade-Seeking Behaviour in Dairy Cattle under Heat Stress: A Comparative Study of Random Forests and Neural Networks
Title: Soft Computing Approaches for Predicting Shade-Seeking Behaviour in Dairy Cattle under Heat Stress: A Comparative Study of Random Forests and Neural Networks | Soft Computing Ansätze zur Vorhersage von Shade-Seeking Verhalten bei Milchvieh unter Hitzestress: Eine vergleichende Studie von Random Forests und Neuronalen Netzwerken | 预测受热压力的奶牛的变形寻找行为的软计算方法:随机森林和神经网络比较研究 2501.05494v2 |
Authors (6): S. Sanjuan, D. A. Méndez, R. Arnau, J. M. Calabuig, X. Díaz de Otálora Aguirre, F. Estellés
Heat stress is one of the main welfare and productivity problems faced by dairy cattle in Mediterranean climates. In this study, we approach the prediction of the daily shade-seeking count as a non-linear multivariate regression problem and evaluate two soft computing algorithms – Random Forests and Neural Networks – trained on high-resolution behavioral and micro-climatic data collected in a commercial farm in Titaguas (Valencia, Spain) during the 2023 summer season. The raw dataset (6907 daytime observations, 5-10 min resolution) includes the number of cows in the shade, ambient temperature and relative humidity. From these we derive three features: current Temperature–Humidity Index (THI), accumulated daytime THI, and mean night-time THI. To evaluate the models’ performance a 5-fold cross-validation is also used. Results show that both soft computing models outperform a single Decision Tree baseline. The best Neural Network (3 hidden layers, 16 neurons each, learning rate = 10e-3) reaches an average RMSE of 14.78, while a Random Forest (10 trees, depth = 5) achieves 14.97 and offers best interpretability. Daily error distributions reveal a median RMSE of 13.84 and confirm that predictions deviate less than one hour from observed shade-seeking peaks. These results demonstrate the suitability of soft computing, data-driven approaches embedded in an applied-mathematical feature framework for modeling noisy biological phenomena, demonstrating their value as low-cost, real-time decision-support tools for precision livestock farming under heat-stress conditions.
nan
Article 416
Title@2025-07-22 (2): Graph Neural Network-Based Distributed Optimal Control for Linear Networked Systems: An Online Distributed Training Approach
Title: Graph Neural Network-Based Distributed Optimal Control for Linear Networked Systems: An Online Distributed Training Approach | Graph Neural Network-based Distributed Optimal Control for Linear Networked Systems: Ein Online Distributed Training Approach | 线性网络系统分布式最佳最佳控制:在线分布式培训方法 2504.06439v2 |
Authors (4): Zihao Song, Shirantha Welikala, Panos J. Antsaklis, Hai Lin
In this paper, we consider the distributed optimal control problem for discrete-time linear networked systems. In particular, we are interested in learning distributed optimal controllers using graph recurrent neural networks (GRNNs). Most of the existing approaches result in centralized optimal controllers with offline training processes. However, as the increasing demand of network resilience, the optimal controllers are further expected to be distributed, and are desirable to be trained in an online distributed fashion, which are also the main contributions of our work. To solve this problem, we first propose a GRNN-based distributed optimal control method, and we cast the problem as a self-supervised learning problem. Then, the distributed online training is achieved via distributed gradient computation, and inspired by the (consensus-based) distributed optimization idea, a distributed online training optimizer is designed. Furthermore, the local closed-loop stability of the linear networked system under our proposed GRNN-based controller is provided by assuming that the nonlinear activation function of the GRNN-based controller is both local sector-bounded and slope-restricted. The effectiveness of our proposed method is illustrated by numerical simulations using a specifically developed simulator.
nan
Article 417
Title@2025-07-22 (2): Towards Automated Regulatory Compliance Verification in Financial Auditing with Large Language Models
Title: Towards Automated Regulatory Compliance Verification in Financial Auditing with Large Language Models | Auf dem Weg zu einer automatisierten Überprüfung der regulatorischen Compliance bei der Finanzprüfung mit großen Sprachmodellen | 采用大语言模式进行财务审计自动监管合规核查 2507.16642v1 |
Authors (11): Armin Berger, Lars Hillebrand, David Leonhard, Tobias Deußer, Thiago Bell Felix de Oliveira, Tim Dilmaghani, Mohamed Khaled, Bernd Kliem, Rüdiger Loitz, Christian Bauckhage, Rafet Sifa
The auditing of financial documents, historically a labor-intensive process, stands on the precipice of transformation. AI-driven solutions have made inroads into streamlining this process by recommending pertinent text passages from financial reports to align with the legal requirements of accounting standards. However, a glaring limitation remains: these systems commonly fall short in verifying if the recommended excerpts indeed comply with the specific legal mandates. Hence, in this paper, we probe the efficiency of publicly available Large Language Models (LLMs) in the realm of regulatory compliance across different model configurations. We place particular emphasis on comparing cutting-edge open-source LLMs, such as Llama-2, with their proprietary counterparts like OpenAI’s GPT models. This comparative analysis leverages two custom datasets provided by our partner PricewaterhouseCoopers (PwC) Germany. We find that the open-source Llama-2 70 billion model demonstrates outstanding performance in detecting non-compliance or true negative occurrences, beating all their proprietary counterparts. Nevertheless, proprietary models such as GPT-4 perform the best in a broad variety of scenarios, particularly in non-English contexts.
nan
Article 418
Title@2025-07-22 (2): Hybrid Reward-Driven Reinforcement Learning for Efficient Quantum Circuit Synthesis
Title: Hybrid Reward-Driven Reinforcement Learning for Efficient Quantum Circuit Synthesis | Hybrid-Reward-getriebenes Verstärkungslernen für effiziente Quantenschaltungssynthese | 高效量子电路合成增强学习 2507.16641v1 |
Authors (3): Sara Giordano, Kornikar Sen, Miguel A. Martin-Delgado
A reinforcement learning (RL) framework is introduced for the efficient synthesis of quantum circuits that generate specified target quantum states from a fixed initial state, addressing a central challenge in both the NISQ era and future fault-tolerant quantum computing. The approach utilizes tabular Q-learning, based on action sequences, within a discretized quantum state space, to effectively manage the exponential growth of the space dimension. The framework introduces a hybrid reward mechanism, combining a static, domain-informed reward that guides the agent toward the target state with customizable dynamic penalties that discourage inefficient circuit structures such as gate congestion and redundant state revisits. By leveraging sparse matrix representations and state-space discretization, the method enables scalable navigation of high-dimensional environments while minimizing computational overhead. Benchmarking on graph-state preparation tasks for up to seven qubits, we demonstrate that the algorithm consistently discovers minimal-depth circuits with optimized gate counts. Moreover, extending the framework to a universal gate set for arbitrary quantum states, it still produces minimal depth circuits, highlighting the algorithm’s robustness and adaptability. The results confirm that this RL-driven approach efficiently explores the complex quantum state space and synthesizes near-optimal quantum circuits, providing a resource-efficient foundation for quantum circuit optimization.
nan
Article 419
Title@2025-07-22 (2): Risk and cross validation in ridge regression with correlated samples
Title: Risk and cross validation in ridge regression with correlated samples | Risiko- und Kreuzvalidierung bei der Regression des Grats mit korrelierten Proben | 具有相关样本的山脊回归风险和交叉验证 2408.04607v5 |
Authors (3): Alexander Atanasov, Jacob A. Zavatone-Veth, Cengiz Pehlevan
Recent years have seen substantial advances in our understanding of high-dimensional ridge regression, but existing theories assume that training examples are independent. By leveraging techniques from random matrix theory and free probability, we provide sharp asymptotics for the in- and out-of-sample risks of ridge regression when the data points have arbitrary correlations. We demonstrate that in this setting, the generalized cross validation estimator (GCV) fails to correctly predict the out-of-sample risk. However, in the case where the noise residuals have the same correlations as the data points, one can modify the GCV to yield an efficiently-computable unbiased estimator that concentrates in the high-dimensional limit, which we dub CorrGCV. We further extend our asymptotic analysis to the case where the test point has nontrivial correlations with the training set, a setting often encountered in time series forecasting. Assuming knowledge of the correlation structure of the time series, this again yields an extension of the GCV estimator, and sharply characterizes the degree to which such test points yield an overly optimistic prediction of long-time risk. We validate the predictions of our theory across a variety of high dimensional data.
nan
Article 420
Title@2025-07-22 (2): Automatic Fine-grained Segmentation-assisted Report Generation
Title: Automatic Fine-grained Segmentation-assisted Report Generation | Automatische, feinkörnige Segmentierung unterstützte Berichtserstellung | 自动精精细分割辅助报告生成 2507.16623v1 |
Authors (9): Frederic Jonske, Constantin Seibold, Osman Alperen Koras, Fin Bahnsen, Marie Bauer, Amin Dada, Hamza Kalisch, Anton Schily, Jens Kleesiek
Reliable end-to-end clinical report generation has been a longstanding goal of medical ML research. The end goal for this process is to alleviate radiologists’ workloads and provide second opinions to clinicians or patients. Thus, a necessary prerequisite for report generation models is a strong general performance and some type of innate grounding capability, to convince clinicians or patients of the veracity of the generated reports. In this paper, we present ASaRG (\textbf{A}utomatic \textbf{S}egmentation-\textbf{a}ssisted \textbf{R}eport \textbf{G}eneration), an extension of the popular LLaVA architecture that aims to tackle both of these problems. ASaRG proposes to fuse intermediate features and fine-grained segmentation maps created by specialist radiological models into LLaVA’s multi-modal projection layer via simple concatenation. With a small number of added parameters, our approach achieves a +0.89\% performance gain ($p=0.012$) in CE F1 score compared to the LLaVA baseline when using only intermediate features, and +2.77\% performance gain ($p<0.001$) when adding a combination of intermediate features and fine-grained segmentation maps. Compared with COMG and ORID, two other report generation methods that utilize segmentations, the performance gain amounts to 6.98\% and 6.28\% in F1 score, respectively. ASaRG is not mutually exclusive with other changes made to the LLaVA architecture, potentially allowing our method to be combined with other advances in the field. Finally, the use of an arbitrary number of segmentations as part of the input demonstrably allows tracing elements of the report to the corresponding segmentation maps and verifying the groundedness of assessments. Our code will be made publicly available at a later date.
nan
Article 421
Title@2025-07-22 (2): Towards a deeper GCN: Alleviate over-smoothing with iterative training and fine-tuning
Title: Towards a deeper GCN: Alleviate over-smoothing with iterative training and fine-tuning | Auf dem Weg zu einer tieferen GCN: Überglätten mit iterativem Training und Feinabstimmung | 更深入的GCN:通过迭接培训和微调,减轻过度缓解 2506.17576v2 |
Authors (6): Furong Peng, Jinzhen Gao, Xuan Lu, Kang Liu, Yifan Huo, Sheng Wang
Graph Convolutional Networks (GCNs) suffer from severe performance degradation in deep architectures due to over-smoothing. While existing studies primarily attribute the over-smoothing to repeated applications of graph Laplacian operators, our empirical analysis reveals a critical yet overlooked factor: trainable linear transformations in GCNs significantly exacerbate feature collapse, even at moderate depths (e.g., 8 layers). In contrast, Simplified Graph Convolution (SGC), which removes these transformations, maintains stable feature diversity up to 32 layers, highlighting linear transformations’ dual role in facilitating expressive power and inducing over-smoothing. However, completely removing linear transformations weakens the model’s expressive capacity. To address this trade-off, we propose Layer-wise Gradual Training (LGT), a novel training strategy that progressively builds deep GCNs while preserving their expressiveness. LGT integrates three complementary components: (1) layer-wise training to stabilize optimization from shallow to deep layers, (2) low-rank adaptation to fine-tune shallow layers and accelerate training, and (3) identity initialization to ensure smooth integration of new layers and accelerate convergence. Extensive experiments on benchmark datasets demonstrate that LGT achieves state-of-the-art performance on vanilla GCN, significantly improving accuracy even in 32-layer settings. Moreover, as a training method, LGT can be seamlessly combined with existing methods such as PairNorm and ContraNorm, further enhancing their performance in deeper networks. LGT offers a general, architecture-agnostic training framework for scalable deep GCNs. The code is available at [https://github.com/jfklasdfj/LGT_GCN].
nan
Article 422
Title@2025-07-22 (2): Stable and Accurate Orbital-Free DFT Powered by Machine Learning
Title: Stable and Accurate Orbital-Free DFT Powered by Machine Learning | Stabile und genaue Orbital-Free DFT Powered by Machine Learning | 借助机器学习的稳定和准确的无轨道无轨道DFT 2503.00443v2 |
Authors (13): Roman Remme, Tobias Kaczun, Tim Ebert, Christof A. Gehrig, Dominik Geng, Gerrit Gerhartz, Marc K. Ickler, Manuel V. Klockow, Peter Lippmann, Johannes S. Schmidt, Simon Wagner, Andreas Dreuw, Fred A. Hamprecht
Hohenberg and Kohn have proven that the electronic energy and the one-particle electron density can, in principle, be obtained by minimizing an energy functional with respect to the density. While decades of theoretical work have produced increasingly faithful approximations to this elusive exact energy functional, their accuracy is still insufficient for many applications, making it reasonable to try and learn it empirically. Using rotationally equivariant atomistic machine learning, we obtain for the first time a density functional that, when applied to the organic molecules in QM9, yields energies with chemical accuracy relative to the Kohn-Sham reference while also converging to meaningful electron densities. Augmenting the training data with densities obtained from perturbed potentials proved key to these advances. This work demonstrates that machine learning can play a crucial role in narrowing the gap between theory and the practical realization of Hohenberg and Kohn’s vision, paving the way for more efficient calculations in large molecular systems.
nan
Article 423
Title@2025-07-22 (2): Rethinking Data Input for Point Cloud Upsampling
Title: Rethinking Data Input for Point Cloud Upsampling | Dateneingabe für Punkt-Cloud-Upsampling neu denken | 重新思考点云取样的数据输入 2407.04476v3 |
Authors (1): Tongxu Zhang
Point cloud upsampling is crucial for tasks like 3D reconstruction. While existing methods rely on patch-based inputs, and there is no research discussing the differences and principles between point cloud model full input and patch based input. Ergo, we propose a novel approach using whole model inputs i.e. Average Segment input. Our experiments on PU1K and ABC datasets reveal that patch-based inputs consistently outperform whole model inputs. To understand this, we will delve into factors in feature extraction, and network architecture that influence upsampling results.
nan
Article 424
Title@2025-07-22 (2): A computational transition for detecting correlated stochastic block models by low-degree polynomials
Title: A computational transition for detecting correlated stochastic block models by low-degree polynomials | Ein rechnerischer Übergang zur Erkennung korrelierter stochastischer Blockmodelle durch Low-Grad-Polynome | 用低度多元度探测相关随机区块模型的计算过渡 2409.00966v2 |
Authors (4): Guanyi Chen, Jian Ding, Shuyang Gong, Zhangsong Li
Detection of correlation in a pair of random graphs is a fundamental statistical and computational problem that has been extensively studied in recent years. In this work, we consider a pair of correlated (sparse) stochastic block models $\mathcal{S}(n,\tfrac{\lambda}{n};k,\epsilon;s)$ that are subsampled from a common parent stochastic block model $\mathcal S(n,\tfrac{\lambda}{n};k,\epsilon)$ with $k=O(1)$ symmetric communities, average degree $\lambda=O(1)$, divergence parameter $\epsilon$, and subsampling probability $s$. For the detection problem of distinguishing this model from a pair of independent Erd\H{o}s-R'enyi graphs with the same edge density $\mathcal{G}(n,\tfrac{\lambda s}{n})$, we focus on tests based on \emph{low-degree polynomials} of the entries of the adjacency matrices, and we determine the threshold that separates the easy and hard regimes. More precisely, we show that this class of tests can distinguish these two models if and only if $s> \min { \sqrt{\alpha}, \frac{1}{\lambda \epsilon^2} }$, where $\alpha\approx 0.338$ is the Otter’s constant and $\frac{1}{\lambda \epsilon^2}$ is the Kesten-Stigum threshold. Combining a reduction argument in \cite{Li25+}, our hardness result also implies low-degree hardness for partial recovery and detection (to independent block models) when $s< \min { \sqrt{\alpha}, \frac{1}{\lambda \epsilon^2} }$. Finally, our proof of low-degree hardness is based on a conditional variant of the low-degree likelihood calculation.
nan
Article 425
Title@2025-07-22 (2): Adaptive Gaussian Mixture Models-based Anomaly Detection for under-constrained Cable-Driven Parallel Robots
Title: Adaptive Gaussian Mixture Models-based Anomaly Detection for under-constrained Cable-Driven Parallel Robots | Adaptive Gaussian Mixture Models-basierte Anomalieerkennung für unterbeschränkte kabelgetriebene Parallelroboter | 用于控制不足的有线驱动平行机器人的适应性高斯混合混合模型异常探测 2507.07714v2 |
Authors (6): Julio Garrido, Javier Vales, Diego Silva-Muñiz, Enrique Riveiro, Pablo López-Matencio, Josué Rivera-Andrade
Cable-Driven Parallel Robots (CDPRs) are increasingly used for load manipulation tasks involving predefined toolpaths with intermediate stops. At each stop, where the platform maintains a fixed pose and the motors keep the cables under tension, the system must evaluate whether it is safe to proceed by detecting anomalies that could compromise performance (e.g., wind gusts or cable impacts). This paper investigates whether anomalies can be detected using only motor torque data, without additional sensors. It introduces an adaptive, unsupervised outlier detection algorithm based on Gaussian Mixture Models (GMMs) to identify anomalies from torque signals. The method starts with a brief calibration period, just a few seconds, during which a GMM is fit on known anomaly-free data. Real-time torque measurements are then evaluated using Mahalanobis distance from the GMM, with statistically derived thresholds triggering anomaly flags. Model parameters are periodically updated using the latest segments identified as anomaly-free to adapt to changing conditions. Validation includes 14 long-duration test sessions simulating varied wind intensities. The proposed method achieves a 100% true positive rate and 95.4% average true negative rate, with 1-second detection latency. Comparative evaluation against power threshold and non-adaptive GMM methods indicates higher robustness to drift and environmental variation.
nan
Article 426
Title@2025-07-22 (2): Spectral Algorithms under Covariate Shift
Title: Spectral Algorithms under Covariate Shift | Spektrale Algorithmen unter Kovariate Verschiebung | 共变量移动下的频谱值 2504.12625v2 |
Authors (3): Jun Fan, Zheng-Chu Guo, Lei Shi
Spectral algorithms leverage spectral regularization techniques to analyze and process data, providing a flexible framework for addressing supervised learning problems. To deepen our understanding of their performance in real-world scenarios where the distributions of training and test data may differ, we conduct a rigorous investigation into the convergence behavior of spectral algorithms under covariate shift. In this setting, the marginal distributions of the input data differ between the training and test datasets, while the conditional distribution of the output given the input remains unchanged. Within a non-parametric regression framework over a reproducing kernel Hilbert space, we analyze the convergence rates of spectral algorithms under covariate shift and show that they achieve minimax optimality when the density ratios between the training and test distributions are uniformly bounded. However, when these density ratios are unbounded, the spectral algorithms may become suboptimal. To address this issue, we propose a novel weighted spectral algorithm with normalized weights that incorporates density ratio information into the learning process. Our theoretical analysis shows that this normalized weighted approach achieves optimal capacity-independent convergence rates, but the rates will suffer from the saturation phenomenon. Furthermore, by introducing a weight clipping technique, we demonstrate that the convergence rates of the weighted spectral algorithm with clipped weights can approach the optimal capacity-dependent convergence rates arbitrarily closely. This improvement resolves the suboptimality issue in unbounded density ratio scenarios and advances the state-of-the-art by refining existing theoretical results.
nan
Article 427
Title@2025-07-22 (2): Antithetic Sampling for Top-k Shapley Identification
Title: Antithetic Sampling for Top-k Shapley Identification | Antithetische Probenahme für Top-K Shapley-Identifikation | 顶部形状识别的抗抗异性取样 2504.02019v2 |
Authors (3): Patrick Kolpaczki, Tim Nielen, Eyke Hüllermeier
Additive feature explanations rely primarily on game-theoretic notions such as the Shapley value by viewing features as cooperating players. The Shapley value’s popularity in and outside of explainable AI stems from its axiomatic uniqueness. However, its computational complexity severely limits practicability. Most works investigate the uniform approximation of all features’ Shapley values, needlessly consuming samples for insignificant features. In contrast, identifying the $k$ most important features can already be sufficiently insightful and yields the potential to leverage algorithmic opportunities connected to the field of multi-armed bandits. We propose Comparable Marginal Contributions Sampling (CMCS), a method for the top-$k$ identification problem utilizing a new sampling scheme taking advantage of correlated observations. We conduct experiments to showcase the efficacy of our method in compared to competitive baselines. Our empirical findings reveal that estimation quality for the approximate-all problem does not necessarily transfer to top-$k$ identification and vice versa.
nan
Article 428
Title@2025-07-22 (2): Scaling Linear Attention with Sparse State Expansion
Title: Scaling Linear Attention with Sparse State Expansion | Scaling Lineare Aufmerksamkeit mit Sparse State Expansion | Sparassar 州扩展时的 缩放线性注意 2507.16577v1 |
Authors (9): Yuqi Pan, Yongqi An, Zheng Li, Yuhong Chou, Ruijie Zhu, Xiaohui Wang, Mingxuan Wang, Jinqiao Wang, Guoqi Li
The Transformer architecture, despite its widespread success, struggles with long-context scenarios due to quadratic computation and linear memory growth. While various linear attention variants mitigate these efficiency constraints by compressing context into fixed-size states, they often degrade performance in tasks such as in-context retrieval and reasoning. To address this limitation and achieve more effective context compression, we propose two key innovations. First, we introduce a row-sparse update formulation for linear attention by conceptualizing state updating as information classification. This enables sparse state updates via softmax-based top-$k$ hard classification, thereby extending receptive fields and reducing inter-class interference. Second, we present Sparse State Expansion (SSE) within the sparse framework, which expands the contextual state into multiple partitions, effectively decoupling parameter size from state capacity while maintaining the sparse classification paradigm. Our design, supported by efficient parallelized implementations, yields effective classification and discriminative state representations. We extensively validate SSE in both pure linear and hybrid (SSE-H) architectures across language modeling, in-context retrieval, and mathematical reasoning benchmarks. SSE demonstrates strong retrieval performance and scales favorably with state size. Moreover, after reinforcement learning (RL) training, our 2B SSE-H model achieves state-of-the-art mathematical reasoning performance among small reasoning models, scoring 64.7 on AIME24 and 51.3 on AIME25, significantly outperforming similarly sized open-source Transformers. These results highlight SSE as a promising and efficient architecture for long-context modeling.
nan
Article 429
Title@2025-07-22 (2): Leveraging Distribution Matching to Make Approximate Machine Unlearning Faster
Title: Leveraging Distribution Matching to Make Approximate Machine Unlearning Faster | Leveraging Distribution Passend, um annähernde Maschine Unlearning schneller zu machen | 利用配配配配的配送让近似机器更快退出学习 2507.09786v2 |
Authors (1): Junaid Iqbal Khan
Approximate machine unlearning (AMU) enables models to `forget’ specific training data through specialized fine-tuning on a retained dataset subset. However, processing this retained subset still dominates computational runtime, while reductions of epochs also remain a challenge. We propose two complementary methods to accelerate classification-oriented AMU. First, \textbf{Blend}, a novel distribution-matching dataset condensation (DC), merges visually similar images with shared blend-weights to significantly reduce the retained set size. It operates with minimal pre-processing overhead and is orders of magnitude faster than state-of-the-art DC methods. Second, our loss-centric method, \textbf{Accelerated-AMU (A-AMU)}, augments the unlearning objective to quicken convergence. A-AMU achieves this by combining a steepened primary loss to expedite forgetting with a novel, differentiable regularizer that matches the loss distributions of forgotten and in-distribution unseen data. Our extensive experiments demonstrate that this dual approach of data and loss-centric optimization dramatically reduces end-to-end unlearning latency across both single and multi-round scenarios, all while preserving model utility and privacy. To our knowledge, this is the first work to systematically tackle unlearning efficiency by jointly designing a specialized dataset condensation technique with a dedicated accelerated loss function. Code is available at https://github.com/algebraicdianuj/DC_Unlearning.
nan
Article 430
Title@2025-07-22 (2): Supernova: Achieving More with Less in Transformer Architectures
Title: Supernova: Achieving More with Less in Transformer Architectures | Supernova: Mit weniger Transformer-Architekturen mehr erreichen | 超新星:在变形结构结构中以更少的变形结构实现更大的成就 2507.15773v2 |
Authors (2): Andrei-Valentin Tanase, Elena Pelican
We present Supernova, a 650M-parameter decoder-only transformer that demonstrates how careful architectural design and tokenization innovation can achieve the performance of larger models while maintaining computational efficiency. Our architecture combines Rotary Positional Embeddings (RoPE), Grouped Query Attention (GQA) with a 3:1 compression ratio, RMSNorm for computational efficiency, and SwiGLU activation functions. A critical innovation is our custom 128,000-vocabulary byte-level BPE tokenizer, which achieves state-of-the-art compression performance. Through detailed analysis, we show that Supernova achieves 90% of the performance of 1B-parameter models while using 35% fewer parameters and requiring only 100B training tokens–an order of magnitude less than competing models. Our findings challenge the prevailing scaling paradigm, demonstrating that architectural efficiency and tokenization quality can compensate for reduced parameter counts.
nan
Article 431
Title@2025-07-22 (2): Families of Optimal Transport Kernels for Cell Complexes
Title: Families of Optimal Transport Kernels for Cell Complexes | Familien von optimalen Transport-Kerneln für Zellkomplexe | 细胞综合体最佳运输核心家庭 2507.16569v1 |
Authors (1): Rahul Khorana
Recent advances have discussed cell complexes as ideal learning representations. However, there is a lack of available machine learning methods suitable for learning on CW complexes. In this paper, we derive an explicit expression for the Wasserstein distance between cell complex signal distributions in terms of a Hodge-Laplacian matrix. This leads to a structurally meaningful measure to compare CW complexes and define the optimal transportation map. In order to simultaneously include both feature and structure information, we extend the Fused Gromov-Wasserstein distance to CW complexes. Finally, we introduce novel kernels over the space of probability measures on CW complexes based on the dual formulation of optimal transport.
nan
Article 432
Title@2025-07-22 (2): Exploring Gender Bias in Large Language Models: An In-depth Dive into the German Language
Title: Exploring Gender Bias in Large Language Models: An In-depth Dive into the German Language | Gender Bias in großen Sprachmodellen erforschen: Ein tiefer Einblick in die deutsche Sprache | 在大语言模式中探索性别偏见:深入跳入德语 2507.16557v1 |
Authors (4): Kristin Gnadt, David Thulke, Simone Kopeinik, Ralf Schlüter
In recent years, various methods have been proposed to evaluate gender bias in large language models (LLMs). A key challenge lies in the transferability of bias measurement methods initially developed for the English language when applied to other languages. This work aims to contribute to this research strand by presenting five German datasets for gender bias evaluation in LLMs. The datasets are grounded in well-established concepts of gender bias and are accessible through multiple methodologies. Our findings, reported for eight multilingual LLM models, reveal unique challenges associated with gender bias in German, including the ambiguous interpretation of male occupational terms and the influence of seemingly neutral nouns on gender perception. This work contributes to the understanding of gender bias in LLMs across languages and underscores the necessity for tailored evaluation frameworks.
nan
Article 433
Title@2025-07-22 (2): Optimization of DNN-based HSI Segmentation FPGA-based SoC for ADS: A Practical Approach
Title: Optimization of DNN-based HSI Segmentation FPGA-based SoC for ADS: A Practical Approach | Optimierung der DNN-basierten HSI-Segmentierung FPGA-basierten SoC für ADS: Ein praktischer Ansatz | 优化基于DNN 的基于DNNHSIHSI的ADS的基于FPGA的FPGA SoC分类:一种实用办法 2507.16556v1 |
Authors (3): Jon Gutiérrez-Zaballa, Koldo Basterretxea, Javier Echanobe
The use of HSI for autonomous navigation is a promising research field aimed at improving the accuracy and robustness of detection, tracking, and scene understanding systems based on vision sensors. Combining advanced computer algorithms, such as DNNs, with small-size snapshot HSI cameras enhances the reliability of these systems. HSI overcomes intrinsic limitations of greyscale and RGB imaging in depicting physical properties of targets, particularly regarding spectral reflectance and metamerism. Despite promising results in HSI-based vision developments, safety-critical systems like ADS demand strict constraints on latency, resource consumption, and security, motivating the shift of ML workloads to edge platforms. This involves a thorough software/hardware co-design scheme to distribute and optimize the tasks efficiently among the limited resources of computing platforms. With respect to inference, the over-parameterized nature of DNNs poses significant computational challenges for real-time on-the-edge deployment. In addition, the intensive data preprocessing required by HSI, which is frequently overlooked, must be carefully managed in terms of memory arrangement and inter-task communication to enable an efficient integrated pipeline design on a SoC. This work presents a set of optimization techniques for the practical co-design of a DNN-based HSI segmentation processor deployed on a FPGA-based SoC targeted at ADS, including key optimizations such as functional software/hardware task distribution, hardware-aware preprocessing, ML model compression, and a complete pipelined deployment. Applied compression techniques significantly reduce the complexity of the designed DNN to 24.34% of the original operations and to 1.02% of the original number of parameters, achieving a 2.86x speed-up in the inference task without noticeable degradation of the segmentation accuracy.
nan
Article 434
Title@2025-07-22 (2): A Comprehensive Data-centric Overview of Federated Graph Learning
Title: A Comprehensive Data-centric Overview of Federated Graph Learning | Ein umfassender datenzentrierter Überblick über das Federated Graph Learning | 以数据为核心的联邦图表学习综合概览 2507.16541v1 |
Authors (11): Zhengyu Wu, Xunkai Li, Yinlin Zhu, Zekai Chen, Guochen Yan, Yanyu Yan, Hao Zhang, Yuming Ai, Xinmo Jin, Rong-Hua Li, Guoren Wang
In the era of big data applications, Federated Graph Learning (FGL) has emerged as a prominent solution that reconcile the tradeoff between optimizing the collective intelligence between decentralized datasets holders and preserving sensitive information to maximum. Existing FGL surveys have contributed meaningfully but largely focus on integrating Federated Learning (FL) and Graph Machine Learning (GML), resulting in early stage taxonomies that emphasis on methodology and simulated scenarios. Notably, a data centric perspective, which systematically examines FGL methods through the lens of data properties and usage, remains unadapted to reorganize FGL research, yet it is critical to assess how FGL studies manage to tackle data centric constraints to enhance model performances. This survey propose a two-level data centric taxonomy: Data Characteristics, which categorizes studies based on the structural and distributional properties of datasets used in FGL, and Data Utilization, which analyzes the training procedures and techniques employed to overcome key data centric challenges. Each taxonomy level is defined by three orthogonal criteria, each representing a distinct data centric configuration. Beyond taxonomy, this survey examines FGL integration with Pretrained Large Models, showcases realistic applications, and highlights future direction aligned with emerging trends in GML.
nan
Article 435
Title@2025-07-22 (2): Symbolic Graph Intelligence: Hypervector Message Passing for Learning Graph-Level Patterns with Tsetlin Machines
Title: Symbolic Graph Intelligence: Hypervector Message Passing for Learning Graph-Level Patterns with Tsetlin Machines | Symbolische Graphenintelligenz: Hypervektor-Nachricht für das Lernen von Graph-Level-Mustern mit Tsetlin-Maschinen | 图示情报:用于学习的Tsetlin机器图层模式的超矢量信息传递 2507.16537v1 |
Authors (1): Christian D. Blakely
We propose a multilayered symbolic framework for general graph classification that leverages sparse binary hypervectors and Tsetlin Machines. Each graph is encoded through structured message passing, where node, edge, and attribute information are bound and bundled into a symbolic hypervector. This process preserves the hierarchical semantics of the graph through layered binding from node attributes to edge relations to structural roles resulting in a compact, discrete representation. We also formulate a local interpretability framework which lends itself to a key advantage of our approach being locally interpretable. We validate our method on TUDataset benchmarks, demonstrating competitive accuracy with strong symbolic transparency compared to neural graph models.
nan
Article 436
Title@2025-07-22 (2): Frontier AI Risk Management Framework in Practice: A Risk Analysis Technical Report
Title: Frontier AI Risk Management Framework in Practice: A Risk Analysis Technical Report | Frontier AI Risk Management Framework in der Praxis: Ein technischer Bericht zur Risikoanalyse | 《国际边界风险管理框架实际操作:风险分析技术报告》 2507.16534v1 |
Authors (38): Shanghai AI Lab, :, Xiaoyang Chen, Yunhao Chen, Zeren Chen, Zhiyun Chen, Hanyun Cui, Yawen Duan, Jiaxuan Guo, Qi Guo, Xuhao Hu, Hong Huang, Lige Huang, Chunxiao Li, Juncheng Li, Qihao Lin, Dongrui Liu, Xinmin Liu, Zicheng Liu, Chaochao Lu, Xiaoya Lu, Jingjing Qu, Qibing Ren, Jing Shao, Jingwei Shi, Jingwei Sun, Peng Wang, Weibing Wang, Jia Xu, Lewen Yan, Xiao Yu, Yi Yu, Boxuan Zhang, Jie Zhang, Weichen Zhang, Zhijie Zheng, Tianyi Zhou, Bowen Zhou
To understand and identify the unprecedented risks posed by rapidly advancing artificial intelligence (AI) models, this report presents a comprehensive assessment of their frontier risks. Drawing on the E-T-C analysis (deployment environment, threat source, enabling capability) from the Frontier AI Risk Management Framework (v1.0) (SafeWork-F1-Framework), we identify critical risks in seven areas: cyber offense, biological and chemical risks, persuasion and manipulation, uncontrolled autonomous AI R\&D, strategic deception and scheming, self-replication, and collusion. Guided by the “AI-$45^\circ$ Law,” we evaluate these risks using “red lines” (intolerable thresholds) and “yellow lines” (early warning indicators) to define risk zones: green (manageable risk for routine deployment and continuous monitoring), yellow (requiring strengthened mitigations and controlled deployment), and red (necessitating suspension of development and/or deployment). Experimental results show that all recent frontier AI models reside in green and yellow zones, without crossing red lines. Specifically, no evaluated models cross the yellow line for cyber offense or uncontrolled AI R\&D risks. For self-replication, and strategic deception and scheming, most models remain in the green zone, except for certain reasoning models in the yellow zone. In persuasion and manipulation, most models are in the yellow zone due to their effective influence on humans. For biological and chemical risks, we are unable to rule out the possibility of most models residing in the yellow zone, although detailed threat modeling and in-depth assessment are required to make further claims. This work reflects our current understanding of AI frontier risks and urges collective action to mitigate these challenges.
nan
Article 437
Title@2025-07-22 (2): confopt: A Library for Implementation and Evaluation of Gradient-based One-Shot NAS Methods
Title: confopt: A Library for Implementation and Evaluation of Gradient-based One-Shot NAS Methods | confopt: Eine Bibliothek zur Implementierung und Bewertung von gradient-basierten One-Shot-NAS-Methoden | 实施和评价基于梯度的单制热NAS方法图书馆 2507.16533v1 |
Authors (5): Abhash Kumar Jha, Shakiba Moradian, Arjun Krishnakumar, Martin Rapp, Frank Hutter
Gradient-based one-shot neural architecture search (NAS) has significantly reduced the cost of exploring architectural spaces with discrete design choices, such as selecting operations within a model. However, the field faces two major challenges. First, evaluations of gradient-based NAS methods heavily rely on the DARTS benchmark, despite the existence of other available benchmarks. This overreliance has led to saturation, with reported improvements often falling within the margin of noise. Second, implementations of gradient-based one-shot NAS methods are fragmented across disparate repositories, complicating fair and reproducible comparisons and further development. In this paper, we introduce Configurable Optimizer (confopt), an extensible library designed to streamline the development and evaluation of gradient-based one-shot NAS methods. Confopt provides a minimal API that makes it easy for users to integrate new search spaces, while also supporting the decomposition of NAS optimizers into their core components. We use this framework to create a suite of new DARTS-based benchmarks, and combine them with a novel evaluation protocol to reveal a critical flaw in how gradient-based one-shot NAS methods are currently assessed. The code can be found at https://github.com/automl/ConfigurableOptimizer.
nan
Article 438
Title@2025-07-22 (2): Benchmarking machine learning models for predicting aerofoil performance
Title: Benchmarking machine learning models for predicting aerofoil performance | Benchmarking von Machine-Learning-Modellen zur Vorhersage der Leistungsfähigkeit des Öls | 确定用于预测油层性能的机器学习模型的基准基准 2504.15993v2 |
Authors (3): Oliver Summerell, Gerardo Aragon-Camarasa, Stephanie Ordonez Sanchez
This paper investigates the capability of Neural Networks (NNs) as alternatives to the traditional methods to analyse the performance of aerofoils used in the wind and tidal energy industry. The current methods used to assess the characteristic lift and drag coefficients include Computational Fluid Dynamics (CFD), thin aerofoil and panel methods, all face trade-offs between computational speed and the accuracy of the results and as such NNs have been investigated as an alternative with the aim that it would perform both quickly and accurately. As such, this paper provides a benchmark for the windAI_bench dataset published by the National Renewable Energy Laboratory (NREL) in the USA. In order to validate the methodology of the benchmarking, the AirfRANSdataset benchmark is used as both a starting point and a point of comparison. This study evaluates four neural networks (MLP, PointNet, GraphSAGE, GUNet) trained on a range of aerofoils at 25 angles of attack (4$^\circ$ to 20$^\circ$) to predict fluid flow and calculate lift coefficients ($C_L$) via the panel method. GraphSAGE and GUNet performed well during the training phase, but underperformed during testing. Accordingly, this paper has identified PointNet and MLP as the two strongest models tested, however whilst the results from MLP are more commonly correct for predicting the behaviour of the fluid, the results from PointNet provide the more accurate results for calculating $C_L$.
nan
Article 439
Title@2025-07-22 (2): Neural Approaches for Multi-Objective Routing on Multigraphs
Title: Neural Approaches for Multi-Objective Routing on Multigraphs | Neurale Ansätze für multi-objektives Routing auf Multigraphen | 多种计量多目的路由的神经方法 2506.22095v2 |
Authors (5): Filip Rydin, Attila Lischka, Jiaming Wu, Morteza Haghir Chehreghani, Balázs Kulcsár
Learning-based methods for routing have gained significant attention in recent years, both in single-objective and multi-objective contexts. Yet, existing methods are unsuitable for routing on multigraphs, which feature multiple edges with distinct attributes between node pairs, despite their strong relevance in real-world scenarios. In this paper, we propose two graph neural network-based methods to address multi-objective routing on multigraphs. Our first approach operates directly on the multigraph by autoregressively selecting edges until a tour is completed. The second model first simplifies the multigraph via a learned pruning strategy and then performs routing on the resulting simple graph. We evaluate both models empirically and demonstrate their strong performance across a range of problems and distributions.
nan
Article 440
Title@2025-07-22 (2): C2-Evo: Co-Evolving Multimodal Data and Model for Self-Improving Reasoning
Title: C2-Evo: Co-Evolving Multimodal Data and Model for Self-Improving Reasoning | C2-Evo: Co-Evolving multimodale Daten und Modell zur Selbstverbesserung | C2-Evo:共同演进的多模式数据和自我改进理由模型 2507.16518v1 |
Authors (12): Xiuwei Chen, Wentao Hu, Hanhui Li, Jun Zhou, Zisheng Chen, Meng Cao, Yihan Zeng, Kui Zhang, Yu-Jie Yuan, Jianhua Han, Hang Xu, Xiaodan Liang
Recent advances in multimodal large language models (MLLMs) have shown impressive reasoning capabilities. However, further enhancing existing MLLMs necessitates high-quality vision-language datasets with carefully curated task complexities, which are both costly and challenging to scale. Although recent self-improving models that iteratively refine themselves offer a feasible solution, they still suffer from two core challenges: (i) most existing methods augment visual or textual data separately, resulting in discrepancies in data complexity (e.g., over-simplified diagrams paired with redundant textual descriptions); and (ii) the evolution of data and models is also separated, leading to scenarios where models are exposed to tasks with mismatched difficulty levels. To address these issues, we propose C2-Evo, an automatic, closed-loop self-improving framework that jointly evolves both training data and model capabilities. Specifically, given a base dataset and a base model, C2-Evo enhances them by a cross-modal data evolution loop and a data-model evolution loop. The former loop expands the base dataset by generating complex multimodal problems that combine structured textual sub-problems with iteratively specified geometric diagrams, while the latter loop adaptively selects the generated problems based on the performance of the base model, to conduct supervised fine-tuning and reinforcement learning alternately. Consequently, our method continuously refines its model and training data, and consistently obtains considerable performance gains across multiple mathematical reasoning benchmarks. Our code, models, and datasets will be released.
nan
Article 441
Title@2025-07-22 (2): Analogy making as amortised model construction
Title: Analogy making as amortised model construction | Analoge Herstellung als amortisierter Modellbau | 模拟作为摊还模型建造 2507.16511v1 |
Authors (5): David G. Nagy, Tingke Shen, Hanqi Zhou, Charley M. Wu, Peter Dayan
Humans flexibly construct internal models to navigate novel situations. To be useful, these internal models must be sufficiently faithful to the environment that resource-limited planning leads to adequate outcomes; equally, they must be tractable to construct in the first place. We argue that analogy plays a central role in these processes, enabling agents to reuse solution-relevant structure from past experiences and amortise the computational costs of both model construction (construal) and planning. Formalising analogies as partial homomorphisms between Markov decision processes, we sketch a framework in which abstract modules, derived from previous construals, serve as composable building blocks for new ones. This modular reuse allows for flexible adaptation of policies and representations across domains with shared structural essence.
nan
Article 442
Title@2025-07-22 (2): Network Analytics for Anti-Money Laundering – A Systematic Literature Review and Experimental Evaluation
Title: Network Analytics for Anti-Money Laundering – A Systematic Literature Review and Experimental Evaluation | Network Analytics for Anti-Money Laundering – Eine systematische Literaturrecherche und experimentelle Auswertung | 反洗钱网络分析 – – 系统文献审查和实验评价 2405.19383v4 |
Authors (5): Bruno Deprez, Toon Vanderschueren, Bart Baesens, Tim Verdonck, Wouter Verbeke
Money laundering presents a pervasive challenge, burdening society by financing illegal activities. The use of network information is increasingly being explored to effectively combat money laundering, given it involves connected parties. This led to a surge in research on network analytics for anti-money laundering (AML). The literature is, however, fragmented and a comprehensive overview of existing work is missing. This results in limited understanding of the methods to apply and their comparative detection power. This paper presents an extensive and unique literature review, based on 97 papers from Web of Science and Scopus, resulting in a taxonomy following a recently proposed fraud analytics framework. We conclude that most research relies on expert-based rules and manual features, while deep learning methods have been gaining traction. This paper also presents a comprehensive framework to evaluate and compare the performance of prominent methods in a standardized setup. We compare manual feature engineering, random walk-based, and deep learning methods on two publicly available data sets. We conclude that (1) network analytics increases the predictive power, but caution is needed when applying GNNs in the face of class imbalance and network topology, and that (2) care should be taken with synthetic data as this can give overly optimistic results. The open-source implementation facilitates researchers and practitioners to extend this work on proprietary data, promoting a standardised approach for the analysis and evaluation of network analytics for AML.
nan
Article 443
Title@2025-07-22 (2): Sparrow: Data-Efficient Video-LLM with Text-to-Image Augmentation
Title: Sparrow: Data-Efficient Video-LLM with Text-to-Image Augmentation | Sparrow: Dateneffizientes Video-LLM mit Text-zu-Bild-Erweiterung | 麻雀:数据有效视频LLM,带有文本到图像放大功能 2411.19951v5 |
Authors (10): Shukang Yin, Chaoyou Fu, Sirui Zhao, Chunjiang Ge, Yan Yang, Yuhan Dai, Yongdong Luo, Tong Xu, Caifeng Shan, Enhong Chen
Recent years have seen the success of Multimodal Large Language Models (MLLMs) in the domain of vision understanding. The success of these models can largely be attributed to the dominant scaling law, which states that larger parameter sizes and data volumes contribute to better performance. Notably, data scaling has been primarily driven by automatic data pipelines, which focus on the self-instruction of LLMs. The paradigm has been taken for granted for quite some time, but the study of the effectiveness of scaling with these data has been neglected for a long time. In this context, this work revisits scaling with synthetic data and focuses on developing video-LLMs from a data-centric perspective. Our primary study approach involves fine-tuning pre-trained image-LLMs with video data and examining learning efficiency through data scaling. Results from our preliminary experiments reveal a low learning efficiency phenomenon when simply scaling up video data samples, which, through our probing, can be ascribed to a lack of instruction diversity. Aiming at this issue, we propose a data augmentation method called Sparrow, which synthesizes video-like samples from pure text instruction data. Mixing these synthetic samples with the video data enables a more efficient training scheme. Through comprehensive experiments, we demonstrate that our proposed method achieves performance comparable to or even superior to that of baselines trained with significantly more samples. Meanwhile, we find that incorporating these synthetic samples can enhance the performance of long video understanding without requiring training on long video data. The code and data examples are available at https://github.com/VITA-MLLM/Sparrow.
nan
Article 444
Title@2025-07-22 (2): FlowEdit: Inversion-Free Text-Based Editing Using Pre-Trained Flow Models
Title: FlowEdit: Inversion-Free Text-Based Editing Using Pre-Trained Flow Models | FlowEdit: Inversionsfreies Text-basiertes Bearbeiten mit vortrainierten Flow-Modellen | 流程:使用预先培训的流程模型进行无逆向无文本编辑 2412.08629v2 |
Authors (4): Vladimir Kulikov, Matan Kleiner, Inbar Huberman-Spiegelglas, Tomer Michaeli
Editing real images using a pre-trained text-to-image (T2I) diffusion/flow model often involves inverting the image into its corresponding noise map. However, inversion by itself is typically insufficient for obtaining satisfactory results, and therefore many methods additionally intervene in the sampling process. Such methods achieve improved results but are not seamlessly transferable between model architectures. Here, we introduce FlowEdit, a text-based editing method for pre-trained T2I flow models, which is inversion-free, optimization-free and model agnostic. Our method constructs an ODE that directly maps between the source and target distributions (corresponding to the source and target text prompts) and achieves a lower transport cost than the inversion approach. This leads to state-of-the-art results, as we illustrate with Stable Diffusion 3 and FLUX. Code and examples are available on the project’s webpage.
nan
Article 445
Title@2025-07-22 (2): BioMaze: Benchmarking and Enhancing Large Language Models for Biological Pathway Reasoning
Title: BioMaze: Benchmarking and Enhancing Large Language Models for Biological Pathway Reasoning | BioMaze: Benchmarking und Verbesserung großer Sprachmodelle für biologische Pathway-Reasoning | Biomaze:为生物途径理由确定基准和加强大语言模式 2502.16660v5 |
Authors (5): Haiteng Zhao, Chang Ma, Fangzhi Xu, Lingpeng Kong, Zhi-Hong Deng
The applications of large language models (LLMs) in various biological domains have been explored recently, but their reasoning ability in complex biological systems, such as pathways, remains underexplored, which is crucial for predicting biological phenomena, formulating hypotheses, and designing experiments. This work explores the potential of LLMs in pathway reasoning. We introduce BioMaze, a dataset with 5.1K complex pathway problems derived from real research, covering various biological contexts including natural dynamic changes, disturbances, additional intervention conditions, and multi-scale research targets. Our evaluation of methods such as CoT and graph-augmented reasoning, shows that LLMs struggle with pathway reasoning, especially in perturbed systems. To address this, we propose PathSeeker, an LLM agent that enhances reasoning through interactive subgraph-based navigation, enabling a more effective approach to handling the complexities of biological systems in a scientifically aligned manner. The dataset and code are available at https://github.com/zhao-ht/BioMaze.
nan
Article 446
Title@2025-07-22 (2): Canonical Correlation Patterns for Validating Clustering of Multivariate Time Series
Title: Canonical Correlation Patterns for Validating Clustering of Multivariate Time Series | Canonical Correlation Patterns für die Validierung Clustering von Multivariate Time Series | 校验多变量时间序列群集的卡尼相对关系模式 2507.16497v1 |
Authors (4): Isabella Degen, Zahraa S Abdallah, Kate Robson Brown, Henry W J Reeve
Clustering of multivariate time series using correlation-based methods reveals regime changes in relationships between variables across health, finance, and industrial applications. However, validating whether discovered clusters represent distinct relationships rather than arbitrary groupings remains a fundamental challenge. Existing clustering validity indices were developed for Euclidean data, and their effectiveness for correlation patterns has not been systematically evaluated. Unlike Euclidean clustering, where geometric shapes provide discrete reference targets, correlations exist in continuous space without equivalent reference patterns. We address this validation gap by introducing canonical correlation patterns as mathematically defined validation targets that discretise the infinite correlation space into finite, interpretable reference patterns. Using synthetic datasets with perfect ground truth across controlled conditions, we demonstrate that canonical patterns provide reliable validation targets, with L1 norm for mapping and L5 norm for silhouette width criterion and Davies-Bouldin index showing superior performance. These methods are robust to distribution shifts and appropriately detect correlation structure degradation, enabling practical implementation guidelines. This work establishes a methodological foundation for rigorous correlation-based clustering validation in high-stakes domains.
nan
Article 447
Title@2025-07-22 (2): Combining Language and Topic Models for Hierarchical Text Classification
Title: Combining Language and Topic Models for Hierarchical Text Classification | Kombination von Sprach- und Themenmodellen für die Hierarchische Textklassifikation | 将等级文字分类的语言和专题模式相结合 2507.16490v1 |
Authors (2): Jaco du Toit, Marcel Dunaiski
Hierarchical text classification (HTC) is a natural language processing task which has the objective of categorising text documents into a set of classes from a predefined structured class hierarchy. Recent HTC approaches use various techniques to incorporate the hierarchical class structure information with the natural language understanding capabilities of pre-trained language models (PLMs) to improve classification performance. Furthermore, using topic models along with PLMs to extract features from text documents has been shown to be an effective approach for multi-label text classification tasks. The rationale behind the combination of these feature extractor models is that the PLM captures the finer-grained contextual and semantic information while the topic model obtains high-level representations which consider the corpus of documents as a whole. In this paper, we use a HTC approach which uses a PLM and a topic model to extract features from text documents which are used to train a classification model. Our objective is to determine whether the combination of the features extracted from the two models is beneficial to HTC performance in general. In our approach, the extracted features are passed through separate convolutional layers whose outputs are combined and passed to a label-wise attention mechanisms which obtains label-specific document representations by weighing the most important features for each class separately. We perform comprehensive experiments on three HTC benchmark datasets and show that using the features extracted from the topic model generally decreases classification performance compared to only using the features obtained by the PLM. In contrast to previous work, this shows that the incorporation of features extracted from topic models for text classification tasks should not be assumed beneficial.
nan
Article 448
Title@2025-07-22 (2): Learning from Data Streams: An Overview and Update
Title: Learning from Data Streams: An Overview and Update | Lernen aus Datenströmen: Eine Übersicht und Aktualisierung | 从数据流中学习:概览和最新情况 2212.14720v3 |
Authors (2): Jesse Read, Indrė Žliobaitė
The literature on machine learning in the context of data streams is vast and growing. However, many of the defining assumptions regarding data-stream learning tasks are too strong to hold in practice, or are even contradictory such that they cannot be met in the contexts of supervised learning. Algorithms are chosen and designed based on criteria which are often not clearly stated, for problem settings not clearly defined, tested in unrealistic settings, and/or in isolation from related approaches in the wider literature. This puts into question the potential for real-world impact of many approaches conceived in such contexts, and risks propagating a misguided research focus. We propose to tackle these issues by reformulating the fundamental definitions and settings of supervised data-stream learning with regard to contemporary considerations of concept drift and temporal dependence; and we take a fresh look at what constitutes a supervised data-stream learning task, and a reconsideration of algorithms that may be applied to tackle such tasks. Through and in reflection of this formulation and overview, helped by an informal survey of industrial players dealing with real-world data streams, we provide recommendations. Our main emphasis is that learning from data streams does not impose a single-pass or online-learning approach, or any particular learning regime; and any constraints on memory and time are not specific to streaming. Meanwhile, there exist established techniques for dealing with temporal dependence and concept drift, in other areas of the literature. For the data streams community, we thus encourage a shift in research focus, from dealing with often-artificial constraints and assumptions on the learning mode, to issues such as robustness, privacy, and interpretability which are increasingly relevant to learning in data streams in academic and industrial settings.
nan
Article 449
Title@2025-07-22 (2): Comparison of Optimised Geometric Deep Learning Architectures, over Varying Toxicological Assay Data Environments
Title: Comparison of Optimised Geometric Deep Learning Architectures, over Varying Toxicological Assay Data Environments | Vergleich von optimierten geometrischen Deep-Learning-Architekturen über unterschiedliche toxikologische Analyse-Datenumgebungen | 超过不同毒性分析数据环境的最佳几何深学习结构比较 2507.17775v1 |
Authors (9): Alexander D. Kalian, Lennart Otte, Jaewook Lee, Emilio Benfenati, Jean-Lou C. M. Dorne, Claire Potter, Olivia J. Osborne, Miao Guo, Christer Hogstrand
Geometric deep learning is an emerging technique in Artificial Intelligence (AI) driven cheminformatics, however the unique implications of different Graph Neural Network (GNN) architectures are poorly explored, for this space. This study compared performances of Graph Convolutional Networks (GCNs), Graph Attention Networks (GATs) and Graph Isomorphism Networks (GINs), applied to 7 different toxicological assay datasets of varying data abundance and endpoint, to perform binary classification of assay activation. Following pre-processing of molecular graphs, enforcement of class-balance and stratification of all datasets across 5 folds, Bayesian optimisations were carried out, for each GNN applied to each assay dataset (resulting in 21 unique Bayesian optimisations). Optimised GNNs performed at Area Under the Curve (AUC) scores ranging from 0.728-0.849 (averaged across all folds), naturally varying between specific assays and GNNs. GINs were found to consistently outperform GCNs and GATs, for the top 5 of 7 most data-abundant toxicological assays. GATs however significantly outperformed over the remaining 2 most data-scarce assays. This indicates that GINs are a more optimal architecture for data-abundant environments, whereas GATs are a more optimal architecture for data-scarce environments. Subsequent analysis of the explored higher-dimensional hyperparameter spaces, as well as optimised hyperparameter states, found that GCNs and GATs reached measurably closer optimised states with each other, compared to GINs, further indicating the unique nature of GINs as a GNN algorithm.
nan
Article 450
Title@2025-07-22 (2): Adaptive Bayesian Single-Shot Quantum Sensing
Title: Adaptive Bayesian Single-Shot Quantum Sensing | Adaptive Bayesian Single-Shot-Quantum Sensing | Bayesian 单制热量量遥感 2507.16477v1 |
Authors (3): Ivana Nikoloska, Ruud Van Sloun, Osvaldo Simeone
Quantum sensing harnesses the unique properties of quantum systems to enable precision measurements of physical quantities such as time, magnetic and electric fields, acceleration, and gravitational gradients well beyond the limits of classical sensors. However, identifying suitable sensing probes and measurement schemes can be a classically intractable task, as it requires optimizing over Hilbert spaces of high dimension. In variational quantum sensing, a probe quantum system is generated via a parameterized quantum circuit (PQC), exposed to an unknown physical parameter through a quantum channel, and measured to collect classical data. PQCs and measurements are typically optimized using offline strategies based on frequentist learning criteria. This paper introduces an adaptive protocol that uses Bayesian inference to optimize the sensing policy via the maximization of the active information gain. The proposed variational methodology is tailored for non-asymptotic regimes where a single probe can be deployed in each time step, and is extended to support the fusion of estimates from multiple quantum sensing agents.
nan
Article 451
Title@2025-07-22 (2): Estimating Treatment Effects with Independent Component Analysis
Title: Estimating Treatment Effects with Independent Component Analysis | Abschätzung der Behandlungseffekte mit unabhängiger Komponentenanalyse | 利用独立组成部分分析估算治疗效果 2507.16467v1 |
Authors (4): Patrik Reizinger, Lester Mackey, Wieland Brendel, Rahul Krishnan
The field of causal inference has developed a variety of methods to accurately estimate treatment effects in the presence of nuisance. Meanwhile, the field of identifiability theory has developed methods like Independent Component Analysis (ICA) to identify latent sources and mixing weights from data. While these two research communities have developed largely independently, they aim to achieve similar goals: the accurate and sample-efficient estimation of model parameters. In the partially linear regression (PLR) setting, Mackey et al. (2018) recently found that estimation consistency can be improved with non-Gaussian treatment noise. Non-Gaussianity is also a crucial assumption for identifying latent factors in ICA. We provide the first theoretical and empirical insights into this connection, showing that ICA can be used for causal effect estimation in the PLR model. Surprisingly, we find that linear ICA can accurately estimate multiple treatment effects even in the presence of Gaussian confounders or nonlinear nuisance.
nan
Article 452
Title@2025-07-22 (2): Machine learning-based multimodal prognostic models integrating pathology images and high-throughput omic data for overall survival prediction in cancer: a systematic review
Title: Machine learning-based multimodal prognostic models integrating pathology images and high-throughput omic data for overall survival prediction in cancer: a systematic review | Maschinelles Lernen-basierte multimodale prognostische Modelle zur Integration pathologischer Bilder und hochdurchsetzter omischer Daten für die Gesamtüberlebensvorhersage bei Krebs: eine systematische Überprüfung | 综合病理图象和高通量血压数据以全面预测癌症存活率的机器学习的多式联运预测模型:系统审查 2507.16876v1 |
Authors (6): Charlotte Jennings, Andrew Broad, Lucy Godson, Emily Clarke, David Westhead, Darren Treanor
Multimodal machine learning integrating histopathology and molecular data shows promise for cancer prognostication. We systematically reviewed studies combining whole slide images (WSIs) and high-throughput omics to predict overall survival. Searches of EMBASE, PubMed, and Cochrane CENTRAL (12/08/2024), plus citation screening, identified eligible studies. Data extraction used CHARMS; bias was assessed with PROBAST+AI; synthesis followed SWiM and PRISMA 2020. Protocol: PROSPERO (CRD42024594745). Forty-eight studies (all since 2017) across 19 cancer types met criteria; all used The Cancer Genome Atlas. Approaches included regularised Cox regression (n=4), classical ML (n=13), and deep learning (n=31). Reported c-indices ranged 0.550-0.857; multimodal models typically outperformed unimodal ones. However, all studies showed unclear/high bias, limited external validation, and little focus on clinical utility. Multimodal WSI-omics survival prediction is a fast-growing field with promising results but needs improved methodological rigor, broader datasets, and clinical evaluation. Funded by NPIC, Leeds Teaching Hospitals NHS Trust, UK (Project 104687), supported by UKRI Industrial Strategy Challenge Fund.
nan
Article 453
Title@2025-07-22 (2): The Sweet Danger of Sugar: Debunking Representation Learning for Encrypted Traffic Classification
Title: The Sweet Danger of Sugar: Debunking Representation Learning for Encrypted Traffic Classification | Sweet Danger of Sugar: Debunking Representative Learning für verschlüsselte Verkehrsklassifikation | 糖的甜甜危险:加密交通分类的取消代表学习 2507.16438v1 |
Authors (5): Yuqi Zhao, Giovanni Dettori, Matteo Boffa, Luca Vassio, Marco Mellia
Recently we have witnessed the explosion of proposals that, inspired by Language Models like BERT, exploit Representation Learning models to create traffic representations. All of them promise astonishing performance in encrypted traffic classification (up to 98% accuracy). In this paper, with a networking expert mindset, we critically reassess their performance. Through extensive analysis, we demonstrate that the reported successes are heavily influenced by data preparation problems, which allow these models to find easy shortcuts - spurious correlation between features and labels - during fine-tuning that unrealistically boost their performance. When such shortcuts are not present - as in real scenarios - these models perform poorly. We also introduce Pcap-Encoder, an LM-based representation learning model that we specifically design to extract features from protocol headers. Pcap-Encoder appears to be the only model that provides an instrumental representation for traffic classification. Yet, its complexity questions its applicability in practical settings. Our findings reveal flaws in dataset preparation and model training, calling for a better and more conscious test design. We propose a correct evaluation methodology and stress the need for rigorous benchmarking.
nan
Article 454
Title@2025-07-22 (2): Hierarchical Safety Realignment: Lightweight Restoration of Safety in Pruned Large Vision-Language Models
Title: Hierarchical Safety Realignment: Lightweight Restoration of Safety in Pruned Large Vision-Language Models | Hierarchische Sicherheits-Neuausrichtung: Leichte Wiederherstellung der Sicherheit in beschnittenen großen Vision-Sprachen-Modellen | 等级安全调整:谨慎大型视觉语言模型中轻度安全恢复 2505.16104v2 |
Authors (6): Yue Li, Xin Yi, Dongsheng Shi, Gerard de Melo, Xiaoling Wang, Linlin Wang
With the increasing size of Large Vision-Language Models (LVLMs), network pruning techniques aimed at compressing models for deployment in resource-constrained environments have garnered significant attention. However, we observe that pruning often leads to a degradation in safety performance. To address this issue, we present a novel and lightweight approach, termed Hierarchical Safety Realignment (HSR). HSR operates by first quantifying the contribution of each attention head to safety, identifying the most critical ones, and then selectively restoring neurons directly within these attention heads that play a pivotal role in maintaining safety. This process hierarchically realigns the safety of pruned LVLMs, progressing from the attention head level to the neuron level. We validate HSR across various models and pruning strategies, consistently achieving notable improvements in safety performance. To our knowledge, this is the first work explicitly focused on restoring safety in LVLMs post-pruning.
nan
Article 455
Title@2025-07-22 (2): From model-based learning to model-free behaviour with Meta-Interpretive Learning
Title: From model-based learning to model-free behaviour with Meta-Interpretive Learning | Vom modellbasierten Lernen zum modellfreien Verhalten mit Meta-Interpretive Learning | 从基于模式的学习到无模式的行为,与 “ 元解释性学习 “ 合作 2507.16434v1 |
Authors (1): Stassa Patsantzis
A “model” is a theory that describes the state of an environment and the effects of an agent’s decisions on the environment. A model-based agent can use its model to predict the effects of its future actions and so plan ahead, but must know the state of the environment. A model-free agent cannot plan, but can act without a model and without completely observing the environment. An autonomous agent capable of acting independently in novel environments must combine both sets of capabilities. We show how to create such an agent with Meta-Interpretive Learning used to learn a model-based Solver used to train a model-free Controller that can solve the same planning problems as the Solver. We demonstrate the equivalence in problem-solving ability of the two agents on grid navigation problems in two kinds of environment: randomly generated mazes, and lake maps with wide open areas. We find that all navigation problems solved by the Solver are also solved by the Controller, indicating the two are equivalent.
nan
Article 456
Title@2025-07-22 (2): Adaptive Multi-task Learning for Multi-sector Portfolio Optimization
Title: Adaptive Multi-task Learning for Multi-sector Portfolio Optimization | Adaptives Multi-Task-Lernen für Multi-Sector Portfolio-Optimierung | 促进多部门组合优化的适应性多任务学习 2507.16433v1 |
Authors (3): Qingliang Fan, Ruike Wu, Yanrong Yang
Accurate transfer of information across multiple sectors to enhance model estimation is both significant and challenging in multi-sector portfolio optimization involving a large number of assets in different classes. Within the framework of factor modeling, we propose a novel data-adaptive multi-task learning methodology that quantifies and learns the relatedness among the principal temporal subspaces (spanned by factors) across multiple sectors under study. This approach not only improves the simultaneous estimation of multiple factor models but also enhances multi-sector portfolio optimization, which heavily depends on the accurate recovery of these factor models. Additionally, a novel and easy-to-implement algorithm, termed projection-penalized principal component analysis, is developed to accomplish the multi-task learning procedure. Diverse simulation designs and practical application on daily return data from Russell 3000 index demonstrate the advantages of multi-task learning methodology.
nan
Article 457
Title@2025-07-22 (2): An effective physics-informed neural operator framework for predicting wavefields
Title: An effective physics-informed neural operator framework for predicting wavefields | Ein effektives physik-informiertes neuronales Bediener-Framework zur Vorhersage von Wellenfeldern | 有效的物理知情神经操作器框架,用于预测波地 2507.16431v1 |
Authors (2): Xiao Ma, Tariq Alkhalifah
Solving the wave equation is fundamental for geophysical applications. However, numerical solutions of the Helmholtz equation face significant computational and memory challenges. Therefore, we introduce a physics-informed convolutional neural operator (PICNO) to solve the Helmholtz equation efficiently. The PICNO takes both the background wavefield corresponding to a homogeneous medium and the velocity model as input function space, generating the scattered wavefield as the output function space. Our workflow integrates PDE constraints directly into the training process, enabling the neural operator to not only fit the available data but also capture the underlying physics governing wave phenomena. PICNO allows for high-resolution reasonably accurate predictions even with limited training samples, and it demonstrates significant improvements over a purely data-driven convolutional neural operator (CNO), particularly in predicting high-frequency wavefields. These features and improvements are important for waveform inversion down the road.
nan
Article 458
Title@2025-07-22 (2): Combined Image Data Augmentations diminish the benefits of Adaptive Label Smoothing
Title: Combined Image Data Augmentations diminish the benefits of Adaptive Label Smoothing | Kombinierte Bilddatenvergrößerungen mindern die Vorteile der adaptiven Labelglättung | 合并图像数据放大减少了调适标签平滑的好处 2507.16427v1 |
Authors (5): Georg Siedel, Ekagra Gupta, Weijia Shao, Silvia Vock, Andrey Morozov
Soft augmentation regularizes the supervised learning process of image classifiers by reducing label confidence of a training sample based on the magnitude of random-crop augmentation applied to it. This paper extends this adaptive label smoothing framework to other types of aggressive augmentations beyond random-crop. Specifically, we demonstrate the effectiveness of the method for random erasing and noise injection data augmentation. Adaptive label smoothing permits stronger regularization via higher-intensity Random Erasing. However, its benefits vanish when applied with a diverse range of image transformations as in the state-of-the-art TrivialAugment method, and excessive label smoothing harms robustness to common corruptions. Our findings suggest that adaptive label smoothing should only be applied when the training data distribution is dominated by a limited, homogeneous set of image transformation types.
nan
Article 459
Title@2025-07-22 (2): Practical Insights into Knowledge Distillation for Pre-Trained Models
Title: Practical Insights into Knowledge Distillation for Pre-Trained Models | Praktische Einblicke in die Wissensdestillation für vortrainierte Modelle | 预培训模式知识提炼的实用透视 2402.14922v2 |
Authors (3): Norah Alballa, Ahmed M. Abdelmoniem, Marco Canini
This research investigates the enhancement of knowledge distillation (KD) processes in pre-trained models, an emerging field in knowledge transfer with significant implications for distributed training and federated learning environments. These environments benefit from reduced communication demands and accommodate various model architectures. Despite the adoption of numerous KD approaches for transferring knowledge among pre-trained models, a comprehensive understanding of KD’s application in these scenarios is lacking. Our study conducts an extensive comparison of multiple KD techniques, including standard KD, tuned KD (via optimized temperature and weight parameters), deep mutual learning, and data partitioning KD. We assess these methods across various data distribution strategies to identify the most effective contexts for each. Through detailed examination of hyperparameter tuning, informed by extensive grid search evaluations, we pinpoint when adjustments are crucial to enhance model performance. This paper sheds light on optimal hyperparameter settings for distinct data partitioning scenarios and investigates KD’s role in improving federated learning by minimizing communication rounds and expediting the training process. By filling a notable void in current research, our findings serve as a practical framework for leveraging KD in pre-trained models within collaborative and federated learning frameworks.
nan
Article 460
Title@2025-07-22 (2): PromptAL: Sample-Aware Dynamic Soft Prompts for Few-Shot Active Learning
Title: PromptAL: Sample-Aware Dynamic Soft Prompts for Few-Shot Active Learning | PromptAL: Sample-Aware Dynamische Soft-Prompts für wenig heißes aktives Lernen | 提示: 用于少点热积极学习的样本- 软件动态软提示 2507.16424v1 |
Authors (6): Hui Xiang, Jinqiao Shi, Ting Zhang, Xiaojie Zhao, Yong Liu, Yong Ma
Active learning (AL) aims to optimize model training and reduce annotation costs by selecting the most informative samples for labeling. Typically, AL methods rely on the empirical distribution of labeled data to define the decision boundary and perform uncertainty or diversity estimation, subsequently identifying potential high-quality samples. In few-shot scenarios, the empirical distribution often diverges significantly from the target distribution, causing the decision boundary to shift away from its optimal position. However, existing methods overlook the role of unlabeled samples in enhancing the empirical distribution to better align with the target distribution, resulting in a suboptimal decision boundary and the selection of samples that inadequately represent the target distribution. To address this, we propose a hybrid AL framework, termed \textbf{PromptAL} (Sample-Aware Dynamic Soft \textbf{Prompts} for Few-Shot \textbf{A}ctive \textbf{L}earning). This framework accounts for the contribution of each unlabeled data point in aligning the current empirical distribution with the target distribution, thereby optimizing the decision boundary. Specifically, PromptAL first leverages unlabeled data to construct sample-aware dynamic soft prompts that adjust the model’s predictive distribution and decision boundary. Subsequently, based on the adjusted decision boundary, it integrates uncertainty estimation with both global and local diversity to select high-quality samples that more accurately represent the target distribution. Experimental results on six in-domain and three out-of-domain datasets show that PromptAL achieves superior performance over nine baselines. Our codebase is openly accessible.
nan
Article 461
Title@2025-07-22 (2): Improving Predictions on Highly Unbalanced Data Using Open Source Synthetic Data Upsampling
Title: Improving Predictions on Highly Unbalanced Data Using Open Source Synthetic Data Upsampling | Verbesserung der Vorhersagen auf sehr unausgewogenen Daten mit Hilfe von Open Source Synthetic Data Upsampling | 利用开放源码合成数据抽样改进对高度不平衡数据的预测 2507.16419v1 |
Authors (3): Ivona Krchova, Michael Platzer, Paul Tiwald
Unbalanced tabular data sets present significant challenges for predictive modeling and data analysis across a wide range of applications. In many real-world scenarios, such as fraud detection, medical diagnosis, and rare event prediction, minority classes are vastly underrepresented, making it difficult for traditional machine learning algorithms to achieve high accuracy. These algorithms tend to favor the majority class, leading to biased models that struggle to accurately represent minority classes. Synthetic data holds promise for addressing the under-representation of minority classes by providing new, diverse, and highly realistic samples. This paper presents a benchmark study on the use of AI-generated synthetic data for upsampling highly unbalanced tabular data sets. We evaluate the effectiveness of an open-source solution, the Synthetic Data SDK by MOSTLY AI, which provides a flexible and user-friendly approach to synthetic upsampling for mixed-type data. We compare predictive models trained on data sets upsampled with synthetic records to those using standard methods, such as naive oversampling and SMOTE-NC. Our results demonstrate that synthetic data can improve predictive accuracy for minority groups by generating diverse data points that fill gaps in sparse regions of the feature space. We show that upsampled synthetic training data consistently results in top-performing predictive models, particularly for mixed-type data sets containing very few minority samples.
nan
Article 462
Title@2025-07-22 (2): GG-BBQ: German Gender Bias Benchmark for Question Answering
Title: GG-BBQ: German Gender Bias Benchmark for Question Answering | GG-BBQ: Deutscher Gender-Bias-Benchmark für Fragenbeantwortung | GGG-BBQ:德国回答问题性别比基准 2507.16410v1 |
Authors (6): Shalaka Satheesh, Katrin Klug, Katharina Beckh, Héctor Allende-Cid, Sebastian Houben, Teena Hassan
Within the context of Natural Language Processing (NLP), fairness evaluation is often associated with the assessment of bias and reduction of associated harm. In this regard, the evaluation is usually carried out by using a benchmark dataset, for a task such as Question Answering, created for the measurement of bias in the model’s predictions along various dimensions, including gender identity. In our work, we evaluate gender bias in German Large Language Models (LLMs) using the Bias Benchmark for Question Answering by Parrish et al. (2022) as a reference. Specifically, the templates in the gender identity subset of this English dataset were machine translated into German. The errors in the machine translated templates were then manually reviewed and corrected with the help of a language expert. We find that manual revision of the translation is crucial when creating datasets for gender bias evaluation because of the limitations of machine translation from English to a language such as German with grammatical gender. Our final dataset is comprised of two subsets: Subset-I, which consists of group terms related to gender identity, and Subset-II, where group terms are replaced with proper names. We evaluate several LLMs used for German NLP on this newly created dataset and report the accuracy and bias scores. The results show that all models exhibit bias, both along and against existing social stereotypes.
nan
Article 463
Title@2025-07-22 (2): MolPIF: A Parameter Interpolation Flow Model for Molecule Generation
Title: MolPIF: A Parameter Interpolation Flow Model for Molecule Generation | MolPIF: Ein Parameter Interpolationsflussmodell für die Molekülerzeugung | MoLPIF: 分子一代的参数内插流动模型 2507.13762v2 |
Authors (13): Yaowei Jin, Junjie Wang, Wenkai Xiang, Duanhua Cao, Dan Teng, Zhehuan Fan, Jiacheng Xiong, Xia Sheng, Chuanlong Zeng, Duo An, Mingyue Zheng, Shuangjia Zheng, Qian Shi
Advances in deep learning for molecular generation show promise in accelerating drug discovery. Bayesian Flow Networks (BFNs) have recently shown impressive performance across diverse chemical tasks, with their success often ascribed to the paradigm of modeling in a low-variance parameter space. However, the Bayesian inference-based strategy imposes limitations on designing more flexible distribution transformation pathways, making it challenging to adapt to diverse data distributions and varied task requirements. Furthermore, the potential for simpler, more efficient parameter-space-based models is unexplored. To address this, we propose a novel Parameter Interpolation Flow model (named PIF) with detailed theoretical foundation, training, and inference procedures. We then develop MolPIF for structure-based drug design, demonstrating its superior performance across diverse metrics compared to baselines. This work validates the effectiveness of parameter-space-based generative modeling paradigm for molecules and offers new perspectives for model design.
nan
Article 464
Title@2025-07-22 (2): Self-Supervised Inductive Logic Programming
Title: Self-Supervised Inductive Logic Programming | Selbstüberwachte induktive Logik-Programmierung | 自上自上自上引逻辑规划 2507.16405v1 |
Authors (1): Stassa Patsantzis
Inductive Logic Programming (ILP) approaches like Meta -/ Interpretive Learning (MIL) can learn, from few examples, recursive logic programs with invented predicates that generalise well to unseen instances. This ability relies on a background theory and negative examples, both carefully selected with expert knowledge of a learning problem and its solutions. But what if such a problem-specific background theory or negative examples are not available? We formalise this question as a new setting for Self-Supervised ILP and present a new MIL algorithm that learns in the new setting from some positive labelled, and zero or more unlabelled examples, and automatically generates, and labels, new positive and negative examples during learning. We implement this algorithm in Prolog in a new MIL system, called Poker. We compare Poker to state-of-the-art MIL system Louise on experiments learning grammars for Context-Free and L-System languages from labelled, positive example strings, no negative examples, and just the terminal vocabulary of a language, seen in examples, as a first-order background theory. We introduce a new approach for the principled selection of a second-order background theory as a Second Order Definite Normal Form (SONF), sufficiently general to learn all programs in a class, thus removing the need for a backgound theory tailored to a learning task. We find that Poker’s performance improves with increasing numbers of automatically generated examples while Louise, bereft of negative examples, over-generalises.
nan
Article 465
Title@2025-07-22 (2): Balancing Robustness and Efficiency in Embedded DNNs Through Activation Function Selection
Title: Balancing Robustness and Efficiency in Embedded DNNs Through Activation Function Selection | Ausbalancierung von Robustheit und Effizienz in eingebetteten DNNs durch Aktivierungsfunktionsauswahl | 通过启动职能选择,在嵌入的DNN 中平衡稳健和效率 2504.05119v2 |
Authors (3): Jon Gutiérrez-Zaballa, Koldo Basterretxea, Javier Echanobe
Machine learning-based embedded systems for safety-critical applications, such as aerospace and autonomous driving, must be robust to perturbations caused by soft errors. As transistor geometries shrink and voltages decrease, modern electronic devices become more susceptible to background radiation, increasing the concern about failures produced by soft errors. The resilience of deep neural networks (DNNs) to these errors depends not only on target device technology but also on model structure and the numerical representation and arithmetic precision of their parameters. Compression techniques like pruning and quantization, used to reduce memory footprint and computational complexity, alter both model structure and representation, affecting soft error robustness. In this regard, although often overlooked, the choice of activation functions (AFs) impacts not only accuracy and trainability but also compressibility and error resilience. This paper explores the use of bounded AFs to enhance robustness against parameter perturbations, while evaluating their effects on model accuracy, compressibility, and computational load with a technology-agnostic approach. We focus on encoder-decoder convolutional models developed for semantic segmentation of hyperspectral images with application to autonomous driving systems. Experiments are conducted on an AMD-Xilinx’s KV260 SoM.
nan
Article 466
Title@2025-07-22 (2): Technical report: Impact of Duration Prediction on Speaker-specific TTS for Indian Languages
Title: Technical report: Impact of Duration Prediction on Speaker-specific TTS for Indian Languages | Technischer Bericht: Auswirkungen der Dauervorhersage auf Speakerspezifische TTS für indische Sprachen | 技术报告:期限预测对印度语特定演讲人TTS的影响 2507.16875v1 |
Authors (4): Isha Pandey, Pranav Gaikwad, Amruta Parulekar, Ganesh Ramakrishnan
High-quality speech generation for low-resource languages, such as many Indian languages, remains a significant challenge due to limited data and diverse linguistic structures. Duration prediction is a critical component in many speech generation pipelines, playing a key role in modeling prosody and speech rhythm. While some recent generative approaches choose to omit explicit duration modeling, often at the cost of longer training times. We retain and explore this module to better understand its impact in the linguistically rich and data-scarce landscape of India. We train a non-autoregressive Continuous Normalizing Flow (CNF) based speech model using publicly available Indian language data and evaluate multiple duration prediction strategies for zero-shot, speaker-specific generation. Our comparative analysis on speech-infilling tasks reveals nuanced trade-offs: infilling based predictors improve intelligibility in some languages, while speaker-prompted predictors better preserve speaker characteristics in others. These findings inform the design and selection of duration strategies tailored to specific languages and tasks, underscoring the continued value of interpretable components like duration prediction in adapting advanced generative architectures to low-resource, multilingual settings.
nan
Article 467
Title@2025-07-22 (2): Optimization and generalization analysis for two-layer physics-informed neural networks without over-parametrization
Title: Optimization and generalization analysis for two-layer physics-informed neural networks without over-parametrization | Optimierungs- und Generalisierungsanalyse für zweischichtige physik-informierte neuronale Netzwerke ohne Überparametrierung | 为两层物理学知情神经网络提供优化和概括化分析,不过分对称 2507.16380v1 |
Authors (2): Zhihan Zeng, Yiqi Gu
This work focuses on the behavior of stochastic gradient descent (SGD) in solving least-squares regression with physics-informed neural networks (PINNs). Past work on this topic has been based on the over-parameterization regime, whose convergence may require the network width to increase vastly with the number of training samples. So, the theory derived from over-parameterization may incur prohibitive computational costs and is far from practical experiments. We perform new optimization and generalization analysis for SGD in training two-layer PINNs, making certain assumptions about the target function to avoid over-parameterization. Given $\epsilon>0$, we show that if the network width exceeds a threshold that depends only on $\epsilon$ and the problem, then the training loss and expected loss will decrease below $O(\epsilon)$.
nan
Article 468
Title@2025-07-22 (2): Meta-learning of Gibbs states for many-body Hamiltonians with applications to Quantum Boltzmann Machines
Title: Meta-learning of Gibbs states for many-body Hamiltonians with applications to Quantum Boltzmann Machines | Meta-Lernen von Gibbs-Staaten für viele-Körper Hamiltonians mit Anwendungen für Quantum Boltzmann Maschinen | 利用Gibbbs各邦的Met 学习,让许多身体机体的汉密尔顿人学习,并使用量子波尔兹曼机器 2507.16373v1 |
Authors (4): Ruchira V Bhat, Rahul Bhowmick, Avinash Singh, Krishna Kumar Sabapathy
The preparation of quantum Gibbs states is a fundamental challenge in quantum computing, essential for applications ranging from modeling open quantum systems to quantum machine learning. Building on the Meta-Variational Quantum Eigensolver framework proposed by Cervera-Lierta et al.(2021) and a problem driven ansatz design, we introduce two meta-learning algorithms: Meta-Variational Quantum Thermalizer (Meta-VQT) and Neural Network Meta-VQT (NN-Meta VQT) for efficient thermal state preparation of parametrized Hamiltonians on Noisy Intermediate-Scale Quantum (NISQ) devices. Meta-VQT utilizes a fully quantum ansatz, while NN Meta-VQT integrates a quantum classical hybrid architecture. Both leverage collective optimization over training sets to generalize Gibbs state preparation to unseen parameters. We validate our methods on upto 8-qubit Transverse Field Ising Model and the 2-qubit Heisenberg model with all field terms, demonstrating efficient thermal state generation beyond training data. For larger systems, we show that our meta-learned parameters when combined with appropriately designed ansatz serve as warm start initializations, significantly outperforming random initializations in the optimization tasks. Furthermore, a 3- qubit Kitaev ring example showcases our algorithm’s effectiveness across finite-temperature crossover regimes. Finally, we apply our algorithms to train a Quantum Boltzmann Machine (QBM) on a 2-qubit Heisenberg model with all field terms, achieving enhanced training efficiency, improved Gibbs state accuracy, and a 30-fold runtime speedup over existing techniques such as variational quantum imaginary time (VarQITE)-based QBM highlighting the scalability and practicality of meta-algorithm-based QBMs.
nan
Article 469
Title@2025-07-22 (2): Autonomous Data Selection with Zero-shot Generative Classifiers for Mathematical Texts
Title: Autonomous Data Selection with Zero-shot Generative Classifiers for Mathematical Texts | Autonome Datenauswahl mit Zero-shot Generative Klassifikatoren für mathematische Texte | 具有数学文本零光生成分类器的自动数据选择 2402.07625v7 |
Authors (4): Yifan Zhang, Yifan Luo, Yang Yuan, Andrew C Yao
We present Autonomous Data Selection (AutoDS), a method that leverages base language models themselves as zero-shot “generative classifiers” to automatically curate high-quality mathematical texts. Unlike prior approaches that require human annotations or training a dedicated data filter, AutoDS relies solely on a model’s logits to determine whether a given passage is mathematically informative and educational. By integrating AutoDS into a continual pretraining pipeline, we substantially boost downstream performance on challenging math benchmarks (MATH, GSM8K, and BBH) while using far fewer tokens than previous methods. Empirically, our approach achieves roughly a twofold improvement in pretraining token efficiency over strong baselines, underscoring the potential of self-directed data selection in enhancing mathematical reasoning. We release our curated AutoMathText dataset to facilitate future research in automated domain-specific data curation. The AutoMathText dataset is available at https://huggingface.co/datasets/math-ai/AutoMathText. The code is available at https://github.com/yifanzhang-pro/AutoMathText.
nan
Article 470
Title@2025-07-22 (2): Physical models realizing the transformer architecture of large language models
Title: Physical models realizing the transformer architecture of large language models | Physikalische Modelle, die die Transformatorenarchitektur großer Sprachmodelle realisieren | 实现大型语言模型变压器结构的物理模型 2507.13354v2 |
Authors (1): Zeqian Chen
The introduction of the transformer architecture in 2017 marked the most striking advancement in natural language processing. The transformer is a model architecture relying entirely on an attention mechanism to draw global dependencies between input and output. However, we believe there is a gap in our theoretical understanding of what the transformer is, and how it works physically. From a physical perspective on modern chips, such as those chips under 28nm, modern intelligent machines should be regarded as open quantum systems beyond conventional statistical systems. Thereby, in this paper, we construct physical models realizing large language models based on a transformer architecture as open quantum systems in the Fock space over the Hilbert space of tokens. Our physical models underlie the transformer architecture for large language models.
nan
Article 471
Title@2025-07-22 (2): Bipartite Patient-Modality Graph Learning with Event-Conditional Modelling of Censoring for Cancer Survival Prediction
Title: Bipartite Patient-Modality Graph Learning with Event-Conditional Modelling of Censoring for Cancer Survival Prediction | Bipartite Patienten-Modalität Graphenlernen mit Ereignis-Bedingte Modellierung der Zensur für Krebs-Überlebensvorhersage | 两边患者-模式图表学习,以及癌症生存预测审查的有条件事件模型 2507.16363v1 |
Authors (7): Hailin Yue, Hulin Kuang, Jin Liu, Junjian Li, Lanlan Wang, Mengshen He, Jianxin Wang
Accurately predicting the survival of cancer patients is crucial for personalized treatment. However, existing studies focus solely on the relationships between samples with known survival risks, without fully leveraging the value of censored samples. Furthermore, these studies may suffer performance degradation in modality-missing scenarios and even struggle during the inference process. In this study, we propose a bipartite patient-modality graph learning with event-conditional modelling of censoring for cancer survival prediction (CenSurv). Specifically, we first use graph structure to model multimodal data and obtain representation. Then, to alleviate performance degradation in modality-missing scenarios, we design a bipartite graph to simulate the patient-modality relationship in various modality-missing scenarios and leverage a complete-incomplete alignment strategy to explore modality-agnostic features. Finally, we design a plug-and-play event-conditional modeling of censoring (ECMC) that selects reliable censored data using dynamic momentum accumulation confidences, assigns more accurate survival times to these censored data, and incorporates them as uncensored data into training. Comprehensive evaluations on 5 publicly cancer datasets showcase the superiority of CenSurv over the best state-of-the-art by 3.1% in terms of the mean C-index, while also exhibiting excellent robustness under various modality-missing scenarios. In addition, using the plug-and-play ECMC module, the mean C-index of 8 baselines increased by 1.3% across 5 datasets. Code of CenSurv is available at https://github.com/yuehailin/CenSurv.
nan
Article 472
Title@2025-07-22 (2): Pre-Training LLMs on a budget: A comparison of three optimizers
Title: Pre-Training LLMs on a budget: A comparison of three optimizers | Pre-Training LLMs auf einem Budget: Ein Vergleich von drei Optimierern | 预算培训前LLMLM项目:三个优化器的比较 2507.08472v2 |
Authors (6): Joel Schlotthauer, Christian Kroos, Chris Hinze, Viktor Hangya, Luzian Hahn, Fabian Küch
Optimizers play a decisive role in reducing pre-training times for LLMs and achieving better-performing models. In this study, we compare three major variants: the de-facto standard AdamW, the simpler Lion, developed through an evolutionary search, and the second-order optimizer Sophia. For better generalization, we train with two different base architectures and use a single- and a multiple-epoch approach while keeping the number of tokens constant. Using the Maximal Update Parametrization and smaller proxy models, we tune relevant hyperparameters separately for each combination of base architecture and optimizer. We found that while the results from all three optimizers were in approximately the same range, Sophia exhibited the lowest training and validation loss, Lion was fastest in terms of training GPU hours but AdamW led to the best downstream evaluation results.
nan
Article 473
Title@2025-07-22 (2): Tri-Learn Graph Fusion Network for Attributed Graph Clustering
Title: Tri-Learn Graph Fusion Network for Attributed Graph Clustering | Tri-Learn Graph Fusion Network für zugeschriebene Graph Clustering | Tri- Learn 属性图集集成的三光图融合网络 2507.13620v2 |
Authors (6): Binxiong Li, Xu Xiang, Xue Li, Binyu Zhao, Heyang Gao, Qinyu Zhao
In recent years, models based on Graph Convolutional Networks (GCN) have made significant strides in the field of graph data analysis. However, challenges such as over-smoothing and over-compression remain when handling large-scale and complex graph datasets, leading to a decline in clustering quality. Although the Graph Transformer architecture has mitigated some of these issues, its performance is still limited when processing heterogeneous graph data. To address these challenges, this study proposes a novel deep clustering framework that comprising GCN, Autoencoder (AE), and Graph Transformer, termed the Tri-Learn Graph Fusion Network (Tri-GFN). This framework enhances the differentiation and consistency of global and local information through a unique tri-learning mechanism and feature fusion enhancement strategy. The framework integrates GCN, AE, and Graph Transformer modules. These components are meticulously fused by a triple-channel enhancement module, which maximizes the use of both node attributes and topological structures, ensuring robust clustering representation. The tri-learning mechanism allows mutual learning among these modules, while the feature fusion strategy enables the model to capture complex relationships, yielding highly discriminative representations for graph clustering. It surpasses many state-of-the-art methods, achieving an accuracy improvement of approximately 0.87% on the ACM dataset, 14.14 % on the Reuters dataset, and 7.58 % on the USPS dataset. Due to its outstanding performance on the Reuters dataset, Tri-GFN can be applied to automatic news classification, topic retrieval, and related fields.
nan
Article 474
Title@2025-07-22 (2): Streamlining Prediction in Bayesian Deep Learning
Title: Streamlining Prediction in Bayesian Deep Learning | Straffung der Vorhersagen in Bayesian Deep Learning | 精简贝耶斯深层学习的预测 2411.18425v4 |
Authors (4): Rui Li, Marcus Klasson, Arno Solin, Martin Trapp
The rising interest in Bayesian deep learning (BDL) has led to a plethora of methods for estimating the posterior distribution. However, efficient computation of inferences, such as predictions, has been largely overlooked with Monte Carlo integration remaining the standard. In this work we examine streamlining prediction in BDL through a single forward pass without sampling. For this we use local linearisation on activation functions and local Gaussian approximations at linear layers. Thus allowing us to analytically compute an approximation to the posterior predictive distribution. We showcase our approach for both MLP and transformers, such as ViT and GPT-2, and assess its performance on regression and classification tasks. Open-source library: https://github.com/AaltoML/SUQ
nan
Article 475
Title@2025-07-22 (2): Multimodal Coordinated Online Behavior: Trade-offs and Strategies
Title: Multimodal Coordinated Online Behavior: Trade-offs and Strategies | Multimodal koordiniertes Online-Verhalten: Kompromisse und Strategien | 多式联运协调在线行为:取舍和战略 2507.12108v2 |
Authors (5): Lorenzo Mannocci, Stefano Cresci, Matteo Magnani, Anna Monreale, Maurizio Tesconi
Coordinated online behavior, which spans from beneficial collective actions to harmful manipulation such as disinformation campaigns, has become a key focus in digital ecosystem analysis. Traditional methods often rely on monomodal approaches, focusing on single types of interactions like co-retweets or co-hashtags, or consider multiple modalities independently of each other. However, these approaches may overlook the complex dynamics inherent in multimodal coordination. This study compares different ways of operationalizing the detection of multimodal coordinated behavior. It examines the trade-off between weakly and strongly integrated multimodal models, highlighting the balance between capturing broader coordination patterns and identifying tightly coordinated behavior. By comparing monomodal and multimodal approaches, we assess the unique contributions of different data modalities and explore how varying implementations of multimodality impact detection outcomes. Our findings reveal that not all the modalities provide distinct insights, but that with a multimodal approach we can get a more comprehensive understanding of coordination dynamics. This work enhances the ability to detect and analyze coordinated online behavior, offering new perspectives for safeguarding the integrity of digital platforms.
nan
Article 476
Title@2025-07-22 (2): InfiniteHBD: Building Datacenter-Scale High-Bandwidth Domain for LLM with Optical Circuit Switching Transceivers
Title: InfiniteHBD: Building Datacenter-Scale High-Bandwidth Domain for LLM with Optical Circuit Switching Transceivers | InfiniteHBD: Building Datacenter-Scale High-Bandwidth Domain für LLM mit optischen Schaltungsschalter Transceivern | 无限HBD:利用光电转换收发器为LLM 建立数据中心 – – 高度宽宽度高域域 2502.03885v4 |
Authors (14): Chenchen Shou, Guyue Liu, Hao Nie, Huaiyu Meng, Yu Zhou, Yimin Jiang, Wenqing Lv, Yelong Xu, Yuanwei Lu, Zhang Chen, Yanbo Yu, Yichen Shen, Yibo Zhu, Daxin Jiang
Scaling Large Language Model (LLM) training relies on multi-dimensional parallelism, where High-Bandwidth Domains (HBDs) are critical for communication-intensive parallelism like Tensor Parallelism (TP) and Expert Parallelism (EP). However, existing HBD architectures face fundamental limitations in scalability, cost, and fault resiliency: switch-centric HBDs (e.g., NVL-72) incur prohibitive scaling costs, while GPU-centric HBDs (e.g., TPUv3/Dojo) suffer from severe fault propagation. Switch-GPU hybrid HBDs such as TPUv4 take a middle-ground approach, but the fault explosion radius remains large at the cube level (e.g., 64 TPUs). We propose InfiniteHBD, a novel transceiver-centric HBD architecture that unifies connectivity and dynamic switching at the transceiver level} using Optical Circuit Switching (OCS). By embedding OCS within each transceiver, InfiniteHBD achieves reconfigurable point-to-multipoint connectivity, allowing the topology to adapt to variable-size rings. This design provides: i) datacenter-wide scalability without cost explosion; ii) fault resilience by isolating failures to a single node, and iii) full bandwidth utilization for fault-free GPUs. Key innovations include a Silicon Photonic (SiPh)-based low-cost OCS transceiver (OCSTrx), a reconfigurable k-hop ring topology co-designed with intra-/inter-node communication, and an HBD-DCN orchestration algorithm maximizing GPU utilization while minimizing cross-ToR datacenter network traffic. The evaluation demonstrates that InfiniteHBD achieves 31% of the cost of NVL-72, near-zero GPU waste ratio (over one order of magnitude lower than NVL-72 and TPUv4), near-zero cross-ToR traffic when node fault ratios are under 7%, and improves Model FLOPs Utilization by 3.37x compared to NVIDIA DGX (8 GPUs per Node).
nan
Article 477
Title@2025-07-22 (2): Leveraging Personalized PageRank and Higher-Order Topological Structures for Heterophily Mitigation in Graph Neural Networks
Title: Leveraging Personalized PageRank and Higher-Order Topological Structures for Heterophily Mitigation in Graph Neural Networks | Leveraging Personalisiertes PageRank und höher geordnete Topologische Strukturen zur heterophilen Milderung in Graph Neural Networks | 在图形神经网络中利用个性化平板和高端地形结构进行热缓解 2507.16347v1 |
Authors (5): Yumeng Wang, Zengyi Wo, Wenjun Wang, Xingcheng Fu, Minglai Shao
Graph Neural Networks (GNNs) excel in node classification tasks but often assume homophily, where connected nodes share similar labels. This assumption does not hold in many real-world heterophilic graphs. Existing models for heterophilic graphs primarily rely on pairwise relationships, overlooking multi-scale information from higher-order structures. This leads to suboptimal performance, particularly under noise from conflicting class information across nodes. To address these challenges, we propose HPGNN, a novel model integrating Higher-order Personalized PageRank with Graph Neural Networks. HPGNN introduces an efficient high-order approximation of Personalized PageRank (PPR) to capture long-range and multi-scale node interactions. This approach reduces computational complexity and mitigates noise from surrounding information. By embedding higher-order structural information into convolutional networks, HPGNN effectively models key interactions across diverse graph dimensions. Extensive experiments on benchmark datasets demonstrate HPGNN’s effectiveness. The model achieves better performance than five out of seven state-of-the-art methods on heterophilic graphs in downstream tasks while maintaining competitive performance on homophilic graphs. HPGNN’s ability to balance multi-scale information and robustness to noise makes it a versatile solution for real-world graph learning challenges. Codes are available at https://github.com/streetcorner/HPGNN.
nan
Article 478
Title@2025-07-22 (2): The Cost of Compression: Tight Quadratic Black-Box Attacks on Sketches for $\ell_2$ Norm Estimation
Title: The Cost of Compression: Tight Quadratic Black-Box Attacks on Sketches for $\ell_2$ Norm Estimation | Die Kosten der Kompression: Enge quadratische Black-Box-Angriffe auf Skizzen für $\ell_2$ Normschätzung | 压缩成本: 以 $@ ell_ 2 $ Norm 估计对制片人进行密切的 Quadristic Black-Box 攻击 2507.16345v1 |
Authors (3): Sara Ahmadian, Edith Cohen, Uri Stemmer
Dimensionality reduction via linear sketching is a powerful and widely used technique, but it is known to be vulnerable to adversarial inputs. We study the black-box adversarial setting, where a fixed, hidden sketching matrix A in $R^{k X n}$ maps high-dimensional vectors v $\in R^n$ to lower-dimensional sketches A v in $R^k$, and an adversary can query the system to obtain approximate ell2-norm estimates that are computed from the sketch. We present a universal, nonadaptive attack that, using tilde(O)($k^2$) queries, either causes a failure in norm estimation or constructs an adversarial input on which the optimal estimator for the query distribution (used by the attack) fails. The attack is completely agnostic to the sketching matrix and to the estimator: It applies to any linear sketch and any query responder, including those that are randomized, adaptive, or tailored to the query distribution. Our lower bound construction tightly matches the known upper bounds of tilde(Omega)($k^2$), achieved by specialized estimators for Johnson Lindenstrauss transforms and AMS sketches. Beyond sketching, our results uncover structural parallels to adversarial attacks in image classification, highlighting fundamental vulnerabilities of compressed representations.
nan
Article 479
Title@2025-07-22 (2): Constructing material network representations for intelligent amorphous alloys design
Title: Constructing material network representations for intelligent amorphous alloys design | Konstruktion von Materialnetzwerkdarstellungen für intelligente amorphe Legierungen | 为智能无定形合金设计建立材料网络示意图 2507.16336v1 |
Authors (7): S. -Y. Zhang, J. Tian, S. -L. Liu, H. -M. Zhang, H. -Y. Bai, Y. -C. Hu, W. -H. Wang
Designing high-performance amorphous alloys is demanding for various applications. But this process intensively relies on empirical laws and unlimited attempts. The high-cost and low-efficiency nature of the traditional strategies prevents effective sampling in the enormous material space. Here, we propose material networks to accelerate the discovery of binary and ternary amorphous alloys. The network topologies reveal hidden material candidates that were obscured by traditional tabular data representations. By scrutinizing the amorphous alloys synthesized in different years, we construct dynamical material networks to track the history of the alloy discovery. We find that some innovative materials designed in the past were encoded in the networks, demonstrating their predictive power in guiding new alloy design. These material networks show physical similarities with several real-world networks in our daily lives. Our findings pave a new way for intelligent materials design, especially for complex alloys.
nan
Article 480
Title@2025-07-22 (2): Higher Gauge Flow Models
Title: Higher Gauge Flow Models | Modelle mit höherem Messfluss | 高压流动模型 2507.16334v1 |
Authors (2): Alexander Strunk, Roland Assam
This paper introduces Higher Gauge Flow Models, a novel class of Generative Flow Models. Building upon ordinary Gauge Flow Models (arXiv:2507.13414), these Higher Gauge Flow Models leverage an L$_{\infty}$-algebra, effectively extending the Lie Algebra. This expansion allows for the integration of the higher geometry and higher symmetries associated with higher groups into the framework of Generative Flow Models. Experimental evaluation on a Gaussian Mixture Model dataset revealed substantial performance improvements compared to traditional Flow Models.
nan
Article 481
Title@2025-07-22 (2): Physics-Driven Neural Network for Solving Electromagnetic Inverse Scattering Problems
Title: Physics-Driven Neural Network for Solving Electromagnetic Inverse Scattering Problems | Physik-getriebenes Neuronales Netzwerk zur Lösung elektromagnetischer Inverse Streuprobleme | 解决电磁反向散射问题的物理动力神经网络 2507.16321v1 |
Authors (7): Yutong Du, Zicheng Liu, Bazargul Matkerim, Changyou Li, Yali Zong, Bo Qi, Jingwei Kou
In recent years, deep learning-based methods have been proposed for solving inverse scattering problems (ISPs), but most of them heavily rely on data and suffer from limited generalization capabilities. In this paper, a new solving scheme is proposed where the solution is iteratively updated following the updating of the physics-driven neural network (PDNN), the hyperparameters of which are optimized by minimizing the loss function which incorporates the constraints from the collected scattered fields and the prior information about scatterers. Unlike data-driven neural network solvers, PDNN is trained only requiring the input of collected scattered fields and the computation of scattered fields corresponding to predicted solutions, thus avoids the generalization problem. Moreover, to accelerate the imaging efficiency, the subregion enclosing the scatterers is identified. Numerical and experimental results demonstrate that the proposed scheme has high reconstruction accuracy and strong stability, even when dealing with composite lossy scatterers.
nan
Article 482
Title@2025-07-22 (2): Accurate and Efficient Fine-Tuning of Quantized Large Language Models Through Optimal Balance
Title: Accurate and Efficient Fine-Tuning of Quantized Large Language Models Through Optimal Balance | Genaues und effizientes Feintuning von Quantisierten großen Sprachmodellen durch optimale Balance | 通过最佳平衡对量化大语言模型进行准确、高效的微调 2407.17029v2 |
Authors (5): Ao Shen, Qiang Wang, Zhiquan Lai, Xionglve Li, Dongsheng Li
Large Language Models (LLMs) have demonstrated impressive performance across various domains. However, the enormous number of model parameters makes fine-tuning challenging, significantly limiting their application and deployment. Existing solutions combine parameter quantization with Low-Rank Adaptation (LoRA), reducing memory usage but causing performance degradation. Additionally, converting fine-tuned models to low-precision representations further degrades performance. In this paper, we identify an imbalance in fine-tuning quantized LLMs with LoRA: overly complex adapter inputs and outputs versus low effective trainability of the adapter, leading to underfitting during fine-tuning. Thus, we propose Quantized LLMs fine-tuning with Balanced Low-Rank Adaptation (Q-BLoRA), which simplifies the adapter’s inputs and outputs while increasing the adapter’s rank to alleviate underfitting during fine-tuning. For low-precision deployment, we propose Quantization-Aware fine-tuning with Balanced Low-Rank Adaptation (QA-BLoRA), which aligns with the block-wise quantization and facilitates quantization-aware fine-tuning of low-rank adaptation based on the parameter merging of Q-BLoRA. Both Q-BLoRA and QA-BLoRA are easily implemented and offer the following optimizations: (i) Q-BLoRA consistently achieves state-of-the-art accuracy compared to baselines and other variants; (ii) QA-BLoRA enables the direct generation of low-precision inference models, which exhibit significant performance improvements over other low-precision models. We validate the effectiveness of Q-BLoRA and QA-BLoRA across various models and scenarios. Code will be made available at \href{https://github.com/xiaocaigou/qbaraqahira}{https://github.com/xiaocaigou/qbaraqahira}
nan
Article 483
Title@2025-07-22 (2): Unsupervised Joint Learning of Optical Flow and Intensity with Event Cameras
Title: Unsupervised Joint Learning of Optical Flow and Intensity with Event Cameras | Unüberwachtes gemeinsames Lernen von optischem Fluss und Intensität mit Ereigniskameras | 利用活动摄像机联合学习光流和强度 2503.17262v2 |
Authors (3): Shuang Guo, Friedhelm Hamann, Guillermo Gallego
Event cameras rely on motion to obtain information about scene appearance. This means that appearance and motion are inherently linked: either both are present and recorded in the event data, or neither is captured. Previous works treat the recovery of these two visual quantities as separate tasks, which does not fit with the above-mentioned nature of event cameras and overlooks the inherent relations between them. We propose an unsupervised learning framework that jointly estimates optical flow (motion) and image intensity (appearance) using a single network. From the data generation model, we newly derive the event-based photometric error as a function of optical flow and image intensity. This error is further combined with the contrast maximization framework to form a comprehensive loss function that provides proper constraints for both flow and intensity estimation. Exhaustive experiments show our method’s state-of-the-art performance: in optical flow estimation, it reduces EPE by 20% and AE by 25% compared to unsupervised approaches, while delivering competitive intensity estimation results, particularly in high dynamic range scenarios. Our method also achieves shorter inference time than all other optical flow methods and many of the image reconstruction methods, while they output only one quantity. Project page: https://github.com/tub-rip/E2FAI
nan
Article 484
Title@2025-07-22 (2): Perovskite-R1: A Domain-Specialized LLM for Intelligent Discovery of Precursor Additives and Experimental Design
Title: Perovskite-R1: A Domain-Specialized LLM for Intelligent Discovery of Precursor Additives and Experimental Design | Perovskite-R1: Eine Domain-Spezialisierte LLM für die intelligente Entdeckung von Precursor-Additiven und experimentellen Design | Perovskite-R1: 用于智能发现前体添加剂和实验设计的一个域专用LLM 2507.16307v1 |
Authors (6): Xin-De Wang, Zhi-Rui Chen, Peng-Jie Guo, Ze-Feng Gao, Cheng Mu, Zhong-Yi Lu
Perovskite solar cells (PSCs) have rapidly emerged as a leading contender in next-generation photovoltaic technologies, owing to their exceptional power conversion efficiencies and advantageous material properties. Despite these advances, challenges such as long-term stability, environmental sustainability, and scalable manufacturing continue to hinder their commercialization. Precursor additive engineering has shown promise in addressing these issues by enhancing both the performance and durability of PSCs. However, the explosive growth of scientific literature and the complex interplay of materials, processes, and device architectures make it increasingly difficult for researchers to efficiently access, organize, and utilize domain knowledge in this rapidly evolving field. To address this gap, we introduce Perovskite-R1, a specialized large language model (LLM) with advanced reasoning capabilities tailored for the discovery and design of PSC precursor additives. By systematically mining and curating 1,232 high-quality scientific publications and integrating a comprehensive library of 33,269 candidate materials, we constructed a domain-specific instruction-tuning dataset using automated question-answer generation and chain-of-thought reasoning. Fine-tuning the QwQ-32B model on this dataset resulted in Perovskite-R1, which can intelligently synthesize literature insights and generate innovative and practical solutions for defect passivation and the selection of precursor additives. Experimental validation of several model-proposed strategies confirms their effectiveness in improving material stability and performance. Our work demonstrates the potential of domain-adapted LLMs in accelerating materials discovery and provides a closed-loop framework for intelligent, data-driven advancements in perovskite photovoltaic research.
nan
Article 485
Title@2025-07-22 (2): Attention-Based Fusion of IQ and FFT Spectrograms with AoA Features for GNSS Jammer Localization
Title: Attention-Based Fusion of IQ and FFT Spectrograms with AoA Features for GNSS Jammer Localization | Aufmerksamkeitsbasierte Fusion von IQ- und FFT-Spektrogrammen mit AoA-Features für die GNSS-Jammerlokalisierung | 以注意力为基础的IQ和FFFT Spectrogragragrams与AoA地貌特征的聚合,用于全球导航卫星系统Jammer本地化 2507.14167v2 |
Authors (6): Lucas Heublein, Christian Wielenberg, Thorsten Nowak, Tobias Feigl, Christopher Mutschler, Felix Ott
Jamming devices disrupt signals from the global navigation satellite system (GNSS) and pose a significant threat by compromising the reliability of accurate positioning. Consequently, the detection and localization of these interference signals are essential to achieve situational awareness, mitigating their impact, and implementing effective counter-measures. Classical Angle of Arrival (AoA) methods exhibit reduced accuracy in multipath environments due to signal reflections and scattering, leading to localization errors. Additionally, AoA-based techniques demand substantial computational resources for array signal processing. In this paper, we propose a novel approach for detecting and classifying interference while estimating the distance, azimuth, and elevation of jamming sources. Our benchmark study evaluates 128 vision encoder and time-series models to identify the highest-performing methods for each task. We introduce an attention-based fusion framework that integrates in-phase and quadrature (IQ) samples with Fast Fourier Transform (FFT)-computed spectrograms while incorporating 22 AoA features to enhance localization accuracy. Furthermore, we present a novel dataset of moving jamming devices recorded in an indoor environment with dynamic multipath conditions and demonstrate superior performance compared to state-of-the-art methods.
nan
Article 486
Title@2025-07-22 (2): CUDA-L1: Improving CUDA Optimization via Contrastive Reinforcement Learning
Title: CUDA-L1: Improving CUDA Optimization via Contrastive Reinforcement Learning | CUDA-L1: Verbesserung der CUDA-Optimierung durch kontrastives Verstärkungslernen | CUDA-L1:通过反竞争强化学习改进CUDA优化 2507.14111v3 |
Authors (5): Xiaoya Li, Xiaofei Sun, Albert Wang, Jiwei Li, Chris Shum
The exponential growth in demand for GPU computing resources, driven by the rapid advancement of Large Language Models, has created an urgent need for automated CUDA optimization strategies. While recent advances in LLMs show promise for code generation, current SOTA models (e.g. R1, o1) achieve low success rates in improving CUDA speed. In this paper, we introduce CUDA-L1, an automated reinforcement learning framework for CUDA optimization. CUDA-L1 achieves performance improvements on the CUDA optimization task: trained on NVIDIA A100, it delivers an average speedup of x17.7 across all 250 CUDA kernels of KernelBench, with peak speedups reaching x449. Furthermore, the model also demonstrates excellent portability across GPU architectures, achieving average speedups of x17.8 on H100, x19.0 on RTX 3090, x16.5 on L40, x14.7 on H800, and x13.9 on H20 despite being optimized specifically for A100. Beyond these benchmark results, CUDA-L1 demonstrates several remarkable properties: 1) Discovers a variety of CUDA optimization techniques and learns to combine them strategically to achieve optimal performance; 2) Uncovers fundamental principles of CUDA optimization; 3) Identifies non-obvious performance bottlenecks and rejects seemingly beneficial optimizations that harm performance. The capabilities of CUDA-L1 demonstrate that reinforcement learning can transform an initially poor-performing LLM into an effective CUDA optimizer through speedup-based reward signals alone, without human expertise or domain knowledge. More importantly, the trained RL model extend the acquired reasoning abilities to new kernels. This paradigm opens possibilities for automated optimization of CUDA operations, and holds promise to substantially promote GPU efficiency and alleviate the rising pressure on GPU computing resources.
nan
Article 487
Title@2025-07-22 (2): Towards Resilient Safety-driven Unlearning for Diffusion Models against Downstream Fine-tuning
Title: Towards Resilient Safety-driven Unlearning for Diffusion Models against Downstream Fine-tuning | Auf dem Weg zu einem resilienten sicherheitsorientierten Unlearning für Diffusionsmodelle gegen Downstream-Fine-Tuning | 面向适应性安全驱动的弹性安全驱动不学习如何利用下游微调传播模型 2507.16302v1 |
Authors (8): Boheng Li, Renjie Gu, Junjie Wang, Leyi Qi, Yiming Li, Run Wang, Zhan Qin, Tianwei Zhang
Text-to-image (T2I) diffusion models have achieved impressive image generation quality and are increasingly fine-tuned for personalized applications. However, these models often inherit unsafe behaviors from toxic pretraining data, raising growing safety concerns. While recent safety-driven unlearning methods have made promising progress in suppressing model toxicity, they are identified to be fragile to downstream fine-tuning, where we reveal that state-of-the-art methods largely fail to retain their effectiveness even when fine-tuned on entirely benign datasets. To mitigate this problem, in this paper, we propose ResAlign, a safety-driven unlearning framework with enhanced resilience against downstream fine-tuning. By modeling downstream fine-tuning as an implicit optimization problem with a Moreau Envelope-based reformulation, ResAlign enables efficient gradient estimation to minimize the recovery of harmful behaviors. Additionally, a meta-learning strategy is proposed to simulate a diverse distribution of fine-tuning scenarios to improve generalization. Extensive experiments across a wide range of datasets, fine-tuning methods, and configurations demonstrate that ResAlign consistently outperforms prior unlearning approaches in retaining safety after downstream fine-tuning while preserving benign generation capability well.
nan
Article 488
Title@2025-07-22 (2): Navigation through Non-Compact Symmetric Spaces: a mathematical perspective on Cartan Neural Networks
Title: Navigation through Non-Compact Symmetric Spaces: a mathematical perspective on Cartan Neural Networks | Navigation durch nicht-kompakte Symmetrische Räume: eine mathematische Perspektive auf kartanische Neuralnetze | 通过非协议对称空间导航:关于Cartan神经网络的数学视角 2507.16871v1 |
Authors (4): Pietro Giuseppe Fré, Federico Milanesio, Guido Sanguinetti, Matteo Santoro
Recent work has identified non-compact symmetric spaces U/H as a promising class of homogeneous manifolds to develop a geometrically consistent theory of neural networks. An initial implementation of these concepts has been presented in a twin paper under the moniker of Cartan Neural Networks, showing both the feasibility and the performance of these geometric concepts in a machine learning context. The current paper expands on the mathematical structures underpinning Cartan Neural Networks, detailing the geometric properties of the layers and how the maps between layers interact with such structures to make Cartan Neural Networks covariant and geometrically interpretable. Together, these twin papers constitute a first step towards a fully geometrically interpretable theory of neural networks exploiting group-theoretic structures
nan
Article 489
Title@2025-07-22 (2): Progressive-Resolution Policy Distillation: Leveraging Coarse-Resolution Simulations for Time-Efficient Fine-Resolution Policy Learning
Title: Progressive-Resolution Policy Distillation: Leveraging Coarse-Resolution Simulations for Time-Efficient Fine-Resolution Policy Learning | Progressive-Resolution Policy Destillation: Leveraging Coarse-Resolution Simulationen für zeiteffizientes Fine-Resolution Policy Learning | 渐进式决议政策蒸馏:利用利用粗制粗制模拟器进行时间效率高的精细决议政策学习 2412.07477v3 |
Authors (3): Yuki Kadokawa, Hirotaka Tahara, Takamitsu Matsubara
In earthwork and construction, excavators often encounter large rocks mixed with various soil conditions, requiring skilled operators. This paper presents a framework for achieving autonomous excavation using reinforcement learning (RL) through a rock excavation simulator. In the simulation, resolution can be defined by the particle size/number in the whole soil space. Fine-resolution simulations closely mimic real-world behavior but demand significant calculation time and challenging sample collection, while coarse-resolution simulations enable faster sample collection but deviate from real-world behavior. To combine the advantages of both resolutions, we explore using policies developed in coarse-resolution simulations for pre-training in fine-resolution simulations. To this end, we propose a novel policy learning framework called Progressive-Resolution Policy Distillation (PRPD), which progressively transfers policies through some middle-resolution simulations with conservative policy transfer to avoid domain gaps that could lead to policy transfer failure. Validation in a rock excavation simulator and nine real-world rock environments demonstrated that PRPD reduced sampling time to less than 1/7 while maintaining task success rates comparable to those achieved through policy learning in a fine-resolution simulation.
nan
Article 490
Title@2025-07-22 (2): Note on Follow-the-Perturbed-Leader in Combinatorial Semi-Bandit Problems
Title: Note on Follow-the-Perturbed-Leader in Combinatorial Semi-Bandit Problems | Hinweis zum Follow-the-Perturbed-Leader bei kombinatorischen Semi-Bandit-Problemen | 关于合并半银行问题后续行动说明 2506.12490v2 |
Authors (2): Botao Chen, Junya Honda
This paper studies the optimality and complexity of Follow-the-Perturbed-Leader (FTPL) policy in size-invariant combinatorial semi-bandit problems. Recently, Honda et al. (2023) and Lee et al. (2024) showed that FTPL achieves Best-of-Both-Worlds (BOBW) optimality in standard multi-armed bandit problems with Fr'{e}chet-type distributions. However, the optimality of FTPL in combinatorial semi-bandit problems remains unclear. In this paper, we consider the regret bound of FTPL with geometric resampling (GR) in size-invariant semi-bandit setting, showing that FTPL respectively achieves $O\left(\sqrt{m^2 d^\frac{1}{\alpha}T}+\sqrt{mdT}\right)$ regret with Fr'{e}chet distributions, and the best possible regret bound of $O\left(\sqrt{mdT}\right)$ with Pareto distributions in adversarial setting. Furthermore, we extend the conditional geometric resampling (CGR) to size-invariant semi-bandit setting, which reduces the computational complexity from $O(d^2)$ of original GR to $O\left(md\left(\log(d/m)+1\right)\right)$ without sacrificing the regret performance of FTPL.
nan
Article 491
Title@2025-07-22 (2): Unisolver: PDE-Conditional Transformers Are Universal PDE Solvers
Title: Unisolver: PDE-Conditional Transformers Are Universal PDE Solvers | Unisolver: PDE-Conditional Transformer sind universelle PDE-Lösemittel | 离子: PDE- 条件变换器为通用 PDE 解答器 2405.17527v4 |
Authors (5): Hang Zhou, Yuezhou Ma, Haixu Wu, Haowen Wang, Mingsheng Long
Deep models have recently emerged as promising tools to solve partial differential equations (PDEs), known as neural PDE solvers. While neural solvers trained from either simulation data or physics-informed loss can solve PDEs reasonably well, they are mainly restricted to a few instances of PDEs, e.g. a certain equation with a limited set of coefficients. This limits their generalization to diverse PDEs, preventing them from being practical surrogate models of numerical solvers. In this paper, we present Unisolver, a novel Transformer model trained on diverse data and conditioned on diverse PDEs, aiming towards a universal neural PDE solver capable of solving a wide scope of PDEs. Instead of purely scaling up data and parameters, Unisolver stems from the theoretical analysis of the PDE-solving process. Inspired by the mathematical structure of PDEs that a PDE solution is fundamentally governed by a series of PDE components such as equation symbols and boundary conditions, we define a complete set of PDE components and flexibly embed them as domain-wise and point-wise deep conditions for Transformer PDE solvers. Integrating physical insights with recent Transformer advances, Unisolver achieves consistent state-of-the-art on three challenging large-scale benchmarks, showing impressive performance and generalizability. Code is available at https://github.com/thuml/Unisolver.
nan
Article 492
Title@2025-07-22 (2): Ownership Verification of DNN Models Using White-Box Adversarial Attacks with Specified Probability Manipulation
Title: Ownership Verification of DNN Models Using White-Box Adversarial Attacks with Specified Probability Manipulation | Eigentumsverifizierung von DNN-Modellen mit White-Box-Adversarial-Angriffen mit spezifizierter Wahrscheinlichkeitsmanipulation | DNN 使用白毒对反对反对性袭击模式进行指定概率操纵的DNN自有性核查 2505.17579v2 |
Authors (5): Teruki Sano, Minoru Kuribayashi, Masao Sakai, Shuji Ishobe, Eisuke Koizumi
In this paper, we propose a novel framework for ownership verification of deep neural network (DNN) models for image classification tasks. It allows verification of model identity by both the rightful owner and third party without presenting the original model. We assume a gray-box scenario where an unauthorized user owns a model that is illegally copied from the original model, provides services in a cloud environment, and the user throws images and receives the classification results as a probability distribution of output classes. The framework applies a white-box adversarial attack to align the output probability of a specific class to a designated value. Due to the knowledge of original model, it enables the owner to generate such adversarial examples. We propose a simple but effective adversarial attack method based on the iterative Fast Gradient Sign Method (FGSM) by introducing control parameters. Experimental results confirm the effectiveness of the identification of DNN models using adversarial attack.
nan
Article 493
Title@2025-07-22 (2): Time to Split: Exploring Data Splitting Strategies for Offline Evaluation of Sequential Recommenders
Title: Time to Split: Exploring Data Splitting Strategies for Offline Evaluation of Sequential Recommenders | Time to Split: Erforschung von Datenspaltungsstrategien für Offline-Evaluierung von Sequential Recommenders | 拆分时间:探索对序列建议者进行离线评价的数据分割战略 2507.16289v1 |
Authors (5): Danil Gusak, Anna Volodkevich, Anton Klenitskiy, Alexey Vasilev, Evgeny Frolov
Modern sequential recommender systems, ranging from lightweight transformer-based variants to large language models, have become increasingly prominent in academia and industry due to their strong performance in the next-item prediction task. Yet common evaluation protocols for sequential recommendations remain insufficiently developed: they often fail to reflect the corresponding recommendation task accurately, or are not aligned with real-world scenarios. Although the widely used leave-one-out split matches next-item prediction, it permits the overlap between training and test periods, which leads to temporal leakage and unrealistically long test horizon, limiting real-world relevance. Global temporal splitting addresses these issues by evaluating on distinct future periods. However, its applications to sequential recommendations remain loosely defined, particularly in terms of selecting target interactions and constructing a validation subset that provides necessary consistency between validation and test metrics. In this paper, we demonstrate that evaluation outcomes can vary significantly across splitting strategies, influencing model rankings and practical deployment decisions. To improve reproducibility in both academic and industrial settings, we systematically compare different splitting strategies for sequential recommendations across multiple datasets and established baselines. Our findings show that prevalent splits, such as leave-one-out, may be insufficiently aligned with more realistic evaluation strategies. Code: https://github.com/monkey0head/time-to-split
nan
Article 494
Title@2025-07-22 (2): FedMultiEmo: Real-Time Emotion Recognition via Multimodal Federated Learning
Title: FedMultiEmo: Real-Time Emotion Recognition via Multimodal Federated Learning | FedMultiEmo: Echtzeit-Emotionserkennung durch multimodales Federated Learning | Fed MultiEmo:通过多模式联邦学习来实时承认情感 2507.15470v2 |
Authors (5): Baran Can Gül, Suraksha Nadig, Stefanos Tziampazis, Nasser Jazdi, Michael Weyrich
In-vehicle emotion recognition underpins adaptive driver-assistance systems and, ultimately, occupant safety. However, practical deployment is hindered by (i) modality fragility - poor lighting and occlusions degrade vision-based methods; (ii) physiological variability - heart-rate and skin-conductance patterns differ across individuals; and (iii) privacy risk - centralized training requires transmission of sensitive data. To address these challenges, we present FedMultiEmo, a privacy-preserving framework that fuses two complementary modalities at the decision level: visual features extracted by a Convolutional Neural Network from facial images, and physiological cues (heart rate, electrodermal activity, and skin temperature) classified by a Random Forest. FedMultiEmo builds on three key elements: (1) a multimodal federated learning pipeline with majority-vote fusion, (2) an end-to-end edge-to-cloud prototype on Raspberry Pi clients and a Flower server, and (3) a personalized Federated Averaging scheme that weights client updates by local data volume. Evaluated on FER2013 and a custom physiological dataset, the federated Convolutional Neural Network attains 77% accuracy, the Random Forest 74%, and their fusion 87%, matching a centralized baseline while keeping all raw data local. The developed system converges in 18 rounds, with an average round time of 120 seconds and a per-client memory footprint below 200 MB. These results indicate that FedMultiEmo offers a practical approach to real-time, privacy-aware emotion recognition in automotive settings.
nan
Article 495
Title@2025-07-22 (2): Diffusion-Based Electrocardiography Noise Quantification via Anomaly Detection
Title: Diffusion-Based Electrocardiography Noise Quantification via Anomaly Detection | Diffusionsbasierte Elektrokardiographie Geräuschquantifizierung durch Anomalieerkennung | 通过非异常检测进行传播基电动心电心动噪音测量 2506.11815v2 |
Authors (8): Tae-Seong Han, Jae-Wook Heo, Hakseung Kim, Cheol-Hui Lee, Hyub Huh, Eue-Keun Choi, Hye Jin Kim, Dong-Joo Kim
Electrocardiography (ECG) signals are frequently degraded by noise, limiting their clinical reliability in both conventional and wearable settings. Existing methods for addressing ECG noise, relying on artifact classification or denoising, are constrained by annotation inconsistencies and poor generalizability. Here, we address these limitations by reframing ECG noise quantification as an anomaly detection task. We propose a diffusion-based framework trained to model the normative distribution of clean ECG signals, identifying deviations as noise without requiring explicit artifact labels. To robustly evaluate performance and mitigate label inconsistencies, we introduce a distribution-based metric using the Wasserstein-1 distance ($W_1$). Our model achieved a macro-average $W_1$ score of 1.308, outperforming the next-best method by over 48\%. External validation confirmed strong generalizability, facilitating the exclusion of noisy segments to improve diagnostic accuracy and support timely clinical intervention. This approach enhances real-time ECG monitoring and broadens ECG applicability in digital health technologies.
nan
Article 496
Title@2025-07-22 (2): Tagging fully hadronic exotic decays of the vectorlike $\mathbf{B}$ quark using a graph neural network
Title: Tagging fully hadronic exotic decays of the vectorlike $\mathbf{B}$ quark using a graph neural network | Tagging voll hadronische exotische Zerfalle des vektorartigen $\mathbf{B}$ Quark mit einem Graphen-Neural-Netzwerk | 使用一个图形神经网络,将 $\ mathbf{B} $quark 等矢量完全老化的异质衰变 2505.07769v2 |
Authors (5): Jai Bardhan, Tanumoy Mandal, Subhadip Mitra, Cyrin Neeraj, Mihir Rawat
Following up on our earlier study in [J. Bardhan et al., Machine learning-enhanced search for a vectorlike singlet B quark decaying to a singlet scalar or pseudoscalar, Phys. Rev. D 107 (2023) 115001; arXiv:2212.02442], we investigate the LHC prospects of pair-produced vectorlike $B$ quarks decaying exotically to a new gauge-singlet (pseudo)scalar field $\Phi$ and a $b$ quark. After the electroweak symmetry breaking, the $\Phi$ decays predominantly to $gg/bb$ final states, leading to a fully hadronic $2b+4j$ or $6b$ signature. Because of the large Standard Model background and the lack of leptonic handles, it is a difficult channel to probe. To overcome the challenge, we employ a hybrid deep learning model containing a graph neural network followed by a deep neural network. We estimate that such a state-of-the-art deep learning analysis pipeline can lead to a performance comparable to that in the semi-leptonic mode, taking the discovery (exclusion) reach up to about $M_B=1.8:(2.4)$ TeV at HL-LHC when $B$ decays fully exotically, i.e., BR$(B \to b\Phi) = 100\%$.
nan
Article 497
Title@2025-07-22 (2): Hierarchical Reasoning Model
Title: Hierarchical Reasoning Model | Hierarchisches Modell der Vernunft | 等级推理模型 2506.21734v2 |
Authors (9): Guan Wang, Jin Li, Yuhao Sun, Xing Chen, Changling Liu, Yue Wu, Meng Lu, Sen Song, Yasin Abbasi Yadkori
Reasoning, the process of devising and executing complex goal-oriented action sequences, remains a critical challenge in AI. Current large language models (LLMs) primarily employ Chain-of-Thought (CoT) techniques, which suffer from brittle task decomposition, extensive data requirements, and high latency. Inspired by the hierarchical and multi-timescale processing in the human brain, we propose the Hierarchical Reasoning Model (HRM), a novel recurrent architecture that attains significant computational depth while maintaining both training stability and efficiency. HRM executes sequential reasoning tasks in a single forward pass without explicit supervision of the intermediate process, through two interdependent recurrent modules: a high-level module responsible for slow, abstract planning, and a low-level module handling rapid, detailed computations. With only 27 million parameters, HRM achieves exceptional performance on complex reasoning tasks using only 1000 training samples. The model operates without pre-training or CoT data, yet achieves nearly perfect performance on challenging tasks including complex Sudoku puzzles and optimal path finding in large mazes. Furthermore, HRM outperforms much larger models with significantly longer context windows on the Abstraction and Reasoning Corpus (ARC), a key benchmark for measuring artificial general intelligence capabilities. These results underscore HRM’s potential as a transformative advancement toward universal computation and general-purpose reasoning systems.
nan
Article 498
Title@2025-07-22 (2): Understanding Generalization, Robustness, and Interpretability in Low-Capacity Neural Networks
Title: Understanding Generalization, Robustness, and Interpretability in Low-Capacity Neural Networks | Verallgemeinerung, Robustheit und Dolmetschbarkeit in neuralen Netzwerken mit geringer Kapazität verstehen | 理解低能力神经网络的普遍化、强健和可解释性 2507.16278v1 |
Authors (1): Yash Kumar
Although modern deep learning often relies on massive over-parameterized models, the fundamental interplay between capacity, sparsity, and robustness in low-capacity networks remains a vital area of study. We introduce a controlled framework to investigate these properties by creating a suite of binary classification tasks from the MNIST dataset with increasing visual difficulty (e.g., 0 and 1 vs. 4 and 9). Our experiments reveal three core findings. First, the minimum model capacity required for successful generalization scales directly with task complexity. Second, these trained networks are robust to extreme magnitude pruning (up to 95% sparsity), revealing the existence of sparse, high-performing subnetworks. Third, we show that over-parameterization provides a significant advantage in robustness against input corruption. Interpretability analysis via saliency maps further confirms that these identified sparse subnetworks preserve the core reasoning process of the original dense models. This work provides a clear, empirical demonstration of the foundational trade-offs governing simple neural networks.
nan
Article 499
Title@2025-07-22 (2): Reducing GPU Memory Fragmentation via Spatio-Temporal Planning for Efficient Large-Scale Model Training
Title: Reducing GPU Memory Fragmentation via Spatio-Temporal Planning for Efficient Large-Scale Model Training | Reduzierung der GPU-Speicherfragmentierung durch Spatio-Temporale Planung für effiziente großformatige Modellschulungen | 通过SPA-时间规划减少GPU内存碎片化,促进高效大型示范培训 2507.16274v1 |
Authors (12): Zixiao Huang, Junhao Hu, Hao Lin, Chunyang Zhu, Yueran Tang, Quanlu Zhang, Zhen Guo, Zhenhua Li, Shengen Yan, Zhenhua Zhu, Guohao Dai, Yu Wang
The rapid scaling of large language models (LLMs) has significantly increased GPU memory pressure, which is further aggravated by training optimization techniques such as virtual pipeline and recomputation that disrupt tensor lifespans and introduce considerable memory fragmentation. Default GPU memory allocators of popular deep learning frameworks like PyTorch use online strategies without knowledge of tensor lifespans, which can waste up to 43\% of memory and cause out-of-memory errors, rendering optimization techniques ineffective or even unusable. To address this, we introduce STWeaver, a GPU memory allocator for deep learning frameworks that reduces fragmentation by exploiting the spatial and temporal regularity in memory allocation behaviors of training workloads. STWeaver introduces a novel paradigm that combines offline planning with online allocation. The offline planning leverages spatio-temporal regularities to generate a near-optimal allocation plan, while the online allocation handles complex and dynamic models such as Mixture-of-Experts (MoE). Built as a pluggable PyTorch allocator, STWeaver reduces fragmentation ratio on average by 79.2\% (up to 100\%) across both dense and sparse models, with negligible overhead. This enables more efficient, high-throughput training configurations and improves performance by up to 32.5\%.
nan
Article 500
Title@2025-07-22 (2): On exploration of an interior mirror descent flow for stochastic nonconvex constrained problem
Title: On exploration of an interior mirror descent flow for stochastic nonconvex constrained problem | Auf der Erforschung des inneren Spiegelabflusses für stochastisches nichtkonvexes beschränktes Problem | 探索内镜面下下下流的内孔反镜下流,以缓解杂乱的非电流制约问题 2507.15264v2 |
Authors (2): Kuangyu Ding, Kim-Chuan Toh
We study a nonsmooth nonconvex optimization problem defined over nonconvex constraints, where the feasible set is given by the intersection of the closure of an open set and a smooth manifold. By endowing the open set with a Riemannian metric induced by a barrier function, we obtain a Riemannian subgradient flow formulated as a differential inclusion, which remains strictly within the interior of the feasible set. This continuous dynamical system unifies two classes of iterative optimization methods, namely the Hessian barrier method and mirror descent scheme, by revealing that these methods can be interpreted as discrete approximations of the continuous flow. We explore the long-term behavior of the trajectories generated by this dynamical system and show that the existing deficient convergence properties of the Hessian barrier and mirror descent scheme can be unifily and more insightfully interpreted through these of the continuous trajectory. For instance, the notorious spurious stationary points \cite{chen2024spurious} observed in Hessian barrier method and mirror descent scheme are interpreted as stable equilibria of the dynamical system that do not correspond to real stationary points of the original optimization problem. We provide two sufficient condition such that these spurious stationary points can be avoided if the strict complementarity conditions holds. In the absence of these regularity condition, we propose a random perturbation strategy that ensures the trajectory converges (subsequentially) to an approximate stationary point. Building on these insights, we introduce two iterative Riemannian subgradient methods, form of interior point methods, that generalizes the existing Hessian barrier method and mirror descent scheme for solving nonsmooth nonconvex optimization problems.
nan
Article 501
Title@2025-07-22 (2): Probing Ranking LLMs: A Mechanistic Analysis for Information Retrieval
Title: Probing Ranking LLMs: A Mechanistic Analysis for Information Retrieval | Probing Ranking LLMs: Eine mechanistische Analyse für die Informationswiederherstellung | 检验排名LMS:信息检索的机械分析 2410.18527v3 |
Authors (3): Tanya Chowdhury, Atharva Nijasure, James Allan
Transformer networks, particularly those achieving performance comparable to GPT models, are well known for their robust feature extraction abilities. However, the nature of these extracted features and their alignment with human-engineered ones remain unexplored. In this work, we investigate the internal mechanisms of state-of-the-art, fine-tuned LLMs for passage reranking. We employ a probing-based analysis to examine neuron activations in ranking LLMs, identifying the presence of known human-engineered and semantic features. Our study spans a broad range of feature categories, including lexical signals, document structure, query-document interactions, and complex semantic representations, to uncover underlying patterns influencing ranking decisions. Through experiments on four different ranking LLMs, we identify statistical IR features that are prominently encoded in LLM activations, as well as others that are notably missing. Furthermore, we analyze how these models respond to out-of-distribution queries and documents, revealing distinct generalization behaviors. By dissecting the latent representations within LLM activations, we aim to improve both the interpretability and effectiveness of ranking models. Our findings offer crucial insights for developing more transparent and reliable retrieval systems, and we release all necessary scripts and code to support further exploration.
nan
Article 502
Title@2025-07-22 (2): ToFe: Lagged Token Freezing and Reusing for Efficient Vision Transformer Inference
Title: ToFe: Lagged Token Freezing and Reusing for Efficient Vision Transformer Inference | ToFe: Gefälschtes Einfrieren und Wiederverwenden von Token für eine effiziente Bildverarbeitungstransformer-Inferenz | ToFe: “ 高效愿景变换引力 “ : “ 冷冻和再利用 “ 拖累的 ToFe: “ 冷冻和再利用 “ 2507.16260v1 |
Authors (3): Haoyue Zhang, Jie Zhang, Song Guo
Although vision transformers (ViT) have shown remarkable success in various vision tasks, their computationally expensive self-attention hinder their deployment on resource-constrained devices. Token reduction, which discards less important tokens during forward propagation, has been proposed to enhance the efficiency of transformer models. However, existing methods handle unimportant tokens irreversibly, preventing their reuse in subsequent blocks. Considering that transformers focus on different information among blocks, tokens reduced in early blocks might be useful later. Furthermore, to adapt transformer models for resource-constrained devices, it is crucial to strike a balance between model performance and computational overhead. To address these challenges, in this paper, we introduce a novel Token Freezing and Reusing (ToFe) framework, where we identify important tokens at each stage and temporarily freeze the unimportant ones, allowing their lagged reusing at a later stage. Specifically, we design a prediction module for token identification and an approximate module for recovery of the frozen tokens. By jointly optimizing with the backbone through computation budget-aware end-to-end training, ToFe can adaptively process the necessary tokens at each block, thereby reducing computational cost while maintaining performance. Extensive experiments demonstrate that ToFe reduces the computational cost of LV-ViT model by 50% with less than 2% drop in Top-1 accuracy, achieving a better trade-off between performance and complexity compared to state-of-the-art methods.
nan
Article 503
Title@2025-07-22 (2): Edge-case Synthesis for Fisheye Object Detection: A Data-centric Perspective
Title: Edge-case Synthesis for Fisheye Object Detection: A Data-centric Perspective | Edge-Case-Synthese zur Erkennung von Fisheye-Objekten: Eine datenzentrierte Perspektive | 鱼眼物体探测边缘综合情况:以数据为中心的视角 2507.16254v1 |
Authors (2): Seunghyeon Kim, Kyeongryeol Go
Fisheye cameras introduce significant distortion and pose unique challenges to object detection models trained on conventional datasets. In this work, we propose a data-centric pipeline that systematically improves detection performance by focusing on the key question of identifying the blind spots of the model. Through detailed error analysis, we identify critical edge-cases such as confusing class pairs, peripheral distortions, and underrepresented contexts. Then we directly address them through edge-case synthesis. We fine-tuned an image generative model and guided it with carefully crafted prompts to produce images that replicate real-world failure modes. These synthetic images are pseudo-labeled using a high-quality detector and integrated into training. Our approach results in consistent performance gains, highlighting how deeply understanding data and selectively fixing its weaknesses can be impactful in specialized domains like fisheye object detection.
nan
Article 504
Title@2025-07-22 (2): Multi-Agent Reinforcement Learning for Sample-Efficient Deep Neural Network Mapping
Title: Multi-Agent Reinforcement Learning for Sample-Efficient Deep Neural Network Mapping | Multi-Agenten-Verstärkungs-Lernen für stichprobeneffiziente Tiefen-Neural-Netzwerk-Mapping | 用于抽样有效深神经网络绘图的多机构强化学习 2507.16249v1 |
Authors (7): Srivatsan Krishnan, Jason Jabbour, Dan Zhang, Natasha Jaques, Aleksandra Faust, Shayegan Omidshafiei, Vijay Janapa Reddi
Mapping deep neural networks (DNNs) to hardware is critical for optimizing latency, energy consumption, and resource utilization, making it a cornerstone of high-performance accelerator design. Due to the vast and complex mapping space, reinforcement learning (RL) has emerged as a promising approach-but its effectiveness is often limited by sample inefficiency. We present a decentralized multi-agent reinforcement learning (MARL) framework designed to overcome this challenge. By distributing the search across multiple agents, our framework accelerates exploration. To avoid inefficiencies from training multiple agents in parallel, we introduce an agent clustering algorithm that assigns similar mapping parameters to the same agents based on correlation analysis. This enables a decentralized, parallelized learning process that significantly improves sample efficiency. Experimental results show our MARL approach improves sample efficiency by 30-300x over standard single-agent RL, achieving up to 32.61x latency reduction and 16.45x energy-delay product (EDP) reduction under iso-sample conditions.
nan
Article 505
Title@2025-07-22 (2): OPC: One-Point-Contraction Unlearning Toward Deep Feature Forgetting
Title: OPC: One-Point-Contraction Unlearning Toward Deep Feature Forgetting | OPC: One-Point-Contraction Unlearning Toward Deep Feature Vergessen | OPC: 一点-合同拆开学习深地地貌的遗忘 2507.07754v2 |
Authors (4): Jaeheun Jung, Bosung Jung, Suhyun Bae, Donghun Lee
Machine unlearning seeks to remove the influence of particular data or class from trained models to meet privacy, legal, or ethical requirements. Existing unlearning methods tend to forget shallowly: phenomenon of an unlearned model pretend to forget by adjusting only the model response, while its internal representations retain information sufficiently to restore the forgotten data or behavior. We empirically confirm the widespread shallowness by reverting the forgetting effect of various unlearning methods via training-free performance recovery attack and gradient-inversion-based data reconstruction attack. To address this vulnerability fundamentally, we define a theoretical criterion of ``deep forgetting’’ based on one-point-contraction of feature representations of data to forget. We also propose an efficient approximation algorithm, and use it to construct a novel general-purpose unlearning algorithm: One-Point-Contraction (OPC). Empirical evaluations on image classification unlearning benchmarks show that OPC achieves not only effective unlearning performance but also superior resilience against both performance recovery attack and gradient-inversion attack. The distinctive unlearning performance of OPC arises from the deep feature forgetting enforced by its theoretical foundation, and recaps the need for improved robustness of machine unlearning methods.
nan
Article 506
Title@2025-07-22 (2): IPPRO: Importance-based Pruning with PRojective Offset for Magnitude-indifferent Structural Pruning
Title: IPPRO: Importance-based Pruning with PRojective Offset for Magnitude-indifferent Structural Pruning | IPPRO: Wichtiges Pruning mit PRojective Offset für Magnitude-indifferent Structural Pruning | IPPRO: 以重力为根据的谨慎与磁度偏差结构谨慎的倾斜偏移 2507.14171v2 |
Authors (4): Jaeheun Jung, Jaehyuk Lee, Yeajin Lee, Donghun Lee
With the growth of demand on neural network compression methods, the structured pruning methods including importance-based approach are actively studied. The magnitude importance and many correlated modern importance criteria often limit the capacity of pruning decision, since the filters with larger magnitudes are not likely to be pruned if the smaller one didn’t, even if it is redundant. In this paper, we propose a novel pruning strategy to challenge this dominating effect of magnitude and provide fair chance to each filter to be pruned, by placing it on projective space. After that, we observe the gradient descent movement whether the filters move toward the origin or not, to measure how the filter is likely to be pruned. This measurement is used to construct PROscore, a novel importance score for IPPRO, a novel importance-based structured pruning with magnitude-indifference. Our evaluation results shows that the proposed importance criteria using the projective space achieves near-lossless pruning by reducing the performance drop in pruning, with promising performance after the finetuning. Our work debunks the ``size-matters’’ myth in pruning and expands the frontier of importance-based pruning both theoretically and empirically.
nan
Article 507
Title@2025-07-22 (2): MPO: An Efficient Post-Processing Framework for Mixing Diverse Preference Alignment
Title: MPO: An Efficient Post-Processing Framework for Mixing Diverse Preference Alignment | MPO: Ein effizientes Post-Processing-Framework zum Mischen unterschiedlicher Präferenzen | MPO: 混合多种优惠协调的高效处理后框架 2502.18699v3 |
Authors (5): Tianze Wang, Dongnan Gui, Yifan Hu, Shuhang Lin, Linjun Zhang
Reinforcement Learning from Human Feedback (RLHF) has shown promise in aligning large language models (LLMs). Yet its reliance on a singular reward model often overlooks the diversity of human preferences. Recent approaches address this limitation by leveraging multi-dimensional feedback to fine-tune corresponding reward models and train LLMs using reinforcement learning. However, the process is costly and unstable, especially given the competing and heterogeneous nature of human preferences. In this paper, we propose Mixing Preference Optimization (MPO), a post-processing framework for aggregating single-objective policies as an alternative to both multi-objective RLHF (MORLHF) and MaxMin-RLHF. MPO avoids alignment from scratch. Instead, it log-linearly combines existing policies into a unified one with the weight of each policy computed via a batch stochastic mirror descent. Empirical results demonstrate that MPO achieves balanced performance across diverse preferences, outperforming or matching existing models with significantly reduced computational costs.
nan
Article 508
Title@2025-07-22 (2): LLM-Enhanced Reranking for Complementary Product Recommendation
Title: LLM-Enhanced Reranking for Complementary Product Recommendation | LLM-erweitertes Reranking für ergänzende Produktempfehlung | LLM-加强补充产品建议书的重新排名 2507.16237v1 |
Authors (2): Zekun Xu, Yudi Zhang
Complementary product recommendation, which aims to suggest items that are used together to enhance customer value, is a crucial yet challenging task in e-commerce. While existing graph neural network (GNN) approaches have made significant progress in capturing complex product relationships, they often struggle with the accuracy-diversity tradeoff, particularly for long-tail items. This paper introduces a model-agnostic approach that leverages Large Language Models (LLMs) to enhance the reranking of complementary product recommendations. Unlike previous works that use LLMs primarily for data preprocessing and graph augmentation, our method applies LLM-based prompting strategies directly to rerank candidate items retrieved from existing recommendation models, eliminating the need for model retraining. Through extensive experiments on public datasets, we demonstrate that our approach effectively balances accuracy and diversity in complementary product recommendations, with at least 50% lift in accuracy metrics and 2% lift in diversity metrics on average for the top recommended items across datasets.
nan
Article 509
Title@2025-07-22 (2): PAC Off-Policy Prediction of Contextual Bandits
Title: PAC Off-Policy Prediction of Contextual Bandits | PAC Off-Policy Vorhersage von Kontext Banditen | PAC 非政策性背景强盗预测 2507.16236v1 |
Authors (3): Yilong Wan, Yuqiang Li, Xianyi Wu
This paper investigates off-policy evaluation in contextual bandits, aiming to quantify the performance of a target policy using data collected under a different and potentially unknown behavior policy. Recently, methods based on conformal prediction have been developed to construct reliable prediction intervals that guarantee marginal coverage in finite samples, making them particularly suited for safety-critical applications. To further achieve coverage conditional on a given offline data set, we propose a novel algorithm that constructs probably approximately correct prediction intervals. Our method builds upon a PAC-valid conformal prediction framework, and we strengthen its theoretical guarantees by establishing PAC-type bounds on coverage. We analyze both finite-sample and asymptotic properties of the proposed method, and compare its empirical performance with existing methods in simulations.
nan
Article 510
Title@2025-07-22 (2): Aligned Manifold Property and Topology Point Clouds for Learning Molecular Properties
Title: Aligned Manifold Property and Topology Point Clouds for Learning Molecular Properties | Aligned Manifold Property and Topology Point Clouds for Learning Molecular Properties | 用于学习分子特性的实心式地产和地形点云 2507.16223v1 |
Authors (1): Alexander Mihalcea
Machine learning models for molecular property prediction generally rely on representations – such as SMILES strings and molecular graphs – that overlook the surface-local phenomena driving intermolecular behavior. 3D-based approaches often reduce surface detail or require computationally expensive SE(3)-equivariant architectures to manage spatial variance. To overcome these limitations, this work introduces AMPTCR (Aligned Manifold Property and Topology Cloud Representation), a molecular surface representation that combines local quantum-derived scalar fields and custom topological descriptors within an aligned point cloud format. Each surface point includes a chemically meaningful scalar, geodesically derived topology vectors, and coordinates transformed into a canonical reference frame, enabling efficient learning with conventional SE(3)-sensitive architectures. AMPTCR is evaluated using a DGCNN framework on two tasks: molecular weight and bacterial growth inhibition. For molecular weight, results confirm that AMPTCR encodes physically meaningful data, with a validation R^2 of 0.87. In the bacterial inhibition task, AMPTCR enables both classification and direct regression of E. coli inhibition values using Dual Fukui functions as the electronic descriptor and Morgan Fingerprints as auxiliary data, achieving an ROC AUC of 0.912 on the classification task, and an R^2 of 0.54 on the regression task. These results help demonstrate that AMPTCR offers a compact, expressive, and architecture-agnostic representation for modeling surface-mediated molecular properties.
nan
Article 511
Title@2025-07-22 (2): Toward Routine CSP of Pharmaceuticals: A Fully Automated Protocol Using Neural Network Potentials
Title: Toward Routine CSP of Pharmaceuticals: A Fully Automated Protocol Using Neural Network Potentials | Auf dem Weg zu einem routinemäßigen CSP of Pharmaceuticals: Ein vollautomatisiertes Protokoll zur Nutzung neuraler Netzwerkpotentiale | 迈向药物常规CSP:利用神经网络潜力的全自动协议 2507.16218v1 |
Authors (3): Zachary L. Glick, Derek P. Metcalf, Scott F. Swarthout
Crystal structure prediction (CSP) is a useful tool in pharmaceutical development for identifying and assessing risks associated with polymorphism, yet widespread adoption has been hindered by high computational costs and the need for both manual specification and expert knowledge to achieve useful results. Here, we introduce a fully automated, high-throughput CSP protocol designed to overcome these barriers. The protocol’s efficiency is driven by Lavo-NN, a novel neural network potential (NNP) architected and trained specifically for pharmaceutical crystal structure generation and ranking. This NNP-driven crystal generation phase is integrated into a scalable cloud-based workflow. We validate this CSP protocol on an extensive retrospective benchmark of 49 unique molecules, almost all of which are drug-like, successfully generating structures that match all 110 $Z’ = 1$ experimental polymorphs. The average CSP in this benchmark is performed with approximately 8.4k CPU hours, which is a significant reduction compared to other protocols. The practical utility of the protocol is further demonstrated through case studies that resolve ambiguities in experimental data and a semi-blinded challenge that successfully identifies and ranks polymorphs of three modern drugs from powder X-ray diffraction patterns alone. By significantly reducing the required time and cost, the protocol enables CSP to be routinely deployed earlier in the drug discovery pipeline, such as during lead optimization. Rapid turnaround times and high throughput also enable CSP that can be run in parallel with experimental screening, providing chemists with real-time insights to guide their work in the lab.
nan
Article 512
Title@2025-07-22 (2): Towards Compute-Optimal Many-Shot In-Context Learning
Title: Towards Compute-Optimal Many-Shot In-Context Learning | Auf dem Weg zu einem rechnerisch-optimalen, viel scharfen In-Context-Lernen | 迈向计算最优化的多个热点内文体学习 2507.16217v1 |
Authors (10): Shahriar Golchin, Yanfei Chen, Rujun Han, Manan Gandhi, Tianli Yu, Swaroop Mishra, Mihai Surdeanu, Rishabh Agarwal, Chen-Yu Lee, Tomas Pfister
Long-context large language models (LLMs) are able to process inputs containing up to several million tokens. In the scope of in-context learning (ICL), this translates into using hundreds/thousands of demonstrations in the input prompt, enabling many-shot ICL. In practice, a fixed set of demonstrations is often selected at random in many-shot settings due to (1) high inference costs, (2) the benefits of caching and reusing computations, and (3) the similar performance offered by this strategy compared to others when scaled. In this work, we propose two straightforward strategies for demonstration selection in many-shot ICL that improve performance with minimal computational overhead. Our first method combines a small number of demonstrations, selected based on their similarity to each test sample, with a disproportionately larger set of random demonstrations that are cached. The second strategy improves the first by replacing random demonstrations with those selected using centroids derived from test sample representations via k-means clustering. Our experiments with Gemini Pro and Flash across several datasets indicate that our strategies consistently outperform random selection and surpass or match the most performant selection approach while supporting caching and reducing inference cost by up to an order of magnitude. We also show that adjusting the proportion of demonstrations selected based on different criteria can balance performance and inference cost in many-shot ICL.
nan
Article 513
Title@2025-07-22 (2): FedWSQ: Efficient Federated Learning with Weight Standardization and Distribution-Aware Non-Uniform Quantization
Title: FedWSQ: Efficient Federated Learning with Weight Standardization and Distribution-Aware Non-Uniform Quantization | FedWSQ: Effizientes Federated Learning mit Gewichtsstandardisierung und distributionssicherer, nicht-einheitlicher Quantisierung | FFWSQ: 节能的联邦学习,重标准化和发行软件非统一量化 2506.23516v3 |
Authors (5): Seung-Wook Kim, Seongyeol Kim, Jiah Kim, Seowon Ji, Se-Ho Lee
Federated learning (FL) often suffers from performance degradation due to key challenges such as data heterogeneity and communication constraints. To address these limitations, we present a novel FL framework called FedWSQ, which integrates weight standardization (WS) and the proposed distribution-aware non-uniform quantization (DANUQ). WS enhances FL performance by filtering out biased components in local updates during training, thereby improving the robustness of the model against data heterogeneity and unstable client participation. In addition, DANUQ minimizes quantization errors by leveraging the statistical properties of local model updates. As a result, FedWSQ significantly reduces communication overhead while maintaining superior model accuracy. Extensive experiments on FL benchmark datasets demonstrate that FedWSQ consistently outperforms existing FL methods across various challenging FL settings, including extreme data heterogeneity and ultra-low-bit communication scenarios.
nan
Article 514
Title@2025-07-22 (2): METER: Multi-modal Evidence-based Thinking and Explainable Reasoning – Algorithm and Benchmark
Title: METER: Multi-modal Evidence-based Thinking and Explainable Reasoning – Algorithm and Benchmark | METER: Multimodales Evidenzbasiertes Denken und Erklärbare Begründung – Algorithmen und Benchmark | 多式联运循证思考和可解释的理由 – – 等级和基准 2507.16206v1 |
Authors (7): Xu Yang, Qi Zhang, Shuming Jiang, Yaowen Xu, Zhaofan Zou, Hao Sun, Xuelong Li
With the rapid advancement of generative AI, synthetic content across images, videos, and audio has become increasingly realistic, amplifying the risk of misinformation. Existing detection approaches predominantly focus on binary classification while lacking detailed and interpretable explanations of forgeries, which limits their applicability in safety-critical scenarios. Moreover, current methods often treat each modality separately, without a unified benchmark for cross-modal forgery detection and interpretation. To address these challenges, we introduce METER, a unified, multi-modal benchmark for interpretable forgery detection spanning images, videos, audio, and audio-visual content. Our dataset comprises four tracks, each requiring not only real-vs-fake classification but also evidence-chain-based explanations, including spatio-temporal localization, textual rationales, and forgery type tracing. Compared to prior benchmarks, METER offers broader modality coverage and richer interpretability metrics such as spatial/temporal IoU, multi-class tracing, and evidence consistency. We further propose a human-aligned, three-stage Chain-of-Thought (CoT) training strategy combining SFT, DPO, and a novel GRPO stage that integrates a human-aligned evaluator with CoT reasoning. We hope METER will serve as a standardized foundation for advancing generalizable and interpretable forgery detection in the era of generative media.
nan
Article 515
Title@2025-07-22 (2): SVAgent: AI Agent for Hardware Security Verification Assertion
Title: SVAgent: AI Agent for Hardware Security Verification Assertion | SVAgent: KI-Agent für Hardware-Sicherheitsprüfung Assertion | AI 硬件安全核查认证代理商 2507.16203v1 |
Authors (6): Rui Guo, Avinash Ayalasomayajula, Henian Li, Jingbo Zhou, Sujan Kumar Saha, Farimah Farahmandi
Verification using SystemVerilog assertions (SVA) is one of the most popular methods for detecting circuit design vulnerabilities. However, with the globalization of integrated circuit design and the continuous upgrading of security requirements, the SVA development model has exposed major limitations. It is not only inefficient in development, but also unable to effectively deal with the increasing number of security vulnerabilities in modern complex integrated circuits. In response to these challenges, this paper proposes an innovative SVA automatic generation framework SVAgent. SVAgent introduces a requirement decomposition mechanism to transform the original complex requirements into a structured, gradually solvable fine-grained problem-solving chain. Experiments have shown that SVAgent can effectively suppress the influence of hallucinations and random answers, and the key evaluation indicators such as the accuracy and consistency of the SVA are significantly better than existing frameworks. More importantly, we successfully integrated SVAgent into the most mainstream integrated circuit vulnerability assessment framework and verified its practicality and reliability in a real engineering design environment.
nan
Article 516
Title@2025-07-22 (2): RealBench: Benchmarking Verilog Generation Models with Real-World IP Designs
Title: RealBench: Benchmarking Verilog Generation Models with Real-World IP Designs | RealBench: Benchmarking von Verilog-Generationsmodellen mit Real-World-IP-Designs | ReealBeonch:以现实世界的IP设计为标准,将风险生成模型与现实世界的IP设计作为基准 2507.16200v1 |
Authors (13): Pengwei Jin, Di Huang, Chongxiao Li, Shuyao Cheng, Yang Zhao, Xinyao Zheng, Jiaguo Zhu, Shuyi Xing, Bohan Dou, Rui Zhang, Zidong Du, Qi Guo, Xing Hu
The automatic generation of Verilog code using Large Language Models (LLMs) has garnered significant interest in hardware design automation. However, existing benchmarks for evaluating LLMs in Verilog generation fall short in replicating real-world design workflows due to their designs’ simplicity, inadequate design specifications, and less rigorous verification environments. To address these limitations, we present RealBench, the first benchmark aiming at real-world IP-level Verilog generation tasks. RealBench features complex, structured, real-world open-source IP designs, multi-modal and formatted design specifications, and rigorous verification environments, including 100% line coverage testbenches and a formal checker. It supports both module-level and system-level tasks, enabling comprehensive assessments of LLM capabilities. Evaluations on various LLMs and agents reveal that even one of the best-performing LLMs, o1-preview, achieves only a 13.3% pass@1 on module-level tasks and 0% on system-level tasks, highlighting the need for stronger Verilog generation models in the future. The benchmark is open-sourced at https://github.com/IPRC-DIP/RealBench.
nan
Article 517
Title@2025-07-22 (2): Diffusion-Modeled Reinforcement Learning for Carbon and Risk-Aware Microgrid Optimization
Title: Diffusion-Modeled Reinforcement Learning for Carbon and Risk-Aware Microgrid Optimization | Diffusionsmodelliertes Verstärkungslernen für die Optimierung von Kohlenstoff und risikobehafteten Mikrogrids | 促进碳和风险软件微型电磁优化的传播模式强化学习 2507.16867v1 |
Authors (6): Yunyi Zhao, Wei Zhang, Cheng Xiang, Hongyang Du, Dusit Niyato, Shuhua Gao
This paper introduces DiffCarl, a diffusion-modeled carbon- and risk-aware reinforcement learning algorithm for intelligent operation of multi-microgrid systems. With the growing integration of renewables and increasing system complexity, microgrid communities face significant challenges in real-time energy scheduling and optimization under uncertainty. DiffCarl integrates a diffusion model into a deep reinforcement learning (DRL) framework to enable adaptive energy scheduling under uncertainty and explicitly account for carbon emissions and operational risk. By learning action distributions through a denoising generation process, DiffCarl enhances DRL policy expressiveness and enables carbon- and risk-aware scheduling in dynamic and uncertain microgrid environments. Extensive experimental studies demonstrate that it outperforms classic algorithms and state-of-the-art DRL solutions, with 2.3-30.1% lower operational cost. It also achieves 28.7% lower carbon emissions than those of its carbon-unaware variant and reduces performance variability. These results highlight DiffCarl as a practical and forward-looking solution. Its flexible design allows efficient adaptation to different system configurations and objectives to support real-world deployment in evolving energy systems.
nan
Article 518
Title@2025-07-22 (2): Learning to Bid in Non-Stationary Repeated First-Price Auctions
Title: Learning to Bid in Non-Stationary Repeated First-Price Auctions | Lernen, in nicht-stationären wiederholten Erstpreis-Auktionen Gebot | 学习在非标准重复第一次价格拍卖中投标 2501.13358v2 |
Authors (5): Zihao Hu, Xiaoyu Fan, Yuan Yao, Jiheng Zhang, Zhengyuan Zhou
First-price auctions have recently gained significant traction in digital advertising markets, exemplified by Google’s transition from second-price to first-price auctions. Unlike in second-price auctions, where bidding one’s private valuation is a dominant strategy, determining an optimal bidding strategy in first-price auctions is more complex. From a learning perspective, the learner (a specific bidder) can interact with the environment (other bidders, i.e., opponents) sequentially to infer their behaviors. Existing research often assumes specific environmental conditions and benchmarks performance against the best fixed policy (static benchmark). While this approach ensures strong learning guarantees, the static benchmark can deviate significantly from the optimal strategy in environments with even mild non-stationarity. To address such scenarios, a dynamic benchmark–representing the sum of the highest achievable rewards at each time step–offers a more suitable objective. However, achieving no-regret learning with respect to the dynamic benchmark requires additional constraints. By inspecting reward functions in online first-price auctions, we introduce two metrics to quantify the regularity of the sequence of opponents’ highest bids, which serve as measures of non-stationarity. We provide a minimax-optimal characterization of the dynamic regret for the class of sequences of opponents’ highest bids that satisfy either of these regularity constraints. Our main technical tool is the Optimistic Mirror Descent (OMD) framework with a novel optimism configuration, which is well-suited for achieving minimax-optimal dynamic regret rates in this context. We then use synthetic datasets to validate our theoretical guarantees and demonstrate that our methods outperform existing ones.
nan
Article 519
Title@2025-07-22 (2): EBaReT: Expert-guided Bag Reward Transformer for Auto Bidding
Title: EBaReT: Expert-guided Bag Reward Transformer for Auto Bidding | EBaReT: fachkundiger Taschen-Reward-Transformator für Auto-Bidding | EBARET: 自动投标专家指导的袋奖励变换器 2507.16186v1 |
Authors (9): Kaiyuan Li, Pengyu Wang, Yunshan Peng, Pengjia Yuan, Yanxiang Zeng, Rui Xiang, Yanhua Cheng, Xialong Liu, Peng Jiang
Reinforcement learning has been widely applied in automated bidding. Traditional approaches model bidding as a Markov Decision Process (MDP). Recently, some studies have explored using generative reinforcement learning methods to address long-term dependency issues in bidding environments. Although effective, these methods typically rely on supervised learning approaches, which are vulnerable to low data quality due to the amount of sub-optimal bids and low probability rewards resulting from the low click and conversion rates. Unfortunately, few studies have addressed these challenges. In this paper, we formalize the automated bidding as a sequence decision-making problem and propose a novel Expert-guided Bag Reward Transformer (EBaReT) to address concerns related to data quality and uncertainty rewards. Specifically, to tackle data quality issues, we generate a set of expert trajectories to serve as supplementary data in the training process and employ a Positive-Unlabeled (PU) learning-based discriminator to identify expert transitions. To ensure the decision also meets the expert level, we further design a novel expert-guided inference strategy. Moreover, to mitigate the uncertainty of rewards, we consider the transitions within a certain period as a “bag” and carefully design a reward function that leads to a smoother acquisition of rewards. Extensive experiments demonstrate that our model achieves superior performance compared to state-of-the-art bidding methods.
nan
Article 520
Title@2025-07-22 (2): Balanced Image Stylization with Style Matching Score
Title: Balanced Image Stylization with Style Matching Score | Ausgeglichene Bildstilisierung mit Style Matching Score | 带有样式匹配评分的平衡图像同步化 2503.07601v2 |
Authors (6): Yuxin Jiang, Liming Jiang, Shuai Yang, Jia-Wei Liu, Ivor Tsang, Mike Zheng Shou
We present Style Matching Score (SMS), a novel optimization method for image stylization with diffusion models. Balancing effective style transfer with content preservation is a long-standing challenge. Unlike existing efforts, our method reframes image stylization as a style distribution matching problem. The target style distribution is estimated from off-the-shelf style-dependent LoRAs via carefully designed score functions. To preserve content information adaptively, we propose Progressive Spectrum Regularization, which operates in the frequency domain to guide stylization progressively from low-frequency layouts to high-frequency details. In addition, we devise a Semantic-Aware Gradient Refinement technique that leverages relevance maps derived from diffusion semantic priors to selectively stylize semantically important regions. The proposed optimization formulation extends stylization from pixel space to parameter space, readily applicable to lightweight feedforward generators for efficient one-step stylization. SMS effectively balances style alignment and content preservation, outperforming state-of-the-art approaches, verified by extensive experiments.
nan
Article 521
Title@2025-07-22 (2): Feature Construction Using Network Control Theory and Rank Encoding for Graph Machine Learning
Title: Feature Construction Using Network Control Theory and Rank Encoding for Graph Machine Learning | Feature Konstruktion mit Network Control Theorie und Rang Encoding für Graph Machine Learning | 图形机器学习使用网络控制理论和排名编码 2507.15195v2 |
Authors (6): Anwar Said, Yifan Wei, Obaid Ullah Ahmad, Mudassir Shabbir, Waseem Abbas, Xenofon Koutsoukos
In this article, we utilize the concept of average controllability in graphs, along with a novel rank encoding method, to enhance the performance of Graph Neural Networks (GNNs) in social network classification tasks. GNNs have proven highly effective in various network-based learning applications and require some form of node features to function. However, their performance is heavily influenced by the expressiveness of these features. In social networks, node features are often unavailable due to privacy constraints or the absence of inherent attributes, making it challenging for GNNs to achieve optimal performance. To address this limitation, we propose two strategies for constructing expressive node features. First, we introduce average controllability along with other centrality metrics (denoted as NCT-EFA) as node-level metrics that capture critical aspects of network topology. Building on this, we develop a rank encoding method that transforms average controllability or any other graph-theoretic metric into a fixed-dimensional feature space, thereby improving feature representation. We conduct extensive numerical evaluations using six benchmark GNN models across four social network datasets to compare different node feature construction methods. Our results demonstrate that incorporating average controllability into the feature space significantly improves GNN performance. Moreover, the proposed rank encoding method outperforms traditional one-hot degree encoding, improving the ROC AUC from 68.7% to 73.9% using GraphSAGE on the GitHub Stargazers dataset, underscoring its effectiveness in generating expressive and efficient node representations.
nan
Article 522
Title@2025-07-22 (2): A Goal-Oriented Reinforcement Learning-Based Path Planning Algorithm for Modular Self-Reconfigurable Satellites
Title: A Goal-Oriented Reinforcement Learning-Based Path Planning Algorithm for Modular Self-Reconfigurable Satellites | Ein zielorientierter Verstärkungs-Lernpfadplanungs-Algorithmus für modulare selbstkonfigurierbare Satelliten | 面向目标的加强学习学习的模块自可自配置卫星的路线图规划算法 2505.01966v2 |
Authors (4): Bofei Liu, Dong Ye, Zunhao Yao, Zhaowei Sun
Modular self-reconfigurable satellites refer to satellite clusters composed of individual modular units capable of altering their configurations. The configuration changes enable the execution of diverse tasks and mission objectives. Existing path planning algorithms for reconfiguration often suffer from high computational complexity, poor generalization capability, and limited support for diverse target configurations. To address these challenges, this paper proposes a goal-oriented reinforcement learning-based path planning algorithm. This algorithm is the first to address the challenge that previous reinforcement learning methods failed to overcome, namely handling multiple target configurations. Moreover, techniques such as Hindsight Experience Replay and Invalid Action Masking are incorporated to overcome the significant obstacles posed by sparse rewards and invalid actions. Based on these designs, our model achieves a 95% and 73% success rate in reaching arbitrary target configurations in a modular satellite cluster composed of four and six units, respectively.
nan
Article 523
Title@2025-07-22 (2): LLM Data Selection and Utilization via Dynamic Bi-level Optimization
Title: LLM Data Selection and Utilization via Dynamic Bi-level Optimization | LLM-Datenauswahl und -Verwendung über dynamische Bi-Level-Optimierung | 通过动态双级优化优化选择和利用LLM数据 2507.16178v1 |
Authors (7): Yang Yu, Kai Han, Hang Zhou, Yehui Tang, Kaiqi Huang, Yunhe Wang, Dacheng Tao
While large-scale training data is fundamental for developing capable large language models (LLMs), strategically selecting high-quality data has emerged as a critical approach to enhance training efficiency and reduce computational costs. Current data selection methodologies predominantly rely on static, training-agnostic criteria, failing to account for the dynamic model training and data interactions. In this paper, we propose a new Data Weighting Model (DWM) to adjust the weight of selected data within each batch to achieve a dynamic data utilization during LLM training. Specially, to better capture the dynamic data preference of the trained model, a bi-level optimization framework is implemented to update the weighting model. Our experiments demonstrate that DWM enhances the performance of models trained with randomly-selected data, and the learned weighting model can be transferred to enhance other data selection methods and models of different sizes. Moreover, we further analyze how a model’s data preferences evolve throughout training, providing new insights into the data preference of the model during training.
nan
Article 524
Title@2025-07-22 (2): Energy-Efficient and Real-Time Sensing for Federated Continual Learning via Sample-Driven Control
Title: Energy-Efficient and Real-Time Sensing for Federated Continual Learning via Sample-Driven Control | Energieeffizientes und Echtzeit-Sensing für ein Federated Continual Learning via Sample-Driven Control | 通过抽样分散控制为联邦持续学习提供节能实时遥感 2310.07497v2 |
Authors (7): Minh Ngoc Luu, Minh-Duong Nguyen, Ebrahim Bedeer, Van Duc Nguyen, Dinh Thai Hoang, Diep N. Nguyen, Quoc-Viet Pham
An intelligent Real-Time Sensing (RTS) system must continuously acquire, update, integrate, and apply knowledge to adapt to real-world dynamics. Managing distributed intelligence in this context requires Federated Continual Learning (FCL). However, effectively capturing the diverse characteristics of RTS data in FCL systems poses significant challenges, including severely impacting computational and communication resources, escalating energy costs, and ultimately degrading overall system performance. To overcome these challenges, we investigate how the data distribution shift from ideal to practical RTS scenarios affects Artificial Intelligence (AI) model performance by leveraging the \textit{generalization gap} concept. In this way, we can analyze how sampling time in RTS correlates with the decline in AI performance, computation cost, and communication efficiency. Based on this observation, we develop a novel Sample-driven Control for Federated Continual Learning (SCFL) technique, specifically designed for mobile edge networks with RTS capabilities. In particular, SCFL is an optimization problem that harnesses the sampling process to concurrently minimize the generalization gap and improve overall accuracy while upholding the energy efficiency of the FCL framework. To solve the highly complex and time-varying optimization problem, we introduce a new soft actor-critic algorithm with explicit and implicit constraints (A2C-EI). Our empirical experiments reveal that we can achieve higher efficiency compared to other DRL baselines. Notably, SCFL can significantly reduce energy consumption up to $85\%$ while maintaining FL convergence and timely data transmission.
nan
Article 525
Title@2025-07-22 (2): Curating Demonstrations using Online Experience
Title: Curating Demonstrations using Online Experience | Kuratierende Demonstrationen mit Online Experience | 利用在线经验治理示范活动 2503.03707v2 |
Authors (4): Annie S. Chen, Alec M. Lessing, Yuejiang Liu, Chelsea Finn
Many robot demonstration datasets contain heterogeneous demonstrations of varying quality. This heterogeneity may benefit policy pre-training, but can hinder robot performance when used with a final imitation learning objective. In particular, some strategies in the data may be less reliable than others or may be underrepresented in the data, leading to poor performance when such strategies are sampled at test time. Moreover, such unreliable or underrepresented strategies can be difficult even for people to discern, and sifting through demonstration datasets is time-consuming and costly. On the other hand, policy performance when trained on such demonstrations can reflect the reliability of different strategies. We thus propose for robots to self-curate based on online robot experience (Demo-SCORE). More specifically, we train and cross-validate a classifier to discern successful policy roll-outs from unsuccessful ones and use the classifier to filter heterogeneous demonstration datasets. Our experiments in simulation and the real world show that Demo-SCORE can effectively identify suboptimal demonstrations without manual curation. Notably, Demo-SCORE achieves over 15-35% higher absolute success rate in the resulting policy compared to the base policy trained with all original demonstrations.
nan
Article 526
Title@2025-07-22 (2): A Collaborative Framework Integrating Large Language Model and Chemical Fragment Space: Mutual Inspiration for Lead Design
Title: A Collaborative Framework Integrating Large Language Model and Chemical Fragment Space: Mutual Inspiration for Lead Design | Ein kollaborativer Rahmen für die Integration von Large Language Model und Chemical Fragment Space: Gegenseitige Inspiration für Lead Design | 整合大语言模型和化学碎片空间:铅设计相互促进 2507.13580v2 |
Authors (6): Hao Tuo, Yan Li, Xuanning Hu, Haishi Zhao, Xueyan Liu, Bo Yang
Combinatorial optimization algorithm is essential in computer-aided drug design by progressively exploring chemical space to design lead compounds with high affinity to target protein. However current methods face inherent challenges in integrating domain knowledge, limiting their performance in identifying lead compounds with novel and valid binding mode. Here, we propose AutoLeadDesign, a lead compounds design framework that inspires extensive domain knowledge encoded in large language models with chemical fragments to progressively implement efficient exploration of vast chemical space. The comprehensive experiments indicate that AutoLeadDesign outperforms baseline methods. Significantly, empirical lead design campaigns targeting two clinically relevant targets (PRMT5 and SARS-CoV-2 PLpro) demonstrate AutoLeadDesign’s competence in de novo generation of lead compounds achieving expert-competitive design efficacy. Structural analysis further confirms their mechanism-validated inhibitory patterns. By tracing the process of design, we find that AutoLeadDesign shares analogous mechanisms with fragment-based drug design which traditionally rely on the expert decision-making, further revealing why it works. Overall, AutoLeadDesign offers an efficient approach for lead compounds design, suggesting its potential utility in drug design.
nan
Article 527
Title@2025-07-22 (2): R-Bot: An LLM-based Query Rewrite System
Title: R-Bot: An LLM-based Query Rewrite System | R-Bot: Ein LLM-basiertes Abfrage-Rewrite-System | R-Bot:一个基于LLM的查询重写系统 2412.01661v2 |
Authors (6): Zhaoyan Sun, Xuanhe Zhou, Guoliang Li, Xiang Yu, Jianhua Feng, Yong Zhang
Query rewrite is essential for optimizing SQL queries to improve their execution efficiency without changing their results. Traditionally, this task has been tackled through heuristic and learning-based methods, each with its limitations in terms of inferior quality and low robustness. Recent advancements in LLMs offer a new paradigm by leveraging their superior natural language and code comprehension abilities. Despite their potential, directly applying LLMs like GPT-4 has faced challenges due to problems such as hallucinations, where the model might generate inaccurate or irrelevant results. To address this, we propose R-Bot, an LLM-based query rewrite system with a systematic approach. We first design a multi-source rewrite evidence preparation pipeline to generate query rewrite evidences for guiding LLMs to avoid hallucinations. We then propose a hybrid structure-semantics retrieval method that combines structural and semantic analysis to retrieve the most relevant rewrite evidences for effectively answering an online query. We next propose a step-by-step LLM rewrite method that iteratively leverages the retrieved evidences to select and arrange rewrite rules with self-reflection. We conduct comprehensive experiments on real-world datasets and widely used benchmarks, and demonstrate the superior performance of our system, R-Bot, surpassing state-of-the-art query rewrite methods. The R-Bot system has been deployed at Huawei and with real customers, and the results show that the proposed R-Bot system achieves lower query latency.
nan
Article 528
Title@2025-07-22 (2): Attacking interpretable NLP systems
Title: Attacking interpretable NLP systems | Angriff auf interpretierbare NLP-Systeme | 攻击可解释的NLP系统 2507.16164v1 |
Authors (4): Eldor Abdukhamidov, Tamer Abuhmed, Joanna C. S. Santos, Mohammed Abuhamad
Studies have shown that machine learning systems are vulnerable to adversarial examples in theory and practice. Where previous attacks have focused mainly on visual models that exploit the difference between human and machine perception, text-based models have also fallen victim to these attacks. However, these attacks often fail to maintain the semantic meaning of the text and similarity. This paper introduces AdvChar, a black-box attack on Interpretable Natural Language Processing Systems, designed to mislead the classifier while keeping the interpretation similar to benign inputs, thus exploiting trust in system transparency. AdvChar achieves this by making less noticeable modifications to text input, forcing the deep learning classifier to make incorrect predictions and preserve the original interpretation. We use an interpretation-focused scoring approach to determine the most critical tokens that, when changed, can cause the classifier to misclassify the input. We apply simple character-level modifications to measure the importance of tokens, minimizing the difference between the original and new text while generating adversarial interpretations similar to benign ones. We thoroughly evaluated AdvChar by testing it against seven NLP models and three interpretation models using benchmark datasets for the classification task. Our experiments show that AdvChar can significantly reduce the prediction accuracy of current deep learning models by altering just two characters on average in input samples.
nan
Article 529
Title@2025-07-22 (2): Adapt On-the-Go: Behavior Modulation for Single-Life Robot Deployment
Title: Adapt On-the-Go: Behavior Modulation for Single-Life Robot Deployment | Adapt On-the-Go: Verhaltensmodulierung für Single-Life-Roboter-Einsatz | 即时适应:单生机器人部署行为改变 2311.01059v3 |
Authors (7): Annie S. Chen, Govind Chada, Laura Smith, Archit Sharma, Zipeng Fu, Sergey Levine, Chelsea Finn
To succeed in the real world, robots must cope with situations that differ from those seen during training. We study the problem of adapting on-the-fly to such novel scenarios during deployment, by drawing upon a diverse repertoire of previouslylearned behaviors. Our approach, RObust Autonomous Modulation (ROAM), introduces a mechanism based on the perceived value of pre-trained behaviors to select and adapt pre-trained behaviors to the situation at hand. Crucially, this adaptation process all happens within a single episode at test time, without any human supervision. We demonstrate that ROAM enables a robot to adapt rapidly to changes in dynamics both in simulation and on a real Go1 quadruped, even successfully moving forward with roller skates on its feet. Our approach adapts over 2x as efficiently compared to existing methods when facing a variety of out-of-distribution situations during deployment by effectively choosing and adapting relevant behaviors on-the-fly.
nan
Article 530
Title@2025-07-22 (2): Learning Patient-Specific Spatial Biomarker Dynamics via Operator Learning for Alzheimer’s Disease Progression
Title: Learning Patient-Specific Spatial Biomarker Dynamics via Operator Learning for Alzheimer’s Disease Progression | Lernen patientenspezifische räumliche Biomarker-Dynamik über den Bediener Lernen für Alzheimer-Krankheitsfortschritt | 通过操作员学习阿尔茨海默氏病发展趋势的学习者学习特定病人空间生物标志动力学 2507.16148v1 |
Authors (4): Jindong Wang, Yutong Mao, Xiao Liu, Wenrui Hao
Alzheimer’s disease (AD) is a complex, multifactorial neurodegenerative disorder with substantial heterogeneity in progression and treatment response. Despite recent therapeutic advances, predictive models capable of accurately forecasting individualized disease trajectories remain limited. Here, we present a machine learning-based operator learning framework for personalized modeling of AD progression, integrating longitudinal multimodal imaging, biomarker, and clinical data. Unlike conventional models with prespecified dynamics, our approach directly learns patient-specific disease operators governing the spatiotemporal evolution of amyloid, tau, and neurodegeneration biomarkers. Using Laplacian eigenfunction bases, we construct geometry-aware neural operators capable of capturing complex brain dynamics. Embedded within a digital twin paradigm, the framework enables individualized predictions, simulation of therapeutic interventions, and in silico clinical trials. Applied to AD clinical data, our method achieves high prediction accuracy exceeding 90% across multiple biomarkers, substantially outperforming existing approaches. This work offers a scalable, interpretable platform for precision modeling and personalized therapeutic optimization in neurodegenerative diseases.
nan
Article 531
Title@2025-07-22 (2): Global Convergence and Rich Feature Learning in $L$-Layer Infinite-Width Neural Networks under $μ$P Parametrization
Title: Global Convergence and Rich Feature Learning in $L$-Layer Infinite-Width Neural Networks under $μ$P Parametrization | Global Convergence and Rich Feature Learning in $L$-Layer Infinite-Width Neural Networks unter $μ$P Parametrization | 全球融合和丰富地物学习,以美元-无线-网络神经网络计值,低于美元-美元-美元 2503.09565v2 |
Authors (4): Zixiang Chen, Greg Yang, Qingyue Zhao, Quanquan Gu
Despite deep neural networks’ powerful representation learning capabilities, theoretical understanding of how networks can simultaneously achieve meaningful feature learning and global convergence remains elusive. Existing approaches like the neural tangent kernel (NTK) are limited because features stay close to their initialization in this parametrization, leaving open questions about feature properties during substantial evolution. In this paper, we investigate the training dynamics of infinitely wide, $L$-layer neural networks using the tensor program (TP) framework. Specifically, we show that, when trained with stochastic gradient descent (SGD) under the Maximal Update parametrization ($\mu$P) and mild conditions on the activation function, SGD enables these networks to learn linearly independent features that substantially deviate from their initial values. This rich feature space captures relevant data information and ensures that any convergent point of the training process is a global minimum. Our analysis leverages both the interactions among features across layers and the properties of Gaussian random variables, providing new insights into deep representation learning. We further validate our theoretical findings through experiments on real-world datasets.
nan
Article 532
Title@2025-07-22 (2): Equivariant Goal Conditioned Contrastive Reinforcement Learning
Title: Equivariant Goal Conditioned Contrastive Reinforcement Learning | Gleichwertiges Ziel Conditioned Kontrastive Verstärkungslernen | 有条件的违反规定强化学习 2507.16139v1 |
Authors (4): Arsh Tangri, Nichols Crawford Taylor, Haojie Huang, Robert Platt
Contrastive Reinforcement Learning (CRL) provides a promising framework for extracting useful structured representations from unlabeled interactions. By pulling together state-action pairs and their corresponding future states, while pushing apart negative pairs, CRL enables learning nontrivial policies without manually designed rewards. In this work, we propose Equivariant CRL (ECRL), which further structures the latent space using equivariant constraints. By leveraging inherent symmetries in goal-conditioned manipulation tasks, our method improves both sample efficiency and spatial generalization. Specifically, we formally define Goal-Conditioned Group-Invariant MDPs to characterize rotation-symmetric robotic manipulation tasks, and build on this by introducing a novel rotation-invariant critic representation paired with a rotation-equivariant actor for Contrastive RL. Our approach consistently outperforms strong baselines across a range of simulated tasks in both state-based and image-based settings. Finally, we extend our method to the offline RL setting, demonstrating its effectiveness across multiple tasks.
nan
Article 533
Title@2025-07-22 (2): Aitomia: Your Intelligent Assistant for AI-Driven Atomistic and Quantum Chemical Simulations
Title: Aitomia: Your Intelligent Assistant for AI-Driven Atomistic and Quantum Chemical Simulations | Aitomia: Ihr intelligenter Assistent für KI-getriebene Atomistische und Quantum Chemical Simulationen | Aitomia:您对AI-Driven原子学和量子化学模拟的智能助理 2505.08195v3 |
Authors (6): Jinming Hu, Hassan Nawaz, Yuting Rui, Lijie Chi, Arif Ullah, Pavlo O. Dral
We have developed Aitomia - a platform powered by AI to assist in performing AI-driven atomistic and quantum chemical (QC) simulations. This evolving intelligent assistant platform is equipped with chatbots and AI agents to help experts and guide non-experts in setting up and running atomistic simulations, monitoring their computational status, analyzing simulation results, and summarizing them for the user in both textual and graphical forms. We achieve these goals by exploiting large language models that leverage the versatility of our MLatom ecosystem, supporting AI-enhanced computational chemistry tasks ranging from ground-state to excited-state calculations, including geometry optimizations, thermochemistry, and spectral calculations. The multi-agent implementation enables autonomous executions of the complex computational workflows, such as the computation of the reaction enthalpies. Aitomia is the first intelligent assistant publicly accessible online on a cloud computing platform for atomistic simulations of broad scope (Aitomistic Hub at https://aitomistic.xyz). It may also be deployed locally as described at http://mlatom.com/aitomia. Aitomia is expected to lower the barrier to performing atomistic simulations, thereby democratizing simulations and accelerating research and development in relevant fields.
nan
Article 534
Title@2025-07-22 (2): L4Q: Parameter Efficient Quantization-Aware Fine-Tuning on Large Language Models
Title: L4Q: Parameter Efficient Quantization-Aware Fine-Tuning on Large Language Models | L4Q: Parameter Effiziente Quantisierungsware Feinsteuerung bei großen Sprachmodellen | L4Q:大语言模型参数有效量化-软件精美推荐 2402.04902v6 |
Authors (3): Hyesung Jeon, Yulhwa Kim, Jae-joon Kim
Due to the high memory and computational costs associated with large language models (LLMs), model compression techniques such as quantization, which reduces inference costs, and parameter-efficient fine-tuning (PEFT) methods like Low-Rank Adaptation (LoRA), which reduce training costs, have gained significant popularity. This trend has spurred active research into quantization-aware PEFT techniques, aimed at maintaining model accuracy while minimizing memory overhead during both inference and training. Previous quantization-aware PEFT methods typically apply post-training quantization (PTQ) to pre-trained LLMs, followed by PEFT to recover accuracy loss. Meanwhile, this approach has limitations in recovering the accuracy loss. In this paper, we propose L4Q, a method that integrates Quantization-Aware Training (QAT) with LoRA. By employing a memory-optimized layer design, L4Q significantly reduces QAT’s memory overhead, making its training cost comparable to LoRA, while preserving the advantage of QAT in producing fully quantized LLMs with high accuracy. Our experiments demonstrate that this combined approach to quantization and fine-tuning achieves superior accuracy compared to decoupled fine-tuning schemes, particularly in 4-bit and 3-bit quantization, positioning L4Q as an efficient QAT solution. Using the LLaMA and Mistral models with instructional datasets, we showcase L4Q’s capabilities in language tasks and few-shot learning.
nan
Article 535
Title@2025-07-21 (1): Expert-Guided LLM Reasoning for Battery Discovery: From AI-Driven Hypothesis to Synthesis and Characterization
Title: Expert-Guided LLM Reasoning for Battery Discovery: From AI-Driven Hypothesis to Synthesis and Characterization | Expertengeführte LLM-Gründung für die Batterieentdeckung: Von der KI-getriebenen Hypothese zur Synthese und Charakterisierung | 电池发现原因:从AI-Driven假说到合成和特性 2507.16110v1 |
Authors (6): Shengchao Liu, Hannan Xu, Yan Ai, Huanxin Li, Yoshua Bengio, Harry Guo
Large language models (LLMs) leverage chain-of-thought (CoT) techniques to tackle complex problems, representing a transformative breakthrough in artificial intelligence (AI). However, their reasoning capabilities have primarily been demonstrated in solving math and coding problems, leaving their potential for domain-specific applications-such as battery discovery-largely unexplored. Inspired by the idea that reasoning mirrors a form of guided search, we introduce ChatBattery, a novel agentic framework that integrates domain knowledge to steer LLMs toward more effective reasoning in materials design. Using ChatBattery, we successfully identify, synthesize, and characterize three novel lithium-ion battery cathode materials, which achieve practical capacity improvements of 28.8%, 25.2%, and 18.5%, respectively, over the widely used cathode material, LiNi0.8Mn0.1Co0.1O2 (NMC811). Beyond this discovery, ChatBattery paves a new path by showing a successful LLM-driven and reasoning-based platform for battery materials invention. This complete AI-driven cycle-from design to synthesis to characterization-demonstrates the transformative potential of AI-driven reasoning in revolutionizing materials discovery.
nan
Article 536
Title@2025-07-21 (1): Recursive Equations For Imputation Of Missing Not At Random Data With Sparse Pattern Support
Title: Recursive Equations For Imputation Of Missing Not At Random Data With Sparse Pattern Support | Rekursive Gleichungen für die Imputation von fehlenden nicht zufälligen Daten mit Sparse Pattern Support | 支持简化模式支持的非随机数据失踪的计算结果的递归等量 2507.16107v1 |
Authors (4): Trung Phung, Kyle Reese, Ilya Shpitser, Rohit Bhattacharya
A common approach for handling missing values in data analysis pipelines is multiple imputation via software packages such as MICE (Van Buuren and Groothuis-Oudshoorn, 2011) and Amelia (Honaker et al., 2011). These packages typically assume the data are missing at random (MAR), and impose parametric or smoothing assumptions upon the imputing distributions in a way that allows imputation to proceed even if not all missingness patterns have support in the data. Such assumptions are unrealistic in practice, and induce model misspecification bias on any analysis performed after such imputation. In this paper, we provide a principled alternative. Specifically, we develop a new characterization for the full data law in graphical models of missing data. This characterization is constructive, is easily adapted for the calculation of imputation distributions for both MAR and MNAR (missing not at random) mechanisms, and is able to handle lack of support for certain patterns of missingness. We use this characterization to develop a new imputation algorithm – Multivariate Imputation via Supported Pattern Recursion (MISPR) – which uses Gibbs sampling, by analogy with the Multivariate Imputation with Chained Equations (MICE) algorithm, but which is consistent under both MAR and MNAR settings, and is able to handle missing data patterns with no support without imposing additional assumptions beyond those already imposed by the missing data model itself. In simulations, we show MISPR obtains comparable results to MICE when data are MAR, and superior, less biased results when data are MNAR. Our characterization and imputation algorithm based on it are a step towards making principled missing data methods more practical in applied settings, where the data are likely both MNAR and sufficiently high dimensional to yield missing data patterns with no support at available sample sizes.
nan
Article 537
Title@2025-07-21 (1): Analysis of the 2024 BraTS Meningioma Radiotherapy Planning Automated Segmentation Challenge
Title: Analysis of the 2024 BraTS Meningioma Radiotherapy Planning Automated Segmentation Challenge | Analyse der Strahlentherapie 2024 BraTS Meningiom Planung Automatisierte Segmentierung Herausforderung | 分析2024年BRATS Meningioma辐射治疗规划自动化分割挑战 2405.18383v3 |
Authors (105): Dominic LaBella, Valeriia Abramova, Mehdi Astaraki, Andre Ferreira, Zhifan Jiang, Mason C. Cleveland, Ramandeep Kang, Uma M. Lal-Trehan Estrada, Cansu Yalcin, Rachika E. Hamadache, Clara Lisazo, Adrià Casamitjana, Joaquim Salvi, Arnau Oliver, Xavier Lladó, Iuliana Toma-Dasu, Tiago Jesus, Behrus Puladi, Jens Kleesiek, Victor Alves, Jan Egger, Daniel Capellán-Martín, Abhijeet Parida, Austin Tapp, Xinyang Liu, Maria J. Ledesma-Carbayo, Jay B. Patel, Thomas N. McNeal, Maya Viera, Owen McCall, Albert E. Kim, Elizabeth R. Gerstner, Christopher P. Bridge, Katherine Schumacher, Michael Mix, Kevin Leu, Shan McBurney-Lin, Pierre Nedelec, Javier Villanueva-Meyer, David R. Raleigh, Jonathan Shapey, Tom Vercauteren, Kazumi Chia, Marina Ivory, Theodore Barfoot, Omar Al-Salihi, Justin Leu, Lia M. Halasz, Yuri S. Velichko, Chunhao Wang, John P. Kirkpatrick, Scott R. Floyd, Zachary J. Reitman, Trey C. Mullikin, Eugene J. Vaios, Christina Huang, Ulas Bagci, Sean Sachdev, Jona A. Hattangadi-Gluth, Tyler M. Seibert, Nikdokht Farid, Connor Puett, Matthew W. Pease, Kevin Shiue, Syed Muhammad Anwar, Shahriar Faghani, Peter Taylor, Pranav Warman, Jake Albrecht, András Jakab, Mana Moassefi, Verena Chung, Rong Chai, Alejandro Aristizabal, Alexandros Karargyris, Hasan Kassem, Sarthak Pati, Micah Sheller, Nazanin Maleki, Rachit Saluja, Florian Kofler, Christopher G. Schwarz, Philipp Lohmann, Phillipp Vollmuth, Louis Gagnon, Maruf Adewole, Hongwei Bran Li, Anahita Fathi Kazerooni, Nourel Hoda Tahon, Udunna Anazodo, Ahmed W. Moawad, Bjoern Menze, Marius George Linguraru, Mariam Aboian, Benedikt Wiestler, Ujjwal Baid, Gian-Marco Conte, Andreas M. Rauschecker, Ayman Nada, Aly H. Abayazeed, Raymond Huang, Maria Correia de Verdier, Jeffrey D. Rudie, Spyridon Bakas, Evan Calabrese
The 2024 Brain Tumor Segmentation Meningioma Radiotherapy (BraTS-MEN-RT) challenge aimed to advance automated segmentation algorithms using the largest known multi-institutional dataset of 750 radiotherapy planning brain MRIs with expert-annotated target labels for patients with intact or postoperative meningioma that underwent either conventional external beam radiotherapy or stereotactic radiosurgery. Each case included a defaced 3D post-contrast T1-weighted radiotherapy planning MRI in its native acquisition space, accompanied by a single-label “target volume” representing the gross tumor volume (GTV) and any at-risk post-operative site. Target volume annotations adhered to established radiotherapy planning protocols, ensuring consistency across cases and institutions, and were approved by expert neuroradiologists and radiation oncologists. Six participating teams developed, containerized, and evaluated automated segmentation models using this comprehensive dataset. Team rankings were assessed using a modified lesion-wise Dice Similarity Coefficient (DSC) and 95% Hausdorff Distance (95HD). The best reported average lesion-wise DSC and 95HD was 0.815 and 26.92 mm, respectively. BraTS-MEN-RT is expected to significantly advance automated radiotherapy planning by enabling precise tumor segmentation and facilitating tailored treatment, ultimately improving patient outcomes. We describe the design and results from the BraTS-MEN-RT challenge.
nan
Article 538
Title@2025-07-21 (1): TorchAO: PyTorch-Native Training-to-Serving Model Optimization
Title: TorchAO: PyTorch-Native Training-to-Serving Model Optimization | TorchAO: PyTorch-Native Training-to-Serving Modelloptimierung | 火炬 – – 火炬 – – 火炬 – – 火炬 – – 培训到服务模式优化模式 2507.16099v1 |
Authors (13): Andrew Or, Apurva Jain, Daniel Vega-Myhre, Jesse Cai, Charles David Hernandez, Zhenrui Zheng, Driss Guessous, Vasiliy Kuznetsov, Christian Puhrsch, Mark Saroufim, Supriya Rao, Thien Tran, Aleksandar Samardžić
We present TorchAO, a PyTorch-native model optimization framework leveraging quantization and sparsity to provide an end-to-end, training-to-serving workflow for AI models. TorchAO supports a variety of popular model optimization techniques, including FP8 quantized training, quantization-aware training (QAT), post-training quantization (PTQ), and 2:4 sparsity, and leverages a novel tensor subclass abstraction to represent a variety of widely-used, backend agnostic low precision data types, including INT4, INT8, FP8, MXFP4, MXFP6, and MXFP8. TorchAO integrates closely with the broader ecosystem at each step of the model optimization pipeline, from pre-training (TorchTitan) to fine-tuning (TorchTune, Axolotl) to serving (HuggingFace, vLLM, SGLang, ExecuTorch), connecting an otherwise fragmented space in a single, unified workflow. TorchAO has enabled recent launches of the quantized Llama 3.2 1B/3B and LlamaGuard3-8B models and is open-source at https://github.com/pytorch/ao/.
nan
Article 539
Title@2025-07-21 (1): DP-TLDM: Differentially Private Tabular Latent Diffusion Model
Title: DP-TLDM: Differentially Private Tabular Latent Diffusion Model | DP-TLDM: Differential Private Tabular Latent Diffusion Model | DP-TLDM:有区别的私人制表式冷流传播模型 2403.07842v2 |
Authors (5): Chaoyi Zhu, Jiayi Tang, Juan F. Pérez, Marten van Dijk, Lydia Y. Chen
Synthetic data from generative models emerges as the privacy-preserving data sharing solution. Such a synthetic data set shall resemble the original data without revealing identifiable private information. Till date, the prior focus on limited types of tabular synthesizers and a small number of privacy attacks, particularly on Generative Adversarial Networks, and overlooks membership inference attacks and defense strategies, i.e., differential privacy. Motivated by the conundrum of keeping high data quality and low privacy risk of synthetic data tables, we propose DPTLDM, Differentially Private Tabular Latent Diffusion Model, which is composed of an autoencoder network to encode the tabular data and a latent diffusion model to synthesize the latent tables. Following the emerging f-DP framework, we apply DP-SGD to train the auto-encoder in combination with batch clipping and use the separation value as the privacy metric to better capture the privacy gain from DP algorithms. Our empirical evaluation demonstrates that DPTLDM is capable of achieving a meaningful theoretical privacy guarantee while also significantly enhancing the utility of synthetic data. Specifically, compared to other DP-protected tabular generative models, DPTLDM improves the synthetic quality by an average of 35% in data resemblance, 15% in the utility for downstream tasks, and 50% in data discriminability, all while preserving a comparable level of privacy risk.
nan
Article 540
Title@2025-07-21 (1): Reinforcement Learning in hyperbolic space for multi-step reasoning
Title: Reinforcement Learning in hyperbolic space for multi-step reasoning | Verstärkung Lernen im hyperbolischen Raum für mehrstufiges Denken | 用于多步推理的双曲空间强化学习 2507.16864v1 |
Authors (3): Tao Xu, Dung-Yang Lee, Momiao Xiong
Multi-step reasoning is a fundamental challenge in artificial intelligence, with applications ranging from mathematical problem-solving to decision-making in dynamic environments. Reinforcement Learning (RL) has shown promise in enabling agents to perform multi-step reasoning by optimizing long-term rewards. However, conventional RL methods struggle with complex reasoning tasks due to issues such as credit assignment, high-dimensional state representations, and stability concerns. Recent advancements in Transformer architectures and hyperbolic geometry have provided novel solutions to these challenges. This paper introduces a new framework that integrates hyperbolic Transformers into RL for multi-step reasoning. The proposed approach leverages hyperbolic embeddings to model hierarchical structures effectively. We present theoretical insights, algorithmic details, and experimental results that include Frontier Math and nonlinear optimal control problems. Compared to RL with vanilla transformer, the hyperbolic RL largely improves accuracy by (32%~44%) on FrontierMath benchmark, (43%~45%) on nonlinear optimal control benchmark, while achieving impressive reduction in computational time by (16%~32%) on FrontierMath benchmark, (16%~17%) on nonlinear optimal control benchmark. Our work demonstrates the potential of hyperbolic Transformers in reinforcement learning, particularly for multi-step reasoning tasks that involve hierarchical structures.
nan
Article 541
Title@2025-07-21 (1): Audio Geolocation: A Natural Sounds Benchmark
Title: Audio Geolocation: A Natural Sounds Benchmark | Audio Geolocation: Ein natürlicher Klang Benchmark | 音频地理定位:自然声音基准 2505.18726v2 |
Authors (4): Mustafa Chasmai, Wuao Liu, Subhransu Maji, Grant Van Horn
Can we determine someone’s geographic location purely from the sounds they hear? Are acoustic signals enough to localize within a country, state, or even city? We tackle the challenge of global-scale audio geolocation, formalize the problem, and conduct an in-depth analysis with wildlife audio from the iNatSounds dataset. Adopting a vision-inspired approach, we convert audio recordings to spectrograms and benchmark existing image geolocation techniques. We hypothesize that species vocalizations offer strong geolocation cues due to their defined geographic ranges and propose an approach that integrates species range prediction with retrieval-based geolocation. We further evaluate whether geolocation improves when analyzing species-rich recordings or when aggregating across spatiotemporal neighborhoods. Finally, we introduce case studies from movies to explore multimodal geolocation using both audio and visual content. Our work highlights the advantages of integrating audio and visual cues, and sets the stage for future research in audio geolocation.
nan
Article 542
Title@2025-07-21 (1): Feature Selection and Junta Testing are Statistically Equivalent
Title: Feature Selection and Junta Testing are Statistically Equivalent | Feature Selection und Junta-Tests sind statistisch gleichwertig | 特征选择和 Junta 测试为统计等值 2505.04604v2 |
Authors (3): Lorenzo Beretta, Nathaniel Harms, Caleb Koch
For a function $f \colon {0,1}^n \to {0,1}$, the junta testing problem asks whether $f$ depends on only $k$ variables. If $f$ depends on only $k$ variables, the feature selection problem asks to find those variables. We prove that these two tasks are statistically equivalent. Specifically, we show that the ``brute-force’’ algorithm, which checks for any set of $k$ variables consistent with the sample, is simultaneously sample-optimal for both problems, and the optimal sample size is [ \Theta\left(\frac 1 \varepsilon \left( \sqrt{2^k \log {n \choose k}} + \log {n \choose k}\right)\right). ]
nan
Article 543
Title@2025-07-21 (1): Efficient Compositional Multi-tasking for On-device Large Language Models
Title: Efficient Compositional Multi-tasking for On-device Large Language Models | Effizientes kompositorisches Multi-Tasking für On-Device große Sprachmodelle | 内部设计大型语言模型的高效组成多任务 2507.16083v1 |
Authors (6): Ondrej Bohdal, Mete Ozay, Jijoong Moon, Kyeng-Hun Lee, Hyeonmok Ko, Umberto Michieli
Adapter parameters provide a mechanism to modify the behavior of machine learning models and have gained significant popularity in the context of large language models (LLMs) and generative AI. These parameters can be merged to support multiple tasks via a process known as task merging. However, prior work on merging in LLMs, particularly in natural language processing, has been limited to scenarios where each test example addresses only a single task. In this paper, we focus on on-device settings and study the problem of text-based compositional multi-tasking, where each test example involves the simultaneous execution of multiple tasks. For instance, generating a translated summary of a long text requires solving both translation and summarization tasks concurrently. To facilitate research in this setting, we propose a benchmark comprising four practically relevant compositional tasks. We also present an efficient method (Learnable Calibration) tailored for on-device applications, where computational resources are limited, emphasizing the need for solutions that are both resource-efficient and high-performing. Our contributions lay the groundwork for advancing the capabilities of LLMs in real-world multi-tasking scenarios, expanding their applicability to complex, resource-constrained use cases.
nan
Article 544
Title@2025-07-21 (1): Randomization Can Reduce Both Bias and Variance: A Case Study in Random Forests
Title: Randomization Can Reduce Both Bias and Variance: A Case Study in Random Forests | Randomisierung kann sowohl Bias als auch Varianz reduzieren: Eine Fallstudie in Random Forests | 随机性可减少偏见和差异:随机森林案例研究 2402.12668v4 |
Authors (2): Brian Liu, Rahul Mazumder
We study the often overlooked phenomenon, first noted in \cite{breiman2001random}, that random forests appear to reduce bias compared to bagging. Motivated by an interesting paper by \cite{mentch2020randomization}, where the authors explain the success of random forests in low signal-to-noise ratio (SNR) settings through regularization, we explore how random forests can capture patterns in the data that bagging ensembles fail to capture. We empirically demonstrate that in the presence of such patterns, random forests reduce bias along with variance and can increasingly outperform bagging ensembles when SNR is high. Our observations offer insights into the real-world success of random forests across a range of SNRs and enhance our understanding of the difference between random forests and bagging ensembles. Our investigations also yield practical insights into the importance of tuning $mtry$ in random forests.
nan
Article 545
Title@2025-07-21 (1): A Lower Bound for the Number of Linear Regions of Ternary ReLU Regression Neural Networks
Title: A Lower Bound for the Number of Linear Regions of Ternary ReLU Regression Neural Networks | Eine niedrigere Grenze für die Anzahl der linearen Regionen der Ternary ReLU Regressions-Neural-Netzwerke | Ternary ReLU后退神经网络线性区域数目的下界宽度 2507.16079v1 |
Authors (3): Yuta Nakahara, Manabu Kobayashi, Toshiyasu Matsushima
With the advancement of deep learning, reducing computational complexity and memory consumption has become a critical challenge, and ternary neural networks (NNs) that restrict parameters to ${-1, 0, +1}$ have attracted attention as a promising approach. While ternary NNs demonstrate excellent performance in practical applications such as image recognition and natural language processing, their theoretical understanding remains insufficient. In this paper, we theoretically analyze the expressivity of ternary NNs from the perspective of the number of linear regions. Specifically, we evaluate the number of linear regions of ternary regression NNs with Rectified Linear Unit (ReLU) for activation functions and prove that the number of linear regions increases polynomially with respect to network width and exponentially with respect to depth, similar to standard NNs. Moreover, we show that it suffices to either square the width or double the depth of ternary NNs to achieve a lower bound on the maximum number of linear regions comparable to that of general ReLU regression NNs. This provides a theoretical explanation, in some sense, for the practical success of ternary NNs.
nan
Article 546
Title@2025-07-21 (1): AI-driven Orchestration at Scale: Estimating Service Metrics on National-Wide Testbeds
Title: AI-driven Orchestration at Scale: Estimating Service Metrics on National-Wide Testbeds | KI-getriebene Orchestrierung im Maßstab: Bewertung von Service-Metriken auf national-breiten Testbeds | AI驱动的缩放式手动手动手动手动手动手动手动手动手动手动手动手动手动手动手动手动手动手动手动手动手动手动手动手动手动手动手动手动手动手动手动手动手动手动手动手动手动手动手动手动手动手动手动手动手动手动手动手动手动手动手动手动手动手动手动手动手动手动手动手动手动手动手动手动手动手动手动手动手动手动手动手 2507.16077v1 |
Authors (5): Rodrigo Moreira, Rafael Pasquini, Joberto S. B. Martins, Tereza C. Carvalho, Flávio de Oliveira Silva
Network Slicing (NS) realization requires AI-native orchestration architectures to efficiently and intelligently handle heterogeneous user requirements. To achieve this, network slicing is evolving towards a more user-centric digital transformation, focusing on architectures that incorporate native intelligence to enable self-managed connectivity in an integrated and isolated manner. However, these initiatives face the challenge of validating their results in production environments, particularly those utilizing ML-enabled orchestration, as they are often tested in local networks or laboratory simulations. This paper proposes a large-scale validation method using a network slicing prediction model to forecast latency using Deep Neural Networks (DNNs) and basic ML algorithms embedded within an NS architecture, evaluated in real large-scale production testbeds. It measures and compares the performance of different DNNs and ML algorithms, considering a distributed database application deployed as a network slice over two large-scale production testbeds. The investigation highlights how AI-based prediction models can enhance network slicing orchestration architectures and presents a seamless, production-ready validation method as an alternative to fully controlled simulations or laboratory setups.
nan
Article 547
Title@2025-07-21 (1): Exploring How Generative MLLMs Perceive More Than CLIP with the Same Vision Encoder
Title: Exploring How Generative MLLMs Perceive More Than CLIP with the Same Vision Encoder | Erforschen, wie Generative MLLMs mehr als CLIP mit dem gleichen Vision Encoder wahrnehmen | 使用相同的愿景编码器探索如何产生比 CLIP 更远的多见性大型LLMs 2411.05195v3 |
Authors (3): Siting Li, Pang Wei Koh, Simon Shaolei Du
Recent research has shown that CLIP models struggle with visual reasoning tasks that require grounding compositionality, understanding spatial relationships, or capturing fine-grained details. One natural hypothesis is that the CLIP vision encoder does not embed essential information for these tasks. However, we find that this is not always the case: The encoder gathers query-relevant visual information, while CLIP fails to extract it. In particular, we show that another branch of Vision-Language Models (VLMs), Generative Multimodal Large Language Models (MLLMs), achieve significantly higher accuracy than CLIP in many of these tasks using the same vision encoder and weights, indicating that these Generative MLLMs perceive more – as they extract and utilize visual information more effectively. We conduct a series of controlled experiments and reveal that their success is attributed to multiple key design choices, including patch tokens, position embeddings, and prompt-based weighting. On the other hand, enhancing the training data alone or applying a stronger text encoder does not suffice to solve the task, and additional text tokens offer little benefit. Interestingly, we find that fine-grained visual reasoning is not exclusive to generative models trained by an autoregressive loss: When converted into CLIP-like encoders by contrastive finetuning, these MLLMs still outperform CLIP under the same cosine similarity-based evaluation protocol. Our study highlights the importance of VLM architectural choices and suggests directions for improving the performance of CLIP-like contrastive VLMs.
nan
Article 548
Title@2025-07-21 (1): Antibiotic Resistance Microbiology Dataset (ARMD): A Resource for Antimicrobial Resistance from EHRs
Title: Antibiotic Resistance Microbiology Dataset (ARMD): A Resource for Antimicrobial Resistance from EHRs | Antibiotikaresistenz Mikrobiologie Datensatz (ARMD): Eine Ressource für antimikrobielle Resistenz von EHRs | 抗生素抗药性微生物生物学数据集(ARMD):EHR的抗微生物抗药性资源 2503.07664v2 |
Authors (16): Fateme Nateghi Haredasht, Fatemeh Amrollahi, Manoj Maddali, Nicholas Marshall, Stephen P. Ma, Lauren N. Cooper, Andrew O. Johnson, Ziming Wei, Richard J. Medford, Sanjat Kanjilal, Niaz Banaei, Stanley Deresinski, Mary K. Goldstein, Steven M. Asch, Amy Chang, Jonathan H. Chen
The Antibiotic Resistance Microbiology Dataset (ARMD) is a de-identified resource derived from electronic health records (EHR) that facilitates research in antimicrobial resistance (AMR). ARMD encompasses big data from adult patients collected from over 15 years at two academic-affiliated hospitals, focusing on microbiological cultures, antibiotic susceptibilities, and associated clinical and demographic features. Key attributes include organism identification, susceptibility patterns for 55 antibiotics, implied susceptibility rules, and de-identified patient information. This dataset supports studies on antimicrobial stewardship, causal inference, and clinical decision-making. ARMD is designed to be reusable and interoperable, promoting collaboration and innovation in combating AMR. This paper describes the dataset’s acquisition, structure, and utility while detailing its de-identification process.
nan
Article 549
Title@2025-07-21 (1): Manifold Learning with Normalizing Flows: Towards Regularity, Expressivity and Iso-Riemannian Geometry
Title: Manifold Learning with Normalizing Flows: Towards Regularity, Expressivity and Iso-Riemannian Geometry | Manifold Learning mit normalisierenden Strömungen: Auf dem Weg zu Regelmäßigkeit, Expressivität und iso-Riemannsche Geometrie | 以正常流动方式进行多重学习:走向规律、直观和Iso-Riemannian 几何 2505.08087v2 |
Authors (2): Willem Diepeveen, Deanna Needell
Modern machine learning increasingly leverages the insight that high-dimensional data often lie near low-dimensional, non-linear manifolds, an idea known as the manifold hypothesis. By explicitly modeling the geometric structure of data through learning Riemannian geometry algorithms can achieve improved performance and interpretability in tasks like clustering, dimensionality reduction, and interpolation. In particular, learned pullback geometry has recently undergone transformative developments that now make it scalable to learn and scalable to evaluate, which further opens the door for principled non-linear data analysis and interpretable machine learning. However, there are still steps to be taken when considering real-world multi-modal data. This work focuses on addressing distortions and modeling errors that can arise in the multi-modal setting and proposes to alleviate both challenges through isometrizing the learned Riemannian structure and balancing regularity and expressivity of the diffeomorphism parametrization. We showcase the effectiveness of the synergy of the proposed approaches in several numerical experiments with both synthetic and real data.
nan
Article 550
Title@2025-07-21 (1): Interpreting CFD Surrogates through Sparse Autoencoders
Title: Interpreting CFD Surrogates through Sparse Autoencoders | Verdolmetschen von CFD Surrogats durch Sparse Autoencoder | 通过Sparse Autoencolders解释 CFD 代理代理 2507.16069v1 |
Authors (2): Yeping Hu, Shusen Liu
Learning-based surrogate models have become a practical alternative to high-fidelity CFD solvers, but their latent representations remain opaque and hinder adoption in safety-critical or regulation-bound settings. This work introduces a posthoc interpretability framework for graph-based surrogate models used in computational fluid dynamics (CFD) by leveraging sparse autoencoders (SAEs). By obtaining an overcomplete basis in the node embedding space of a pretrained surrogate, the method extracts a dictionary of interpretable latent features. The approach enables the identification of monosemantic concepts aligned with physical phenomena such as vorticity or flow structures, offering a model-agnostic pathway to enhance explainability and trustworthiness in CFD applications.
nan
Article 551
Title@2025-07-21 (1): Erasing Conceptual Knowledge from Language Models
Title: Erasing Conceptual Knowledge from Language Models | Auslöschen von konzeptionellen Kenntnissen aus Sprachmodellen | 将概念知识从语言模式中除去 2410.02760v3 |
Authors (4): Rohit Gandikota, Sheridan Feucht, Samuel Marks, David Bau
In this work, we introduce Erasure of Language Memory (ELM), a principled approach to concept-level unlearning that operates by matching distributions defined by the model’s own introspective classification capabilities. Our key insight is that effective unlearning should leverage the model’s ability to evaluate its own knowledge, using the language model itself as a classifier to identify and reduce the likelihood of generating content related to undesired concepts. ELM applies this framework to create targeted low-rank updates that reduce generation probabilities for concept-specific content while preserving the model’s broader capabilities. We demonstrate ELM’s efficacy on biosecurity, cybersecurity, and literary domain erasure tasks. Comparative evaluation reveals that ELM-modified models achieve near-random performance on assessments targeting erased concepts, while simultaneously preserving generation coherence, maintaining benchmark performance on unrelated tasks, and exhibiting strong robustness to adversarial attacks. Our code, data, and trained models are available at https://elm.baulab.info
nan
Article 552
Title@2025-07-21 (1): Is memory all you need? Data-driven Mori-Zwanzig modeling of Lagrangian particle dynamics in turbulent flows
Title: Is memory all you need? Data-driven Mori-Zwanzig modeling of Lagrangian particle dynamics in turbulent flows | Ist Gedächtnis alles, was Sie brauchen? Datengesteuerte Mori-Zwanzig Modellierung der lagrangischen Teilchendynamik in turbulenten Strömungen | 数据驱动的Mori- Zwanzig 模拟在动荡中流动的拉格朗江粒子动态。 2507.16058v1 |
Authors (6): Xander de Wit, Alessandro Gabbana, Michael Woodward, Yen Ting Lin, Federico Toschi, Daniel Livescu
The dynamics of Lagrangian particles in turbulence play a crucial role in mixing, transport, and dispersion processes in complex flows. Their trajectories exhibit highly non-trivial statistical behavior, motivating the development of surrogate models that can reproduce these trajectories without incurring the high computational cost of direct numerical simulations of the full Eulerian field. This task is particularly challenging because reduced-order models typically lack access to the full set of interactions with the underlying turbulent field. Novel data-driven machine learning techniques can be very powerful in capturing and reproducing complex statistics of the reduced-order/surrogate dynamics. In this work, we show how one can learn a surrogate dynamical system that is able to evolve a turbulent Lagrangian trajectory in a way that is point-wise accurate for short-time predictions (with respect to Kolmogorov time) and stable and statistically accurate at long times. This approach is based on the Mori–Zwanzig formalism, which prescribes a mathematical decomposition of the full dynamical system into resolved dynamics that depend on the current state and the past history of a reduced set of observables and the unresolved orthogonal dynamics due to unresolved degrees of freedom of the initial state. We show how by training this reduced order model on a point-wise error metric on short time-prediction, we are able to correctly learn the dynamics of the Lagrangian turbulence, such that also the long-time statistical behavior is stably recovered at test time. This opens up a range of new applications, for example, for the control of active Lagrangian agents in turbulence.
nan
Article 553
Title@2025-07-21 (1): Radiological and Biological Dictionary of Radiomics Features: Addressing Understandable AI Issues in Personalized Breast Cancer; Dictionary Version BM1.0
Title: Radiological and Biological Dictionary of Radiomics Features: Addressing Understandable AI Issues in Personalized Breast Cancer; Dictionary Version BM1.0 | Radiologisches und Biologisches Wörterbuch der Radiomik Features: Adressierung verständlicher KI-Probleme in Personalisierte Brustkrebs; Wörterbuch Version BM1.0 | 放射特征的辐射和生物词典:解决个人化乳腺癌中可理解的AI问题;字典版BM1.0。 2507.16041v1 |
Authors (7): Arman Gorji, Nima Sanati, Amir Hossein Pouria, Somayeh Sadat Mehrnia, Ilker Hacihaliloglu, Arman Rahmim, Mohammad R. Salmanpour
Radiomics-based AI models show promise for breast cancer diagnosis but often lack interpretability, limiting clinical adoption. This study addresses the gap between radiomic features (RF) and the standardized BI-RADS lexicon by proposing a dual-dictionary framework. First, a Clinically-Informed Feature Interpretation Dictionary (CIFID) was created by mapping 56 RFs to BI-RADS descriptors (shape, margin, internal enhancement) through literature and expert review. The framework was applied to classify triple-negative breast cancer (TNBC) versus non-TNBC using dynamic contrast-enhanced MRI from a multi-institutional cohort of 1,549 patients. We trained 27 machine learning classifiers with 27 feature selection methods. SHapley Additive exPlanations (SHAP) were used to interpret predictions and generate a complementary Data-Driven Feature Interpretation Dictionary (DDFID) for 52 additional RFs. The best model, combining Variance Inflation Factor (VIF) selection with Extra Trees Classifier, achieved an average cross-validation accuracy of 0.83. Key predictive RFs aligned with clinical knowledge: higher Sphericity (round/oval shape) and lower Busyness (more homogeneous enhancement) were associated with TNBC. The framework confirmed known imaging biomarkers and uncovered novel, interpretable associations. This dual-dictionary approach (BM1.0) enhances AI model transparency and supports the integration of RFs into routine breast cancer diagnosis and personalized care.
nan
Article 554
Title@2025-07-21 (1): Reactivation: Empirical NTK Dynamics Under Task Shifts
Title: Reactivation: Empirical NTK Dynamics Under Task Shifts | Reaktivierung: Empirische NTK-Dynamik unter Aufgabenverschiebungen | 重新激活: 任务变换下的NTK实证动态 2507.16039v1 |
Authors (5): Yuzhi Liu, Zixuan Chen, Zirui Zhang, Yufei Liu, Giulia Lanzillotta
The Neural Tangent Kernel (NTK) offers a powerful tool to study the functional dynamics of neural networks. In the so-called lazy, or kernel regime, the NTK remains static during training and the network function is linear in the static neural tangents feature space. The evolution of the NTK during training is necessary for feature learning, a key driver of deep learning success. The study of the NTK dynamics has led to several critical discoveries in recent years, in generalization and scaling behaviours. However, this body of work has been limited to the single task setting, where the data distribution is assumed constant over time. In this work, we present a comprehensive empirical analysis of NTK dynamics in continual learning, where the data distribution shifts over time. Our findings highlight continual learning as a rich and underutilized testbed for probing the dynamics of neural training. At the same time, they challenge the validity of static-kernel approximations in theoretical treatments of continual learning, even at large scale.
nan
Article 555
Title@2025-07-21 (1): Autocomp: LLM-Driven Code Optimization for Tensor Accelerators
Title: Autocomp: LLM-Driven Code Optimization for Tensor Accelerators | Autocomp: LLM-gesteuerte Code-Optimierung für Tensor-Beschleuniger | 自动comp: LLM- Driven 代码对 Tensor 加速器的优化 2505.18574v3 |
Authors (4): Charles Hong, Sahil Bhatia, Alvin Cheung, Yakun Sophia Shao
Hardware accelerators, especially those designed for tensor processing, have become ubiquitous in today’s computing landscape. However, even with significant efforts in building compilers, programming these tensor accelerators remains challenging, leaving much of their potential underutilized. Recently, large language models (LLMs), trained on large amounts of code, have shown significant promise in code generation and optimization tasks, but generating low-resource languages like specialized tensor accelerator code still poses a significant challenge. We tackle this challenge with Autocomp, an approach that empowers accelerator programmers to leverage domain knowledge and hardware feedback to optimize code via an automated LLM-driven search. We accomplish this by: 1) formulating each optimization pass as a structured two-phase prompt, divided into planning and code generation phases, 2) inserting domain knowledge during planning via a concise and adaptable optimization menu, and 3) integrating correctness and performance metrics from hardware as feedback at each search iteration. Across three categories of representative workloads and two different accelerators, we demonstrate that Autocomp-optimized code runs 5.6x (GEMM) and 2.7x (convolution) faster than the vendor-provided library, and outperforms expert-level hand-tuned code by 1.4x (GEMM), 1.1x (convolution), and 1.3x (fine-grained linear algebra). Additionally, we demonstrate that optimization schedules generated from Autocomp can be reused across similar tensor operations, improving speedups by up to 24% under a fixed sample budget.
nan
Article 556
Title@2025-07-21 (1): Beyond the ATE: Interpretable Modelling of Treatment Effects over Dose and Time
Title: Beyond the ATE: Interpretable Modelling of Treatment Effects over Dose and Time | Jenseits der ATE: Interpretierbare Modellierung von Behandlungseffekten über Dosis und Zeit | 超越ATE:可解释的剂量和时间处理效果模型 2507.07271v2 |
Authors (4): Julianna Piskorz, Krzysztof Kacprzyk, Harry Amad, Mihaela van der Schaar
The Average Treatment Effect (ATE) is a foundational metric in causal inference, widely used to assess intervention efficacy in randomized controlled trials (RCTs). However, in many applications – particularly in healthcare – this static summary fails to capture the nuanced dynamics of treatment effects that vary with both dose and time. We propose a framework for modelling treatment effect trajectories as smooth surfaces over dose and time, enabling the extraction of clinically actionable insights such as onset time, peak effect, and duration of benefit. To ensure interpretability, robustness, and verifiability – key requirements in high-stakes domains – we adapt SemanticODE, a recent framework for interpretable trajectory modelling, to the causal setting where treatment effects are never directly observed. Our approach decouples the estimation of trajectory shape from the specification of clinically relevant properties (e.g., maxima, inflection points), supporting domain-informed priors, post-hoc editing, and transparent analysis. We show that our method yields accurate, interpretable, and editable models of treatment dynamics, facilitating both rigorous causal analysis and practical decision-making.
nan
Article 557
Title@2025-07-21 (1): Neural Probabilistic Shaping: Joint Distribution Learning for Optical Fiber Communications
Title: Neural Probabilistic Shaping: Joint Distribution Learning for Optical Fiber Communications | Neurale probabilistische Formgebung: Gemeinsames Vertriebslernen für die optische Faserkommunikation | 神经概率形状:光纤通信联合分发学习 2507.16012v1 |
Authors (3): Mohammad Taha Askari, Lutz Lampe, Amirhossein Ghazisaeidi
We present an autoregressive end-to-end learning approach for probabilistic shaping on nonlinear fiber channels. Our proposed scheme learns the joint symbol distribution and provides a 0.3-bits/2D achievable information rate gain over an optimized marginal distribution for dual-polarized 64-QAM transmission over a single-span 205 km link.
nan
Article 558
Title@2025-07-21 (1): Enhancing Stability of Physics-Informed Neural Network Training Through Saddle-Point Reformulation
Title: Enhancing Stability of Physics-Informed Neural Network Training Through Saddle-Point Reformulation | Verbesserung der Stabilität der physikinformierten neuralen Netzwerkschulung durch Sättel-Punkt-Reformulation | 通过散装式点式调整加强物理内成形神经网络培训的稳定 2507.16008v1 |
Authors (4): Dmitry Bylinkin, Mikhail Aleksandrov, Savelii Chezhegov, Aleksandr Beznosikov
Physics-informed neural networks (PINNs) have gained prominence in recent years and are now effectively used in a number of applications. However, their performance remains unstable due to the complex landscape of the loss function. To address this issue, we reformulate PINN training as a nonconvex-strongly concave saddle-point problem. After establishing the theoretical foundation for this approach, we conduct an extensive experimental study, evaluating its effectiveness across various tasks and architectures. Our results demonstrate that the proposed method outperforms the current state-of-the-art techniques.
nan
Article 559
Title@2025-07-21 (1): Risks of AI Scientists: Prioritizing Safeguarding Over Autonomy
Title: Risks of AI Scientists: Prioritizing Safeguarding Over Autonomy | Risiken von KI-Wissenschaftlern: Priorisierender Schutz vor Autonomie | AI 科学家的风险:将保障自治作为优先事项 2402.04247v5 |
Authors (13): Xiangru Tang, Qiao Jin, Kunlun Zhu, Tongxin Yuan, Yichi Zhang, Wangchunshu Zhou, Meng Qu, Yilun Zhao, Jian Tang, Zhuosheng Zhang, Arman Cohan, Zhiyong Lu, Mark Gerstein
AI scientists powered by large language models have demonstrated substantial promise in autonomously conducting experiments and facilitating scientific discoveries across various disciplines. While their capabilities are promising, these agents also introduce novel vulnerabilities that require careful consideration for safety. However, there has been limited comprehensive exploration of these vulnerabilities. This perspective examines vulnerabilities in AI scientists, shedding light on potential risks associated with their misuse, and emphasizing the need for safety measures. We begin by providing an overview of the potential risks inherent to AI scientists, taking into account user intent, the specific scientific domain, and their potential impact on the external environment. Then, we explore the underlying causes of these vulnerabilities and provide a scoping review of the limited existing works. Based on our analysis, we propose a triadic framework involving human regulation, agent alignment, and an understanding of environmental feedback (agent regulation) to mitigate these identified risks. Furthermore, we highlight the limitations and challenges associated with safeguarding AI scientists and advocate for the development of improved models, robust benchmarks, and comprehensive regulations.
nan
Article 560
Title@2025-07-21 (1): AutoMAT: A Hierarchical Framework for Autonomous Alloy Discovery
Title: AutoMAT: A Hierarchical Framework for Autonomous Alloy Discovery | AutoMAT: Hierarchischer Rahmen für die autonome Legierungsentdeckung | AutomAT: 自主合金发现等级框架 2507.16005v1 |
Authors (10): Penghui Yang, Chendong Zhao, Bijun Tang, Zhonghan Zhang, Xinrun Wang, Yanchen Deng, Yuhao Lu, Cuntai Guan, Zheng Liu, Bo An
Alloy discovery is central to advancing modern industry but remains hindered by the vastness of compositional design space and the costly validation. Here, we present AutoMAT, a hierarchical and autonomous framework grounded in and validated by experiments, which integrates large language models, automated CALPHAD-based simulations, and AI-driven search to accelerate alloy design. Spanning the entire pipeline from ideation to validation, AutoMAT achieves high efficiency, accuracy, and interpretability without the need for manually curated large datasets. In a case study targeting a lightweight, high-strength alloy, AutoMAT identifies a titanium alloy with 8.1% lower density and comparable yield strength relative to the state-of-the-art reference, achieving the highest specific strength among all comparisons. In a second case targeting high-yield-strength high-entropy alloys, AutoMAT achieves a 28.2% improvement in yield strength over the base alloy. In both cases, AutoMAT reduces the discovery timeline from years to weeks, illustrating its potential as a scalable and versatile platform for next-generation alloy design.
nan
Article 561
Title@2025-07-21 (1): Minor Embedding for Quantum Annealing with Reinforcement Learning
Title: Minor Embedding for Quantum Annealing with Reinforcement Learning | Geringfügige Einbettung für Quantum Annealing mit Verstärkungslernen | 以强化学习为量子安纳林进行小嵌入 2507.16004v1 |
Authors (3): Riccardo Nembrini, Maurizio Ferrari Dacrema, Paolo Cremonesi
Quantum Annealing (QA) is a quantum computing paradigm for solving combinatorial optimization problems formulated as Quadratic Unconstrained Binary Optimization (QUBO) problems. An essential step in QA is minor embedding, which maps the problem graph onto the sparse topology of the quantum processor. This process is computationally expensive and scales poorly with increasing problem size and hardware complexity. Existing heuristics are often developed for specific problem graphs or hardware topologies and are difficult to generalize. Reinforcement Learning (RL) offers a promising alternative by treating minor embedding as a sequential decision-making problem, where an agent learns to construct minor embeddings by iteratively mapping the problem variables to the hardware qubits. We propose a RL-based approach to minor embedding using a Proximal Policy Optimization agent, testing its ability to embed both fully connected and randomly generated problem graphs on two hardware topologies, Chimera and Zephyr. The results show that our agent consistently produces valid minor embeddings, with reasonably efficient number of qubits, in particular on the more modern Zephyr topology. Our proposed approach is also able to scale to moderate problem sizes and adapts well to different graph structures, highlighting RL’s potential as a flexible and general-purpose framework for minor embedding in QA.
nan
Article 562
Title@2025-07-21 (1): Learning without training: The implicit dynamics of in-context learning
Title: Learning without training: The implicit dynamics of in-context learning | Lernen ohne Ausbildung: Die implizite Dynamik des In-Context-Lernens | 缺乏培训的学习:内通性学习的隐含动态 2507.16003v1 |
Authors (5): Benoit Dherin, Michael Munn, Hanna Mazzawi, Michael Wunder, Javier Gonzalvo
One of the most striking features of Large Language Models (LLM) is their ability to learn in context. Namely at inference time an LLM is able to learn new patterns without any additional weight update when these patterns are presented in the form of examples in the prompt, even if these patterns were not seen during training. The mechanisms through which this can happen are still largely unknown. In this work, we show that the stacking of a self-attention layer with an MLP, allows the transformer block to implicitly modify the weights of the MLP layer according to the context. We argue through theory and experimentation that this simple mechanism may be the reason why LLMs can learn in context and not only during training. Specifically, we show under mild simplifying assumptions how a transformer block implicitly transforms a context into a low-rank weight-update of the MLP layer.
nan
Article 563
Title@2025-07-21 (1): Automated Design of Structured Variational Quantum Circuits with Reinforcement Learning
Title: Automated Design of Structured Variational Quantum Circuits with Reinforcement Learning | Automatisiertes Design von strukturierten Variations-Quantum-Schaltungen mit Verstärkungs-Lernen | 结构变化量子电路的自动设计与强化学习 2507.16001v1 |
Authors (5): Gloria Turati, Simone Foderà, Riccardo Nembrini, Maurizio Ferrari Dacrema, Paolo Cremonesi
Variational Quantum Algorithms (VQAs) are among the most promising approaches for leveraging near-term quantum hardware, yet their effectiveness strongly depends on the design of the underlying circuit ansatz, which is typically constructed with heuristic methods. In this work, we represent the synthesis of variational quantum circuits as a sequential decision-making problem, where gates are added iteratively in order to optimize an objective function, and we introduce two reinforcement learning-based methods, RLVQC Global and RLVQC Block, tailored to combinatorial optimization problems. RLVQC Block creates ansatzes that generalize the Quantum Approximate Optimization Algorithm (QAOA), by discovering a two-qubits block that is applied to all the interacting qubit pairs. While RLVQC Global further generalizes the ansatz and adds gates unconstrained by the structure of the interacting qubits. Both methods adopt the Proximal Policy Optimization (PPO) algorithm and use empirical measurement outcomes as state observations to guide the agent. We evaluate the proposed methods on a broad set of QUBO instances derived from classical graph-based optimization problems. Our results show that both RLVQC methods exhibit strong results with RLVQC Block consistently outperforming QAOA and generally surpassing RLVQC Global. While RLVQC Block produces circuits with depth comparable to QAOA, the Global variant is instead able to find significantly shorter ones. These findings suggest that reinforcement learning methods can be an effective tool to discover new ansatz structures tailored for specific problems and that the most effective circuit design strategy lies between rigid predefined architectures and completely unconstrained ones, offering a favourable trade-off between structure and adaptability.
nan
Article 564
Title@2025-07-21 (1): Learning Neural Differential Algebraic Equations via Operator Splitting
Title: Learning Neural Differential Algebraic Equations via Operator Splitting | Neurale Differentialalgebraische Gleichungen über Operator-Splitting lernen | 通过运算符分割进行学习神经差异 2403.12938v3 |
Authors (5): James Koch, Madelyn Shapiro, Himanshu Sharma, Draguna Vrabie, Jan Drgona
Differential algebraic equations (DAEs) describe the temporal evolution of systems that obey both differential and algebraic constraints. Of particular interest are systems that contain implicit relationships between their components, such as conservation laws. Here, we present an Operator Splitting (OS) numerical integration scheme for learning unknown components of DAEs from time-series data. In this work, we show that the proposed OS-based time-stepping scheme is suitable for relevant system-theoretic data-driven modeling tasks. Presented examples include (i) the inverse problem of tank-manifold dynamics and (ii) discrepancy modeling of a network of pumps, tanks, and pipes. Our experiments demonstrate the proposed method’s robustness to noise and extrapolation ability to (i) learn the behaviors of the system components and their interaction physics and (ii) disambiguate between data trends and mechanistic relationships contained in the system.
nan
Article 565
Title@2025-07-21 (1): Omni-Router: Sharing Routing Decisions in Sparse Mixture-of-Experts for Speech Recognition
Title: Omni-Router: Sharing Routing Decisions in Sparse Mixture-of-Experts for Speech Recognition | Omni-Router: Routing-Entscheidungen in Sparse Mixture-of-Experts für die Spracherkennung teilen | Omni-Router: 分享语音识别专家的松散混集决定 2507.05724v2 |
Authors (3): Zijin Gu, Tatiana Likhomanenko, Navdeep Jaitly
Mixture-of-experts (MoE) architectures have expanded from language modeling to automatic speech recognition (ASR). Traditional MoE methods, such as the Switch Transformer, route experts independently within each layer. Our analysis reveals that routers in most layers make expert choices that are not strongly correlated with the choices of the routers in other layers. To increase the cooperation between experts in different layers and encourage greater specialization, we use a shared router across different MoE layers. We call this model Omni-router Transformer. Extensive experiments on a large-scale pseudo-labeled dataset and evaluations across 10 diverse, out-of-domain ASR benchmarks demonstrate that the Omni-router Transformer is able to achieve lower training loss and consistently outperform dense and Switch Transformer models, reducing average word error rates by 11.2% and 8.2%, respectively, while providing structured expert usage and improved robustness to diverse data.
nan
Article 566
Title@2025-07-21 (1): Semantic-Aware Gaussian Process Calibration with Structured Layerwise Kernels for Deep Neural Networks
Title: Semantic-Aware Gaussian Process Calibration with Structured Layerwise Kernels for Deep Neural Networks | Semantisch-bewusste Gaußische Prozesskalibrierung mit strukturierten schichtweisen Kernen für tiefe neurale Netzwerke | 深神经网络结构图层内心校准 2507.15987v1 |
Authors (2): Kyung-hwan Lee, Kyung-tae Kim
Calibrating the confidence of neural network classifiers is essential for quantifying the reliability of their predictions during inference. However, conventional Gaussian Process (GP) calibration methods often fail to capture the internal hierarchical structure of deep neural networks, limiting both interpretability and effectiveness for assessing predictive reliability. We propose a Semantic-Aware Layer-wise Gaussian Process (SAL-GP) framework that mirrors the layered architecture of the target neural network. Instead of applying a single global GP correction, SAL-GP employs a multi-layer GP model, where each layer’s feature representation is mapped to a local calibration correction. These layerwise GPs are coupled through a structured multi-layer kernel, enabling joint marginalization across all layers. This design allows SAL-GP to capture both local semantic dependencies and global calibration coherence, while consistently propagating predictive uncertainty through the network. The resulting framework enhances interpretability aligned with the network architecture and enables principled evaluation of confidence consistency and uncertainty quantification in deep models.
nan
Article 567
Title@2025-07-21 (1): Investigation of unsupervised and supervised hyperspectral anomaly detection
Title: Investigation of unsupervised and supervised hyperspectral anomaly detection | Untersuchung des nicht überwachten und überwachten hyperspektralen Anomaliennachweises | 调查无人监督和监管的超光谱异常现象探测 2408.07114v2 |
Authors (4): Mazharul Hossain, Aaron Robinson, Lan Wang, Chrysanthe Preza
Hyperspectral sensing is a valuable tool for detecting anomalies and distinguishing between materials in a scene. Hyperspectral anomaly detection (HS-AD) helps characterize the captured scenes and separates them into anomaly and background classes. It is vital in agriculture, environment, and military applications such as RSTA (reconnaissance, surveillance, and target acquisition) missions. We previously designed an equal voting ensemble of hyperspectral unmixing and three unsupervised HS-AD algorithms. We later utilized a supervised classifier to determine the weights of a voting ensemble, creating a hybrid of heterogeneous unsupervised HS-AD algorithms with a supervised classifier in a model stacking, which improved detection accuracy. However, supervised classification methods usually fail to detect novel or unknown patterns that substantially deviate from those seen previously. In this work, we evaluate our technique and other supervised and unsupervised methods using general hyperspectral data to provide new insights.
nan
Article 568
Title@2025-07-21 (1): On the transferability of Sparse Autoencoders for interpreting compressed models
Title: On the transferability of Sparse Autoencoders for interpreting compressed models | Über die Übertragbarkeit von Sparse Autoencodern zur Interpretation komprimierter Modelle | 用于解释压缩模型的 Sparse Autoencards 可转让性 2507.15977v1 |
Authors (3): Suchit Gupte, Vishnu Kabir Chhabra, Mohammad Mahdi Khalili
Modern LLMs face inference efficiency challenges due to their scale. To address this, many compression methods have been proposed, such as pruning and quantization. However, the effect of compression on a model’s interpretability remains elusive. While several model interpretation approaches exist, such as circuit discovery, Sparse Autoencoders (SAEs) have proven particularly effective in decomposing a model’s activation space into its feature basis. In this work, we explore the differences in SAEs for the original and compressed models. We find that SAEs trained on the original model can interpret the compressed model albeit with slight performance degradation compared to the trained SAE on the compressed model. Furthermore, simply pruning the original SAE itself achieves performance comparable to training a new SAE on the pruned model. This finding enables us to mitigate the extensive training costs of SAEs.
nan
Article 569
Title@2025-07-21 (1): Efficient dataset construction using active learning and uncertainty-aware neural networks for plasma turbulent transport surrogate models
Title: Efficient dataset construction using active learning and uncertainty-aware neural networks for plasma turbulent transport surrogate models | Effizienter Datensatzaufbau durch aktives Lernen und unsichere neuronale Netze für turbulente Transportsurrogatmodelle im Plasma | 利用积极学习和有不确定性的神经网络,为等离子体动荡运输替代模型建立高效的数据集构建 2507.15976v1 |
Authors (6): Aaron Ho, Lorenzo Zanisi, Bram de Leeuw, Vincent Galvan, Pablo Rodriguez-Fernandez, Nathaniel T. Howard
This work demonstrates a proof-of-principle for using uncertainty-aware architectures, in combination with active learning techniques and an in-the-loop physics simulation code as a data labeller, to construct efficient datasets for data-driven surrogate model generation. Building off of a previous proof-of-principle successfully demonstrating training set reduction on static pre-labelled datasets, using the ADEPT framework, this strategy was applied again to the plasma turbulent transport problem within tokamak fusion plasmas, specifically the QuaLiKiz quasilinear electrostatic gyrokinetic turbulent transport code. While QuaLiKiz provides relatively fast evaluations, this study specifically targeted small datasets to serve as a proxy for more expensive codes, such as CGYRO or GENE. The newly implemented algorithm uses the SNGP architecture for the classification component of the problem and the BNN-NCP architecture for the regression component, training models for all turbulent modes (ITG, TEM, ETG) and all transport fluxes ($Q_e$, $Q_i$, $\Gamma_e$, $\Gamma_i$, and $\Pi_i$) described by the general QuaLiKiz output. With 45 active learning iterations, moving from a small initial training set of $10^{2}$ to a final set of $10^{4}$, the resulting models reached a $F_1$ classification performance of ~0.8 and a $R^2$ regression performance of ~0.75 on an independent test set across all outputs. This extrapolates to reaching the same performance and efficiency as the previous ADEPT pipeline, although on a problem with 1 extra input dimension. While the improvement rate achieved in this implementation diminishes faster than expected, the overall technique is formulated with components that can be upgraded and generalized to many surrogate modeling applications beyond plasma turbulent transport predictions.
nan
Article 570
Title@2025-07-21 (1): The Impact of Language Mixing on Bilingual LLM Reasoning
Title: The Impact of Language Mixing on Bilingual LLM Reasoning | Die Auswirkungen des Sprachmixens auf die zweisprachige LLM-Reasoning | 语言混合对双语LLM理由解释的影响 2507.15849v1 |
Authors (5): Yihao Li, Jiayi Xin, Miranda Muqing Miao, Qi Long, Lyle Ungar
Proficient multilingual speakers often intentionally switch languages in the middle of a conversation. Similarly, recent reasoning-focused bilingual large language models (LLMs) with strong capabilities in both languages exhibit language mixing–alternating languages within their chain of thought. Discouraging this behavior in DeepSeek-R1 was found to degrade accuracy, suggesting that language mixing may benefit reasoning. In this work, we study language switching in Chinese-English bilingual reasoning models. We identify reinforcement learning with verifiable rewards (RLVR) as the critical training stage that leads to language mixing. We demonstrate that language mixing can enhance reasoning: enforcing monolingual decoding reduces accuracy by 5.6 percentage points on math reasoning tasks. Additionally, a lightweight probe can be trained to predict whether a potential language switch would benefit or harm reasoning, and when used to guide decoding, increases accuracy by up to 6.25 percentage points. Our findings suggest that language mixing is not merely a byproduct of multilingual training, but is a strategic reasoning behavior.
nan
Article 571
Title@2025-07-21 (1): Transparent Trade-offs between Properties of Explanations
Title: Transparent Trade-offs between Properties of Explanations | Transparente Kompromisse zwischen den Eigenschaften von Erklärungen | 解释属性之间的透明权衡取舍 2410.23880v2 |
Authors (5): Hiwot Belay Tadesse, Alihan Hüyük, Yaniv Yacoby, Weiwei Pan, Finale Doshi-Velez
When explaining black-box machine learning models, it’s often important for explanations to have certain desirable properties. Most existing methods `encourage’ desirable properties in their construction of explanations. In this work, we demonstrate that these forms of encouragement do not consistently create explanations with the properties that are supposedly being targeted. Moreover, they do not allow for any control over which properties are prioritized when different properties are at odds with each other. We propose to directly optimize explanations for desired properties. Our direct approach not only produces explanations with optimal properties more consistently but also empowers users to control trade-offs between different properties, allowing them to create explanations with exactly what is needed for a particular task.
nan
Article 572
Title@2025-07-21 (1): FASTGEN: Fast and Cost-Effective Synthetic Tabular Data Generation with LLMs
Title: FASTGEN: Fast and Cost-Effective Synthetic Tabular Data Generation with LLMs | FASTGEN: Schnelle und kosteneffektive synthetische Tabellendatenerstellung mit LLMs | FASTGEN: 利用LLMs快速和成本-效益高的合成图表数据生成 2507.15839v1 |
Authors (4): Anh Nguyen, Sam Schafft, Nicholas Hale, John Alfaro
Synthetic data generation has emerged as an invaluable solution in scenarios where real-world data collection and usage are limited by cost and scarcity. Large language models (LLMs) have demonstrated remarkable capabilities in producing high-fidelity, domain-relevant samples across various fields. However, existing approaches that directly use LLMs to generate each record individually impose prohibitive time and cost burdens, particularly when large volumes of synthetic data are required. In this work, we propose a fast, cost-effective method for realistic tabular data synthesis that leverages LLMs to infer and encode each field’s distribution into a reusable sampling script. By automatically classifying fields into numerical, categorical, or free-text types, the LLM generates distribution-based scripts that can efficiently produce diverse, realistic datasets at scale without continuous model inference. Experimental results show that our approach outperforms traditional direct methods in both diversity and data realism, substantially reducing the burden of high-volume synthetic data generation. We plan to apply this methodology to accelerate testing in production pipelines, thereby shortening development cycles and improving overall system efficiency. We believe our insights and lessons learned will aid researchers and practitioners seeking scalable, cost-effective solutions for synthetic data generation.
nan
Article 573
Title@2025-07-21 (1): Optimizing Canaries for Privacy Auditing with Metagradient Descent
Title: Optimizing Canaries for Privacy Auditing with Metagradient Descent | Optimierung von Kanarien für die Datenschutzprüfung mit Metagradient Descent | 优化 “ 与代谢人后裔 “ 进行隐私审计的金库 2507.15836v1 |
Authors (4): Matteo Boglioni, Terrance Liu, Andrew Ilyas, Zhiwei Steven Wu
In this work we study black-box privacy auditing, where the goal is to lower bound the privacy parameter of a differentially private learning algorithm using only the algorithm’s outputs (i.e., final trained model). For DP-SGD (the most successful method for training differentially private deep learning models), the canonical approach auditing uses membership inference-an auditor comes with a small set of special “canary” examples, inserts a random subset of them into the training set, and then tries to discern which of their canaries were included in the training set (typically via a membership inference attack). The auditor’s success rate then provides a lower bound on the privacy parameters of the learning algorithm. Our main contribution is a method for optimizing the auditor’s canary set to improve privacy auditing, leveraging recent work on metagradient optimization. Our empirical evaluation demonstrates that by using such optimized canaries, we can improve empirical lower bounds for differentially private image classification models by over 2x in certain instances. Furthermore, we demonstrate that our method is transferable and efficient: canaries optimized for non-private SGD with a small model architecture remain effective when auditing larger models trained with DP-SGD.
nan
Article 574
Title@2025-07-21 (1): Multi-Strategy Improved Snake Optimizer Accelerated CNN-LSTM-Attention-Adaboost for Trajectory Prediction
Title: Multi-Strategy Improved Snake Optimizer Accelerated CNN-LSTM-Attention-Adaboost for Trajectory Prediction | Multi-Strategy Verbesserte Schlangenoptimierung beschleunigte CNN-LSTM-Achtung-Adaboost für Flugbahnvorhersage | CNN-LSTM-Tenned-Adabowst 跟踪预测多战略改进蛇优化加速器 2507.15832v1 |
Authors (1): Shiyang Li
To address the limitations of medium- and long-term four-dimensional (4D) trajectory prediction models, this paper proposes a hybrid CNN-LSTM-attention-adaboost neural network model incorporating a multi-strategy improved snake-herd optimization (SO) algorithm. The model applies the Adaboost algorithm to divide multiple weak learners, and each submodel utilizes CNN to extract spatial features, LSTM to capture temporal features, and attention mechanism to capture global features comprehensively. The strong learner model, combined with multiple sub-models, then optimizes the hyperparameters of the prediction model through the natural selection behavior pattern simulated by SO. In this study, based on the real ADS-B data from Xi’an to Tianjin, the comparison experiments and ablation studies of multiple optimizers are carried out, and a comprehensive test and evaluation analysis is carried out. The results show that SO-CLA-adaboost outperforms traditional optimizers such as particle swarm, whale, and gray wolf in handling large-scale high-dimensional trajectory data. In addition, introducing the full-strategy collaborative improvement SO algorithm improves the model’s prediction accuracy by 39.89%.
nan
Article 575
Title@2025-07-21 (1): Just Ask for Music (JAM): Multimodal and Personalized Natural Language Music Recommendation
Title: Just Ask for Music (JAM): Multimodal and Personalized Natural Language Music Recommendation | Fragen Sie einfach nach Musik (JAM): Multimodale und personalisierte natürliche Sprache Musik Empfehlung | 仅询问音乐(JAM):多式和个性化自然语言音乐建议 2507.15826v1 |
Authors (7): Alessandro B. Melchiorre, Elena V. Epure, Shahed Masoudian, Gustavo Escobedo, Anna Hausberger, Manuel Moussallam, Markus Schedl
Natural language interfaces offer a compelling approach for music recommendation, enabling users to express complex preferences conversationally. While Large Language Models (LLMs) show promise in this direction, their scalability in recommender systems is limited by high costs and latency. Retrieval-based approaches using smaller language models mitigate these issues but often rely on single-modal item representations, overlook long-term user preferences, and require full model retraining, posing challenges for real-world deployment. In this paper, we present JAM (Just Ask for Music), a lightweight and intuitive framework for natural language music recommendation. JAM models user-query-item interactions as vector translations in a shared latent space, inspired by knowledge graph embedding methods like TransE. To capture the complexity of music and user intent, JAM aggregates multimodal item features via cross-attention and sparse mixture-of-experts. We also introduce JAMSessions, a new dataset of over 100k user-query-item triples with anonymized user/item embeddings, uniquely combining conversational queries and user long-term preferences. Our results show that JAM provides accurate recommendations, produces intuitive representations suitable for practical use cases, and can be easily integrated with existing music recommendation stacks.
nan
Article 576
Title@2025-07-21 (1): ACS: An interactive framework for conformal selection
Title: ACS: An interactive framework for conformal selection | ACS: Ein interaktives Framework für die konforme Auswahl | ACC: 兼容性选择互动框架 2507.15825v1 |
Authors (4): Yu Gui, Ying Jin, Yash Nair, Zhimei Ren
This paper presents adaptive conformal selection (ACS), an interactive framework for model-free selection with guaranteed error control. Building on conformal selection (Jin and Cand`es, 2023b), ACS generalizes the approach to support human-in-the-loop adaptive data analysis. Under the ACS framework, we can partially reuse the data to boost the selection power, make decisions on the fly while exploring the data, and incorporate new information or preferences as they arise. The key to ACS is a carefully designed principle that controls the information available for decision making, allowing the data analyst to explore the data adaptively while maintaining rigorous control of the false discovery rate (FDR). Based on the ACS framework, we provide concrete selection algorithms for various goals, including model update/selection, diversified selection, and incorporating newly available labeled data. The effectiveness of ACS is demonstrated through extensive numerical simulations and real-data applications in large language model (LLM) deployment and drug discovery.
nan
Article 577
Title@2025-07-21 (1): Efficient Multi-Camera Tokenization with Triplanes for End-to-End Driving
Title: Efficient Multi-Camera Tokenization with Triplanes for End-to-End Driving | Effiziente Multi-Kamera-Tokenisierung mit Triplanes für End-to-End-Fahren | 利用三边飞机进行端到端驱动 2506.12251v2 |
Authors (6): Boris Ivanovic, Cristiano Saltori, Yurong You, Yan Wang, Wenjie Luo, Marco Pavone
Autoregressive Transformers are increasingly being deployed as end-to-end robot and autonomous vehicle (AV) policy architectures, owing to their scalability and potential to leverage internet-scale pretraining for generalization. Accordingly, tokenizing sensor data efficiently is paramount to ensuring the real-time feasibility of such architectures on embedded hardware. To this end, we present an efficient triplane-based multi-camera tokenization strategy that leverages recent advances in 3D neural reconstruction and rendering to produce sensor tokens that are agnostic to the number of input cameras and their resolution, while explicitly accounting for their geometry around an AV. Experiments on a large-scale AV dataset and state-of-the-art neural simulator demonstrate that our approach yields significant savings over current image patch-based tokenization strategies, producing up to 72% fewer tokens, resulting in up to 50% faster policy inference while achieving the same open-loop motion planning accuracy and improved offroad rates in closed-loop driving simulations.
nan
Article 578
Title@2025-07-21 (1): Federated Split Learning with Improved Communication and Storage Efficiency
Title: Federated Split Learning with Improved Communication and Storage Efficiency | Federated Split Learning mit verbesserter Kommunikation und Speichereffizienz | 改进通信和储存效率的联邦分化学习 2507.15816v1 |
Authors (2): Yujia Mu, Cong Shen
Federated learning (FL) is one of the popular distributed machine learning (ML) solutions but incurs significant communication and computation costs at edge devices. Federated split learning (FSL) can train sub-models in parallel and reduce the computational burden of edge devices by splitting the model architecture. However, it still requires a high communication overhead due to transmitting the smashed data and gradients between clients and the server in every global round. Furthermore, the server must maintain separate partial models for every client, leading to a significant storage requirement. To address these challenges, this paper proposes a novel communication and storage efficient federated split learning method, termed CSE-FSL, which utilizes an auxiliary network to locally update the weights of the clients while keeping a single model at the server, hence avoiding frequent transmissions of gradients from the server and greatly reducing the storage requirement of the server. Additionally, a new model update method of transmitting the smashed data in selected epochs can reduce the amount of smashed data sent from the clients. We provide a theoretical analysis of CSE-FSL, rigorously guaranteeing its convergence under non-convex loss functions. The extensive experimental results further indicate that CSE-FSL achieves a significant communication reduction over existing FSL solutions using real-world FL tasks.
nan
Article 579
Title@2025-07-21 (1): LLM Economist: Large Population Models and Mechanism Design in Multi-Agent Generative Simulacra
Title: LLM Economist: Large Population Models and Mechanism Design in Multi-Agent Generative Simulacra | LLM Economist: Große Bevölkerungsmodelle und Mechanism Design in Multi-Agent Generative Simulacra | LLM 经济学家:多机构生成模拟中大型人口模型和机制设计 2507.15815v1 |
Authors (6): Seth Karten, Wenzhe Li, Zihan Ding, Samuel Kleiner, Yu Bai, Chi Jin
We present the LLM Economist, a novel framework that uses agent-based modeling to design and assess economic policies in strategic environments with hierarchical decision-making. At the lower level, bounded rational worker agents – instantiated as persona-conditioned prompts sampled from U.S. Census-calibrated income and demographic statistics – choose labor supply to maximize text-based utility functions learned in-context. At the upper level, a planner agent employs in-context reinforcement learning to propose piecewise-linear marginal tax schedules anchored to the current U.S. federal brackets. This construction endows economic simulacra with three capabilities requisite for credible fiscal experimentation: (i) optimization of heterogeneous utilities, (ii) principled generation of large, demographically realistic agent populations, and (iii) mechanism design – the ultimate nudging problem – expressed entirely in natural language. Experiments with populations of up to one hundred interacting agents show that the planner converges near Stackelberg equilibria that improve aggregate social welfare relative to Saez solutions, while a periodic, persona-level voting procedure furthers these gains under decentralized governance. These results demonstrate that large language model-based agents can jointly model, simulate, and govern complex economic systems, providing a tractable test bed for policy evaluation at the societal scale to help build better civilizations.
nan
Article 580
Title@2025-07-21 (1): Splitting criteria for ordinal decision trees: an experimental study
Title: Splitting criteria for ordinal decision trees: an experimental study | Aufteilung der Kriterien für Ordinalentscheidungsbäume: eine experimentelle Studie | 例常决策树的分割标准:一项实验研究 2412.13697v3 |
Authors (5): Rafael Ayllón-Gavilán, Francisco José Martínez-Estudillo, David Guijo-Rubio, César Hervás-Martínez, Pedro Antonio Gutiérrez
Ordinal Classification (OC) addresses those classification tasks where the labels exhibit a natural order. Unlike nominal classification, which treats all classes as mutually exclusive and unordered, OC takes the ordinal relationship into account, producing more accurate and relevant results. This is particularly critical in applications where the magnitude of classification errors has significant consequences. Despite this, OC problems are often tackled using nominal methods, leading to suboptimal solutions. Although decision trees are among the most popular classification approaches, ordinal tree-based approaches have received less attention when compared to other classifiers. This work provides a comprehensive survey of ordinal splitting criteria, standardising the notations used in the literature to enhance clarity and consistency. Three ordinal splitting criteria, Ordinal Gini (OGini), Weighted Information Gain (WIG), and Ranking Impurity (RI), are compared to the nominal counterparts of the first two (Gini and information gain), by incorporating them into a decision tree classifier. An extensive repository considering $45$ publicly available OC datasets is presented, supporting the first experimental comparison of ordinal and nominal splitting criteria using well-known OC evaluation metrics. The results have been statistically analysed, highlighting that OGini stands out as the best ordinal splitting criterion to date, reducing the mean absolute error achieved by Gini by more than 3.02%. To promote reproducibility, all source code developed, a detailed guide for reproducing the results, the 45 OC datasets, and the individual results for all the evaluated methodologies are provided.
nan
Article 581
Title@2025-07-21 (1): MSGM: A Multi-Scale Spatiotemporal Graph Mamba for EEG Emotion Recognition
Title: MSGM: A Multi-Scale Spatiotemporal Graph Mamba for EEG Emotion Recognition | MSGM: Multi-Scale Spatiotemporal Graph Mamba für EEG-Emotionserkennung | MMSGM: 承认EEG情感的多空间反光图 Mamba 2507.15914v1 |
Authors (5): Hanwen Liu, Yifeng Gong, Zuwei Yan, Zeheng Zhuang, Jiaxuan Lu
EEG-based emotion recognition struggles with capturing multi-scale spatiotemporal dynamics and ensuring computational efficiency for real-time applications. Existing methods often oversimplify temporal granularity and spatial hierarchies, limiting accuracy. To overcome these challenges, we propose the Multi-Scale Spatiotemporal Graph Mamba (MSGM), a novel framework integrating multi-window temporal segmentation, bimodal spatial graph modeling, and efficient fusion via the Mamba architecture. By segmenting EEG signals across diverse temporal scales and constructing global-local graphs with neuroanatomical priors, MSGM effectively captures fine-grained emotional fluctuations and hierarchical brain connectivity. A multi-depth Graph Convolutional Network (GCN) and token embedding fusion module, paired with Mamba’s state-space modeling, enable dynamic spatiotemporal interaction at linear complexity. Notably, with just one MSST-Mamba layer, MSGM surpasses leading methods in the field on the SEED, THU-EP, and FACED datasets, outperforming baselines in subject-independent emotion classification while achieving robust accuracy and millisecond-level inference on the NVIDIA Jetson Xavier NX.
nan
Article 582
Title@2025-07-21 (1): Rethinking Inductive Bias in Geographically Neural Network Weighted Regression
Title: Rethinking Inductive Bias in Geographically Neural Network Weighted Regression | Induktive Bias im geographisch neuralen Netzwerk neu denken Gewichtete Regression | 重新思考在地理神经网络中诱导的偏见 2507.09958v4 |
Authors (1): Zhenyuan Chen
Inductive bias is a key factor in spatial regression models, determining how well a model can learn from limited data and capture spatial patterns. This work revisits the inductive biases in Geographically Neural Network Weighted Regression (GNNWR) and identifies limitations in current approaches for modeling spatial non-stationarity. While GNNWR extends traditional Geographically Weighted Regression by using neural networks to learn spatial weighting functions, existing implementations are often restricted by fixed distance-based schemes and limited inductive bias. We propose to generalize GNNWR by incorporating concepts from convolutional neural networks, recurrent neural networks, and transformers, introducing local receptive fields, sequential context, and self-attention into spatial regression. Through extensive benchmarking on synthetic spatial datasets with varying heterogeneity, noise, and sample sizes, we show that GNNWR outperforms classic methods in capturing nonlinear and complex spatial relationships. Our results also reveal that model performance depends strongly on data characteristics, with local models excelling in highly heterogeneous or small-sample scenarios, and global models performing better with larger, more homogeneous data. These findings highlight the importance of inductive bias in spatial modeling and suggest future directions, including learnable spatial weighting functions, hybrid neural architectures, and improved interpretability for models handling non-stationary spatial data.
nan
Article 583
Title@2025-07-21 (1): Automatic dimensionality reduction of Twin-in-the-Loop Observers
Title: Automatic dimensionality reduction of Twin-in-the-Loop Observers | Automatische Dimensionalitätsreduktion von Twin-in-the-Loop-Beobachtern | 双在洛op观察家的自动维度减少 2401.10945v2 |
Authors (5): Giacomo Delcaro, Riccardo Poli, Federico Dettù, Simone Formentin, Sergio Matteo Savaresi
Conventional vehicle dynamics estimation methods suffer from the drawback of employing independent, separately calibrated filtering modules for each variable. To address this limitation, a recent proposal introduces a unified Twin-in-the-Loop (TiL) Observer architecture. This architecture replaces the simplified control-oriented vehicle model with a full-fledged vehicle simulator (digital twin), and employs a real-time correction mechanism using a linear time-invariant output error law. Bayesian Optimization is utilized to tune the observer due to the simulator’s black-box nature, leading to a high-dimensional optimization problem. This paper focuses on developing a procedure to reduce the observer’s complexity by exploring both supervised and unsupervised learning approaches. The effectiveness of these strategies is validated for longitudinal and lateral vehicle dynamics using real-world data.
nan
Article 584
Title@2025-07-21 (1): Diffusion models for multivariate subsurface generation and efficient probabilistic inversion
Title: Diffusion models for multivariate subsurface generation and efficient probabilistic inversion | Diffusionsmodelle für multivariate Untergrunderzeugung und effiziente probabilistische Inversion | 多变地表下产生和高效概率转换的多变地表下生成扩散模型 2507.15809v1 |
Authors (2): Roberto Miele, Niklas Linde
Diffusion models offer stable training and state-of-the-art performance for deep generative modeling tasks. Here, we consider their use in the context of multivariate subsurface modeling and probabilistic inversion. We first demonstrate that diffusion models enhance multivariate modeling capabilities compared to variational autoencoders and generative adversarial networks. In diffusion modeling, the generative process involves a comparatively large number of time steps with update rules that can be modified to account for conditioning data. We propose different corrections to the popular Diffusion Posterior Sampling approach by Chung et al. (2023). In particular, we introduce a likelihood approximation accounting for the noise-contamination that is inherent in diffusion modeling. We assess performance in a multivariate geological scenario involving facies and correlated acoustic impedance. Conditional modeling is demonstrated using both local hard data (well logs) and nonlinear geophysics (fullstack seismic data). Our tests show significantly improved statistical robustness, enhanced sampling of the posterior probability density function and reduced computational costs, compared to the original approach. The method can be used with both hard and indirect conditioning data, individually or simultaneously. As the inversion is included within the diffusion process, it is faster than other methods requiring an outer-loop around the generative model, such as Markov chain Monte Carlo.
nan
Article 585
Title@2025-07-21 (1): ConformalSAM: Unlocking the Potential of Foundational Segmentation Models in Semi-Supervised Semantic Segmentation with Conformal Prediction
Title: ConformalSAM: Unlocking the Potential of Foundational Segmentation Models in Semi-Supervised Semantic Segmentation with Conformal Prediction | ConformalSAM: Das Potenzial von Basissegmentierungsmodellen in semi-überwachter semantischer Segmentierung mit konformer Vorhersage freisetzen | 非正式拼音系统:在半超超语义分解中释放基础分解模型的潜力,同时进行非正式预测 2507.15803v1 |
Authors (7): Danhui Chen, Ziquan Liu, Chuxi Yang, Dan Wang, Yan Yan, Yi Xu, Xiangyang Ji
Pixel-level vision tasks, such as semantic segmentation, require extensive and high-quality annotated data, which is costly to obtain. Semi-supervised semantic segmentation (SSSS) has emerged as a solution to alleviate the labeling burden by leveraging both labeled and unlabeled data through self-training techniques. Meanwhile, the advent of foundational segmentation models pre-trained on massive data, has shown the potential to generalize across domains effectively. This work explores whether a foundational segmentation model can address label scarcity in the pixel-level vision task as an annotator for unlabeled images. Specifically, we investigate the efficacy of using SEEM, a Segment Anything Model (SAM) variant fine-tuned for textual input, to generate predictive masks for unlabeled data. To address the shortcomings of using SEEM-generated masks as supervision, we propose ConformalSAM, a novel SSSS framework which first calibrates the foundation model using the target domain’s labeled data and then filters out unreliable pixel labels of unlabeled data so that only high-confidence labels are used as supervision. By leveraging conformal prediction (CP) to adapt foundation models to target data through uncertainty calibration, ConformalSAM exploits the strong capability of the foundational segmentation model reliably which benefits the early-stage learning, while a subsequent self-reliance training strategy mitigates overfitting to SEEM-generated masks in the later training stage. Our experiment demonstrates that, on three standard benchmarks of SSSS, ConformalSAM achieves superior performance compared to recent SSSS methods and helps boost the performance of those methods as a plug-in.
nan
Article 586
Title@2025-07-21 (1): Hypergraphs on high dimensional time series sets using signature transform
Title: Hypergraphs on high dimensional time series sets using signature transform | Hypergraphen auf hochdimensionalen Zeitreihen-Sets mit Signatur-Transformation | 使用签名变换的高维时间序列数 2507.15802v1 |
Authors (2): Rémi Vaucher, Paul Minchella
In recent decades, hypergraphs and their analysis through Topological Data Analysis (TDA) have emerged as powerful tools for understanding complex data structures. Various methods have been developed to construct hypergraphs – referred to as simplicial complexes in the TDA framework – over datasets, enabling the formation of edges between more than two vertices. This paper addresses the challenge of constructing hypergraphs from collections of multivariate time series. While prior work has focused on the case of a single multivariate time series, we extend this framework to handle collections of such time series. Our approach generalizes the method proposed in Chretien and al. by leveraging the properties of signature transforms to introduce controlled randomness, thereby enhancing the robustness of the construction process. We validate our method on synthetic datasets and present promising results.
nan
Article 587
Title@2025-07-21 (1): In-depth Analysis of Low-rank Matrix Factorisation in a Federated Setting
Title: In-depth Analysis of Low-rank Matrix Factorisation in a Federated Setting | Detaillierte Analyse der Low-Rank-Matrix-Fabrizierung in einem Federated Setting | 深入分析联邦体系中低级别母体因数化 2409.08771v2 |
Authors (3): Constantin Philippenko, Kevin Scaman, Laurent Massoulié
We analyze a distributed algorithm to compute a low-rank matrix factorization on $N$ clients, each holding a local dataset $\mathbf{S}^i \in \mathbb{R}^{n_i \times d}$, mathematically, we seek to solve $min_{\mathbf{U}^i \in \mathbb{R}^{n_i\times r}, \mathbf{V}\in \mathbb{R}^{d \times r} } \frac{1}{2} \sum_{i=1}^N |\mathbf{S}^i - \mathbf{U}^i \mathbf{V}^\top|^2{\text{F}}$. Considering a power initialization of $\mathbf{V}$, we rewrite the previous smooth non-convex problem into a smooth strongly-convex problem that we solve using a parallel Nesterov gradient descent potentially requiring a single step of communication at the initialization step. For any client $i$ in ${1, \dots, N}$, we obtain a global $\mathbf{V}$ in $\mathbb{R}^{d \times r}$ common to all clients and a local variable $\mathbf{U}^i$ in $\mathbb{R}^{n_i \times r}$. We provide a linear rate of convergence of the excess loss which depends on $\sigma{\max} / \sigma_{r}$, where $\sigma_{r}$ is the $r^{\mathrm{th}}$ singular value of the concatenation $\mathbf{S}$ of the matrices $(\mathbf{S}^i){i=1}^N$. This result improves the rates of convergence given in the literature, which depend on $\sigma{\max}^2 / \sigma_{\min}^2$. We provide an upper bound on the Frobenius-norm error of reconstruction under the power initialization strategy. We complete our analysis with experiments on both synthetic and real data.
nan
Article 588
Title@2025-07-21 (1): TensorSocket: Shared Data Loading for Deep Learning Training
Title: TensorSocket: Shared Data Loading for Deep Learning Training | TensorSocket: Shared Data Loading für Deep Learning Training | TensorSocket: 用于深学习培训的共享数据加载 2409.18749v3 |
Authors (3): Ties Robroek, Neil Kim Nielsen, Pınar Tözün
Training deep learning models is a repetitive and resource-intensive process. Data scientists often train several models before landing on a set of parameters (e.g., hyper-parameter tuning) and model architecture (e.g., neural architecture search), among other things that yield the highest accuracy. The computational efficiency of these training tasks depends highly on how well the training data is supplied to the training process. The repetitive nature of these tasks results in the same data processing pipelines running over and over, exacerbating the need for and costs of computational resources. In this paper, we present TensorSocket to reduce the computational needs of deep learning training by enabling simultaneous training processes to share the same data loader. TensorSocket mitigates CPU-side bottlenecks in cases where the collocated training workloads have high throughput on GPU, but are held back by lower data-loading throughput on CPU. TensorSocket achieves this by reducing redundant computations and data duplication across collocated training processes and leveraging modern GPU-GPU interconnects. While doing so, TensorSocket is able to train and balance differently-sized models and serve multiple batch sizes simultaneously and is hardware- and pipeline-agnostic in nature. Our evaluation shows that TensorSocket enables scenarios that are infeasible without data sharing, increases training throughput by up to 100%, and when utilizing cloud instances, achieves cost savings of 50% by reducing the hardware resource needs on the CPU side. Furthermore, TensorSocket outperforms the state-of-the-art solutions for shared data loading such as CoorDL and Joader; it is easier to deploy and maintain and either achieves higher or matches their throughput while requiring fewer CPU resources.
nan
Article 589
Title@2025-07-21 (1): Optimizer’s Information Criterion: Dissecting and Correcting Bias in Data-Driven Optimization
Title: Optimizer’s Information Criterion: Dissecting and Correcting Bias in Data-Driven Optimization | Optimizer’s Information Criterion: Dissektion und Korrektur von Bias in der datengesteuerten Optimierung | 优化信息标准:在数据驱动优化中解剖和纠正偏见 2306.10081v4 |
Authors (3): Garud Iyengar, Henry Lam, Tianyu Wang
In data-driven optimization, the sample performance of the obtained decision typically incurs an optimistic bias against the true performance, a phenomenon commonly known as the Optimizer’s Curse and intimately related to overfitting in machine learning. Common techniques to correct this bias, such as cross-validation, require repeatedly solving additional optimization problems and are therefore computationally expensive. We develop a general bias correction approach, building on what we call Optimizer’s Information Criterion (OIC), that directly approximates the first-order bias and does not require solving any additional optimization problems. Our OIC generalizes the celebrated Akaike Information Criterion to evaluate the objective performance in data-driven optimization, which crucially involves not only model fitting but also its interplay with the downstream optimization. As such it can be used for decision selection instead of only model selection. We apply our approach to a range of data-driven optimization formulations comprising empirical and parametric models, their regularized counterparts, and furthermore contextual optimization. Finally, we provide numerical validation on the superior performance of our approach under synthetic and real-world datasets.
nan
Article 590
Title@2025-07-21 (1): Small LLMs Do Not Learn a Generalizable Theory of Mind via Reinforcement Learning
Title: Small LLMs Do Not Learn a Generalizable Theory of Mind via Reinforcement Learning | Kleine LLMs lernen keine verallgemeinerbare Theorie des Geistes durch Verstärkungslernen | 小型LLMs Do Loms Don not Learn a Global For Syor of Mind Syory 通过加强学习学习学习不学习普通心理理论的小型LLMs 2507.15788v1 |
Authors (2): Sneheel Sarangi, Hanan Salam
Recent advancements in large language models (LLMs) have demonstrated emergent capabilities in complex reasoning, largely spurred by rule-based Reinforcement Learning (RL) techniques applied during the post-training. This has raised the question of whether similar methods can instill more nuanced, human-like social intelligence, such as a Theory of Mind (ToM), in LLMs. This paper investigates whether small-scale LLMs can acquire a robust and generalizable ToM capability through RL with verifiable rewards (RLVR). We conduct a systematic evaluation by training models on various combinations of prominent ToM datasets (HiToM, ExploreToM, FANToM) and testing for generalization on held-out datasets (e.g., OpenToM). Our findings indicate that small LLMs struggle to develop a generic ToM capability. While performance on in-distribution tasks improves, this capability fails to transfer to unseen ToM tasks with different characteristics. Furthermore, we demonstrate that prolonged RL training leads to models ``hacking’’ the statistical patterns of the training datasets, resulting in significant performance gains on in-domain data but no change, or degradation of performance on out-of-distribution tasks. This suggests the learned behavior is a form of narrow overfitting rather than the acquisition of a true, abstract ToM capability.
nan
Article 591
Title@2025-07-21 (1): Graph Attention Specialized Expert Fusion Model for Node Classification: Based on Cora and Pubmed Datasets
Title: Graph Attention Specialized Expert Fusion Model for Node Classification: Based on Cora and Pubmed Datasets | Grafik-Achtung Spezialisiertes Experten-Fusionsmodell für Knotenklassifikation: Basierend auf Cora- und Pubmed-Datensätzen | 节点分类专门专家融合模型:以科拉和普米德数据集为基础 2507.15784v1 |
Authors (2): Zihang Ma, Qitian Yin
Graph node classification is a fundamental task in graph neural networks (GNNs), aiming to assign predefined class labels to nodes. On the PubMed citation network dataset, we observe significant classification difficulty disparities, with Category 2 achieving only 74.4% accuracy in traditional GCN, 7.5% lower than Category 1. To address this, we propose a Wasserstein-Rubinstein (WR) distance enhanced Expert Fusion Model (WR-EFM), training specialized GNN models for Categories 0/1 (with layer normalization and residual connections) and Multi-hop Graph Attention Networks (GAT) for Category 2. The WR distance metric optimizes representation similarity between models, particularly focusing on improving Category 2 performance. Our adaptive fusion strategy dynamically weights models based on category-specific performance, with Category 2 assigned a GAT weight of 0.8. WR distance further guides the fusion process by measuring distributional differences between model representations, enabling more principled integration of complementary features. Experimental results show WR-EFM achieves balanced accuracy across categories: 77.8% (Category 0), 78.0% (Category 1), and 79.9% (Category 2), outperforming both single models and standard fusion approaches. The coefficient of variation (CV) of WR-EFM’s category accuracies is 0.013, 77.6% lower than GCN’s 0.058, demonstrating superior stability. Notably, WR-EFM improves Category 2 accuracy by 5.5% compared to GCN, verifying the effectiveness of WR-guided fusion in capturing complex structural patterns. This work provides a novel paradigm for handling class-imbalanced graph classification tasks. To promote the research community, we release our project at https://github.com/s010m00n/GASEM4NC.
nan
Article 592
Title@2025-07-21 (1): Dissociating model architectures from inference computations
Title: Dissociating model architectures from inference computations | Trennen von Modellarchitekturen von Inferenzberechnungen | 将模型结构与推断计算分离 2507.15776v1 |
Authors (2): Noor Sajid, Johan Medrano
Parr et al., 2025 examines how auto-regressive and deep temporal models differ in their treatment of non-Markovian sequence modelling. Building on this, we highlight the need for dissociating model architectures, i.e., how the predictive distribution factorises, from the computations invoked at inference. We demonstrate that deep temporal computations are mimicked by autoregressive models by structuring context access during iterative inference. Using a transformer trained on next-token prediction, we show that inducing hierarchical temporal factorisation during iterative inference maintains predictive capacity while instantiating fewer computations. This emphasises that processes for constructing and refining predictions are not necessarily bound to their underlying model architectures.
nan
Article 593
Title@2025-07-21 (1): Dynamics is what you need for time-series forecasting!
Title: Dynamics is what you need for time-series forecasting! | Dynamics ist das, was Sie für die Zeitreihenvorhersage brauchen! | 时间序列预测需要动力! 2507.15774v1 |
Authors (3): Alexis-Raja Brachet, Pierre-Yves Richard, Céline Hudelot
While boundaries between data modalities are vanishing, the usual successful deep models are still challenged by simple ones in the time-series forecasting task. Our hypothesis is that this task needs models that are able to learn the data underlying dynamics. We propose to validate it through both systemic and empirical studies. We develop an original $\texttt{PRO-DYN}$ nomenclature to analyze existing models through the lens of dynamics. Two observations thus emerged: $\textbf{1}$. under-performing architectures learn dynamics at most partially, $\textbf{2}$. the location of the dynamics block at the model end is of prime importance. We conduct extensive experiments to confirm our observations on a set of performance-varying models with diverse backbones. Results support the need to incorporate a learnable dynamics block and its use as the final predictor.
nan
Article 594
Title@2025-07-21 (1): Deep-Learning Investigation of Vibrational Raman Spectra for Plant-Stress Analysis
Title: Deep-Learning Investigation of Vibrational Raman Spectra for Plant-Stress Analysis | Deep-Learning-Untersuchung von Vibrations-Raman-Spektren für Pflanzen-Stress-Analysen | 用于植物压力分析的振动性拉曼-斯佩特拉深度学习调查 2507.15772v1 |
Authors (9): Anoop C. Patil, Benny Jian Rong Sng, Yu-Wei Chang, Joana B. Pereira, Chua Nam-Hai, Rajani Sarojam, Gajendra Pratap Singh, In-Cheol Jang, Giovanni Volpe
Detecting stress in plants is crucial for both open-farm and controlled-environment agriculture. Biomolecules within plants serve as key stress indicators, offering vital markers for continuous health monitoring and early disease detection. Raman spectroscopy provides a powerful, non-invasive means to quantify these biomolecules through their molecular vibrational signatures. However, traditional Raman analysis relies on customized data-processing workflows that require fluorescence background removal and prior identification of Raman peaks of interest-introducing potential biases and inconsistencies. Here, we introduce DIVA (Deep-learning-based Investigation of Vibrational Raman spectra for plant-stress Analysis), a fully automated workflow based on a variational autoencoder. Unlike conventional approaches, DIVA processes native Raman spectra-including fluorescence backgrounds-without manual preprocessing, identifying and quantifying significant spectral features in an unbiased manner. We applied DIVA to detect a range of plant stresses, including abiotic (shading, high light intensity, high temperature) and biotic stressors (bacterial infections). By integrating deep learning with vibrational spectroscopy, DIVA paves the way for AI-driven plant health assessment, fostering more resilient and sustainable agricultural practices.
nan
Article 595
Title@2025-07-21 (1): Multi-Modal Sensor Fusion for Proactive Blockage Prediction in mmWave Vehicular Networks
Title: Multi-Modal Sensor Fusion for Proactive Blockage Prediction in mmWave Vehicular Networks | Multi-Modal Sensor Fusion für proaktive Blockierungsvorhersage in mmWave Vehicular Networks | 毫米WVVVVVVVVLAVLAVVVVVVVVVVVVVE 模拟屏蔽预测的多式多式传感器聚合 2507.15769v1 |
Authors (6): Ahmad M. Nazar, Abdulkadir Celik, Mohamed Y. Selim, Asmaa Abdallah, Daji Qiao, Ahmed M. Eltawil
Vehicular communication systems operating in the millimeter wave (mmWave) band are highly susceptible to signal blockage from dynamic obstacles such as vehicles, pedestrians, and infrastructure. To address this challenge, we propose a proactive blockage prediction framework that utilizes multi-modal sensing, including camera, GPS, LiDAR, and radar inputs in an infrastructure-to-vehicle (I2V) setting. This approach uses modality-specific deep learning models to process each sensor stream independently and fuses their outputs using a softmax-weighted ensemble strategy based on validation performance. Our evaluations, for up to 1.5s in advance, show that the camera-only model achieves the best standalone trade-off with an F1-score of 97.1% and an inference time of 89.8ms. A camera+radar configuration further improves accuracy to 97.2% F1 at 95.7ms. Our results display the effectiveness and efficiency of multi-modal sensing for mmWave blockage prediction and provide a pathway for proactive wireless communication in dynamic environments.
nan
Article 596
Title@2025-07-21 (1): Quantum Learning Theory Beyond Batch Binary Classification
Title: Quantum Learning Theory Beyond Batch Binary Classification | Quanten-Lern-Theorie jenseits der Batch Binary Klassifikation | 超出批次二进制分类的量子学习理论 2302.07409v5 |
Authors (2): Preetham Mohan, Ambuj Tewari
Arunachalam and de Wolf (2018) showed that the sample complexity of quantum batch learning of boolean functions, in the realizable and agnostic settings, has the same form and order as the corresponding classical sample complexities. In this paper, we extend this, ostensibly surprising, message to batch multiclass learning, online boolean learning, and online multiclass learning. For our online learning results, we first consider an adaptive adversary variant of the classical model of Dawid and Tewari (2022). Then, we introduce the first (to the best of our knowledge) model of online learning with quantum examples.
nan
Article 597
Title@2025-07-21 (1): Predictive Planner for Autonomous Driving with Consistency Models
Title: Predictive Planner for Autonomous Driving with Consistency Models | Predictive Planer für autonomes Fahren mit konsistenten Modellen | 与一致性模式一致自主驾驶的预测规划员 2502.08033v3 |
Authors (5): Anjian Li, Sangjae Bae, David Isele, Ryne Beeson, Faizan M. Tariq
Trajectory prediction and planning are essential for autonomous vehicles to navigate safely and efficiently in dynamic environments. Traditional approaches often treat them separately, limiting the ability for interactive planning. While recent diffusion-based generative models have shown promise in multi-agent trajectory generation, their slow sampling is less suitable for high-frequency planning tasks. In this paper, we leverage the consistency model to build a predictive planner that samples from a joint distribution of ego and surrounding agents, conditioned on the ego vehicle’s navigational goal. Trained on real-world human driving datasets, our consistency model generates higher-quality trajectories with fewer sampling steps than standard diffusion models, making it more suitable for real-time deployment. To enforce multiple planning constraints simultaneously on the ego trajectory, a novel online guided sampling approach inspired by the Alternating Direction Method of Multipliers (ADMM) is introduced. Evaluated on the Waymo Open Motion Dataset (WOMD), our method enables proactive behavior such as nudging and yielding, and also demonstrates smoother, safer, and more efficient trajectories and satisfaction of multiple constraints under a limited computational budget.
nan
Article 598
Title@2025-07-21 (1): Reciprocity-Aware Convolutional Neural Networks for Map-Based Path Loss Prediction
Title: Reciprocity-Aware Convolutional Neural Networks for Map-Based Path Loss Prediction | Reziprocity-Aware Convolutional Neural Networks for Map-Based Path Loss Prediction | 地图路径损耗预测对等天体对流神经网络 2504.03625v2 |
Authors (3): Ryan G. Dempsey, Jonathan Ethier, Halim Yanikomeroglu
Path loss modeling is a widely used technique for estimating point-to-point losses along a communications link from transmitter (Tx) to receiver (Rx). Accurate path loss predictions can optimize use of the radio frequency spectrum and minimize unwanted interference. Modern path loss modeling often leverages data-driven approaches, using machine learning to train models on drive test measurement datasets. Drive tests primarily represent downlink scenarios, where the Tx is located on a building and the Rx is located on a moving vehicle. Consequently, trained models are frequently reserved for downlink coverage estimation, lacking representation of uplink scenarios. In this paper, we demonstrate that data augmentation can be used to train a path loss model that is generalized to uplink, downlink, and backhaul scenarios, training using only downlink drive test measurements. By adding a small number of synthetic samples representing uplink scenarios to the training set, root mean squared error is reduced by > 8 dB on uplink examples in the test set.
nan
Article 599
Title@2025-07-21 (1): Steering into New Embedding Spaces: Analyzing Cross-Lingual Alignment Induced by Model Interventions in Multilingual Language Models
Title: Steering into New Embedding Spaces: Analyzing Cross-Lingual Alignment Induced by Model Interventions in Multilingual Language Models | Steuerung in neue Einbettungsräume: Analyse der Cross-Lingual Alignment Induziert durch Modellinterventionen in mehrsprachigen Sprachmodellen | 指导进入新嵌入空间:分析多语文模式示范干预措施所引出的不同语言之间的横向一致 2502.15639v2 |
Authors (6): Anirudh Sundar, Sinead Williamson, Katherine Metcalf, Barry-John Theobald, Skyler Seto, Masha Fedzechkina
Aligned representations across languages is a desired property in multilingual large language models (mLLMs), as alignment can improve performance in cross-lingual tasks. Typically alignment requires fine-tuning a model, which is computationally expensive, and sizable language data, which often may not be available. A data-efficient alternative to fine-tuning is model interventions – a method for manipulating model activations to steer generation into the desired direction. We analyze the effect of a popular intervention (finding experts) on the alignment of cross-lingual representations in mLLMs. We identify the neurons to manipulate for a given language and introspect the embedding space of mLLMs pre- and post-manipulation. We show that modifying the mLLM’s activations changes its embedding space such that cross-lingual alignment is enhanced. Further, we show that the changes to the embedding space translate into improved downstream performance on retrieval tasks, with up to 2x improvements in top-1 accuracy on cross-lingual retrieval.
nan
Article 600
Title@2025-07-21 (1): Comparative Evaluation of Radiomics and Deep Learning Models for Disease Detection in Chest Radiography
Title: Comparative Evaluation of Radiomics and Deep Learning Models for Disease Detection in Chest Radiography | Vergleichende Auswertung von Radiomik und Deep-Learning-Modellen zur Erkennung von Krankheiten in der Brustradiographie | 比较评价用于在胸针射电摄影中检测疾病辐射学和深学习模型的比较评价 2504.12249v3 |
Authors (2): Zhijin He, Alan B. McMillan
The application of artificial intelligence (AI) in medical imaging has revolutionized diagnostic practices, enabling advanced analysis and interpretation of radiological data. This study presents a comprehensive evaluation of radiomics-based and deep learning-based approaches for disease detection in chest radiography, focusing on COVID-19, lung opacity, and viral pneumonia. While deep learning models, particularly convolutional neural networks and vision transformers, learn directly from image data, radiomics-based models extract handcrafted features, offering potential advantages in data-limited scenarios. We systematically compared the diagnostic performance of various AI models, including Decision Trees, Gradient Boosting, Random Forests, Support Vector Machines, and Multi-Layer Perceptrons for radiomics, against state-of-the-art deep learning models such as InceptionV3, EfficientNetL, and ConvNeXtXLarge. Performance was evaluated across multiple sample sizes. At 24 samples, EfficientNetL achieved an AUC of 0.839, outperforming SVM (AUC = 0.762). At 4000 samples, InceptionV3 achieved the highest AUC of 0.996, compared to 0.885 for Random Forest. A Scheirer-Ray-Hare test confirmed significant main and interaction effects of model type and sample size on all metrics. Post hoc Mann-Whitney U tests with Bonferroni correction further revealed consistent performance advantages for deep learning models across most conditions. These findings provide statistically validated, data-driven recommendations for model selection in diagnostic AI. Deep learning models demonstrated higher performance and better scalability with increasing data availability, while radiomics-based models may remain useful in low-data contexts. This study addresses a critical gap in AI-based diagnostic research by offering practical guidance for deploying AI models across diverse clinical environments.
nan
Article 601
Title@2025-07-21 (1): Towards physician-centered oversight of conversational diagnostic AI
Title: Towards physician-centered oversight of conversational diagnostic AI | Auf dem Weg zur ärztlichen Aufsicht über gesprächsdiagnostische KI | 致力于以医生为中心对谈话诊断进行监督 AI 2507.15743v1 |
Authors (35): Elahe Vedadi, David Barrett, Natalie Harris, Ellery Wulczyn, Shashir Reddy, Roma Ruparel, Mike Schaekermann, Tim Strother, Ryutaro Tanno, Yash Sharma, Jihyeon Lee, Cían Hughes, Dylan Slack, Anil Palepu, Jan Freyberg, Khaled Saab, Valentin Liévin, Wei-Hung Weng, Tao Tu, Yun Liu, Nenad Tomasev, Kavita Kulkarni, S. Sara Mahdavi, Kelvin Guu, Joëlle Barral, Dale R. Webster, James Manyika, Avinatan Hassidim, Katherine Chou, Yossi Matias, Pushmeet Kohli, Adam Rodman, Vivek Natarajan, Alan Karthikesalingam, David Stutz
Recent work has demonstrated the promise of conversational AI systems for diagnostic dialogue. However, real-world assurance of patient safety means that providing individual diagnoses and treatment plans is considered a regulated activity by licensed professionals. Furthermore, physicians commonly oversee other team members in such activities, including nurse practitioners (NPs) or physician assistants/associates (PAs). Inspired by this, we propose a framework for effective, asynchronous oversight of the Articulate Medical Intelligence Explorer (AMIE) AI system. We propose guardrailed-AMIE (g-AMIE), a multi-agent system that performs history taking within guardrails, abstaining from individualized medical advice. Afterwards, g-AMIE conveys assessments to an overseeing primary care physician (PCP) in a clinician cockpit interface. The PCP provides oversight and retains accountability of the clinical decision. This effectively decouples oversight from intake and can thus happen asynchronously. In a randomized, blinded virtual Objective Structured Clinical Examination (OSCE) of text consultations with asynchronous oversight, we compared g-AMIE to NPs/PAs or a group of PCPs under the same guardrails. Across 60 scenarios, g-AMIE outperformed both groups in performing high-quality intake, summarizing cases, and proposing diagnoses and management plans for the overseeing PCP to review. This resulted in higher quality composite decisions. PCP oversight of g-AMIE was also more time-efficient than standalone PCP consultations in prior work. While our study does not replicate existing clinical practices and likely underestimates clinicians’ capabilities, our results demonstrate the promise of asynchronous oversight as a feasible paradigm for diagnostic AI systems to operate under expert human oversight for enhancing real-world care.
nan
Article 602
Title@2025-07-21 (1): Conformal and kNN Predictive Uncertainty Quantification Algorithms in Metric Spaces
Title: Conformal and kNN Predictive Uncertainty Quantification Algorithms in Metric Spaces | Konforme und kNN Predictive Uncertainty Quantification Algorithmen in Metric Spaces | 计量空间中正规和kNN 预测不确定性的量化数值 2507.15741v1 |
Authors (2): Gábor Lugosi, Marcos Matabuena
This paper introduces a framework for uncertainty quantification in regression models defined in metric spaces. Leveraging a newly defined notion of homoscedasticity, we develop a conformal prediction algorithm that offers finite-sample coverage guarantees and fast convergence rates of the oracle estimator. In heteroscedastic settings, we forgo these non-asymptotic guarantees to gain statistical efficiency, proposing a local $k$–nearest–neighbor method without conformal calibration that is adaptive to the geometry of each particular nonlinear space. Both procedures work with any regression algorithm and are scalable to large data sets, allowing practitioners to plug in their preferred models and incorporate domain expertise. We prove consistency for the proposed estimators under minimal conditions. Finally, we demonstrate the practical utility of our approach in personalized–medicine applications involving random response objects such as probability distributions and graph Laplacians.
nan
Article 603
Title@2025-07-21 (1): Competitive Algorithms for Cooperative Multi-Agent Ski-Rental Problems
Title: Competitive Algorithms for Cooperative Multi-Agent Ski-Rental Problems | Wettbewerbsfähige Algorithmen für kooperative Multi-Agenten-Ski-Mietprobleme | 合作性多机构天空-天空问题的竞争价值 2507.15727v1 |
Authors (6): Xuchuang Wang, Bo Sun, Hedyeh Beyhaghi, John C. S. Lui, Mohammad Hajiesmaili, Adam Wierman
This paper introduces a novel multi-agent ski-rental problem that generalizes the classical ski-rental dilemma to a group setting where agents incur individual and shared costs. In our model, each agent can either rent at a fixed daily cost, or purchase a pass at an individual cost, with an additional third option of a discounted group pass available to all. We consider scenarios in which agents’ active days differ, leading to dynamic states as agents drop out of the decision process. To address this problem from different perspectives, we define three distinct competitive ratios: overall, state-dependent, and individual rational. For each objective, we design and analyze optimal deterministic and randomized policies. Our deterministic policies employ state-aware threshold functions that adapt to the dynamic states, while our randomized policies sample and resample thresholds from tailored state-aware distributions. The analysis reveals that symmetric policies, in which all agents use the same threshold, outperform asymmetric ones. Our results provide competitive ratio upper and lower bounds and extend classical ski-rental insights to multi-agent settings, highlighting both theoretical and practical implications for group decision-making under uncertainty.
nan
Article 604
Title@2025-07-21 (1): A Review of Bayesian Uncertainty Quantification in Deep Probabilistic Image Segmentation
Title: A Review of Bayesian Uncertainty Quantification in Deep Probabilistic Image Segmentation | Eine Überprüfung der Bayesian Uncertainty Quantification in Deep Probabilistic Image Segmentation | 对贝叶斯不确定因素在深概率图像分割中量化的回顾 2411.16370v6 |
Authors (5): M. M. A. Valiuddin, R. J. G. van Sloun, C. G. A. Viviers, P. H. N. de With, F. van der Sommen
Advances in architectural design, data availability, and compute have driven remarkable progress in semantic segmentation. Yet, these models often rely on relaxed Bayesian assumptions, omitting critical uncertainty information needed for robust decision-making. The resulting reliance on point estimates has fueled interest in probabilistic segmentation, but the literature remains fragmented. In response, this review consolidates and contextualizes foundational concepts in uncertainty modeling, including the non-trivial task of distinguishing between epistemic and aleatoric uncertainty and examining their roles across four key downstream segmentation tasks, highlighting Active Learning as particularly promising. By unifying theory, terminology, and applications, we provide a coherent foundation for researchers and identify critical challenges, such as strong assumptions in spatial aggregation, lack of standardized benchmarks, and pitfalls in current uncertainty quantification methods. We identify trends such as the adoption of contemporary generative models, driven by advances in the broader field of generative modeling, with segmentation-specific innovation primarily in the conditioning mechanisms. Moreover, we observe growing interest in distribution- and sampling-free approaches to uncertainty estimation. We further propose directions for advancing uncertainty-aware segmentation in deep learning, including pragmatic strategies for disentangling different sources of uncertainty, novel uncertainty modeling approaches and improved Transformer-based backbones. In this way, we aim to support the development of more reliable, efficient, and interpretable segmentation models that effectively incorporate uncertainty into real-world applications.
nan
Article 605
Title@2025-07-21 (1): Explainable Anomaly Detection for Electric Vehicles Charging Stations
Title: Explainable Anomaly Detection for Electric Vehicles Charging Stations | Erklärbare Anomalieerkennung für Elektroautos Ladestationen | 电动车辆充电站可解释异常探测 2507.15718v1 |
Authors (7): Matteo Cederle, Andrea Mazzucco, Andrea Demartini, Eugenio Mazza, Eugenia Suriani, Federico Vitti, Gian Antonio Susto
Electric vehicles (EV) charging stations are one of the critical infrastructures needed to support the transition to renewable-energy-based mobility, but ensuring their reliability and efficiency requires effective anomaly detection to identify irregularities in charging behavior. However, in such a productive scenario, it is also crucial to determine the underlying cause behind the detected anomalies. To achieve this goal, this study investigates unsupervised anomaly detection techniques for EV charging infrastructure, integrating eXplainable Artificial Intelligence techniques to enhance interpretability and uncover root causes of anomalies. Using real-world sensors and charging session data, this work applies Isolation Forest to detect anomalies and employs the Depth-based Isolation Forest Feature Importance (DIFFI) method to identify the most important features contributing to such anomalies. The efficacy of the proposed approach is evaluated in a real industrial case.
nan
Article 606
Title@2025-07-21 (1): Model-Based Exploration in Monitored Markov Decision Processes
Title: Model-Based Exploration in Monitored Markov Decision Processes | Modellbasierte Exploration in überwachten Markov-Entscheidungsprozessen | 在监测的Markov决策过程中进行基于模型的探索 2502.16772v6 |
Authors (4): Alireza Kazemipour, Simone Parisi, Matthew E. Taylor, Michael Bowling
A tenet of reinforcement learning is that the agent always observes rewards. However, this is not true in many realistic settings, e.g., a human observer may not always be available to provide rewards, sensors may be limited or malfunctioning, or rewards may be inaccessible during deployment. Monitored Markov decision processes (Mon-MDPs) have recently been proposed to model such settings. However, existing Mon-MDP algorithms have several limitations: they do not fully exploit the problem structure, cannot leverage a known monitor, lack worst-case guarantees for ‘unsolvable’ Mon-MDPs without specific initialization, and offer only asymptotic convergence proofs. This paper makes three contributions. First, we introduce a model-based algorithm for Mon-MDPs that addresses these shortcomings. The algorithm employs two instances of model-based interval estimation: one to ensure that observable rewards are reliably captured, and another to learn the minimax-optimal policy. Second, we empirically demonstrate the advantages. We show faster convergence than prior algorithms in over four dozen benchmarks, and even more dramatic improvement when the monitoring process is known. Third, we present the first finite-sample bound on performance. We show convergence to a minimax-optimal policy even when some rewards are never observable.
nan
Article 607
Title@2025-07-21 (1): Winning Big with Small Models: Knowledge Distillation vs. Self-Training for Reducing Hallucination in Product QA Agents
Title: Winning Big with Small Models: Knowledge Distillation vs. Self-Training for Reducing Hallucination in Product QA Agents | Groß gewinnen mit kleinen Modellen: Wissensdestillation vs. Selbsttraining zur Reduktion der Halluzination in Produkt-QA-Agenten | 以小型模型赢得大奖:知识蒸馏与减少产品质量保证剂中幻觉的自我培训 2502.19545v2 |
Authors (6): Ashley Lewis, Michael White, Jing Liu, Toshiaki Koike-Akino, Kieran Parsons, Ye Wang
The deployment of Large Language Models (LLMs) in customer support is constrained by hallucination (generating false information) and the high cost of proprietary models. To address these challenges, we propose a retrieval-augmented question-answering (QA) pipeline and explore how to balance human input and automation. Using a dataset of questions about a Samsung Smart TV user manual, we demonstrate that synthetic data generated by LLMs outperforms crowdsourced data in reducing hallucination in finetuned models. We also compare self-training (fine-tuning models on their own outputs) and knowledge distillation (fine-tuning on stronger models’ outputs, e.g., GPT-4o), and find that self-training achieves comparable hallucination reduction. We conjecture that this surprising finding can be attributed to increased exposure bias issues in the knowledge distillation case and support this conjecture with post hoc analysis. We also improve robustness to unanswerable questions and retrieval failures with contextualized “I don’t know” responses. These findings show that scalable, cost-efficient QA systems can be built using synthetic data and self-training with open-source models, reducing reliance on proprietary tools or costly human annotations.
nan
Article 608
Title@2025-07-21 (1): CoLD: Counterfactually-Guided Length Debiasing for Process Reward Models
Title: CoLD: Counterfactually-Guided Length Debiasing for Process Reward Models | CoLD: Counterfactually-Führungslängen-Debiasing für Prozess-Reward-Modelle | CoLD: 反事实引导进程奖励模型的长度偏差 2507.15698v1 |
Authors (7): Congmin Zheng, Jiachen Zhu, Jianghao Lin, Xinyi Dai, Yong Yu, Weinan Zhang, Mengyue Yang
Process Reward Models (PRMs) play a central role in evaluating and guiding multi-step reasoning in large language models (LLMs), especially for mathematical problem solving. However, we identify a pervasive length bias in existing PRMs: they tend to assign higher scores to longer reasoning steps, even when the semantic content and logical validity are unchanged. This bias undermines the reliability of reward predictions and leads to overly verbose outputs during inference. To address this issue, we propose CoLD(Counterfactually-Guided Length Debiasing), a unified framework that mitigates length bias through three components: an explicit length-penalty adjustment, a learned bias estimator trained to capture spurious length-related signals, and a joint training strategy that enforces length-invariance in reward predictions. Our approach is grounded in counterfactual reasoning and informed by causal graph analysis. Extensive experiments on MATH500 and GSM-Plus show that CoLD consistently reduces reward-length correlation, improves accuracy in step selection, and encourages more concise, logically valid reasoning. These results demonstrate the effectiveness and practicality of CoLD in improving the fidelity and robustness of PRMs.
nan
Article 609
Title@2025-07-21 (1): Gradient-Guided Annealing for Domain Generalization
Title: Gradient-Guided Annealing for Domain Generalization | Gradient-Guided Annealing für Domain Generalization | 域通用化的渐渐引导安纳林 2502.20162v7 |
Authors (2): Aristotelis Ballas, Christos Diou
Domain Generalization (DG) research has gained considerable traction as of late, since the ability to generalize to unseen data distributions is a requirement that eludes even state-of-the-art training algorithms. In this paper we observe that the initial iterations of model training play a key role in domain generalization effectiveness, since the loss landscape may be significantly different across the training and test distributions, contrary to the case of i.i.d. data. Conflicts between gradients of the loss components of each domain lead the optimization procedure to undesirable local minima that do not capture the domain-invariant features of the target classes. We propose alleviating domain conflicts in model optimization, by iteratively annealing the parameters of a model in the early stages of training and searching for points where gradients align between domains. By discovering a set of parameter values where gradients are updated towards the same direction for each data distribution present in the training set, the proposed Gradient-Guided Annealing (GGA) algorithm encourages models to seek out minima that exhibit improved robustness against domain shifts. The efficacy of GGA is evaluated on five widely accepted and challenging image classification domain generalization benchmarks, where its use alone is able to establish highly competitive or even state-of-the-art performance. Moreover, when combined with previously proposed domain-generalization algorithms it is able to consistently improve their effectiveness by significant margins.
nan
Article 610
Title@2025-07-21 (1): Understanding the training of infinitely deep and wide ResNets with Conditional Optimal Transport
Title: Understanding the training of infinitely deep and wide ResNets with Conditional Optimal Transport | Verständnis der Ausbildung von unendlich tiefen und breiten ResNets mit Conditional Optimal Transport | 理解如何以有条件最佳运输方式培训无限深和宽的ResNet 2403.12887v2 |
Authors (3): Raphaël Barboni, Gabriel Peyré, François-Xavier Vialard
We study the convergence of gradient flow for the training of deep neural networks. If Residual Neural Networks are a popular example of very deep architectures, their training constitutes a challenging optimization problem due notably to the non-convexity and the non-coercivity of the objective. Yet, in applications, those tasks are successfully solved by simple optimization algorithms such as gradient descent. To better understand this phenomenon, we focus here on a ``mean-field’’ model of infinitely deep and arbitrarily wide ResNet, parameterized by probability measures over the product set of layers and parameters and with constant marginal on the set of layers. Indeed, in the case of shallow neural networks, mean field models have proven to benefit from simplified loss-landscapes and good theoretical guarantees when trained with gradient flow for the Wasserstein metric on the set of probability measures. Motivated by this approach, we propose to train our model with gradient flow w.r.t. the conditional Optimal Transport distance: a restriction of the classical Wasserstein distance which enforces our marginal condition. Relying on the theory of gradient flows in metric spaces we first show the well-posedness of the gradient flow equation and its consistency with the training of ResNets at finite width. Performing a local Polyak-\L{}ojasiewicz analysis, we then show convergence of the gradient flow for well-chosen initializations: if the number of features is finite but sufficiently large and the risk is sufficiently small at initialization, the gradient flow converges towards a global minimizer. This is the first result of this type for infinitely deep and arbitrarily wide ResNets.
nan
Article 611
Title@2025-07-21 (1): Missing value imputation with adversarial random forests – MissARF
Title: Missing value imputation with adversarial random forests – MissARF | Fehlender Wert imputation mit konversarischen zufälligen Wäldern – MissARF | 对抗性随机随机森林缺失的估算值 – – MissARRF 2507.15681v1 |
Authors (4): Pegah Golchian, Jan Kapar, David S. Watson, Marvin N. Wright
Handling missing values is a common challenge in biostatistical analyses, typically addressed by imputation methods. We propose a novel, fast, and easy-to-use imputation method called missing value imputation with adversarial random forests (MissARF), based on generative machine learning, that provides both single and multiple imputation. MissARF employs adversarial random forest (ARF) for density estimation and data synthesis. To impute a missing value of an observation, we condition on the non-missing values and sample from the estimated conditional distribution generated by ARF. Our experiments demonstrate that MissARF performs comparably to state-of-the-art single and multiple imputation methods in terms of imputation quality and fast runtime with no additional costs for multiple imputation.
nan
Article 612
Title@2025-07-21 (1): GeoHNNs: Geometric Hamiltonian Neural Networks
Title: GeoHNNs: Geometric Hamiltonian Neural Networks | GeoHNNs: Geometrische Hamiltonische Neuronale Netzwerke | GeoHNNs:几何汉密尔顿神经网络 2507.15678v1 |
Authors (2): Amine Mohamed Aboussalah, Abdessalam Ed-dib
The fundamental laws of physics are intrinsically geometric, dictating the evolution of systems through principles of symmetry and conservation. While modern machine learning offers powerful tools for modeling complex dynamics from data, common methods often ignore this underlying geometric fabric. Physics-informed neural networks, for instance, can violate fundamental physical principles, leading to predictions that are unstable over long periods, particularly for high-dimensional and chaotic systems. Here, we introduce \textit{Geometric Hamiltonian Neural Networks (GeoHNN)}, a framework that learns dynamics by explicitly encoding the geometric priors inherent to physical laws. Our approach enforces two fundamental structures: the Riemannian geometry of inertia, by parameterizing inertia matrices in their natural mathematical space of symmetric positive-definite matrices, and the symplectic geometry of phase space, using a constrained autoencoder to ensure the preservation of phase space volume in a reduced latent space. We demonstrate through experiments on systems ranging from coupled oscillators to high-dimensional deformable objects that GeoHNN significantly outperforms existing models. It achieves superior long-term stability, accuracy, and energy conservation, confirming that embedding the geometry of physics is not just a theoretical appeal but a practical necessity for creating robust and generalizable models of the physical world.
nan
Article 613
Title@2025-07-21 (1): Executable Functional Abstractions: Inferring Generative Programs for Advanced Math Problems
Title: Executable Functional Abstractions: Inferring Generative Programs for Advanced Math Problems | Ausführbare Funktionsabstractions: Ausleiten von Generativen Programmen für fortgeschrittene Math-Probleme | 可执行的功能性抽象:为高级数学问题推导产生方案 2504.09763v2 |
Authors (5): Zaid Khan, Elias Stengel-Eskin, Archiki Prasad, Jaemin Cho, Mohit Bansal
Scientists often infer abstract procedures from specific instances of problems and use the abstractions to generate new, related instances. For example, programs encoding the formal rules and properties of a system have been useful in fields ranging from reinforcement learning (procedural environments) to physics (simulation engines). These programs can be seen as functions which execute to different outputs based on their parameterizations (e.g., gridworld configuration or initial physical conditions). We introduce the term EFA (Executable Functional Abstraction) to denote such programs for math problems. EFA-like constructs have been shown to be useful for mathematical reasoning as problem generators for stress-testing models. However, prior work has been limited to automatically constructing abstractions for grade-school math (whose simple rules are easy to encode in programs), while generating EFAs for advanced math has thus far required human engineering. We explore the automatic construction of EFAs for advanced mathematics problems by developing EFAGen, which operationalizes the task of automatically inferring an EFA for a given seed problem and solution as a program synthesis task. We first formalize the properties of any valid EFA as executable unit tests. Using execution feedback from the unit tests, we search over candidate programs sampled from a LLM to find EFA programs that are faithful to the generalized problem and solution class underlying the seed problem. We then apply the tests as a reward signal, training LLMs to become better writers of EFAs. We show that EFAs inferred by EFAGen are faithful to the seed problems, produce learnable problem variations, and that EFAGen can infer EFAs across diverse sources of competition-level math problems. Finally, we show uses of model-written EFAs e.g., finding harder/easier problem variants, as well as data generation.
nan
Article 614
Title@2025-07-21 (1): Attention with Markov: A Framework for Principled Analysis of Transformers via Markov Chains
Title: Attention with Markov: A Framework for Principled Analysis of Transformers via Markov Chains | Aufmerksamkeit bei Markov: Ein Rahmen für die grundsätzliche Analyse von Transformatoren über Markov Ketten | 注意Markov:通过Markov 链条对变形器进行原则分析的框架 2402.04161v2 |
Authors (7): Ashok Vardhan Makkuva, Marco Bondaschi, Adway Girish, Alliot Nagle, Martin Jaggi, Hyeji Kim, Michael Gastpar
Attention-based transformers have achieved tremendous success across a variety of disciplines including natural languages. To deepen our understanding of their sequential modeling capabilities, there is a growing interest in using Markov input processes to study them. A key finding is that when trained on first-order Markov chains, transformers with two or more layers consistently develop an induction head mechanism to estimate the in-context bigram conditional distribution. In contrast, single-layer transformers, unable to form an induction head, directly learn the Markov kernel but often face a surprising challenge: they become trapped in local minima representing the unigram distribution, whereas deeper models reliably converge to the ground-truth bigram. While single-layer transformers can theoretically model first-order Markov chains, their empirical failure to learn this simple kernel in practice remains a curious phenomenon. To explain this contrasting behavior of single-layer models, in this paper we introduce a new framework for a principled analysis of transformers via Markov chains. Leveraging our framework, we theoretically characterize the loss landscape of single-layer transformers and show the existence of global minima (bigram) and bad local minima (unigram) contingent on data properties and model architecture. We precisely delineate the regimes under which these local optima occur. Backed by experiments, we demonstrate that our theoretical findings are in congruence with the empirical results. Finally, we outline several open problems in this arena. Code is available at https://github.com/Bond1995/Markov .
nan
Article 615
Title@2025-07-21 (1): Further exploration of binding energy residuals using machine learning and the development of a composite ensemble model
Title: Further exploration of binding energy residuals using machine learning and the development of a composite ensemble model | Weitere Erforschung von Bindungsenergieresten mittels maschinellem Lernen und der Entwicklung eines zusammengesetzten Ensemblemodells | 利用机器学习和开发复合组合组合模型,进一步利用机器学习和开发综合组合模型,探索具有约束力的能源残余物 2503.11066v3 |
Authors (4): I. Bentley, J. Tedder, M. Gebran, A. Paul
This paper describes the development of the Four Model Tree Ensemble (FMTE). The FMTE is a composite of machine learning models trained on experimental binding energies from the Atomic Mass Evaluation (AME) 2012. The FMTE predicts binding energy values for all nuclei with N > 7 and Z > 7 from AME 2020 with a standard deviation of 76 keV and a mean average deviation of 34 keV. The FMTE model was developed by combining three new models with one prior model. The new models presented here have been trained on binding energy residuals from mass models using four machine learning approaches. The models presented in this work leverage shape parameters along with other physical features. We have determined the preferred machine learning approach for binding energy residuals is the least-squares boosted ensemble of trees. This approach appears to have a superior ability to both interpolate and extrapolate binding energy residuals. A comparison with the masses of isotopes that were not measured previously and a discussion of extrapolations approaching the neutron drip line have been included.
nan
Article 616
Title@2025-07-21 (1): Sparsification Under Siege: Defending Against Poisoning Attacks in Communication-Efficient Federated Learning
Title: Sparsification Under Siege: Defending Against Poisoning Attacks in Communication-Efficient Federated Learning | Sparsifikation unter Belagerung: Abwehr gegen vergiftende Angriffe in kommunikativ-effizientem Federated Learning | 隔离下的隔离:在通信-高效联邦学习中防范毒物攻击 2505.01454v4 |
Authors (6): Zhiyong Jin, Runhua Xu, Chao Li, Yizhong Liu, Jianxin Li, James Joshi
Federated Learning (FL) enables collaborative model training across distributed clients while preserving data privacy, yet it faces significant challenges in communication efficiency and vulnerability to poisoning attacks. While sparsification techniques mitigate communication overhead by transmitting only critical model parameters, they inadvertently amplify security risks: adversarial clients can exploit sparse updates to evade detection and degrade model performance. Existing defense mechanisms, designed for standard FL communication scenarios, are ineffective in addressing these vulnerabilities within sparsified FL. To bridge this gap, we propose FLARE, a novel federated learning framework that integrates sparse index mask inspection and model update sign similarity analysis to detect and mitigate poisoning attacks in sparsified FL. Extensive experiments across multiple datasets and adversarial scenarios demonstrate that FLARE significantly outperforms existing defense strategies, effectively securing sparsified FL against poisoning attacks while maintaining communication efficiency.
nan
Article 617
Title@2025-07-21 (1): Towards Explainable Anomaly Detection in Shared Mobility Systems
Title: Towards Explainable Anomaly Detection in Shared Mobility Systems | Auf dem Weg zu einer erklärbaren Anomalienerkennung in gemeinsamen Mobilitätssystemen | 共同流动系统中可解释的异常探测 2507.15643v1 |
Authors (4): Elnur Isgandarov, Matteo Cederle, Federico Chiariotti, Gian Antonio Susto
Shared mobility systems, such as bike-sharing networks, play a crucial role in urban transportation. Identifying anomalies in these systems is essential for optimizing operations, improving service reliability, and enhancing user experience. This paper presents an interpretable anomaly detection framework that integrates multi-source data, including bike-sharing trip records, weather conditions, and public transit availability. The Isolation Forest algorithm is employed for unsupervised anomaly detection, along with the Depth-based Isolation Forest Feature Importance (DIFFI) algorithm providing interpretability. Results show that station-level analysis offers a robust understanding of anomalies, highlighting the influence of external factors such as adverse weather and limited transit availability. Our findings contribute to improving decision-making in shared mobility operations.
nan
Article 618
Title@2025-07-21 (1): Ultra-fast feature learning for the training of two-layer neural networks in the two-timescale regime
Title: Ultra-fast feature learning for the training of two-layer neural networks in the two-timescale regime | Ultraschnelles Feature-Lernen für das Training von zweischichtigen neuronalen Netzwerken im Zwei-Zeit-Regime | 用于培训两层神经网络的超快专题学习 2504.18208v2 |
Authors (3): Raphaël Barboni, Gabriel Peyré, François-Xavier Vialard
We study the convergence of gradient methods for the training of mean-field single-hidden-layer neural networks with square loss. For this high-dimensional and non-convex optimization problem, most known convergence results are either qualitative or rely on a neural tangent kernel analysis where nonlinear representations of the data are fixed. Using that this problem belongs to the class of separable nonlinear least squares problems, we consider here a Variable Projection (VarPro) or two-timescale learning algorithm, thereby eliminating the linear variables and reducing the learning problem to the training of nonlinear features. In a teacher-student scenario, we show such a strategy enables provable convergence rates for the sampling of a teacher feature distribution. Precisely, in the limit where the regularization strength vanishes, we show that the dynamic of the feature distribution corresponds to a weighted ultra-fast diffusion equation. Recent results on the asymptotic behavior of such PDEs then give quantitative guarantees for the convergence of the learned feature distribution.
nan
Article 619
Title@2025-07-21 (1): Data Mixing Agent: Learning to Re-weight Domains for Continual Pre-training
Title: Data Mixing Agent: Learning to Re-weight Domains for Continual Pre-training | Data Mixing Agent: Erlernen von Re-Gewicht Domains für kontinuierliches Pre-Training | 数据混合代理: 学习为连续培训前学习重新加权域域 2507.15640v1 |
Authors (7): Kailai Yang, Xiao Liu, Lei Ji, Hao Li, Yeyun Gong, Peng Cheng, Mao Yang
Continual pre-training on small-scale task-specific data is an effective method for improving large language models in new target fields, yet it risks catastrophic forgetting of their original capabilities. A common solution is to re-weight training data mixtures from source and target fields on a domain space to achieve balanced performance. Previous domain reweighting strategies rely on manual designation with certain heuristics based on human intuition or empirical results. In this work, we prove that more general heuristics can be parameterized by proposing Data Mixing Agent, the first model-based, end-to-end framework that learns to re-weight domains. The agent learns generalizable heuristics through reinforcement learning on large quantities of data mixing trajectories with corresponding feedback from an evaluation environment. Experiments in continual pre-training on math reasoning show that Data Mixing Agent outperforms strong baselines in achieving balanced performance across source and target field benchmarks. Furthermore, it generalizes well across unseen source fields, target models, and domain spaces without retraining. Direct application to the code generation field also indicates its adaptability across target domains. Further analysis showcases the agents’ well-aligned heuristics with human intuitions and their efficiency in achieving superior model performance with less source-field data.
nan
Article 620
Title@2025-07-21 (1): Dual Turing Test: A Framework for Detecting and Mitigating Undetectable AI
Title: Dual Turing Test: A Framework for Detecting and Mitigating Undetectable AI | Dual Turing Test: Ein Framework zur Erkennung und Abmilderung nicht nachweisbarer KI | 双重图示试验:检测和减缓不可检测的AI值的框架 2507.15907v1 |
Authors (1): Alberto Messina
In this short note, we propose a unified framework that bridges three areas: (1) a flipped perspective on the Turing Test, the “dual Turing test”, in which a human judge’s goal is to identify an AI rather than reward a machine for deception; (2) a formal adversarial classification game with explicit quality constraints and worst-case guarantees; and (3) a reinforcement learning (RL) alignment pipeline that uses an undetectability detector and a set of quality related components in its reward model. We review historical precedents, from inverted and meta-Turing variants to modern supervised reverse-Turing classifiers, and highlight the novelty of combining quality thresholds, phased difficulty levels, and minimax bounds. We then formalize the dual test: define the judge’s task over N independent rounds with fresh prompts drawn from a prompt space Q, introduce a quality function Q and parameters tau and delta, and cast the interaction as a two-player zero-sum game over the adversary’s feasible strategy set M. Next, we map this minimax game onto an RL-HF style alignment loop, in which an undetectability detector D provides negative reward for stealthy outputs, balanced by a quality proxy that preserves fluency. Throughout, we include detailed explanations of each component notation, the meaning of inner minimization over sequences, phased tests, and iterative adversarial training and conclude with a suggestion for a couple of immediate actions.
nan
Article 621
Title@2025-07-21 (1): Accelerating HEC-RAS: A Recurrent Neural Operator for Rapid River Forecasting
Title: Accelerating HEC-RAS: A Recurrent Neural Operator for Rapid River Forecasting | Beschleunigung von HEC-RAS: Ein wiederkehrender Neuraloperator für Rapid River Forecasting | 加速ECC-RAS:快速河流预报经常神经操作员 2507.15614v1 |
Authors (12): Edward Holmberg, Pujan Pokhrel, Maximilian Zoch, Elias Ioup, Ken Pathak, Steven Sloan, Kendall Niles, Jay Ratcliff, Maik Flanagin, Christian Guetl, Julian Simeonov, Mahdi Abdelguerfi
Physics-based solvers like HEC-RAS provide high-fidelity river forecasts but are too computationally intensive for on-the-fly decision-making during flood events. The central challenge is to accelerate these simulations without sacrificing accuracy. This paper introduces a deep learning surrogate that treats HEC-RAS not as a solver but as a data-generation engine. We propose a hybrid, auto-regressive architecture that combines a Gated Recurrent Unit (GRU) to capture short-term temporal dynamics with a Geometry-Aware Fourier Neural Operator (Geo-FNO) to model long-range spatial dependencies along a river reach. The model learns underlying physics implicitly from a minimal eight-channel feature vector encoding dynamic state, static geometry, and boundary forcings extracted directly from native HEC-RAS files. Trained on 67 reaches of the Mississippi River Basin, the surrogate was evaluated on a year-long, unseen hold-out simulation. Results show the model achieves a strong predictive accuracy, with a median absolute stage error of 0.31 feet. Critically, for a full 67-reach ensemble forecast, our surrogate reduces the required wall-clock time from 139 minutes to 40 minutes, a speedup of nearly 3.5 times over the traditional solver. The success of this data-driven approach demonstrates that robust feature engineering can produce a viable, high-speed replacement for conventional hydraulic models, improving the computational feasibility of large-scale ensemble flood forecasting.
nan
Article 622
Title@2025-07-21 (1): Brain-Inspired Online Adaptation for Remote Sensing with Spiking Neural Network
Title: Brain-Inspired Online Adaptation for Remote Sensing with Spiking Neural Network | Gehirn-inspirierte Online-Anpassung zur Fernerkundung mit Spiking Neural Network | 利用Spiking神经网络进行有脑启发的遥感在线适应 2409.02146v2 |
Authors (4): Dexin Duan, Peilin liu, Bingwei Hui, Fei Wen
On-device computing, or edge computing, is becoming increasingly important for remote sensing, particularly in applications like deep network-based perception on on-orbit satellites and unmanned aerial vehicles (UAVs). In these scenarios, two brain-like capabilities are crucial for remote sensing models: (1) high energy efficiency, allowing the model to operate on edge devices with limited computing resources, and (2) online adaptation, enabling the model to quickly adapt to environmental variations, weather changes, and sensor drift. This work addresses these needs by proposing an online adaptation framework based on spiking neural networks (SNNs) for remote sensing. Starting with a pretrained SNN model, we design an efficient, unsupervised online adaptation algorithm, which adopts an approximation of the BPTT algorithm and only involves forward-in-time computation that significantly reduces the computational complexity of SNN adaptation learning. Besides, we propose an adaptive activation scaling scheme to boost online SNN adaptation performance, particularly in low time-steps. Furthermore, for the more challenging remote sensing detection task, we propose a confidence-based instance weighting scheme, which substantially improves adaptation performance in the detection task. To our knowledge, this work is the first to address the online adaptation of SNNs. Extensive experiments on seven benchmark datasets across classification, segmentation, and detection tasks demonstrate that our proposed method significantly outperforms existing domain adaptation and domain generalization approaches under varying weather conditions. The proposed method enables energy-efficient and fast online adaptation on edge devices, and has much potential in applications such as remote perception on on-orbit satellites and UAV.
nan
Article 623
Title@2025-07-21 (1): Deep Learning for Computing Convergence Rates of Markov Chains
Title: Deep Learning for Computing Convergence Rates of Markov Chains | Deep Learning for Computing Convergence Rates of Markov Ketten | Markov 链条计算聚合率深入学习 2405.20435v2 |
Authors (3): Yanlin Qu, Jose Blanchet, Peter Glynn
Convergence rate analysis for general state-space Markov chains is fundamentally important in areas such as Markov chain Monte Carlo and algorithmic analysis (for computing explicit convergence bounds). This problem, however, is notoriously difficult because traditional analytical methods often do not generate practically useful convergence bounds for realistic Markov chains. We propose the Deep Contractive Drift Calculator (DCDC), the first general-purpose sample-based algorithm for bounding the convergence of Markov chains to stationarity in Wasserstein distance. The DCDC has two components. First, inspired by the new convergence analysis framework in Qu, Blanchet and Glynn (2023), we introduce the Contractive Drift Equation (CDE), the solution of which leads to an explicit convergence bound. Second, we develop an efficient neural-network-based CDE solver. Equipped with these two components, DCDC solves the CDE and converts the solution into a convergence bound. We analyze the sample complexity of the algorithm and further demonstrate the effectiveness of the DCDC by generating convergence bounds for realistic Markov chains arising from stochastic processing networks as well as constant step-size stochastic optimization.
nan
Article 624
Title@2025-07-21 (1): Optimal Batch-Size Control for Low-Latency Federated Learning with Device Heterogeneity
Title: Optimal Batch-Size Control for Low-Latency Federated Learning with Device Heterogeneity | Optimale Batch-Size-Steuerung für Low-Latency-Federated Learning mit Geräte Heterogenität | 具有不同设备差异的低长期联邦学习最佳批次和最佳程度控制 2507.15601v1 |
Authors (3): Huiling Yang, Zhanwei Wang, Kaibin Huang
Federated learning (FL) has emerged as a popular approach for collaborative machine learning in sixth-generation (6G) networks, primarily due to its privacy-preserving capabilities. The deployment of FL algorithms is expected to empower a wide range of Internet-of-Things (IoT) applications, e.g., autonomous driving, augmented reality, and healthcare. The mission-critical and time-sensitive nature of these applications necessitates the design of low-latency FL frameworks that guarantee high learning performance. In practice, achieving low-latency FL faces two challenges: the overhead of computing and transmitting high-dimensional model updates, and the heterogeneity in communication-and-computation (C$^2$) capabilities across devices. To address these challenges, we propose a novel C$^2$-aware framework for optimal batch-size control that minimizes end-to-end (E2E) learning latency while ensuring convergence. The framework is designed to balance a fundamental C$^2$ tradeoff as revealed through convergence analysis. Specifically, increasing batch sizes improves the accuracy of gradient estimation in FL and thus reduces the number of communication rounds required for convergence, but results in higher per-round latency, and vice versa. The associated problem of latency minimization is intractable; however, we solve it by designing an accurate and tractable surrogate for convergence speed, with parameters fitted to real data. This approach yields two batch-size control strategies tailored to scenarios with slow and fast fading, while also accommodating device heterogeneity. Extensive experiments using real datasets demonstrate that the proposed strategies outperform conventional batch-size adaptation schemes that do not consider the C$^2$ tradeoff or device heterogeneity.
nan
Article 625
Title@2025-07-21 (1): Applying the Chinese Wall Reverse Engineering Technique to Large Language Model Code Editing
Title: Applying the Chinese Wall Reverse Engineering Technique to Large Language Model Code Editing | Anwendung der Technik der chinesischen Wandumkehrtechnik auf die Bearbeitung von großen Sprachmodellen | 将中国长墙反向工程技术应用到大语言模式编辑 2507.15599v1 |
Authors (1): Manatsawin Hanmongkolchai
Large language models for code (Code LLM) are increasingly utilized in programming environments. Despite their utility, the training datasets for top LLM remain undisclosed, raising concerns about potential copyright violations. Some models, such as Pleias and Comma put emphasis on data curation and licenses, however, with limited training data these models are not competitive and only serve as proof of concepts. To improve the utility of these models, we propose an application of the “Chinese Wall” technique, inspired by the reverse engineering technique of the same name – a high quality model is used to generate detailed instructions for a weaker model. By doing so, a weaker but ethically aligned model may be used to perform complicated tasks that, otherwise, can only be completed by more powerful models. In our evaluation, we’ve found that this technique improves Comma v0.1 1T’s performance in CanItEdit benchmark by over 66%, and Starcoder2 Instruct by roughly 20% compared to when running the same model on the benchmark alone. The practical application of this technique today, however, may be limited due to the lack of models trained on public domain content without copyright restrictions.
nan
Article 626
Title@2025-07-21 (1): Being-H0: Vision-Language-Action Pretraining from Large-Scale Human Videos
Title: Being-H0: Vision-Language-Action Pretraining from Large-Scale Human Videos | Being-H0: Vision-Sprache-Aktion Vorschulung von großformatigen menschlichen Videos | 人与人:通过大型人类视频进行视觉-语言-行动预培训 2507.15597v1 |
Authors (10): Hao Luo, Yicheng Feng, Wanpeng Zhang, Sipeng Zheng, Ye Wang, Haoqi Yuan, Jiazheng Liu, Chaoyi Xu, Qin Jin, Zongqing Lu
We introduce Being-H0, a dexterous Vision-Language-Action model (VLA) trained on large-scale human videos. Existing VLAs struggle with complex manipulation tasks requiring high dexterity and generalize poorly to novel scenarios and tasks, primarily due to their reliance on synthetic data with significant sim-to-real gaps or teleoperated demonstrations lacking scale and diversity. To address this data bottleneck, we propose leveraging human hands as a foundation manipulator, capitalizing on the rich dexterity and scalability present in web data. Our approach centers on physical instruction tuning, a novel training paradigm that combines large-scale VLA pretraining from human videos, physical space alignment for 3D reasoning, and post-training adaptation for robotic tasks. Additionally, we introduce a part-level motion tokenization method which achieves millimeter-level reconstruction accuracy to model precise hand trajectories for action learning. To support our proposed paradigm, we further develop a comprehensive data curation pipeline that integrates heterogeneous sources – including motion capture, VR, and RGB-only videos – into a large-scale dataset with millions of motion-based instructional instances. We empirically show the excellence of Being-H0 in hand motion generation and instruction following, and it also scales well with model and data sizes. Importantly, we observe the expected gains of Being-H0 in real-world robotic manipulation as physical instruction tuning is applied. More details are available at https://beingbeyond.github.io/Being-H0.
nan
Article 627
Title@2025-07-21 (1): Red-Team Multi-Agent Reinforcement Learning for Emergency Braking Scenario
Title: Red-Team Multi-Agent Reinforcement Learning for Emergency Braking Scenario | Red-Team Multi-Agent Verstärkungs-Lernen für Notfall-Brems-Szenario | 红队多机构强化学习,用于紧急制动设想方案 2507.15587v1 |
Authors (6): Yinsong Chen, Kaifeng Wang, Xiaoqiang Meng, Xueyuan Li, Zirui Li, Xin Gao
Current research on decision-making in safety-critical scenarios often relies on inefficient data-driven scenario generation or specific modeling approaches, which fail to capture corner cases in real-world contexts. To address this issue, we propose a Red-Team Multi-Agent Reinforcement Learning framework, where background vehicles with interference capabilities are treated as red-team agents. Through active interference and exploration, red-team vehicles can uncover corner cases outside the data distribution. The framework uses a Constraint Graph Representation Markov Decision Process, ensuring that red-team vehicles comply with safety rules while continuously disrupting the autonomous vehicles (AVs). A policy threat zone model is constructed to quantify the threat posed by red-team vehicles to AVs, inducing more extreme actions to increase the danger level of the scenario. Experimental results show that the proposed framework significantly impacts AVs decision-making safety and generates various corner cases. This method also offers a novel direction for research in safety-critical scenarios.
nan
Article 628
Title@2025-07-21 (1): We Need to Rethink Benchmarking in Anomaly Detection
Title: We Need to Rethink Benchmarking in Anomaly Detection | Wir müssen Benchmarking bei Anomalienerkennung neu denken | 我们需要重新思考异常探测的基准 2507.15584v1 |
Authors (4): Philipp Röchner, Simon Klüttermann, Franz Rothlauf, Daniel Schlör
Despite the continuous proposal of new anomaly detection algorithms and extensive benchmarking efforts, progress seems to stagnate, with only minor performance differences between established baselines and new algorithms. In this position paper, we argue that this stagnation is due to limitations in how we evaluate anomaly detection algorithms. Current benchmarking does not, for example, sufficiently reflect the diversity of anomalies in applications ranging from predictive maintenance to scientific discovery. Consequently, we need to rethink benchmarking in anomaly detection. In our opinion, anomaly detection should be studied using scenarios that capture the relevant characteristics of different applications. We identify three key areas for improvement: First, we need to identify anomaly detection scenarios based on a common taxonomy. Second, anomaly detection pipelines should be analyzed end-to-end and by component. Third, evaluating anomaly detection algorithms should be meaningful regarding the scenario’s objectives.
nan
Article 629
Title@2025-07-21 (1): Automated Classification of Volcanic Earthquakes Using Transformer Encoders: Insights into Data Quality and Model Interpretability
Title: Automated Classification of Volcanic Earthquakes Using Transformer Encoders: Insights into Data Quality and Model Interpretability | Automatisierte Klassifizierung von Vulkan-Erdbeben mit Transformer-Encodern: Einblicke in Datenqualität und Modellinterpretierbarkeit | 利用变换器计算器对火山地震进行自动分类:对数据质量和模型解释的透视 2507.01260v2 |
Authors (5): Y. Suzuki, Y. Yukutake, T. Ohminato, M. Yamasaki, Ahyi Kim
Precisely classifying earthquake types is crucial for elucidating the relationship between volcanic earthquakes and volcanic activity. However, traditional methods rely on subjective human judgment, which requires considerable time and effort. To address this issue, we developed a deep learning model using a transformer encoder for a more objective and efficient classification. Tested on Mount Asama’s diverse seismic activity, our model achieved high F1 scores (0.930 for volcano tectonic, 0.931 for low-frequency earthquakes, and 0.980 for noise), superior to a conventional CNN-based method. To enhance interpretability, attention weight visualizations were analyzed, revealing that the model focuses on key waveform features similarly to human experts. However, inconsistencies in training data, such as ambiguously labeled B-type events with S-waves, were found to influence classification accuracy and attention weight distributions. Experiments addressing data selection and augmentation demonstrated the importance of balancing data quality and diversity. In addition, stations within 3 km of the crater played an important role in improving model performance and interpretability. These findings highlight the potential of Transformer-based models for automated volcanic earthquake classification, particularly in improving efficiency and interpretability. By addressing challenges such as data imbalance and subjective labeling, our approach provides a robust framework for understanding seismic activity at Mount Asama. Moreover, this framework offers opportunities for transfer learning to other volcanic regions, paving the way for enhanced volcanic hazard assessments and disaster mitigation strategies.
nan
Article 630
Title@2025-07-21 (1): GeMix: Conditional GAN-Based Mixup for Improved Medical Image Augmentation
Title: GeMix: Conditional GAN-Based Mixup for Improved Medical Image Augmentation | GeMix: Bedingtes GAN-basiertes Mixup für verbesserte medizinische Bildvergrößerung | GeMix:改进医学图像放大条件性 GAN 混合组合 2507.15577v1 |
Authors (5): Hugo Carlesso, Maria Eliza Patulea, Moncef Garouani, Radu Tudor Ionescu, Josiane Mothe
Mixup has become a popular augmentation strategy for image classification, yet its naive pixel-wise interpolation often produces unrealistic images that can hinder learning, particularly in high-stakes medical applications. We propose GeMix, a two-stage framework that replaces heuristic blending with a learned, label-aware interpolation powered by class-conditional GANs. First, a StyleGAN2-ADA generator is trained on the target dataset. During augmentation, we sample two label vectors from Dirichlet priors biased toward different classes and blend them via a Beta-distributed coefficient. Then, we condition the generator on this soft label to synthesize visually coherent images that lie along a continuous class manifold. We benchmark GeMix on the large-scale COVIDx-CT-3 dataset using three backbones (ResNet-50, ResNet-101, EfficientNet-B0). When combined with real data, our method increases macro-F1 over traditional mixup for all backbones, reducing the false negative rate for COVID-19 detection. GeMix is thus a drop-in replacement for pixel-space mixup, delivering stronger regularization and greater semantic fidelity, without disrupting existing training pipelines. We publicly release our code at https://github.com/hugocarlesso/GeMix to foster reproducibility and further research.
nan
Article 631
Title@2025-07-21 (1): On the Role of AI in Managing Satellite Constellations: Insights from the ConstellAI Project
Title: On the Role of AI in Managing Satellite Constellations: Insights from the ConstellAI Project | Über die Rolle der KI bei der Verwaltung von Satellitenkonstellationen: Einblicke aus dem ConstellAI-Projekt | 关于AI在管理卫星星座方面的作用:ConstellAI项目透视 2507.15574v1 |
Authors (7): Gregory F. Stock, Juan A. Fraire, Holger Hermanns, Jędrzej Mosiężny, Yusra Al-Khazraji, Julio Ramírez Molina, Evridiki V. Ntagiou
The rapid expansion of satellite constellations in near-Earth orbits presents significant challenges in satellite network management, requiring innovative approaches for efficient, scalable, and resilient operations. This paper explores the role of Artificial Intelligence (AI) in optimizing the operation of satellite mega-constellations, drawing from the ConstellAI project funded by the European Space Agency (ESA). A consortium comprising GMV GmbH, Saarland University, and Thales Alenia Space collaborates to develop AI-driven algorithms and demonstrates their effectiveness over traditional methods for two crucial operational challenges: data routing and resource allocation. In the routing use case, Reinforcement Learning (RL) is used to improve the end-to-end latency by learning from historical queuing latency, outperforming classical shortest path algorithms. For resource allocation, RL optimizes the scheduling of tasks across constellations, focussing on efficiently using limited resources such as battery and memory. Both use cases were tested for multiple satellite constellation configurations and operational scenarios, resembling the real-life spacecraft operations of communications and Earth observation satellites. This research demonstrates that RL not only competes with classical approaches but also offers enhanced flexibility, scalability, and generalizability in decision-making processes, which is crucial for the autonomous and intelligent management of satellite fleets. The findings of this activity suggest that AI can fundamentally alter the landscape of satellite constellation management by providing more adaptive, robust, and cost-effective solutions.
nan
Article 632
Title@2025-07-21 (1): Derivative-Free Diffusion Manifold-Constrained Gradient for Unified XAI
Title: Derivative-Free Diffusion Manifold-Constrained Gradient for Unified XAI | Derivativ-freie Diffusion Manifold-Contrained Gradient für Unified XAI | 用于统一 XAI 的衍生-无扩散扩散操纵受训练梯度 2411.15265v2 |
Authors (6): Won Jun Kim, Hyungjin Chung, Jaemin Kim, Sangmin Lee, Byeongsu Sim, Jong Chul Ye
Gradient-based methods are a prototypical family of explainability techniques, especially for image-based models. Nonetheless, they have several shortcomings in that they (1) require white-box access to models, (2) are vulnerable to adversarial attacks, and (3) produce attributions that lie off the image manifold, leading to explanations that are not actually faithful to the model and do not align well with human perception. To overcome these challenges, we introduce Derivative-Free Diffusion Manifold-Constrainted Gradients (FreeMCG), a novel method that serves as an improved basis for explainability of a given neural network than the traditional gradient. Specifically, by leveraging ensemble Kalman filters and diffusion models, we derive a derivative-free approximation of the model’s gradient projected onto the data manifold, requiring access only to the model’s outputs. We demonstrate the effectiveness of FreeMCG by applying it to both counterfactual generation and feature attribution, which have traditionally been treated as distinct tasks. Through comprehensive evaluation on both tasks, counterfactual explanation and feature attribution, we show that our method yields state-of-the-art results while preserving the essential properties expected of XAI tools.
nan
Article 633
Title@2025-07-21 (1): Generalized Consistency Trajectory Models for Image Manipulation
Title: Generalized Consistency Trajectory Models for Image Manipulation | Generalisierte Konsistenz-Trajektorien für die Bildmanipulation | 用于图像操纵的通用一致轨迹模型 2403.12510v4 |
Authors (4): Beomsu Kim, Jaemin Kim, Jeongsol Kim, Jong Chul Ye
Diffusion models (DMs) excel in unconditional generation, as well as on applications such as image editing and restoration. The success of DMs lies in the iterative nature of diffusion: diffusion breaks down the complex process of mapping noise to data into a sequence of simple denoising tasks. Moreover, we are able to exert fine-grained control over the generation process by injecting guidance terms into each denoising step. However, the iterative process is also computationally intensive, often taking from tens up to thousands of function evaluations. Although consistency trajectory models (CTMs) enable traversal between any time points along the probability flow ODE (PFODE) and score inference with a single function evaluation, CTMs only allow translation from Gaussian noise to data. This work aims to unlock the full potential of CTMs by proposing generalized CTMs (GCTMs), which translate between arbitrary distributions via ODEs. We discuss the design space of GCTMs and demonstrate their efficacy in various image manipulation tasks such as image-to-image translation, restoration, and editing.
nan
Article 634
Title@2025-07-21 (1): Towards Reliable, Uncertainty-Aware Alignment
Title: Towards Reliable, Uncertainty-Aware Alignment | Zuverlässige, unsichere Ausrichtung | 实现可靠、不确定和不确定的软件统一 2507.15906v1 |
Authors (3): Debangshu Banerjee, Kintan Saha, Aditya Gopalan
Alignment of large language models (LLMs) typically involves training a reward model on preference data, followed by policy optimization with respect to the reward model. However, optimizing policies with respect to a single reward model estimate can render it vulnerable to inaccuracies in the reward model. We empirically study the variability of reward model training on open-source benchmarks. We observe that independently trained reward models on the same preference dataset can exhibit substantial disagreement, highlighting the instability of current alignment strategies. Employing a theoretical model, we demonstrate that variability in reward model estimation can cause overfitting, leading to the risk of performance degradation. To mitigate this risk, we propose a variance-aware policy optimization framework for preference-based alignment. The key ingredient of the framework is a new policy regularizer that incorporates reward model variance estimates. We show that variance-aware policy optimization provably reduces the risk of outputting a worse policy than the default. Experiments across diverse LLM and reward model configurations confirm that our approach yields more stable and robust alignment than the standard (variance-unaware) pipeline.
nan
Article 635
Title@2025-07-21 (1): Trade-offs between elective surgery rescheduling and length-of-stay prediction accuracy
Title: Trade-offs between elective surgery rescheduling and length-of-stay prediction accuracy | Kompromisse zwischen der Neuplanung der Wahloperation und der Genauigkeit der Langzeitprognose | 选择性外科重新安排与停留期预测准确性之间的权衡取舍 2507.15566v1 |
Authors (4): Pieter Smet, Martina Doneda, Ettore Lanzarone, Giuliana Carello
The availability of downstream resources plays a critical role in planning the admission of patients undergoing elective surgery, with inpatient beds being one of the most crucial resources. When planning patient admissions, predictions on their length-of-stay (LOS) made by machine learning (ML) models are used to ensure bed availability. However, the actual LOS for each patient may differ considerably from the predicted value, potentially making the schedule infeasible. To address such infeasibilities, rescheduling strategies that take advantage of operational flexibility can be implemented. For example, adjustments may include postponing admission dates, relocating patients to different wards, or even transferring patients who are already admitted. The common assumption is that more accurate LOS predictions reduce the impact of rescheduling. However, training ML models that can make such accurate predictions can be costly. Building on previous work that proposed simulated \ac{ml} for evaluating data-driven approaches, this paper explores the relationship between LOS prediction accuracy and rescheduling flexibility across various corrective policies. Specifically, we examine the most effective patient rescheduling strategies under LOS prediction errors to prevent bed overflows while optimizing resource utilization.
nan
Article 636
Title@2025-07-21 (1): zkFL: Zero-Knowledge Proof-based Gradient Aggregation for Federated Learning
Title: zkFL: Zero-Knowledge Proof-based Gradient Aggregation for Federated Learning | zkFL: Null-Knowledge Proof-based Gradient Aggregation für Federated Learning | zkFL: 联邦学习零知识校验渐进汇总 2310.02554v5 |
Authors (5): Zhipeng Wang, Nanqing Dong, Jiahao Sun, William Knottenbelt, Yike Guo
Federated learning (FL) is a machine learning paradigm, which enables multiple and decentralized clients to collaboratively train a model under the orchestration of a central aggregator. FL can be a scalable machine learning solution in big data scenarios. Traditional FL relies on the trust assumption of the central aggregator, which forms cohorts of clients honestly. However, a malicious aggregator, in reality, could abandon and replace the client’s training models, or insert fake clients, to manipulate the final training results. In this work, we introduce zkFL, which leverages zero-knowledge proofs to tackle the issue of a malicious aggregator during the training model aggregation process. To guarantee the correct aggregation results, the aggregator provides a proof per round, demonstrating to the clients that the aggregator executes the intended behavior faithfully. To further reduce the verification cost of clients, we use blockchain to handle the proof in a zero-knowledge way, where miners (i.e., the participants validating and maintaining the blockchain data) can verify the proof without knowing the clients’ local and aggregated models. The theoretical analysis and empirical results show that zkFL achieves better security and privacy than traditional FL, without modifying the underlying FL network structure or heavily compromising the training speed.
nan
Article 637
Title@2025-07-21 (1): PhysGym: Benchmarking LLMs in Interactive Physics Discovery with Controlled Priors
Title: PhysGym: Benchmarking LLMs in Interactive Physics Discovery with Controlled Priors | PhysGym: Benchmarking von LLMs in der interaktiven Physik-Discovery mit kontrollierten Prioren | PhysGym: 与受控前科互动物理发现中基准化LLMs 2507.15550v1 |
Authors (5): Yimeng Chen, Piotr Piȩkos, Mateusz Ostaszewski, Firas Laakom, Jürgen Schmidhuber
Evaluating the scientific discovery capabilities of large language model based agents, particularly how they cope with varying environmental complexity and utilize prior knowledge, requires specialized benchmarks currently lacking in the landscape. To address this gap, we introduce PhysGym, a novel benchmark suite and simulation platform for rigorously assessing LLM-based scientific reasoning in interactive physics environments. PhysGym’s primary contribution lies in its sophisticated control over the level of prior knowledge provided to the agent. This allows researchers to dissect agent performance along axes including the complexity of the problem and the prior knowledge levels. The benchmark comprises a suite of interactive simulations, where agents must actively probe environments, gather data sequentially under constraints and formulate hypotheses about underlying physical laws. PhysGym provides standardized evaluation protocols and metrics for assessing hypothesis accuracy and model fidelity. We demonstrate the benchmark’s utility by presenting results from baseline LLMs, showcasing its ability to differentiate capabilities based on varying priors and task complexity.
nan
Article 638
Title@2025-07-21 (1): The added value for MRI radiomics and deep-learning for glioblastoma prognostication compared to clinical and molecular information
Title: The added value for MRI radiomics and deep-learning for glioblastoma prognostication compared to clinical and molecular information | Der Mehrwert für MRT-Radiomik und Deep-Learning für Glioblastom-Prognostik im Vergleich zu klinischen und molekularen Informationen | 与临床和分子信息相比,MRI放射性辐射学和深层学习对于遗传性血浆瘤预测的增加值 2507.15548v1 |
Authors (22): D. Abler, O. Pusterla, A. Joye-Kühnis, N. Andratschke, M. Bach, A. Bink, S. M. Christ, P. Hagmann, B. Pouymayou, E. Pravatà, P. Radojewski, M. Reyes, L. Ruinelli, R. Schaer, B. Stieltjes, G. Treglia, W. Valenzuela, R. Wiest, S. Zoergiebel, M. Guckenberger, S. Tanadini-Lang, A. Depeursinge
Background: Radiomics shows promise in characterizing glioblastoma, but its added value over clinical and molecular predictors has yet to be proven. This study assessed the added value of conventional radiomics (CR) and deep learning (DL) MRI radiomics for glioblastoma prognosis (<= 6 vs > 6 months survival) on a large multi-center dataset. Methods: After patient selection, our curated dataset gathers 1152 glioblastoma (WHO 2016) patients from five Swiss centers and one public source. It included clinical (age, gender), molecular (MGMT, IDH), and baseline MRI data (T1, T1 contrast, FLAIR, T2) with tumor regions. CR and DL models were developed using standard methods and evaluated on internal and external cohorts. Sub-analyses assessed models with different feature sets (imaging-only, clinical/molecular-only, combined-features) and patient subsets (S-1: all patients, S-2: with molecular data, S-3: IDH wildtype). Results: The best performance was observed in the full cohort (S-1). In external validation, the combined-feature CR model achieved an AUC of 0.75, slightly, but significantly outperforming clinical-only (0.74) and imaging-only (0.68) models. DL models showed similar trends, though without statistical significance. In S-2 and S-3, combined models did not outperform clinical-only models. Exploratory analysis of CR models for overall survival prediction suggested greater relevance of imaging data: across all subsets, combined-feature models significantly outperformed clinical-only models, though with a modest advantage of 2-4 C-index points. Conclusions: While confirming the predictive value of anatomical MRI sequences for glioblastoma prognosis, this multi-center study found standard CR and DL radiomics approaches offer minimal added value over demographic predictors such as age and gender.
nan
Article 639
Title@2025-07-21 (1): Improving AEBS Validation Through Objective Intervention Classification Leveraging the Prediction Divergence Principle
Title: Improving AEBS Validation Through Objective Intervention Classification Leveraging the Prediction Divergence Principle | Verbesserung der AEBS-Validierung durch Ziel-Interventions-Klassifikation Begünstigung des Prinzips der Prognoseabweichung | 通过利用预测差异原则的客观干预分类,改进对AEBS的验证 2507.07872v2 |
Authors (2): Daniel Betschinske, Steven Peters
The safety validation of automatic emergency braking system (AEBS) requires accurately distinguishing between false positive (FP) and true positive (TP) system activations. While simulations allow straightforward differentiation by comparing scenarios with and without interventions, analyzing activations from open-loop resimulations - such as those from field operational testing (FOT) - is more complex. This complexity arises from scenario parameter uncertainty and the influence of driver interventions in the recorded data. Human labeling is frequently used to address these challenges, relying on subjective assessments of intervention necessity or situational criticality, potentially introducing biases and limitations. This work proposes a rule-based classification approach leveraging the Prediction Divergence Principle (PDP) to address those issues. Applied to a simplified AEBS, the proposed method reveals key strengths, limitations, and system requirements for effective implementation. The findings suggest that combining this approach with human labeling may enhance the transparency and consistency of classification, thereby improving the overall validation process. While the rule set for classification derived in this work adopts a conservative approach, the paper outlines future directions for refinement and broader applicability. Finally, this work highlights the potential of such methods to complement existing practices, paving the way for more reliable and reproducible AEBS validation frameworks.
nan
Article 640
Title@2025-07-21 (1): Data Aware Differentiable Neural Architecture Search for Tiny Keyword Spotting Applications
Title: Data Aware Differentiable Neural Architecture Search for Tiny Keyword Spotting Applications | Data Aware Differentiable Neural Architecture Suche nach winzigen Keyword-Spoting-Anwendungen | Data Ental Invecled 不同神经结构搜索微小关键词点点名应用 2507.15545v1 |
Authors (5): Yujia Shi, Emil Njor, Pablo Martínez-Nuevo, Sven Ewan Shepstone, Xenofon Fafoutis
The success of Machine Learning is increasingly tempered by its significant resource footprint, driving interest in efficient paradigms like TinyML. However, the inherent complexity of designing TinyML systems hampers their broad adoption. To reduce this complexity, we introduce “Data Aware Differentiable Neural Architecture Search”. Unlike conventional Differentiable Neural Architecture Search, our approach expands the search space to include data configuration parameters alongside architectural choices. This enables Data Aware Differentiable Neural Architecture Search to co-optimize model architecture and input data characteristics, effectively balancing resource usage and system performance for TinyML applications. Initial results on keyword spotting demonstrate that this novel approach to TinyML system design can generate lean but highly accurate systems.
nan
Article 641
Title@2025-07-21 (1): Foundation Models and Transformers for Anomaly Detection: A Survey
Title: Foundation Models and Transformers for Anomaly Detection: A Survey | Grundlagenmodelle und Transformer zur Erkennung von Anomalien: Eine Umfrage | 异常探测的基础模型和变形模型:调查 2507.15905v1 |
Authors (5): Mouïn Ben Ammar, Arturo Mendoza, Nacim Belkhir, Antoine Manzanera, Gianni Franchi
In line with the development of deep learning, this survey examines the transformative role of Transformers and foundation models in advancing visual anomaly detection (VAD). We explore how these architectures, with their global receptive fields and adaptability, address challenges such as long-range dependency modeling, contextual modeling and data scarcity. The survey categorizes VAD methods into reconstruction-based, feature-based and zero/few-shot approaches, highlighting the paradigm shift brought about by foundation models. By integrating attention mechanisms and leveraging large-scale pre-training, Transformers and foundation models enable more robust, interpretable, and scalable anomaly detection solutions. This work provides a comprehensive review of state-of-the-art techniques, their strengths, limitations, and emerging trends in leveraging these architectures for VAD.
nan
Article 642
Title@2025-07-21 (1): Controlled Model Debiasing through Minimal and Interpretable Updates
Title: Controlled Model Debiasing through Minimal and Interpretable Updates | Controlled Model Debiasing durch minimale und interpretierbare Updates | 通过最小和可解释的更新减少偏差 2502.21284v2 |
Authors (4): Federico Di Gennaro, Thibault Laugel, Vincent Grari, Marcin Detyniecki
Traditional approaches to learning fair machine learning models often require rebuilding models from scratch, typically without considering potentially existing models. In a context where models need to be retrained frequently, this can lead to inconsistent model updates, as well as redundant and costly validation testing. To address this limitation, we introduce the notion of controlled model debiasing, a novel supervised learning task relying on two desiderata: that the differences between the new fair model and the existing one should be (i) minimal and (ii) interpretable. After providing theoretical guarantees to this new problem, we introduce a novel algorithm for algorithmic fairness, COMMOD, that is both model-agnostic and does not require the sensitive attribute at test time. In addition, our algorithm is explicitly designed to enforce minimal and interpretable changes between biased and debiased predictions in a binary classification task, a property that, while highly desirable in high-stakes applications, is rarely prioritized as an explicit objective in fairness literature. Our approach combines a concept-based architecture and adversarial learning and we demonstrate through empirical results that it achieves comparable performance to state-of-the-art debiasing methods while performing minimal and interpretable prediction changes.
nan
Article 643
Title@2025-07-21 (1): Closed-form Solutions: A New Perspective on Solving Differential Equations
Title: Closed-form Solutions: A New Perspective on Solving Differential Equations | Closed-form Lösungen: Eine neue Perspektive zur Lösung von Differentialgleichungen | 封闭式解决办法:解决差异等量的新视角 2405.14620v4 |
Authors (11): Shu Wei, Yanjie Li, Lina Yu, Weijun Li, Min Wu, Linjun Sun, Jingyi Liu, Hong Qin, Yusong Deng, Jufeng Han, Yan Pang
The quest for analytical solutions to differential equations has traditionally been constrained by the need for extensive mathematical expertise. Machine learning methods like genetic algorithms have shown promise in this domain, but are hindered by significant computational time and the complexity of their derived solutions. This paper introduces SSDE (Symbolic Solver for Differential Equations), a novel reinforcement learning-based approach that derives symbolic closed-form solutions for various differential equations. Evaluations across a diverse set of ordinary and partial differential equations demonstrate that SSDE outperforms existing machine learning methods, delivering superior accuracy and efficiency in obtaining analytical solutions.
nan
Article 644
Title@2025-07-21 (1): Safe and High-Performance Learning of Model Predicitve Control using Kernel-Based Interpolation
Title: Safe and High-Performance Learning of Model Predicitve Control using Kernel-Based Interpolation | Sicheres und hochleistungsfähiges Lernen der Modellprädizitve-Steuerung mittels Kernel-basierter Interpolation | 利用以内核为基础的内流内插,安全而高绩效地学习示范先决控制模型 2410.06771v2 |
Authors (3): Alexander Rose, Philipp Schaub, Rolf Findeisen
We present a method that allows efficient and safe approximation of model predictive controllers using kernel interpolation. Since the computational complexity of the approximating function scales linearly with the number of data points, we propose to use a scoring function which chooses the most promising data. To further reduce the complexity of the approximation, we restrict our considerations to the set of closed-loop reachable states. That is, the approximating function only has to be accurate within this set. This makes our method especially suited for systems, where the set of initial conditions is small. In order to guarantee safety and high performance of the designed approximated controller, we use reachability analysis based on Monte Carlo methods.
nan
Article 645
Title@2025-07-21 (1): An Investigation of Test-time Adaptation for Audio Classification under Background Noise
Title: An Investigation of Test-time Adaptation for Audio Classification under Background Noise | Eine Untersuchung der Testzeitanpassung für die Audioklassifikation unter Hintergrundgeräuschen | 关于背景噪音下音频分类的试验时间适应情况调查 2507.15523v1 |
Authors (4): Weichuang Shao, Iman Yi Liao, Tomas Henrique Bode Maul, Tissa Chandesa
Domain shift is a prominent problem in Deep Learning, causing a model pre-trained on a source dataset to suffer significant performance degradation on test datasets. This research aims to address the issue of audio classification under domain shift caused by background noise using Test-Time Adaptation (TTA), a technique that adapts a pre-trained model during testing using only unlabelled test data before making predictions. We adopt two common TTA methods, TTT and TENT, and a state-of-the-art method CoNMix, and investigate their respective performance on two popular audio classification datasets, AudioMNIST (AM) and SpeechCommands V1 (SC), against different types of background noise and noise severity levels. The experimental results reveal that our proposed modified version of CoNMix produced the highest classification accuracy under domain shift (5.31% error rate under 10 dB exercise bike background noise and 12.75% error rate under 3 dB running tap background noise for AM) compared to TTT and TENT. The literature search provided no evidence of similar works, thereby motivating the work reported here as the first study to leverage TTA techniques for audio classification under domain shift.
nan
Article 646
Title@2025-07-21 (1): Dictionary-Learning-Based Data Pruning for System Identification
Title: Dictionary-Learning-Based Data Pruning for System Identification | Wörterbuch-Learning-basierte Datenprüfung für die Systemidentifikation | 用于系统识别的词典 – – 以学习为基础的数据保护 2502.11484v2 |
Authors (4): Tingna Wang, Sikai Zhang, Mingming Song, Limin Sun
System identification is normally involved in augmenting time series data by time shifting and nonlinearisation (e.g., polynomial basis), both of which introduce redundancy in features and samples. Many research works focus on reducing redundancy feature-wise, while less attention is paid to sample-wise redundancy. This paper proposes a novel data pruning method, called mini-batch FastCan, to reduce sample-wise redundancy based on dictionary learning. Time series data is represented by some representative samples, called atoms, via dictionary learning. The useful samples are selected based on their correlation with the atoms. The method is tested on one simulated dataset and two benchmark datasets. The R-squared between the coefficients of models trained on the full datasets and the coefficients of models trained on pruned datasets is adopted to evaluate the performance of data pruning methods. It is found that the proposed method significantly outperforms the random pruning method.
nan
Article 647
Title@2025-07-21 (1): MDNF: Multi-Diffusion-Nets for Neural Fields on Meshes
Title: MDNF: Multi-Diffusion-Nets for Neural Fields on Meshes | MDNF: Multi-Diffusionsnetze für neurale Felder auf Maschen | MDNF:Mshes神经场多传播网络 2409.03034v2 |
Authors (3): Avigail Cohen Rimon, Tal Shnitzer, Mirela Ben Chen
We propose a novel framework for representing neural fields on triangle meshes that is multi-resolution across both spatial and frequency domains. Inspired by the Neural Fourier Filter Bank (NFFB), our architecture decomposes the spatial and frequency domains by associating finer spatial resolution levels with higher frequency bands, while coarser resolutions are mapped to lower frequencies. To achieve geometry-aware spatial decomposition we leverage multiple DiffusionNet components, each associated with a different spatial resolution level. Subsequently, we apply a Fourier feature mapping to encourage finer resolution levels to be associated with higher frequencies. The final signal is composed in a wavelet-inspired manner using a sine-activated MLP, aggregating higher-frequency signals on top of lower-frequency ones. Our architecture attains high accuracy in learning complex neural fields and is robust to discontinuities, exponential scale variations of the target field, and mesh modification. We demonstrate the effectiveness of our approach through its application to diverse neural fields, such as synthetic RGB functions, UV texture coordinates, and vertex normals, illustrating different challenges. To validate our method, we compare its performance against two alternatives, showcasing the advantages of our multi-resolution architecture.
nan
Article 648
Title@2025-07-21 (1): Off-Policy Corrected Reward Modeling for Reinforcement Learning from Human Feedback
Title: Off-Policy Corrected Reward Modeling for Reinforcement Learning from Human Feedback | Off-Policy korrigierte Prämienmodellierung für verstärktes Lernen aus menschlichem Feedback | 利用人类反馈加强学习的非政策纠正奖励模型 2507.15507v1 |
Authors (3): Johannes Ackermann, Takashi Ishida, Masashi Sugiyama
Reinforcement Learning from Human Feedback (RLHF) allows us to train models, such as language models (LMs), to follow complex human preferences. In RLHF for LMs, we first train an LM using supervised fine-tuning, sample pairs of responses, obtain human feedback, and use the resulting data to train a reward model (RM). RL methods are then used to train the LM to maximize the reward given by the RM. As training progresses, the responses generated by the LM no longer resemble the responses seen by the RM during training, leading to the RM becoming inaccurate. The score given by the RM keeps increasing, but the learned behavior no longer matches the human preferences. This issue is known as overoptimization. We investigate overoptimization from the point of view of distribution shift and show that the shift results in an inconsistent estimate of the RM parameters, leading to an inconsistent estimate of the policy gradient. We propose Off-Policy Corrected Reward Modeling (OCRM), which iteratively off-policy corrects the RM using importance weighting, without requiring new labels or samples. This results in a more accurate RM, which empirically leads to an improved final policy. We validate our approach in experiments with summarization and chatbot datasets and show that it performs significantly better than standard RLHF methods and baselines. Our implementation is available at https://github.com/JohannesAck/OffPolicyCorrectedRewardModeling
nan
Article 649
Title@2025-07-21 (1): ASPERA: A Simulated Environment to Evaluate Planning for Complex Action Execution
Title: ASPERA: A Simulated Environment to Evaluate Planning for Complex Action Execution | ASPERA: Eine simulierte Umgebung, um Planung für komplexe Aktionen zu bewerten | ASPERA:评估复杂行动执行规划的模拟环境 2507.15501v1 |
Authors (9): Alexandru Coca, Mark Gaynor, Zhenxing Zhang, Jianpeng Cheng, Bo-Hsiang Tseng, Pete Boothroyd, Héctor Martinez Alonso, Diarmuid Ó Séaghdha, Anders Johannsen
This work evaluates the potential of large language models (LLMs) to power digital assistants capable of complex action execution. These assistants rely on pre-trained programming knowledge to execute multi-step goals by composing objects and functions defined in assistant libraries into action execution programs. To achieve this, we develop ASPERA, a framework comprising an assistant library simulation and a human-assisted LLM data generation engine. Our engine allows developers to guide LLM generation of high-quality tasks consisting of complex user queries, simulation state and corresponding validation programs, tackling data availability and evaluation robustness challenges. Alongside the framework we release Asper-Bench, an evaluation dataset of 250 challenging tasks generated using ASPERA, which we use to show that program generation grounded in custom assistant libraries is a significant challenge to LLMs compared to dependency-free code generation.
nan
Article 650
Title@2025-07-21 (1): Ranking-Based At-Risk Student Prediction Using Federated Learning and Differential Features
Title: Ranking-Based At-Risk Student Prediction Using Federated Learning and Differential Features | Rankingbasierte At-Risk-Prognose von Studenten mit Federated Learning und Differential Features | 利用联邦学习和不同特点,按等级排列的在风险时学生预测 2505.09287v2 |
Authors (5): Shunsuke Yoneda, Valdemar Švábenský, Gen Li, Daisuke Deguchi, Atsushi Shimada
Digital textbooks are widely used in various educational contexts, such as university courses and online lectures. Such textbooks yield learning log data that have been used in numerous educational data mining (EDM) studies for student behavior analysis and performance prediction. However, these studies have faced challenges in integrating confidential data, such as academic records and learning logs, across schools due to privacy concerns. Consequently, analyses are often conducted with data limited to a single school, which makes developing high-performing and generalizable models difficult. This study proposes a method that combines federated learning and differential features to address these issues. Federated learning enables model training without centralizing data, thereby preserving student privacy. Differential features, which utilize relative values instead of absolute values, enhance model performance and generalizability. To evaluate the proposed method, a model for predicting at-risk students was trained using data from 1,136 students across 12 courses conducted over 4 years, and validated on hold-out test data from 5 other courses. Experimental results demonstrated that the proposed method addresses privacy concerns while achieving performance comparable to that of models trained via centralized learning in terms of Top-n precision, nDCG, and PR-AUC. Furthermore, using differential features improved prediction performance across all evaluation datasets compared to non-differential approaches. The trained models were also applicable for early prediction, achieving high performance in detecting at-risk students in earlier stages of the semester within the validation datasets.
nan
Article 651
Title@2025-07-21 (1): Fast-VAT: Accelerating Cluster Tendency Visualization using Cython and Numba
Title: Fast-VAT: Accelerating Cluster Tendency Visualization using Cython and Numba | Schnell-MwSt: Beschleunigung der Cluster-Tendenzvisualisierung mit Cython und Numba | 快速VAT:使用Cython和Numba加速集束密度可视化 2507.15904v1 |
Authors (2): MSR Avinash, Ismael Lachheb
Visual Assessment of Cluster Tendency (VAT) is a widely used unsupervised technique to assess the presence of cluster structure in unlabeled datasets. However, its standard implementation suffers from significant performance limitations due to its O(n^2) time complexity and inefficient memory usage. In this work, we present Fast-VAT, a high-performance reimplementation of the VAT algorithm in Python, augmented with Numba’s Just-In-Time (JIT) compilation and Cython’s static typing and low-level memory optimizations. Our approach achieves up to 50x speedup over the baseline implementation, while preserving the output fidelity of the original method. We validate Fast-VAT on a suite of real and synthetic datasets – including Iris, Mall Customers, and Spotify subsets – and verify cluster tendency using Hopkins statistics, PCA, and t-SNE. Additionally, we compare VAT’s structural insights with clustering results from DBSCAN and K-Means to confirm its reliability.
nan
Article 652
Title@2025-07-21 (1): Dense-depth map guided deep Lidar-Visual Odometry with Sparse Point Clouds and Images
Title: Dense-depth map guided deep Lidar-Visual Odometry with Sparse Point Clouds and Images | Tiefe Karte geführte tiefe Lidar-Visual-Odometrie mit Sparse Point Clouds und Bildern | 带散点云和图象的深深深带深深深带深深深带深深带深深带深深带深带深带深带深带深带深带深带深带深带深带深带深带深带深带深带深带深带深带深带深带深带深带深带深带深带的光度图像和微微点云和图象的图案 2507.15496v1 |
Authors (6): JunYing Huang, Ao Xu, DongSun Yong, KeRen Li, YuanFeng Wang, Qi Qin
Odometry is a critical task for autonomous systems for self-localization and navigation. We propose a novel LiDAR-Visual odometry framework that integrates LiDAR point clouds and images for accurate and robust pose estimation. Our method utilizes a dense-depth map estimated from point clouds and images through depth completion, and incorporates a multi-scale feature extraction network with attention mechanisms, enabling adaptive depth-aware representations. Furthermore, we leverage dense depth information to refine flow estimation and mitigate errors in occlusion-prone regions. Our hierarchical pose refinement module optimizes motion estimation progressively, ensuring robust predictions against dynamic environments and scale ambiguities. Comprehensive experiments on the KITTI odometry benchmark demonstrate that our approach achieves similar or superior accuracy and robustness compared to state-of-the-art visual and LiDAR odometry methods.
nan
Article 653
Title@2025-07-21 (1): Bayesian Optimization for Molecules Should Be Pareto-Aware
Title: Bayesian Optimization for Molecules Should Be Pareto-Aware | Bayesian Optimierung für Moleküle sollte Pareto-Bewusst sein | Bayesian Bayesian 分子优化应该是 Pareto- Aware 2507.13704v2 |
Authors (4): Anabel Yong, Austin Tripp, Layla Hosseini-Gerami, Brooks Paige
Multi-objective Bayesian optimization (MOBO) provides a principled framework for navigating trade-offs in molecular design. However, its empirical advantages over scalarized alternatives remain underexplored. We benchmark a simple Pareto-based MOBO strategy – Expected Hypervolume Improvement (EHVI) – against a simple fixed-weight scalarized baseline using Expected Improvement (EI), under a tightly controlled setup with identical Gaussian Process surrogates and molecular representations. Across three molecular optimization tasks, EHVI consistently outperforms scalarized EI in terms of Pareto front coverage, convergence speed, and chemical diversity. While scalarization encompasses flexible variants – including random or adaptive schemes – our results show that even strong deterministic instantiations can underperform in low-data regimes. These findings offer concrete evidence for the practical advantages of Pareto-aware acquisition in de novo molecular optimization, especially when evaluation budgets are limited and trade-offs are nontrivial.
nan
Article 654
Title@2025-07-21 (1): OMoE: Diversifying Mixture of Low-Rank Adaptation by Orthogonal Finetuning
Title: OMoE: Diversifying Mixture of Low-Rank Adaptation by Orthogonal Finetuning | OMoE: Diversifizierende Mischung aus Low-Rank-Anpassung durch Orthogonal Finetuning | OMoE:通过矫形微调使低Rank适应混合体多样化 2501.10062v2 |
Authors (6): Jinyuan Feng, Zhiqiang Pu, Tianyi Hu, Dongmin Li, Xiaolin Ai, Huimu Wang
Building mixture-of-experts (MoE) architecture for Low-rank adaptation (LoRA) is emerging as a potential direction in parameter-efficient fine-tuning (PEFT) for its modular design and remarkable performance. However, simply stacking the number of experts cannot guarantee significant improvement. In this work, we first conduct qualitative analysis to indicate that experts collapse to similar representations in vanilla MoE, limiting the capacity of modular design and computational efficiency. Ulteriorly, Our analysis reveals that the performance of previous MoE variants maybe limited by a lack of diversity among experts. Motivated by these findings, we propose Orthogonal Mixture-of-Experts (OMoE), a resource-efficient MoE variant that trains experts in an orthogonal manner to promote diversity. In OMoE, a Gram-Schmidt process is leveraged to enforce that the experts’ representations lie within the Stiefel manifold. By applying orthogonal constraints directly to the architecture, OMoE keeps the learning objective unchanged, without compromising optimality. Our method is simple and alleviates memory bottlenecks, as it incurs minimal experts compared to vanilla MoE models. Experiments on diverse commonsense reasoning benchmarks demonstrate that OMoE can consistently achieve stable and efficient performance improvement when compared with the state-of-the-art methods while significantly reducing the number of required experts.
nan
Article 655
Title@2025-07-21 (1): Low-dimensional Functions are Efficiently Learnable under Randomly Biased Distributions
Title: Low-dimensional Functions are Efficiently Learnable under Randomly Biased Distributions | Low-dimensionale Funktionen sind unter Randomly Biased Distributions effizient erlernbar | 低维函数可在随机偏差分布下高效学习 2502.06443v2 |
Authors (3): Elisabetta Cornacchia, Dan Mikulincer, Elchanan Mossel
The problem of learning single index and multi index models has gained significant interest as a fundamental task in high-dimensional statistics. Many recent works have analysed gradient-based methods, particularly in the setting of isotropic data distributions, often in the context of neural network training. Such studies have uncovered precise characterisations of algorithmic sample complexity in terms of certain analytic properties of the target function, such as the leap, information, and generative exponents. These properties establish a quantitative separation between low and high complexity learning tasks. In this work, we show that high complexity cases are rare. Specifically, we prove that introducing a small random perturbation to the data distribution–via a random shift in the first moment–renders any Gaussian single index model as easy to learn as a linear function. We further extend this result to a class of multi index models, namely sparse Boolean functions, also known as Juntas.
nan
Article 656
Title@2025-07-21 (1): Information Preserving Line Search via Bayesian Optimization
Title: Information Preserving Line Search via Bayesian Optimization | Informationen Erhaltung der Liniensuche über Bayesian Optimization | 通过 Bayesian 最佳优化保存信息 2507.15485v1 |
Authors (3): Robin Labryga, Tomislav Prusina, Sören Laue
Line search is a fundamental part of iterative optimization methods for unconstrained and bound-constrained optimization problems to determine suitable step lengths that provide sufficient improvement in each iteration. Traditional line search methods are based on iterative interval refinement, where valuable information about function value and gradient is discarded in each iteration. We propose a line search method via Bayesian optimization, preserving and utilizing otherwise discarded information to improve step-length choices. Our approach is guaranteed to converge and shows superior performance compared to state-of-the-art methods based on empirical tests on the challenging unconstrained and bound-constrained optimization problems from the CUTEst test set.
nan
Article 657
Title@2025-07-21 (1): The Constitutional Controller: Doubt-Calibrated Steering of Compliant Agents
Title: The Constitutional Controller: Doubt-Calibrated Steering of Compliant Agents | Der Verfassungsverantwortliche: Zweifelsfrei gesteuerte Steuerung von konformen Agenten | 宪制主计长:经核查的反怀疑管制人员指导员 2507.15478v1 |
Authors (7): Simon Kohaut, Felix Divo, Navid Hamid, Benedict Flade, Julian Eggert, Devendra Singh Dhami, Kristian Kersting
Ensuring reliable and rule-compliant behavior of autonomous agents in uncertain environments remains a fundamental challenge in modern robotics. Our work shows how neuro-symbolic systems, which integrate probabilistic, symbolic white-box reasoning models with deep learning methods, offer a powerful solution to this challenge. This enables the simultaneous consideration of explicit rules and neural models trained on noisy data, combining the strength of structured reasoning with flexible representations. To this end, we introduce the Constitutional Controller (CoCo), a novel framework designed to enhance the safety and reliability of agents by reasoning over deep probabilistic logic programs representing constraints such as those found in shared traffic spaces. Furthermore, we propose the concept of self-doubt, implemented as a probability density conditioned on doubt features such as travel velocity, employed sensors, or health factors. In a real-world aerial mobility study, we demonstrate CoCo’s advantages for intelligent autonomous systems to learn appropriate doubts and navigate complex and uncertain environments safely and compliantly.
nan
Article 658
Title@2025-07-21 (1): How to Leverage Predictive Uncertainty Estimates for Reducing Catastrophic Forgetting in Online Continual Learning
Title: How to Leverage Predictive Uncertainty Estimates for Reducing Catastrophic Forgetting in Online Continual Learning | Wie man Predictive Uncertainty Schätzungen für die Verringerung der Katastrophenvergessenheit in Online-Kontinual Learning | 如何利用预测的不确定性估算来减少在线持续学习中的灾难性遗忘 2407.07668v3 |
Authors (3): Giuseppe Serra, Ben Werner, Florian Buettner
Many real-world applications require machine-learning models to be able to deal with non-stationary data distributions and thus learn autonomously over an extended period of time, often in an online setting. One of the main challenges in this scenario is the so-called catastrophic forgetting (CF) for which the learning model tends to focus on the most recent tasks while experiencing predictive degradation on older ones. In the online setting, the most effective solutions employ a fixed-size memory buffer to store old samples used for replay when training on new tasks. Many approaches have been presented to tackle this problem. However, it is not clear how predictive uncertainty information for memory management can be leveraged in the most effective manner and conflicting strategies are proposed to populate the memory. Are the easiest-to-forget or the easiest-to-remember samples more effective in combating CF? Starting from the intuition that predictive uncertainty provides an idea of the samples’ location in the decision space, this work presents an in-depth analysis of different uncertainty estimates and strategies for populating the memory. The investigation provides a better understanding of the characteristics data points should have for alleviating CF. Then, we propose an alternative method for estimating predictive uncertainty via the generalised variance induced by the negative log-likelihood. Finally, we demonstrate that the use of predictive uncertainty measures helps in reducing CF in different settings.
nan
Article 659
Title@2025-07-21 (1): An Adaptive Random Fourier Features approach Applied to Learning Stochastic Differential Equations
Title: An Adaptive Random Fourier Features approach Applied to Learning Stochastic Differential Equations | Ein adaptives Random Fourier Features Ansatz angewandt, um stochastische Differentialgleichungen zu lernen | 用于学习斯托卡差异等量的适应性随机随机四变特性方法 2507.15442v1 |
Authors (4): Owen Douglas, Aku Kammonen, Anamika Pandey, Raúl Tempone
This work proposes a training algorithm based on adaptive random Fourier features (ARFF) with Metropolis sampling and resampling \cite{kammonen2024adaptiverandomfourierfeatures} for learning drift and diffusion components of stochastic differential equations from snapshot data. Specifically, this study considers It\^{o} diffusion processes and a likelihood-based loss function derived from the Euler-Maruyama integration introduced in \cite{Dietrich2023} and \cite{dridi2021learningstochasticdynamicalsystems}. This work evaluates the proposed method against benchmark problems presented in \cite{Dietrich2023}, including polynomial examples, underdamped Langevin dynamics, a stochastic susceptible-infected-recovered model, and a stochastic wave equation. Across all cases, the ARFF-based approach matches or surpasses the performance of conventional Adam-based optimization in both loss minimization and convergence speed. These results highlight the potential of ARFF as a compelling alternative for data-driven modeling of stochastic dynamics.
nan
Article 660
Title@2025-07-21 (1): The calculus of variations of the Transformer on the hyperspherical tangent bundle
Title: The calculus of variations of the Transformer on the hyperspherical tangent bundle | Die Variationsrechnung des Transformers auf dem hypersphärischen Tangentenbündel | 超球正切捆绑上变形器变形的微积分 2507.15431v1 |
Authors (1): Andrew Gracyk
We offer a theoretical mathematical background to Transformers through Lagrangian optimization across the token space. The Transformer, as a flow map, exists in the tangent fiber for each token along the high-dimensional unit sphere. The circumstance of the hypersphere across the latent data is reasonable due to the trained diagonal matrix equal to the identity, which has various empirical justifications. Thus, under the continuum limit of the dynamics, the latent vectors flow among the tangent bundle. Using these facts, we devise a mathematical framework for the Transformer through calculus of variations. We develop a functional and show that the continuous flow map induced by the Transformer satisfies this functional, therefore the Transformer can be viewed as a natural solver of a calculus of variations problem. We invent new scenarios of when our methods are applicable based on loss optimization with respect to path optimality. We derive the Euler-Lagrange equation for the Transformer. The variant of the Euler-Lagrange equation we present has various appearances in literature, but, to our understanding, oftentimes not foundationally proven or under other specialized cases. Our overarching proof is new: our techniques are classical and the use of the flow map object is original. We provide several other relevant results, primarily ones specific to neural scenarios. In particular, much of our analysis will be attempting to quantify Transformer data in variational contexts under neural approximations. Calculus of variations on manifolds is a well-nourished research area, but for the Transformer specifically, it is uncharted: we lay the foundation for this area through an introduction to the Lagrangian for the Transformer.
nan
Article 661
Title@2025-07-21 (1): SynthCTI: LLM-Driven Synthetic CTI Generation to enhance MITRE Technique Mapping
Title: SynthCTI: LLM-Driven Synthetic CTI Generation to enhance MITRE Technique Mapping | SynthCTI: LLM-getriebene synthetische CTI-Generation zur Verbesserung der MITRE-Technikmapping | 合成技术:利用LLM-Driven 合成CTI新一代,加强MITRE技术绘图 2507.16852v1 |
Authors (6): Álvaro Ruiz-Ródenas, Jaime Pujante Sáez, Daniel García-Algora, Mario Rodríguez Béjar, Jorge Blasco, José Luis Hernández-Ramos
Cyber Threat Intelligence (CTI) mining involves extracting structured insights from unstructured threat data, enabling organizations to understand and respond to evolving adversarial behavior. A key task in CTI mining is mapping threat descriptions to MITRE ATT\&CK techniques. However, this process is often performed manually, requiring expert knowledge and substantial effort. Automated approaches face two major challenges: the scarcity of high-quality labeled CTI data and class imbalance, where many techniques have very few examples. While domain-specific Large Language Models (LLMs) such as SecureBERT have shown improved performance, most recent work focuses on model architecture rather than addressing the data limitations. In this work, we present SynthCTI, a data augmentation framework designed to generate high-quality synthetic CTI sentences for underrepresented MITRE ATT\&CK techniques. Our method uses a clustering-based strategy to extract semantic context from training data and guide an LLM in producing synthetic CTI sentences that are lexically diverse and semantically faithful. We evaluate SynthCTI on two publicly available CTI datasets, CTI-to-MITRE and TRAM, using LLMs with different capacity. Incorporating synthetic data leads to consistent macro-F1 improvements: for example, ALBERT improves from 0.35 to 0.52 (a relative gain of 48.6\%), and SecureBERT reaches 0.6558 (up from 0.4412). Notably, smaller models augmented with SynthCTI outperform larger models trained without augmentation, demonstrating the value of data generation methods for building efficient and effective CTI classification systems.
nan
Article 662
Title@2025-07-21 (1): Attend or Perish: Benchmarking Attention in Algorithmic Reasoning
Title: Attend or Perish: Benchmarking Attention in Algorithmic Reasoning | Teilnahme oder Perish: Benchmarking-Achtung bei algorithmischer Vernunft | 出勤或风险:在算法理由中设定关注基准 2503.01909v2 |
Authors (4): Michal Spiegel, Michal Štefánik, Marek Kadlčík, Josef Kuchař
Can transformers learn to perform algorithmic tasks reliably across previously unseen input/output domains? While pre-trained language models show solid accuracy on benchmarks incorporating algorithmic reasoning, assessing the reliability of these results necessitates an ability to distinguish genuine algorithmic understanding from memorization. In this paper, we propose AttentionSpan, an algorithmic benchmark comprising five tasks of infinite input domains where we can disentangle and trace the correct, robust algorithm necessary for the task. This allows us to assess (i) models’ ability to extrapolate to unseen types of inputs, including new lengths, value ranges or input domains, but also (ii)to assess the robustness of their learned mechanisms. By analyzing attention maps and performing targeted interventions, we show that attention mechanism directly causes failures in extrapolation. We make the implementation of all our tasks and interpretability methods publicly available at https://github.com/michalspiegel/AttentionSpan .
nan
Article 663
Title@2025-07-21 (1): STUN: Structured-Then-Unstructured Pruning for Scalable MoE Pruning
Title: STUN: Structured-Then-Unstructured Pruning for Scalable MoE Pruning | STUN: Strukturierte und dann unstrukturierte Pruning für skalierbare MoE Pruning | STUN: 结构化的当时无结构化的为可缩缩的MoE Pruning提供结构化的当时无结构化的谨慎 2409.06211v2 |
Authors (6): Jaeseong Lee, seung-won hwang, Aurick Qiao, Daniel F Campos, Zhewei Yao, Yuxiong He
Mixture-of-experts (MoEs) have been adopted for reducing inference costs by sparsely activating experts in Large language models (LLMs). Despite this reduction, the massive number of experts in MoEs still makes them expensive to serve. In this paper, we study how to address this, by pruning MoEs. Among pruning methodologies, unstructured pruning has been known to achieve the highest performance for a given pruning ratio, compared to structured pruning, since the latter imposes constraints on the sparsification structure. This is intuitive, as the solution space of unstructured pruning subsumes that of structured pruning. However, our counterintuitive finding reveals that expert pruning, a form of structured pruning, can actually precede unstructured pruning to outperform unstructured-only pruning. As existing expert pruning, requiring $O(\frac{k^n}{\sqrt{n}})$ forward passes for $n$ experts, cannot scale for recent MoEs, we propose a scalable alternative with $O(1)$ complexity, yet outperforming the more expensive methods. The key idea is leveraging a latent structure between experts, based on behavior similarity, such that the greedy decision of whether to prune closely captures the joint pruning effect. Ours is highly effective – for Snowflake Arctic, a 480B-sized MoE with 128 experts, our method needs only one H100 and two hours to achieve nearly no loss in performance with 40% sparsity, even in generative tasks such as GSM8K, where state-of-the-art unstructured pruning fails to. The code will be made publicly available.
nan
Article 664
Title@2025-07-21 (1): Predictive Process Monitoring Using Object-centric Graph Embeddings
Title: Predictive Process Monitoring Using Object-centric Graph Embeddings | Predictive Process Monitoring mit objektzentrierten Graphen-Einbettungen | 利用以物体为中心的图示嵌入器进行预测过程监测 2507.15411v1 |
Authors (4): Wissam Gherissi, Mehdi Acheli, Joyce El Haddad, Daniela Grigori
Object-centric predictive process monitoring explores and utilizes object-centric event logs to enhance process predictions. The main challenge lies in extracting relevant information and building effective models. In this paper, we propose an end-to-end model that predicts future process behavior, focusing on two tasks: next activity prediction and next event time. The proposed model employs a graph attention network to encode activities and their relationships, combined with an LSTM network to handle temporal dependencies. Evaluated on one reallife and three synthetic event logs, the model demonstrates competitive performance compared to state-of-the-art methods.
nan
Article 665
Title@2025-07-21 (1): Towards Mitigation of Hallucination for LLM-empowered Agents: Progressive Generalization Bound Exploration and Watchdog Monitor
Title: Towards Mitigation of Hallucination for LLM-empowered Agents: Progressive Generalization Bound Exploration and Watchdog Monitor | Zur Milderung der Halluzination für LLM-fähige Agenten: Progressive Generalization Bound Exploration und Watchdog Monitor | 努力减少LLM-动力剂的幻觉:逐步普遍化探矿和监视监测仪表监测 2507.15903v1 |
Authors (6): Siyuan Liu, Wenjing Liu, Zhiwei Xu, Xin Wang, Bo Chen, Tao Li
Empowered by large language models (LLMs), intelligent agents have become a popular paradigm for interacting with open environments to facilitate AI deployment. However, hallucinations generated by LLMs-where outputs are inconsistent with facts-pose a significant challenge, undermining the credibility of intelligent agents. Only if hallucinations can be mitigated, the intelligent agents can be used in real-world without any catastrophic risk. Therefore, effective detection and mitigation of hallucinations are crucial to ensure the dependability of agents. Unfortunately, the related approaches either depend on white-box access to LLMs or fail to accurately identify hallucinations. To address the challenge posed by hallucinations of intelligent agents, we present HalMit, a novel black-box watchdog framework that models the generalization bound of LLM-empowered agents and thus detect hallucinations without requiring internal knowledge of the LLM’s architecture. Specifically, a probabilistic fractal sampling technique is proposed to generate a sufficient number of queries to trigger the incredible responses in parallel, efficiently identifying the generalization bound of the target agent. Experimental evaluations demonstrate that HalMit significantly outperforms existing approaches in hallucination monitoring. Its black-box nature and superior performance make HalMit a promising solution for enhancing the dependability of LLM-powered systems.
nan
Article 666
Title@2025-07-21 (1): MAP Estimation with Denoisers: Convergence Rates and Guarantees
Title: MAP Estimation with Denoisers: Convergence Rates and Guarantees | MAP-Schätzung mit Denoisern: Konvergenzraten und Garantien | MAP 与Denoisers的估算:趋同率和保障 2507.15397v1 |
Authors (4): Scott Pesme, Giacomo Meanti, Michael Arbel, Julien Mairal
Denoiser models have become powerful tools for inverse problems, enabling the use of pretrained networks to approximate the score of a smoothed prior distribution. These models are often used in heuristic iterative schemes aimed at solving Maximum a Posteriori (MAP) optimisation problems, where the proximal operator of the negative log-prior plays a central role. In practice, this operator is intractable, and practitioners plug in a pretrained denoiser as a surrogate-despite the lack of general theoretical justification for this substitution. In this work, we show that a simple algorithm, closely related to several used in practice, provably converges to the proximal operator under a log-concavity assumption on the prior $p$. We show that this algorithm can be interpreted as a gradient descent on smoothed proximal objectives. Our analysis thus provides a theoretical foundation for a class of empirically successful but previously heuristic methods.
nan
Article 667
Title@2025-07-21 (1): Learning to Gridize: Segment Physical World by Wireless Communication Channel
Title: Learning to Gridize: Segment Physical World by Wireless Communication Channel | Gridize lernen: Segment Physical World per Wireless Communication Channel | 学习网络化:通过无线通信频道进行分形物理世界 2507.15386v1 |
Authors (6): Juntao Wang, Feng Yin, Tian Ding, Tsung-Hui Chang, Zhi-Quan Luo, Qi Yan
Gridization, the process of partitioning space into grids where users share similar channel characteristics, serves as a fundamental prerequisite for efficient large-scale network optimization. However, existing methods like Geographical or Beam Space Gridization (GSG or BSG) are limited by reliance on unavailable location data or the flawed assumption that similar signal strengths imply similar channel properties. We propose Channel Space Gridization (CSG), a pioneering framework that unifies channel estimation and gridization for the first time. Formulated as a joint optimization problem, CSG uses only beam-level reference signal received power (RSRP) to estimate Channel Angle Power Spectra (CAPS) and partition samples into grids with homogeneous channel characteristics. To perform CSG, we develop the CSG Autoencoder (CSG-AE), featuring a trainable RSRP-to-CAPS encoder, a learnable sparse codebook quantizer, and a physics-informed decoder based on the Localized Statistical Channel Model. On recognizing the limitations of naive training scheme, we propose a novel Pretraining-Initialization-Detached-Asynchronous (PIDA) training scheme for CSG-AE, ensuring stable and effective training by systematically addressing the common pitfalls of the naive training paradigm. Evaluations reveal that CSG-AE excels in CAPS estimation accuracy and clustering quality on synthetic data. On real-world datasets, it reduces Active Mean Absolute Error (MAE) by 30\% and Overall MAE by 65\% on RSRP prediction accuracy compared to salient baselines using the same data, while improving channel consistency, cluster sizes balance, and active ratio, advancing the development of gridization for large-scale network optimization.
nan
Article 668
Title@2025-07-21 (1): To Label or Not to Label: PALM – A Predictive Model for Evaluating Sample Efficiency in Active Learning Models
Title: To Label or Not to Label: PALM – A Predictive Model for Evaluating Sample Efficiency in Active Learning Models | Beschriftung oder Nichtbeschriftung: PALM - ein vorausschauendes Modell zur Bewertung der Probeneffizienz in aktiven Lernmodellen | 标签或非标签标签:PALM – – 积极学习模式样本效率评价预测模型 2507.15381v1 |
Authors (3): Julia Machnio, Mads Nielsen, Mostafa Mehdipour Ghazi
Active learning (AL) seeks to reduce annotation costs by selecting the most informative samples for labeling, making it particularly valuable in resource-constrained settings. However, traditional evaluation methods, which focus solely on final accuracy, fail to capture the full dynamics of the learning process. To address this gap, we propose PALM (Performance Analysis of Active Learning Models), a unified and interpretable mathematical model that characterizes AL trajectories through four key parameters: achievable accuracy, coverage efficiency, early-stage performance, and scalability. PALM provides a predictive description of AL behavior from partial observations, enabling the estimation of future performance and facilitating principled comparisons across different strategies. We validate PALM through extensive experiments on CIFAR-10/100 and ImageNet-50/100/200, covering a wide range of AL methods and self-supervised embeddings. Our results demonstrate that PALM generalizes effectively across datasets, budgets, and strategies, accurately predicting full learning curves from limited labeled data. Importantly, PALM reveals crucial insights into learning efficiency, data space coverage, and the scalability of AL methods. By enabling the selection of cost-effective strategies and predicting performance under tight budget constraints, PALM lays the basis for more systematic, reproducible, and data-efficient evaluation of AL in both research and real-world applications. The code is available at: https://github.com/juliamachnio/PALM.
nan
Article 669
Title@2025-07-21 (1): RL4CO: an Extensive Reinforcement Learning for Combinatorial Optimization Benchmark
Title: RL4CO: an Extensive Reinforcement Learning for Combinatorial Optimization Benchmark | RL4CO: ein umfangreiches Verstärkungslernen für kombinatorische Optimierungs-Benchmark | RL4CO:综合优化基准的广泛强化学习 2306.17100v6 |
Authors (33): Federico Berto, Chuanbo Hua, Junyoung Park, Laurin Luttmann, Yining Ma, Fanchen Bu, Jiarui Wang, Haoran Ye, Minsu Kim, Sanghyeok Choi, Nayeli Gast Zepeda, André Hottung, Jianan Zhou, Jieyi Bi, Yu Hu, Fei Liu, Hyeonah Kim, Jiwoo Son, Haeyeon Kim, Davide Angioni, Wouter Kool, Zhiguang Cao, Qingfu Zhang, Joungho Kim, Jie Zhang, Kijung Shin, Cathy Wu, Sungsoo Ahn, Guojie Song, Changhyun Kwon, Kevin Tierney, Lin Xie, Jinkyoo Park
Combinatorial optimization (CO) is fundamental to several real-world applications, from logistics and scheduling to hardware design and resource allocation. Deep reinforcement learning (RL) has recently shown significant benefits in solving CO problems, reducing reliance on domain expertise and improving computational efficiency. However, the absence of a unified benchmarking framework leads to inconsistent evaluations, limits reproducibility, and increases engineering overhead, raising barriers to adoption for new researchers. To address these challenges, we introduce RL4CO, a unified and extensive benchmark with in-depth library coverage of 27 CO problem environments and 23 state-of-the-art baselines. Built on efficient software libraries and best practices in implementation, RL4CO features modularized implementation and flexible configurations of diverse environments, policy architectures, RL algorithms, and utilities with extensive documentation. RL4CO helps researchers build on existing successes while exploring and developing their own designs, facilitating the entire research process by decoupling science from heavy engineering. We finally provide extensive benchmark studies to inspire new insights and future work. RL4CO has already attracted numerous researchers in the community and is open-sourced at https://github.com/ai4co/rl4co.
nan
Article 670
Title@2025-07-21 (1): Proficient Graph Neural Network Design by Accumulating Knowledge on Large Language Models
Title: Proficient Graph Neural Network Design by Accumulating Knowledge on Large Language Models | Proficient Graph Neural Network Design durch Akkumulation von Wissen über große Sprachmodelle | 通过积累关于大语言模型的知识设计精巧的图形神经网络 2408.06717v2 |
Authors (7): Jialiang Wang, Hanmo Liu, Shimin Di, Zhili Wang, Jiachuan Wang, Lei Chen, Xiaofang Zhou
High-level automation is increasingly critical in AI, driven by rapid advances in large language models (LLMs) and AI agents. However, LLMs, despite their general reasoning power, struggle significantly in specialized, data-sensitive tasks such as designing Graph Neural Networks (GNNs). This difficulty arises from (1) the inherent knowledge gaps in modeling the intricate, varying relationships between graph properties and suitable architectures and (2) the external noise from misleading descriptive inputs, often resulting in generic or even misleading model suggestions. Achieving proficiency in designing data-aware models – defined as the meta-level capability to systematically accumulate, interpret, and apply data-specific design knowledge – remains challenging for existing automated approaches, due to their inefficient construction and application of meta-knowledge. To achieve the meta-level proficiency, we propose DesiGNN, a knowledge-centered framework that systematically converts past model design experiences into structured, fine-grained knowledge priors well fitted to meta-learning with LLMs. To account for the inherent variability and external noise, DesiGNN aligns empirical property filtering from extensive benchmarks with adaptive elicitation of literature insights via LLMs. By constructing a solid meta-knowledge between unseen graph understanding and known effective architecture patterns, DesiGNN can deliver top-5.77% initial model proposals for unseen datasets within seconds, and achieve consistently superior performance with minimal search costs against baselines.
nan
Article 671
Title@2025-07-21 (1): EEG-based Epileptic Prediction via a Two-stage Channel-aware Set Transformer Network
Title: EEG-based Epileptic Prediction via a Two-stage Channel-aware Set Transformer Network | EEG-basierte epileptische Vorhersage über ein zweistufiges Channel-aware Set Transformer Network | 通过两阶段频道感应装置变形器网络进行基于EEG的月球预测 2507.15364v1 |
Authors (9): Ruifeng Zheng, Cong Chen, Shuang Wang, Yiming Liu, Lin You, Jindong Lu, Ruizhe Zhu, Guodao Zhang, Kejie Huang
Epilepsy is a chronic, noncommunicable brain disorder, and sudden seizure onsets can significantly impact patients’ quality of life and health. However, wearable seizure-predicting devices are still limited, partly due to the bulky size of EEG-collecting devices. To relieve the problem, we proposed a novel two-stage channel-aware Set Transformer Network that could perform seizure prediction with fewer EEG channel sensors. We also tested a seizure-independent division method which could prevent the adjacency of training and test data. Experiments were performed on the CHB-MIT dataset which includes 22 patients with 88 merged seizures. The mean sensitivity before channel selection was 76.4% with a false predicting rate (FPR) of 0.09/hour. After channel selection, dominant channels emerged in 20 out of 22 patients; the average number of channels was reduced to 2.8 from 18; and the mean sensitivity rose to 80.1% with an FPR of 0.11/hour. Furthermore, experimental results on the seizure-independent division supported our assertion that a more rigorous seizure-independent division should be used for patients with abundant EEG recordings.
nan
Article 672
Title@2025-07-21 (1): Meta4XNLI: A Crosslingual Parallel Corpus for Metaphor Detection and Interpretation
Title: Meta4XNLI: A Crosslingual Parallel Corpus for Metaphor Detection and Interpretation | Meta4XNLI: Ein Crosslingual Parallel Corpus für die Erkennung und Interpretation von Metaphoren | Meta4XNLI: 用于识别和解释代名词的跨语言平行体 2404.07053v3 |
Authors (2): Elisa Sanchez-Bayona, Rodrigo Agerri
Metaphors are a ubiquitous but often overlooked part of everyday language. As a complex cognitive-linguistic phenomenon, they provide a valuable means to evaluate whether language models can capture deeper aspects of meaning, including semantic, pragmatic, and cultural context. In this work, we present Meta4XNLI, the first parallel dataset for Natural Language Inference (NLI) newly annotated for metaphor detection and interpretation in both English and Spanish. Meta4XNLI facilitates the comparison of encoder- and decoder-based models in detecting and understanding metaphorical language in multilingual and cross-lingual settings. Our results show that fine-tuned encoders outperform decoders-only LLMs in metaphor detection. Metaphor interpretation is evaluated via the NLI framework with comparable performance of masked and autoregressive models, which notably decreases when the inference is affected by metaphorical language. Our study also finds that translation plays an important role in the preservation or loss of metaphors across languages, introducing shifts that might impact metaphor occurrence and model performance. These findings underscore the importance of resources like Meta4XNLI for advancing the analysis of the capabilities of language models and improving our understanding of metaphor processing across languages. Furthermore, the dataset offers previously unavailable opportunities to investigate metaphor interpretation, cross-lingual metaphor transferability, and the impact of translation on the development of multilingual annotated resources.
nan
Article 673
Title@2025-07-21 (1): Constrained Optimal Fuel Consumption of HEVs under Observational Noise
Title: Constrained Optimal Fuel Consumption of HEVs under Observational Noise | Eingeschränkter optimaler Kraftstoffverbrauch von HEV unter Beobachtungslärm | 在观测噪音下控制最佳燃料消耗 2410.20913v2 |
Authors (2): Shuchang Yan, Haoran Sun
In our prior work, we investigated the minimum fuel consumption of a hybrid electric vehicle (HEV) under a state-of-charge (SOC) balance constraint, assuming perfect SOC measurements and accurate reference speed profiles. The constrained optimal fuel consumption (COFC) problem was addressed using a constrained reinforcement learning (CRL) framework. However, in real-world scenarios, SOC readings are often corrupted by sensor noise, and reference speeds may deviate from actual driving conditions. To account for these imperfections, this study reformulates the COFC problem by explicitly incorporating observational noise in both SOC and reference speed. We adopt a robust CRL approach, where the noise is modeled as a uniform distribution, and employ a structured training procedure to ensure stability. The proposed method is evaluated through simulations on the Toyota Prius hybrid system (THS), using both the New European Driving Cycle (NEDC) and the Worldwide Harmonized Light Vehicles Test Cycle (WLTC). Results show that fuel consumption and SOC constraint satisfaction remain robust across varying noise levels. Furthermore, the analysis reveals that observational noise in SOC and speed can impact fuel consumption to different extents. To the best of our knowledge, this is the first study to explicitly examine how observational noise – commonly encountered in dynamometer testing and predictive energy control (PEC) applications – affects constrained optimal fuel consumption in HEVs.
nan
Article 674
Title@2025-07-21 (1): Efficient Visual Appearance Optimization by Learning from Prior Preferences
Title: Efficient Visual Appearance Optimization by Learning from Prior Preferences | Effiziente optische Erscheinungsbildsoptimierung durch Lernen aus vorherigen Präferenzen | 学习从先前优惠制获得最佳优化 2507.15355v1 |
Authors (3): Zhipeng Li, Yi-Chi Liao, Christian Holz
Adjusting visual parameters such as brightness and contrast is common in our everyday experiences. Finding the optimal parameter setting is challenging due to the large search space and the lack of an explicit objective function, leaving users to rely solely on their implicit preferences. Prior work has explored Preferential Bayesian Optimization (PBO) to address this challenge, involving users to iteratively select preferred designs from candidate sets. However, PBO often requires many rounds of preference comparisons, making it more suitable for designers than everyday end-users. We propose Meta-PO, a novel method that integrates PBO with meta-learning to improve sample efficiency. Specifically, Meta-PO infers prior users’ preferences and stores them as models, which are leveraged to intelligently suggest design candidates for the new users, enabling faster convergence and more personalized results. An experimental evaluation of our method for appearance design tasks on 2D and 3D content showed that participants achieved satisfactory appearance in 5.86 iterations using Meta-PO when participants shared similar goals with a population (e.g., tuning for a warm'' look) and in 8 iterations even generalizes across divergent goals (e.g., from
vintage’’, warm'', to
holiday’’). Meta-PO makes personalized visual optimization more applicable to end-users through a generalizable, more efficient optimization conditioned on preferences, with the potential to scale interface personalization more broadly.
nan
Article 675
Title@2025-07-21 (1): ChronoSense: Exploring Temporal Understanding in Large Language Models with Time Intervals of Events
Title: ChronoSense: Exploring Temporal Understanding in Large Language Models with Time Intervals of Events | ChronoSense: Erforschen des zeitlichen Verständnisses in großen Sprachmodellen mit Zeitintervallen von Ereignissen | Chronossensensense:探索具有时际事件间隔的大型语言模型中的时间理解 2501.03040v2 |
Authors (2): Duygu Sezen Islakoglu, Jan-Christoph Kalo
Large Language Models (LLMs) have achieved remarkable success in various NLP tasks, yet they still face significant challenges in reasoning and arithmetic. Temporal reasoning, a critical component of natural language understanding, has raised increasing research attention. However, comprehensive testing of Allen’s interval relations (e.g., before, after, during) – a fundamental framework for temporal relationships – remains underexplored. To fill this gap, we present ChronoSense, a new benchmark for evaluating LLMs’ temporal understanding. It includes 16 tasks, focusing on identifying the Allen relation between two temporal events and temporal arithmetic, using both abstract events and real-world data from Wikidata. We assess the performance of seven recent LLMs using this benchmark and the results indicate that models handle Allen relations, even symmetrical ones, quite differently. Moreover, the findings suggest that the models may rely on memorization to answer time-related questions. Overall, the models’ low performance highlights the need for improved temporal understanding in LLMs and ChronoSense offers a robust framework for future research in this area. Our dataset and the source code are available at https://github.com/duyguislakoglu/chronosense.
nan
Article 676
Title@2025-07-21 (1): Scaling Decentralized Learning with FLock
Title: Scaling Decentralized Learning with FLock | Skalierung dezentrales Lernen mit FLock | 与 FLock 的分散化学习 2507.15349v1 |
Authors (4): Zehua Cheng, Rui Sun, Jiahao Sun, Yike Guo
Fine-tuning the large language models (LLMs) are prevented by the deficiency of centralized control and the massive computing and communication overhead on the decentralized schemes. While the typical standard federated learning (FL) supports data privacy, the central server requirement creates a single point of attack and vulnerability to poisoning attacks. Generalizing the result in this direction to 70B-parameter models in the heterogeneous, trustless environments has turned out to be a huge, yet unbroken bottleneck. This paper introduces FLock, a decentralized framework for secure and efficient collaborative LLM fine-tuning. Integrating a blockchain-based trust layer with economic incentives, FLock replaces the central aggregator with a secure, auditable protocol for cooperation among untrusted parties. We present the first empirical validation of fine-tuning a 70B LLM in a secure, multi-domain, decentralized setting. Our experiments show the FLock framework defends against backdoor poisoning attacks that compromise standard FL optimizers and fosters synergistic knowledge transfer. The resulting models show a >68% reduction in adversarial attack success rates. The global model also demonstrates superior cross-domain generalization, outperforming models trained in isolation on their own specialized data.
nan
Article 677
Title@2025-07-21 (1): Probing Information Distribution in Transformer Architectures through Entropy Analysis
Title: Probing Information Distribution in Transformer Architectures through Entropy Analysis | Probing Information Distribution in Transformer-Architekturen durch Entropie-Analyse | 通过 Entropy 分析在变形结构中进行测试信息发布 2507.15347v1 |
Authors (5): Amedeo Buonanno, Alessandro Rivetti, Francesco A. N. Palmieri, Giovanni Di Gennaro, Gianmarco Romano
This work explores entropy analysis as a tool for probing information distribution within Transformer-based architectures. By quantifying token-level uncertainty and examining entropy patterns across different stages of processing, we aim to investigate how information is managed and transformed within these models. As a case study, we apply the methodology to a GPT-based large language model, illustrating its potential to reveal insights into model behavior and internal representations. This approach may offer insights into model behavior and contribute to the development of interpretability and evaluation frameworks for transformer-based models
nan
Article 678
Title@2025-07-21 (1): LionGuard 2: Building Lightweight, Data-Efficient & Localised Multilingual Content Moderators
Title: LionGuard 2: Building Lightweight, Data-Efficient & Localised Multilingual Content Moderators | LionGuard 2: Leichte, dateneffiziente und lokalisierte Mehrsprachige Inhaltsmoderatoren bauen | 狮子座标2:轻量、数据效率和本地化多语种内容主持人 2507.15339v1 |
Authors (4): Leanne Tan, Gabriel Chua, Ziyu Ge, Roy Ka-Wei Lee
Modern moderation systems increasingly support multiple languages, but often fail to address localisation and low-resource variants - creating safety gaps in real-world deployments. Small models offer a potential alternative to large LLMs, yet still demand considerable data and compute. We present LionGuard 2, a lightweight, multilingual moderation classifier tailored to the Singapore context, supporting English, Chinese, Malay, and partial Tamil. Built on pre-trained OpenAI embeddings and a multi-head ordinal classifier, LionGuard 2 outperforms several commercial and open-source systems across 17 benchmarks, including both Singapore-specific and public English datasets. The system is actively deployed within the Singapore Government, demonstrating practical efficacy at scale. Our findings show that high-quality local data and robust multilingual embeddings can achieve strong moderation performance, without fine-tuning large models. We release our model weights and part of our training data to support future work on LLM safety.
nan
Article 679
Title@2025-07-21 (1): Beyond Model Base Selection: Weaving Knowledge to Master Fine-grained Neural Network Design
Title: Beyond Model Base Selection: Weaving Knowledge to Master Fine-grained Neural Network Design | Jenseits der Modell-Basis-Auswahl: Wissen weben, um feinkörniges neurales Netzwerk-Design zu meistern | 超越示范基础选择:将知识编织到精巧神经网络设计硕士 2507.15336v1 |
Authors (7): Jialiang Wang, Hanmo Liu, Shimin Di, Zhili Wang, Jiachuan Wang, Lei Chen, Xiaofang Zhou
Database systems have recently advocated for embedding machine learning (ML) capabilities, offering declarative model queries over large, managed model repositories, thereby circumventing the huge computational overhead of traditional ML-based algorithms in automated neural network model selection. Pioneering database studies aim to organize existing benchmark repositories as model bases (MB), querying them for the model records with the highest performance estimation metrics for given tasks. However, this static model selection practice overlooks the fine-grained, evolving relational dependencies between diverse task queries and model architecture variations, resulting in suboptimal matches and failing to further refine the model effectively. To fill the model refinement gap in database research, we propose M-DESIGN, a curated model knowledge base (MKB) pipeline for mastering neural network refinement by adaptively weaving prior insights about model architecture modification. First, we propose a knowledge weaving engine that reframes model refinement as an adaptive query problem over task metadata. Given a user’s task query, M-DESIGN quickly matches and iteratively refines candidate models by leveraging a graph-relational knowledge schema that explicitly encodes data properties, architecture variations, and pairwise performance deltas as joinable relations. This schema supports fine-grained relational analytics over architecture tweaks and drives a predictive query planner that can detect and adapt to out-of-distribution (OOD) tasks. We instantiate M-DESIGN for graph analytics tasks, where our model knowledge base enriches existing benchmarks with structured metadata covering 3 graph tasks and 22 graph datasets, contributing data records of 67,760 graph models. Empirical results demonstrate that M-DESIGN delivers the optimal model in 26 of 33 data-task pairs within limited budgets.
nan
Article 680
Title@2025-07-21 (1): Mixture-of-Recursions: Learning Dynamic Recursive Depths for Adaptive Token-Level Computation
Title: Mixture-of-Recursions: Learning Dynamic Recursive Depths for Adaptive Token-Level Computation | Mixture-of-Recursions: Dynamische Rekursive Tiefen für adaptive Token-Level-Computation lernen | 混合流流流:学习适应调控级计算法的动态回流深度 2507.10524v2 |
Authors (11): Sangmin Bae, Yujin Kim, Reza Bayat, Sungnyun Kim, Jiyoun Ha, Tal Schuster, Adam Fisch, Hrayr Harutyunyan, Ziwei Ji, Aaron Courville, Se-Young Yun
Scaling language models unlocks impressive capabilities, but the accompanying computational and memory demands make both training and deployment expensive. Existing efficiency efforts typically target either parameter sharing or adaptive computation, leaving open the question of how to attain both simultaneously. We introduce Mixture-of-Recursions (MoR), a unified framework that combines the two axes of efficiency inside a single Recursive Transformer. MoR reuses a shared stack of layers across recursion steps to achieve parameter efficiency, while lightweight routers enable adaptive token-level thinking by dynamically assigning different recursion depths to individual tokens. This allows MoR to focus quadratic attention computation only among tokens still active at a given recursion depth, further improving memory access efficiency by selectively caching only their key-value pairs. Beyond these core mechanisms, we also propose a KV sharing variant that reuses KV pairs from the first recursion, specifically designed to decrease prefill latency and memory footprint. Across model scales ranging from 135M to 1.7B parameters, MoR forms a new Pareto frontier: at equal training FLOPs and smaller model sizes, it significantly lowers validation perplexity and improves few-shot accuracy, while delivering higher throughput compared with vanilla and existing recursive baselines. These gains demonstrate that MoR is an effective path towards large-model quality without incurring large-model cost.
nan
Article 681
Title@2025-07-21 (1): Language Generation in the Limit: Noise, Loss, and Feedback
Title: Language Generation in the Limit: Noise, Loss, and Feedback | Sprachgenerierung im Limit: Lärm, Verlust und Feedback | 限制范围内的语言生成:噪音、损失和反馈 2507.15319v1 |
Authors (3): Yannan Bai, Debmalya Panigrahi, Ian Zhang
Kleinberg and Mullainathan (2024) recently proposed a formal framework called language generation in the limit and showed that given a sequence of example strings from an unknown target language drawn from any countable collection, an algorithm can correctly generate unseen strings from the target language within finite time. This notion was further refined by Li, Raman, and Tewari (2024), who defined stricter categories of non-uniform and uniform generation. They showed that a finite union of uniformly generatable collections is generatable in the limit, and asked if the same is true for non-uniform generation. We begin by resolving the question in the negative: we give a uniformly generatable collection and a non-uniformly generatable collection whose union is not generatable in the limit. We then use facets of this construction to further our understanding of several variants of language generation. The first two, generation with noise and without samples, were introduced by Raman and Raman (2025) and Li, Raman, and Tewari (2024) respectively. We show the equivalence of these models for uniform and non-uniform generation, and provide a characterization of non-uniform noisy generation. The former paper asked if there is any separation between noisy and non-noisy generation in the limit – we show that such a separation exists even with a single noisy string. Finally, we study the framework of generation with feedback, introduced by Charikar and Pabbaraju (2025), where the algorithm is strengthened by allowing it to ask membership queries. We show finite queries add no power, but infinite queries yield a strictly more powerful model. In summary, the results in this paper resolve the union-closedness of language generation in the limit, and leverage those techniques (and others) to give precise characterizations for natural variants that incorporate noise, loss, and feedback.
nan
Article 682
Title@2025-07-21 (1): Universal crystal material property prediction via multi-view geometric fusion in graph transformers
Title: Universal crystal material property prediction via multi-view geometric fusion in graph transformers | Universelle Kristallmaterial-Eigenschaftsvorhersage über Multi-View-Geometrische Fusion in Graphentransformatoren | 通过在图形变压器中多视图几几何聚合预测通用晶体物质特性 2507.15303v1 |
Authors (3): Liang Zhang, Kong Chen, Yuen Wu
Accurately and comprehensively representing crystal structures is critical for advancing machine learning in large-scale crystal materials simulations, however, effectively capturing and leveraging the intricate geometric and topological characteristics of crystal structures remains a core, long-standing challenge for most existing methods in crystal property prediction. Here, we propose MGT, a multi-view graph transformer framework that synergistically fuses SE3 invariant and SO3 equivariant graph representations, which respectively captures rotation-translation invariance and rotation equivariance in crystal geometries. To strategically incorporate these complementary geometric representations, we employ a lightweight mixture of experts router in MGT to adaptively adjust the weight assigned to SE3 and SO3 embeddings based on the specific target task. Compared with previous state-of-the-art models, MGT reduces the mean absolute error by up to 21% on crystal property prediction tasks through multi-task self-supervised pretraining. Ablation experiments and interpretable investigations confirm the effectiveness of each technique implemented in our framework. Additionally, in transfer learning scenarios including crystal catalyst adsorption energy and hybrid perovskite bandgap prediction, MGT achieves performance improvements of up to 58% over existing baselines, demonstrating domain-agnostic scalability across diverse application domains. As evidenced by the above series of studies, we believe that MGT can serve as useful model for crystal material property prediction, providing a valuable tool for the discovery of novel materials.
nan
Article 683
Title@2025-07-21 (1): JAMUN: Bridging Smoothed Molecular Dynamics and Score-Based Learning for Conformational Ensembles
Title: JAMUN: Bridging Smoothed Molecular Dynamics and Score-Based Learning for Conformational Ensembles | JAMUN: Überbrückung geglätteter molekularer Dynamik und Score-basiertes Lernen für konformationelle Ensembles | JAMUN:连接通融的分子动态和基于分数的学习,以便组成组合 2410.14621v2 |
Authors (6): Ameya Daigavane, Bodhi P. Vani, Darcy Davidson, Saeed Saremi, Joshua Rackers, Joseph Kleinhenz
Conformational ensembles of protein structures are immensely important both for understanding protein function and drug discovery in novel modalities such as cryptic pockets. Current techniques for sampling ensembles such as molecular dynamics (MD) are computationally inefficient, while many recent machine learning methods do not transfer to systems outside their training data. We propose JAMUN which performs MD in a smoothed, noised space of all-atom 3D conformations of molecules by utilizing the framework of walk-jump sampling. JAMUN enables ensemble generation for small peptides at rates of an order of magnitude faster than traditional molecular dynamics. The physical priors in JAMUN enables transferability to systems outside of its training data, even to peptides that are longer than those originally trained on. Our model, code and weights are available at https://github.com/prescient-design/jamun.
nan
Article 684
Title@2025-07-21 (1): Variational Mode-Driven Graph Convolutional Network for Spatiotemporal Traffic Forecasting
Title: Variational Mode-Driven Graph Convolutional Network for Spatiotemporal Traffic Forecasting | Variationelles modegetriebenes Graphenkonvolutionales Netzwerk für die räumliche Verkehrsprognose | 瞬时交通流量预测变化模式驱动图集演变网络 2408.16191v3 |
Authors (4): Osama Ahmad, Lukas Wesemann, Fabian Waschkowski, Zubair Khalid
This paper focuses on spatiotemporal (ST) traffic prediction using graph neural networks (GNNs). Given that ST data comprises non-stationary and complex temporal patterns, interpreting and predicting such trends is inherently challenging. Representing ST data in decomposed modes helps infer underlying behavior and assess the impact of noise on predictive performance. We propose a framework that decomposes ST data into interpretable modes using variational mode decomposition (VMD) and processes them through a neural network for future state forecasting. Unlike existing graph-based traffic forecasters that operate directly on raw or aggregated time series, the proposed hybrid approach, termed the Variational Mode Graph Convolutional Network (VMGCN), first decomposes non-stationary signals into interpretable variational modes by determining the optimal mode count via reconstruction-loss minimization and then learns both intramode and cross-mode spatiotemporal dependencies through a novel attention-augmented GCN. Additionally, we analyze the significance of each mode and the effect of bandwidth constraints on multi-horizon traffic flow predictions. The proposed two-stage design yields significant accuracy gains while providing frequency-level interpretability with demonstrated superior performance on the LargeST dataset for both short-term and long-term forecasting tasks. The implementation is publicly available on https://github.com/OsamaAhmad369/VMGCN.
nan
Article 685
Title@2025-07-21 (1): Feel-Good Thompson Sampling for Contextual Bandits: a Markov Chain Monte Carlo Showdown
Title: Feel-Good Thompson Sampling for Contextual Bandits: a Markov Chain Monte Carlo Showdown | Feel-Good Thompson Sampling für Kontext Bandits: ein Markov-Kette Monte Carlo Showdown | 汤普森对背景强盗的抽样:马可夫链条蒙特卡洛秀 2507.15290v1 |
Authors (2): Emile Anand, Sarah Liaw
Thompson Sampling (TS) is widely used to address the exploration/exploitation tradeoff in contextual bandits, yet recent theory shows that it does not explore aggressively enough in high-dimensional problems. Feel-Good Thompson Sampling (FG-TS) addresses this by adding an optimism bonus that biases toward high-reward models, and it achieves the asymptotically minimax-optimal regret in the linear setting when posteriors are exact. However, its performance with \emph{approximate} posteriors – common in large-scale or neural problems – has not been benchmarked. We provide the first systematic study of FG-TS and its smoothed variant (SFG-TS) across eleven real-world and synthetic benchmarks. To evaluate their robustness, we compare performance across settings with exact posteriors (linear and logistic bandits) to approximate regimes produced by fast but coarse stochastic-gradient samplers. Ablations over preconditioning, bonus scale, and prior strength reveal a trade-off: larger bonuses help when posterior samples are accurate, but hurt when sampling noise dominates. FG-TS generally outperforms vanilla TS in linear and logistic bandits, but tends to be weaker in neural bandits. Nevertheless, because FG-TS and its variants are competitive and easy-to-use, we recommend them as baselines in modern contextual-bandit benchmarks. Finally, we provide source code for all our experiments in https://github.com/SarahLiaw/ctx-bandits-mcmc-showdown.
nan
Article 686
Title@2025-07-21 (1): Preferential subspace identification (PSID) with forward-backward smoothing
Title: Preferential subspace identification (PSID) with forward-backward smoothing | Präferenzielle Subraum-Identifikation (PSID) mit nach vorne gerichteter Glättung | 优先次空间识别(PSID),前向平滑 2507.15288v1 |
Authors (2): Omid G. Sani, Maryam M. Shanechi
System identification methods for multivariate time-series, such as neural and behavioral recordings, have been used to build models for predicting one from the other. For example, Preferential Subspace Identification (PSID) builds a state-space model of a primary time-series (e.g., neural activity) to optimally predict a secondary time-series (e.g., behavior). However, PSID focuses on optimal prediction using past primary data, even though in offline applications, better estimation can be achieved by incorporating concurrent data (filtering) or all available data (smoothing). Here, we extend PSID to enable optimal filtering and smoothing. First, we show that the presence of a secondary signal makes it possible to uniquely identify a model with an optimal Kalman update step (to enable filtering) from a family of otherwise equivalent state-space models. Our filtering solution augments PSID with a reduced-rank regression step that directly learns the optimal gain required for the update step from data. We refer to this extension of PSID as PSID with filtering. Second, inspired by two-filter Kalman smoother formulations, we develop a novel forward-backward PSID smoothing algorithm where we first apply PSID with filtering and then apply it again in the reverse time direction on the residuals of the filtered secondary signal. We validate our methods on simulated data, showing that our approach recovers the ground-truth model parameters for filtering, and achieves optimal filtering and smoothing decoding performance of the secondary signal that matches the ideal performance of the true underlying model. This work provides a principled framework for optimal linear filtering and smoothing in the two-signal setting, significantly expanding the toolkit for analyzing dynamic interactions in multivariate time-series.
nan
Article 687
Title@2025-07-21 (1): Mixture of Autoencoder Experts Guidance using Unlabeled and Incomplete Data for Exploration in Reinforcement Learning
Title: Mixture of Autoencoder Experts Guidance using Unlabeled and Incomplete Data for Exploration in Reinforcement Learning | Mischung von Autoencoder Experten Anleitung mit unmarkierten und unvollständigen Daten für die Exploration in Verstärkungs-Lernen | 使用无标签和不完整数据进行强化学习探索的自动编码器混合专家指导 2507.15287v1 |
Authors (2): Elias Malomgré, Pieter Simoens
Recent trends in Reinforcement Learning (RL) highlight the need for agents to learn from reward-free interactions and alternative supervision signals, such as unlabeled or incomplete demonstrations, rather than relying solely on explicit reward maximization. Additionally, developing generalist agents that can adapt efficiently in real-world environments often requires leveraging these reward-free signals to guide learning and behavior. However, while intrinsic motivation techniques provide a means for agents to seek out novel or uncertain states in the absence of explicit rewards, they are often challenged by dense reward environments or the complexity of high-dimensional state and action spaces. Furthermore, most existing approaches rely directly on the unprocessed intrinsic reward signals, which can make it difficult to shape or control the agent’s exploration effectively. We propose a framework that can effectively utilize expert demonstrations, even when they are incomplete and imperfect. By applying a mapping function to transform the similarity between an agent’s state and expert data into a shaped intrinsic reward, our method allows for flexible and targeted exploration of expert-like behaviors. We employ a Mixture of Autoencoder Experts to capture a diverse range of behaviors and accommodate missing information in demonstrations. Experiments show our approach enables robust exploration and strong performance in both sparse and dense reward environments, even when demonstrations are sparse or incomplete. This provides a practical framework for RL in realistic settings where optimal data is unavailable and precise reward control is needed.
nan
Article 688
Title@2025-07-21 (1): Machine Unlearning for Streaming Forgetting
Title: Machine Unlearning for Streaming Forgetting | Maschine-Entlernen für Streaming Vergessen | 为流出遗忘而取消机器学习 2507.15280v1 |
Authors (6): Shaofei Shen, Chenhao Zhang, Yawen Zhao, Alina Bialkowski, Weitong Chen, Miao Xu
Machine unlearning aims to remove knowledge of the specific training data in a well-trained model. Currently, machine unlearning methods typically handle all forgetting data in a single batch, removing the corresponding knowledge all at once upon request. However, in practical scenarios, requests for data removal often arise in a streaming manner rather than in a single batch, leading to reduced efficiency and effectiveness in existing methods. Such challenges of streaming forgetting have not been the focus of much research. In this paper, to address the challenges of performance maintenance, efficiency, and data access brought about by streaming unlearning requests, we introduce a streaming unlearning paradigm, formalizing the unlearning as a distribution shift problem. We then estimate the altered distribution and propose a novel streaming unlearning algorithm to achieve efficient streaming forgetting without requiring access to the original training data. Theoretical analyses confirm an $O(\sqrt{T} + V_T)$ error bound on the streaming unlearning regret, where $V_T$ represents the cumulative total variation in the optimal solution over $T$ learning rounds. This theoretical guarantee is achieved under mild conditions without the strong restriction of convex loss function. Experiments across various models and datasets validate the performance of our proposed method.
nan
Article 689
Title@2025-07-21 (1): Temporal Basis Function Models for Closed-Loop Neural Stimulation
Title: Temporal Basis Function Models for Closed-Loop Neural Stimulation | Temporale Basis-Funktionsmodelle für die Closed-Loop-Neuralstimulation | 闭闭路神经刺激的时时基础功能模型 2507.15274v1 |
Authors (4): Matthew J. Bryan, Felix Schwock, Azadeh Yazdan-Shahmorad, Rajesh P N Rao
Closed-loop neural stimulation provides novel therapies for neurological diseases such as Parkinson’s disease (PD), but it is not yet clear whether artificial intelligence (AI) techniques can tailor closed-loop stimulation to individual patients or identify new therapies. Progress requires us to address a number of translational issues, including sample efficiency, training time, and minimizing loop latency such that stimulation may be shaped in response to changing brain activity. We propose temporal basis function models (TBFMs) to address these difficulties, and explore this approach in the context of excitatory optogenetic stimulation. We demonstrate the ability of TBF models to provide a single-trial, spatiotemporal forward prediction of the effect of optogenetic stimulation on local field potentials (LFPs) measured in two non-human primates. We further use simulations to demonstrate the use of TBF models for closed-loop stimulation, driving neural activity towards target patterns. The simplicity of TBF models allow them to be sample efficient, rapid to train (2-4min), and low latency (0.2ms) on desktop CPUs. We demonstrate the model on 40 sessions of previously published excitatory optogenetic stimulation data. For each session, the model required 15-20min of data collection to successfully model the remainder of the session. It achieved a prediction accuracy comparable to a baseline nonlinear dynamical systems model that requires hours to train, and superior accuracy to a linear state-space model. In our simulations, it also successfully allowed a closed-loop stimulator to control a neural circuit. Our approach begins to bridge the translational gap between complex AI-based approaches to modeling dynamical systems and the vision of using such forward prediction models to develop novel, clinically useful closed-loop stimulation protocols.
nan
Article 690
Title@2025-07-21 (1): Developing Cryptocurrency Trading Strategy Based on Autoencoder-CNN-GANs Algorithms
Title: Developing Cryptocurrency Trading Strategy Based on Autoencoder-CNN-GANs Algorithms | Entwicklung einer Cryptowährungs-Handelsstrategie auf der Grundlage von Autoencoder-CNN-GAN-Algorithmen | 制定基于自动编码器-CNN-GANs算法的加密货币交易战略 2412.18202v6 |
Authors (6): Zhuohuan Hu, Richard Yu, Zizhou Zhang, Haoran Zheng, Qianying Liu, Yining Zhou
This paper leverages machine learning algorithms to forecast and analyze financial time series. The process begins with a denoising autoencoder to filter out random noise fluctuations from the main contract price data. Then, one-dimensional convolution reduces the dimensionality of the filtered data and extracts key information. The filtered and dimensionality-reduced price data is fed into a GANs network, and its output serve as input of a fully connected network. Through cross-validation, a model is trained to capture features that precede large price fluctuations. The model predicts the likelihood and direction of significant price changes in real-time price sequences, placing trades at moments of high prediction accuracy. Empirical results demonstrate that using autoencoders and convolution to filter and denoise financial data, combined with GANs, achieves a certain level of predictive performance, validating the capabilities of machine learning algorithms to discover underlying patterns in financial sequences. Keywords - CNN;GANs; Cryptocurrency; Prediction.
nan
Article 691
Title@2025-07-21 (1): Self-Tuning Self-Supervised Image Anomaly Detection
Title: Self-Tuning Self-Supervised Image Anomaly Detection | Selbst-Tuning Selbst-überwachte Bildanomalie-Erkennung | 自自上自上自上图像异常检测 2306.12033v3 |
Authors (3): Jaemin Yoo, Lingxiao Zhao, Leman Akoglu
Self-supervised learning (SSL) has emerged as a promising paradigm that presents supervisory signals to real-world problems, bypassing the extensive cost of manual labeling. Consequently, self-supervised anomaly detection (SSAD) has seen a recent surge of interest, since SSL is especially attractive for unsupervised tasks. However, recent works have reported that the choice of a data augmentation function has significant impact on the accuracy of SSAD, posing augmentation search as an essential but nontrivial problem due to lack of labeled validation data. In this paper, we introduce ST-SSAD, the first unsupervised approach to end-to-end augmentation tuning for SSAD. To this end, our work presents two key contributions. The first is a new unsupervised validation loss that quantifies the alignment between augmented training data and unlabeled validation data. The second is new differentiable augmentation functions, allowing data augmentation hyperparameter(s) to be tuned in an end-to-end manner. Experiments on two testbeds with semantic class anomalies and subtle industrial defects show that ST-SSAD gives significant performance gains over existing works. All our code and testbeds are available at https://github.com/jaeminyoo/ST-SSAD.
nan
Article 692
Title@2025-07-21 (1): Resolving Token-Space Gradient Conflicts: Token Space Manipulation for Transformer-Based Multi-Task Learning
Title: Resolving Token-Space Gradient Conflicts: Token Space Manipulation for Transformer-Based Multi-Task Learning | Token-Space-Gradient-Konflikte lösen: Token Space-Manipulation für transformerbasiertes Multi-Task-Learning | 解决 Token- Space 渐变冲突: 用于以变换器为基础的多任务学习的 Token 空间操纵 2507.07485v2 |
Authors (2): Wooseong Jeong, Kuk-Jin Yoon
Multi-Task Learning (MTL) enables multiple tasks to be learned within a shared network, but differences in objectives across tasks can cause negative transfer, where the learning of one task degrades another task’s performance. While pre-trained transformers significantly improve MTL performance, their fixed network capacity and rigid structure limit adaptability. Previous dynamic network architectures attempt to address this but are inefficient as they directly convert shared parameters into task-specific ones. We propose Dynamic Token Modulation and Expansion (DTME-MTL), a framework applicable to any transformer-based MTL architecture. DTME-MTL enhances adaptability and reduces overfitting by identifying gradient conflicts in token space and applying adaptive solutions based on conflict type. Unlike prior methods that mitigate negative transfer by duplicating network parameters, DTME-MTL operates entirely in token space, enabling efficient adaptation without excessive parameter growth. Extensive experiments demonstrate that DTME-MTL consistently improves multi-task performance with minimal computational overhead, offering a scalable and effective solution for enhancing transformer-based MTL models.
nan
Article 693
Title@2025-07-21 (1): CHORDS: Diffusion Sampling Accelerator with Multi-core Hierarchical ODE Solvers
Title: CHORDS: Diffusion Sampling Accelerator with Multi-core Hierarchical ODE Solvers | CHORDS: Diffusions-Probenahmebeschleuniger mit multicore-hierarchischen ODE-Solvers | CHORDS: 多分级等级式极分解解码器扩散取样加速器 2507.15260v1 |
Authors (6): Jiaqi Han, Haotian Ye, Puheng Li, Minkai Xu, James Zou, Stefano Ermon
Diffusion-based generative models have become dominant generators of high-fidelity images and videos but remain limited by their computationally expensive inference procedures. Existing acceleration techniques either require extensive model retraining or compromise significantly on sample quality. This paper explores a general, training-free, and model-agnostic acceleration strategy via multi-core parallelism. Our framework views multi-core diffusion sampling as an ODE solver pipeline, where slower yet accurate solvers progressively rectify faster solvers through a theoretically justified inter-core communication mechanism. This motivates our multi-core training-free diffusion sampling accelerator, CHORDS, which is compatible with various diffusion samplers, model architectures, and modalities. Through extensive experiments, CHORDS significantly accelerates sampling across diverse large-scale image and video diffusion models, yielding up to 2.1x speedup with four cores, improving by 50% over baselines, and 2.9x speedup with eight cores, all without quality degradation. This advancement enables CHORDS to establish a solid foundation for real-time, high-fidelity diffusion generation.
nan
Article 694
Title@2025-07-21 (1): Synchronizing Task Behavior: Aligning Multiple Tasks during Test-Time Training
Title: Synchronizing Task Behavior: Aligning Multiple Tasks during Test-Time Training | Aufgabenverhalten synchronisieren: Mehrere Aufgaben während der Test-Time-Schulung ausrichten | 同步任务行为: 测试时训练中对齐多个任务 2507.07778v2 |
Authors (4): Wooseong Jeong, Jegyeong Cho, Youngho Yoon, Kuk-Jin Yoon
Generalizing neural networks to unseen target domains is a significant challenge in real-world deployments. Test-time training (TTT) addresses this by using an auxiliary self-supervised task to reduce the domain gap caused by distribution shifts between the source and target. However, we find that when models are required to perform multiple tasks under domain shifts, conventional TTT methods suffer from unsynchronized task behavior, where the adaptation steps needed for optimal performance in one task may not align with the requirements of other tasks. To address this, we propose a novel TTT approach called Synchronizing Tasks for Test-time Training (S4T), which enables the concurrent handling of multiple tasks. The core idea behind S4T is that predicting task relations across domain shifts is key to synchronizing tasks during test time. To validate our approach, we apply S4T to conventional multi-task benchmarks, integrating it with traditional TTT protocols. Our empirical results show that S4T outperforms state-of-the-art TTT methods across various benchmarks.
nan
Article 695
Title@2025-07-21 (1): Physics-Informed Learning of Proprietary Inverter Models for Grid Dynamic Studies
Title: Physics-Informed Learning of Proprietary Inverter Models for Grid Dynamic Studies | Physik-informiertes Lernen von proprietären Wechselrichtermodellen für Grid Dynamic Studies | 电网动态研究专有反转器模型物理学习 2507.15259v1 |
Authors (4): Kyung-Bin Kwon, Sayak Mukherjee, Ramij R. Hossain, Marcelo Elizondo
This letter develops a novel physics-informed neural ordinary differential equations-based framework to emulate the proprietary dynamics of the inverters – essential for improved accuracy in grid dynamic simulations. In current industry practice, the original equipment manufacturers (OEMs) often do not disclose the exact internal controls and parameters of the inverters, posing significant challenges in performing accurate dynamic simulations and other relevant studies, such as gain tunings for stability analysis and controls. To address this, we propose a Physics-Informed Latent Neural ODE Model (PI-LNM) that integrates system physics with neural learning layers to capture the unmodeled behaviors of proprietary units. The proposed method is validated using a grid-forming inverter (GFM) case study, demonstrating improved dynamic simulation accuracy over approaches that rely solely on data-driven learning without physics-based guidance.
nan
Article 696
Title@2025-07-21 (1): Interaction-Merged Motion Planning: Effectively Leveraging Diverse Motion Datasets for Robust Planning
Title: Interaction-Merged Motion Planning: Effectively Leveraging Diverse Motion Datasets for Robust Planning | Interaction-Merged Motion Planning: Diverse Motion-Datensätze für robuste Planung effektiv nutzen | 交互式组合式动态规划:有效利用多种移动式数据集进行强力规划 2507.04790v2 |
Authors (5): Giwon Lee, Wooseong Jeong, Daehee Park, Jaewoo Jeong, Kuk-Jin Yoon
Motion planning is a crucial component of autonomous robot driving. While various trajectory datasets exist, effectively utilizing them for a target domain remains challenging due to differences in agent interactions and environmental characteristics. Conventional approaches, such as domain adaptation or ensemble learning, leverage multiple source datasets but suffer from domain imbalance, catastrophic forgetting, and high computational costs. To address these challenges, we propose Interaction-Merged Motion Planning (IMMP), a novel approach that leverages parameter checkpoints trained on different domains during adaptation to the target domain. IMMP follows a two-step process: pre-merging to capture agent behaviors and interactions, sufficiently extracting diverse information from the source domain, followed by merging to construct an adaptable model that efficiently transfers diverse interactions to the target domain. Our method is evaluated on various planning benchmarks and models, demonstrating superior performance compared to conventional approaches.
nan
Article 697
Title@2025-07-21 (1): MEETI: A Multimodal ECG Dataset from MIMIC-IV-ECG with Signals, Images, Features and Interpretations
Title: MEETI: A Multimodal ECG Dataset from MIMIC-IV-ECG with Signals, Images, Features and Interpretations | MEETI: Ein multimodaler EKG-Datensatz von MIMIC-IV-ECG mit Signalen, Bildern, Features und Interpretationen | MIMIMIMI-IV-ECG的多式ECG数据集,带有信号、图像、特征和解释 2507.15255v1 |
Authors (7): Deyun Zhang, Xiang Lan, Shijia Geng, Qinghao Zhao, Sumei Fan, Mengling Feng, Shenda Hong
Electrocardiogram (ECG) plays a foundational role in modern cardiovascular care, enabling non-invasive diagnosis of arrhythmias, myocardial ischemia, and conduction disorders. While machine learning has achieved expert-level performance in ECG interpretation, the development of clinically deployable multimodal AI systems remains constrained, primarily due to the lack of publicly available datasets that simultaneously incorporate raw signals, diagnostic images, and interpretation text. Most existing ECG datasets provide only single-modality data or, at most, dual modalities, making it difficult to build models that can understand and integrate diverse ECG information in real-world settings. To address this gap, we introduce MEETI (MIMIC-IV-Ext ECG-Text-Image), the first large-scale ECG dataset that synchronizes raw waveform data, high-resolution plotted images, and detailed textual interpretations generated by large language models. In addition, MEETI includes beat-level quantitative ECG parameters extracted from each lead, offering structured parameters that support fine-grained analysis and model interpretability. Each MEETI record is aligned across four components: (1) the raw ECG waveform, (2) the corresponding plotted image, (3) extracted feature parameters, and (4) detailed interpretation text. This alignment is achieved using consistent, unique identifiers. This unified structure supports transformer-based multimodal learning and supports fine-grained, interpretable reasoning about cardiac health. By bridging the gap between traditional signal analysis, image-based interpretation, and language-driven understanding, MEETI established a robust foundation for the next generation of explainable, multimodal cardiovascular AI. It offers the research community a comprehensive benchmark for developing and evaluating ECG-based AI systems.
nan
Article 698
Title@2025-07-21 (1): Disentangling Homophily and Heterophily in Multimodal Graph Clustering
Title: Disentangling Homophily and Heterophily in Multimodal Graph Clustering | Entwirren von Homophilie und Heterophilie in multimodalen Graphenclustern | 在多式图表群集中分离同形和异形 2507.15253v1 |
Authors (5): Zhaochen Guo, Zhixiang Shen, Xuanting Xie, Liangjian Wen, Zhao Kang
Multimodal graphs, which integrate unstructured heterogeneous data with structured interconnections, offer substantial real-world utility but remain insufficiently explored in unsupervised learning. In this work, we initiate the study of multimodal graph clustering, aiming to bridge this critical gap. Through empirical analysis, we observe that real-world multimodal graphs often exhibit hybrid neighborhood patterns, combining both homophilic and heterophilic relationships. To address this challenge, we propose a novel framework – \textsc{Disentangled Multimodal Graph Clustering (DMGC)} – which decomposes the original hybrid graph into two complementary views: (1) a homophily-enhanced graph that captures cross-modal class consistency, and (2) heterophily-aware graphs that preserve modality-specific inter-class distinctions. We introduce a \emph{Multimodal Dual-frequency Fusion} mechanism that jointly filters these disentangled graphs through a dual-pass strategy, enabling effective multimodal integration while mitigating category confusion. Our self-supervised alignment objectives further guide the learning process without requiring labels. Extensive experiments on both multimodal and multi-relational graph datasets demonstrate that DMGC achieves state-of-the-art performance, highlighting its effectiveness and generalizability across diverse settings. Our code is available at https://github.com/Uncnbb/DMGC.
nan
Article 699
Title@2025-07-21 (1): Testing the spin-bath view of self-attention: A Hamiltonian analysis of GPT-2 Transformer
Title: Testing the spin-bath view of self-attention: A Hamiltonian analysis of GPT-2 Transformer | Testen der Spin-Bad-Ansicht der Selbstachtung: Eine Hamiltonian Analyse von GPT-2 Transformer | 测试自觉自觉的自吹泡泡视图:汉密尔顿对GPT-2变形器的分析 2507.00683v4 |
Authors (2): Satadeep Bhattacharjee, Seung-Cheol Lee
The recently proposed physics-based framework by Huo and Johnson~\cite{huo2024capturing} models the attention mechanism of Large Language Models (LLMs) as an interacting two-body spin system, offering a first-principles explanation for phenomena like repetition and bias. Building on this hypothesis, we extract the complete Query-Key weight matrices from a production-grade GPT-2 model and derive the corresponding effective Hamiltonian for every attention head. From these Hamiltonians, we obtain analytic phase boundaries and logit gap criteria that predict which token should dominate the next-token distribution for a given context. A systematic evaluation on 144 heads across 20 factual-recall prompts reveals a strong negative correlation between the theoretical logit gaps and the model’s empirical token rankings ($r\approx-0.70$, $p<10^{-3}$).Targeted ablations further show that suppressing the heads most aligned with the spin-bath predictions induces the anticipated shifts in output probabilities, confirming a causal link rather than a coincidental association. Taken together, our findings provide the first strong empirical evidence for the spin-bath analogy in a production-grade model. In this work, we utilize the context-field lens, which provides physics-grounded interpretability and motivates the development of novel generative models bridging theoretical condensed matter physics and artificial intelligence.
nan
Article 700
Title@2025-07-21 (1): A Practical Guide for Evaluating LLMs and LLM-Reliant Systems
Title: A Practical Guide for Evaluating LLMs and LLM-Reliant Systems | Ein praktischer Leitfaden für die Bewertung von LLMs und LLM-Reliant Systemen | 评估LLMM和LLM-Resiont Systems的实用指南 2506.13023v2 |
Authors (3): Ethan M. Rudd, Christopher Andrews, Philip Tully
Recent advances in generative AI have led to remarkable interest in using systems that rely on large language models (LLMs) for practical applications. However, meaningful evaluation of these systems in real-world scenarios comes with a distinct set of challenges, which are not well-addressed by synthetic benchmarks and de-facto metrics that are often seen in the literature. We present a practical evaluation framework which outlines how to proactively curate representative datasets, select meaningful evaluation metrics, and employ meaningful evaluation methodologies that integrate well with practical development and deployment of LLM-reliant systems that must adhere to real-world requirements and meet user-facing needs.
nan
Article 701
Title@2025-07-21 (1): Improving the Generation of VAEs with High Dimensional Latent Spaces by the use of Hyperspherical Coordinates
Title: Improving the Generation of VAEs with High Dimensional Latent Spaces by the use of Hyperspherical Coordinates | Verbesserung der Generierung von VAEs mit hochdimensionalen Latentenräumen durch den Einsatz von Hypersphärischen Koordinaten | 通过使用超球坐标改进具有高维维度低空空间的VAE的生成 2507.15900v1 |
Authors (5): Alejandro Ascarate, Leo Lebrat, Rodrigo Santa Cruz, Clinton Fookes, Olivier Salvado
Variational autoencoders (VAE) encode data into lower-dimensional latent vectors before decoding those vectors back to data. Once trained, decoding a random latent vector from the prior usually does not produce meaningful data, at least when the latent space has more than a dozen dimensions. In this paper, we investigate this issue by drawing insight from high dimensional statistics: in these regimes, the latent vectors of a standard VAE are by construction distributed uniformly on a hypersphere. We propose to formulate the latent variables of a VAE using hyperspherical coordinates, which allows compressing the latent vectors towards an island on the hypersphere, thereby reducing the latent sparsity and we show that this improves the generation ability of the VAE. We propose a new parameterization of the latent space with limited computational overhead.
nan
Article 702
Title@2025-07-21 (1): Spatio-Temporal Demand Prediction for Food Delivery Using Attention-Driven Graph Neural Networks
Title: Spatio-Temporal Demand Prediction for Food Delivery Using Attention-Driven Graph Neural Networks | Spatio-Temporale Nachfragevorhersage für die Lebensmittellieferung mit aufmerksamkeitsgetriebenem Graphen-Neural-Netzwerk | 利用引人注意的图形神经网络对食品提供情况进行时需求预测 2507.15246v1 |
Authors (2): Rabia Latief Bhat, Iqra Altaf Gillani
Accurate demand forecasting is critical for enhancing the efficiency and responsiveness of food delivery platforms, where spatial heterogeneity and temporal fluctuations in order volumes directly influence operational decisions. This paper proposes an attention-based Graph Neural Network framework that captures spatial-temporal dependencies by modeling the food delivery environment as a graph. In this graph, nodes represent urban delivery zones, while edges reflect spatial proximity and inter-regional order flow patterns derived from historical data. The attention mechanism dynamically weighs the influence of neighboring zones, enabling the model to focus on the most contextually relevant areas during prediction. Temporal trends are jointly learned alongside spatial interactions, allowing the model to adapt to evolving demand patterns. Extensive experiments on real-world food delivery datasets demonstrate the superiority of the proposed model in forecasting future order volumes with high accuracy. The framework offers a scalable and adaptive solution to support proactive fleet positioning, resource allocation, and dispatch optimization in urban food delivery operations.
nan
Article 703
Title@2025-07-21 (1): EVOLVE-X: Embedding Fusion and Language Prompting for User Evolution Forecasting on Social Media
Title: EVOLVE-X: Embedding Fusion and Language Prompting for User Evolution Forecasting on Social Media | EVOLVE-X: Integration von Fusionen und Sprachen für die Prognose der Nutzerentwicklung in sozialen Medien | EVOLVE-X:社会媒体用户演变预测的嵌入融合和语言提示 2507.16847v1 |
Authors (4): Ismail Hossain, Sai Puppala, Md Jahangir Alam, Sajedul Talukder
Social media platforms serve as a significant medium for sharing personal emotions, daily activities, and various life events, ensuring individuals stay informed about the latest developments. From the initiation of an account, users progressively expand their circle of friends or followers, engaging actively by posting, commenting, and sharing content. Over time, user behavior on these platforms evolves, influenced by demographic attributes and the networks they form. In this study, we present a novel approach that leverages open-source models Llama-3-Instruct, Mistral-7B-Instruct, Gemma-7B-IT through prompt engineering, combined with GPT-2, BERT, and RoBERTa using a joint embedding technique, to analyze and predict the evolution of user behavior on social media over their lifetime. Our experiments demonstrate the potential of these models to forecast future stages of a user’s social evolution, including network changes, future connections, and shifts in user activities. Experimental results highlight the effectiveness of our approach, with GPT-2 achieving the lowest perplexity (8.21) in a Cross-modal configuration, outperforming RoBERTa (9.11) and BERT, and underscoring the importance of leveraging Cross-modal configurations for superior performance. This approach addresses critical challenges in social media, such as friend recommendations and activity predictions, offering insights into the trajectory of user behavior. By anticipating future interactions and activities, this research aims to provide early warnings about potential negative outcomes, enabling users to make informed decisions and mitigate risks in the long term.
nan
Article 704
Title@2025-07-21 (1): Cross-Domain Few-Shot Learning with Coalescent Projections and Latent Space Reservation
Title: Cross-Domain Few-Shot Learning with Coalescent Projections and Latent Space Reservation | Cross-Domain Wenig-heißes Lernen mit koaleszierenden Projektionen und Latent Space Reservation | 与煤白预测和暗地空间保留有关的零热学习 2507.15243v1 |
Authors (5): Naeem Paeedeh, Mahardhika Pratama, Wolfgang Mayer, Jimmy Cao, Ryszard Kowlczyk
Despite the progress in Cross-Domain Few-Shot Learning (CD-FSL), a model pre-trained with DINO combined with a prototypical classifier outperforms the latest SOTA methods. A crucial limitation that needs to be overcome is that updating too many parameters of the transformers leads to overfitting due to the scarcity of labeled samples. To address this challenge, we propose a new concept, Coalescent Projection (CP), as an effective successor to soft prompts. Additionally, we propose a novel pseudo-class generation method combined with Self-Supervised Transformations (SSTs) that relies solely on the base domain to prepare the network for encountering unseen samples from different domains. The proposed method exhibits its effectiveness in comprehensive experiments on the extreme domain shift scenario of the BSCD-FSL benchmark. Our code is published at https://github.com/Naeem-Paeedeh/CPLSR.
nan
Article 705
Title@2025-07-21 (1): Exact Reformulation and Optimization for Direct Metric Optimization in Binary Imbalanced Classification
Title: Exact Reformulation and Optimization for Direct Metric Optimization in Binary Imbalanced Classification | Exakte Neuformulierung und Optimierung für die direkte Metrische Optimierung in der binären Imbalanced Classification | 二元平衡分类中直接计量优化的精确调整和优化 2507.15240v1 |
Authors (5): Le Peng, Yash Travadi, Chuan He, Ying Cui, Ju Sun
For classification with imbalanced class frequencies, i.e., imbalanced classification (IC), standard accuracy is known to be misleading as a performance measure. While most existing methods for IC resort to optimizing balanced accuracy (i.e., the average of class-wise recalls), they fall short in scenarios where the significance of classes varies or certain metrics should reach prescribed levels. In this paper, we study two key classification metrics, precision and recall, under three practical binary IC settings: fix precision optimize recall (FPOR), fix recall optimize precision (FROP), and optimize $F_\beta$-score (OFBS). Unlike existing methods that rely on smooth approximations to deal with the indicator function involved, \textit{we introduce, for the first time, exact constrained reformulations for these direct metric optimization (DMO) problems}, which can be effectively solved by exact penalty methods. Experiment results on multiple benchmark datasets demonstrate the practical superiority of our approach over the state-of-the-art methods for the three DMO problems. We also expect our exact reformulation and optimization (ERO) framework to be applicable to a wide range of DMO problems for binary IC and beyond. Our code is available at https://github.com/sun-umn/DMO.
nan
Article 706
Title@2025-07-21 (1): HEPPO-GAE: Hardware-Efficient Proximal Policy Optimization with Generalized Advantage Estimation
Title: HEPPO-GAE: Hardware-Efficient Proximal Policy Optimization with Generalized Advantage Estimation | HEPPO-GAE: Hardwareeffiziente proximale Politikoptimierung mit generalisierter Vorteilsschätzung | HEPPO-GAE: 采用通用的先进估计法优化政策 2501.12703v2 |
Authors (2): Hazem Taha, Ameer M. S. Abdelhadi
This paper introduces HEPPO-GAE, an FPGA-based accelerator designed to optimize the Generalized Advantage Estimation (GAE) stage in Proximal Policy Optimization (PPO). Unlike previous approaches that focused on trajectory collection and actor-critic updates, HEPPO-GAE addresses GAE’s computational demands with a parallel, pipelined architecture implemented on a single System-on-Chip (SoC). This design allows for the adaptation of various hardware accelerators tailored for different PPO phases. A key innovation is our strategic standardization technique, which combines dynamic reward standardization and block standardization for values, followed by 8-bit uniform quantization. This method stabilizes learning, enhances performance, and manages memory bottlenecks, achieving a 4x reduction in memory usage and a 1.5x increase in cumulative rewards. We propose a solution on a single SoC device with programmable logic and embedded processors, delivering throughput orders of magnitude higher than traditional CPU-GPU systems. Our single-chip solution minimizes communication latency and throughput bottlenecks, significantly boosting PPO training efficiency. Experimental results show a 30% increase in PPO speed and a substantial reduction in memory access time, underscoring HEPPO-GAE’s potential for broad applicability in hardware-efficient reinforcement learning algorithms.
nan
Article 707
Title@2025-07-21 (1): SOI Matters: Analyzing Multi-Setting Training Dynamics in Pretrained Language Models via Subsets of Interest
Title: SOI Matters: Analyzing Multi-Setting Training Dynamics in Pretrained Language Models via Subsets of Interest | SOI Matters: Analyse von Multi-Setting-Trainingsdynamiken in vorgebildeten Sprachmodellen über Teilmengen von Interesse | SOI事项:分析通过利益子集分析培训前语言模式中多设置培训动态 2507.15236v1 |
Authors (4): Shayan Vassef, Amirhossein Dabiriaghdam, Mohammadreza Bakhtiari, Yadollah Yaghoobzadeh
This work investigates the impact of multi-task, multi-lingual, and multi-source learning approaches on the robustness and performance of pretrained language models. To enhance this analysis, we introduce Subsets of Interest (SOI), a novel categorization framework that identifies six distinct learning behavior patterns during training, including forgettable examples, unlearned examples, and always correct examples. Through SOI transition heatmaps and dataset cartography visualization, we analyze how examples shift between these categories when transitioning from single-setting to multi-setting configurations. We perform comprehensive experiments across three parallel comparisons: multi-task vs. single-task learning using English tasks (entailment, paraphrase, sentiment), multi-source vs. single-source learning using sentiment analysis datasets, and multi-lingual vs. single-lingual learning using intent classification in French, English, and Persian. Our results demonstrate that multi-source learning consistently improves out-of-distribution performance by up to 7%, while multi-task learning shows mixed results with notable gains in similar task combinations. We further introduce a two-stage fine-tuning approach where the second stage leverages SOI-based subset selection to achieve additional performance improvements. These findings provide new insights into training dynamics and offer practical approaches for optimizing multi-setting language model performance.
nan
Article 708
Title@2025-07-21 (1): Accelerated Bayesian Optimal Experimental Design via Conditional Density Estimation and Informative Data
Title: Accelerated Bayesian Optimal Experimental Design via Conditional Density Estimation and Informative Data | Beschleunigte Bayesian Optimal Experimental Design über Conditional Density Abschätzung und Informative Data | 通过有条件密度估计和信息数据快速加速的巴伊西亚最佳实验设计 2507.15235v1 |
Authors (3): Miao Huang, Hongqiao Wang, Kunyu Wu
The Design of Experiments (DOEs) is a fundamental scientific methodology that provides researchers with systematic principles and techniques to enhance the validity, reliability, and efficiency of experimental outcomes. In this study, we explore optimal experimental design within a Bayesian framework, utilizing Bayes’ theorem to reformulate the utility expectation–originally expressed as a nested double integral–into an independent double integral form, significantly improving numerical efficiency. To further accelerate the computation of the proposed utility expectation, conditional density estimation is employed to approximate the ratio of two Gaussian random fields, while covariance serves as a selection criterion to identify informative datasets during model fitting and integral evaluation. In scenarios characterized by low simulation efficiency and high costs of raw data acquisition, key challenges such as surrogate modeling, failure probability estimation, and parameter inference are systematically restructured within the Bayesian experimental design framework. The effectiveness of the proposed methodology is validated through both theoretical analysis and practical applications, demonstrating its potential for enhancing experimental efficiency and decision-making under uncertainty.
nan
Article 709
Title@2025-07-21 (1): Plan and Budget: Effective and Efficient Test-Time Scaling on Large Language Model Reasoning
Title: Plan and Budget: Effective and Efficient Test-Time Scaling on Large Language Model Reasoning | Plan und Budget: Effektive und effiziente Test-Zeit-Skalierung auf großsprachliche Modell-Reasoning | 计划和预算:关于大语言示范理由的高效率、高效益、高效率的测试时间 2505.16122v2 |
Authors (7): Junhong Lin, Xinyue Zeng, Jie Zhu, Song Wang, Julian Shun, Jun Wu, Dawei Zhou
Large Language Models (LLMs) have achieved remarkable success in complex reasoning tasks, but their inference remains computationally inefficient. We observe a common failure mode in many prevalent LLMs, overthinking, where models generate verbose and tangential reasoning traces even for simple queries. Recent works have tried to mitigate this by enforcing fixed token budgets, however, this can lead to underthinking, especially on harder problems. Through empirical analysis, we identify that this inefficiency often stems from unclear problem-solving strategies. To formalize this, we develop a theoretical model, BBAM (Bayesian Budget Allocation Model), which models reasoning as a sequence of sub-questions with varying uncertainty, and introduce the $E^3$ metric to capture the trade-off between correctness and computation efficiency. Building on theoretical results from BBAM, we propose Plan-and-Budget, a model-agnostic, test-time framework that decomposes complex queries into sub-questions and allocates token budgets based on estimated complexity using adaptive scheduling. Plan-and-Budget improves reasoning efficiency across a range of tasks and models, achieving up to +70% accuracy gains, -39% token reduction, and +187.5% improvement in $E^3$. Notably, it elevates a smaller model (DS-Qwen-32B) to match the efficiency of a larger model (DS-LLaMA-70B)-demonstrating Plan-and-Budget’s ability to close performance gaps without retraining. Our code is available at https://github.com/junhongmit/P-and-B.
nan
Article 710
Title@2025-07-21 (1): Multimodal Fine-grained Reasoning for Post Quality Evaluation
Title: Multimodal Fine-grained Reasoning for Post Quality Evaluation | Multimodale feinkörnige Begründung für die Bewertung der Post-Qualität | 高质量后评价的多式联运优化理由 2507.17934v1 |
Authors (6): Xiaoxu Guo, Siyan Liang, Yachao Cui, Juxiang Zhou, Lei Wang, Han Cao
Accurately assessing post quality requires complex relational reasoning to capture nuanced topic-post relationships. However, existing studies face three major limitations: (1) treating the task as unimodal categorization, which fails to leverage multimodal cues and fine-grained quality distinctions; (2) introducing noise during deep multimodal fusion, leading to misleading signals; and (3) lacking the ability to capture complex semantic relationships like relevance and comprehensiveness. To address these issues, we propose the Multimodal Fine-grained Topic-post Relational Reasoning (MFTRR) framework, which mimics human cognitive processes. MFTRR reframes post-quality assessment as a ranking task and incorporates multimodal data to better capture quality variations. It consists of two key modules: (1) the Local-Global Semantic Correlation Reasoning Module, which models fine-grained semantic interactions between posts and topics at both local and global levels, enhanced by a maximum information fusion mechanism to suppress noise; and (2) the Multi-Level Evidential Relational Reasoning Module, which explores macro- and micro-level relational cues to strengthen evidence-based reasoning. We evaluate MFTRR on three newly constructed multimodal topic-post datasets and the public Lazada-Home dataset. Experimental results demonstrate that MFTRR significantly outperforms state-of-the-art baselines, achieving up to 9.52% NDCG@3 improvement over the best unimodal method on the Art History dataset.
nan
Article 711
Title@2025-07-21 (1): Robust and Differentially Private PCA for non-Gaussian data
Title: Robust and Differentially Private PCA for non-Gaussian data | Robustes und differenziert privates PCA für nicht-gaussische Daten | 用于非高加索数据的强力和有区别的私人五氯苯甲醚 2507.15232v1 |
Authors (2): Minwoo Kim, Sungkyu Jung
Recent advances have sparked significant interest in the development of privacy-preserving Principal Component Analysis (PCA). However, many existing approaches rely on restrictive assumptions, such as assuming sub-Gaussian data or being vulnerable to data contamination. Additionally, some methods are computationally expensive or depend on unknown model parameters that must be estimated, limiting their accessibility for data analysts seeking privacy-preserving PCA. In this paper, we propose a differentially private PCA method applicable to heavy-tailed and potentially contaminated data. Our approach leverages the property that the covariance matrix of properly rescaled data preserves eigenvectors and their order under elliptical distributions, which include Gaussian and heavy-tailed distributions. By applying a bounded transformation, we enable straightforward computation of principal components in a differentially private manner. Additionally, boundedness guarantees robustness against data contamination. We conduct both theoretical analysis and empirical evaluations of the proposed method, focusing on its ability to recover the subspace spanned by the leading principal components. Extensive numerical experiments demonstrate that our method consistently outperforms existing approaches in terms of statistical utility, particularly in non-Gaussian or contaminated data settings.
nan
Article 712
Title@2025-07-21 (1): Temporal Conformal Prediction (TCP): A Distribution-Free Statistical and Machine Learning Framework for Adaptive Risk Forecasting
Title: Temporal Conformal Prediction (TCP): A Distribution-Free Statistical and Machine Learning Framework for Adaptive Risk Forecasting | Temporal Conformal Prediction (TCP): Ein verteilungsfreies statistisches und maschinelles Lernkonzept für adaptive Risikoprognosen | 时空危机预测:用于适应风险预测的不分发的统计和机器学习框架 2507.05470v2 |
Authors (3): Agnideep Aich, Ashit Baran Aich, Dipak C. Jain
We propose Temporal Conformal Prediction (TCP), a principled framework for constructing well-calibrated prediction intervals for non-stationary time series. TCP integrates a machine learning-based quantile forecaster with an online conformal calibration layer. This layer’s thresholds are updated via a modified Robbins-Monro scheme, allowing the model to dynamically adapt to volatility clustering and regime shifts without rigid parametric assumptions. We benchmark TCP against GARCH, Historical Simulation, and static Quantile Regression across diverse financial assets. Our empirical results reveal a critical flaw in static methods: while sharp, Quantile Regression is poorly calibrated, systematically over-covering the nominal 95% target. In contrast, TCP’s adaptive mechanism actively works to achieve the correct coverage level, successfully navigating the coverage-sharpness tradeoff. Visualizations during the 2020 market crash confirm TCP’s superior adaptive response, and a comprehensive sensitivity analysis demonstrates the framework’s robustness to hyperparameter choices. Overall, TCP offers a practical and theoretically-grounded solution to the central challenge of calibrated uncertainty quantification for time series under distribution shift, advancing the interface between statistical inference and machine learning.
nan
Article 713
Title@2025-07-21 (1): Structural DID with ML: Theory, Simulation, and a Roadmap for Applied Research
Title: Structural DID with ML: Theory, Simulation, and a Roadmap for Applied Research | Strukturelle DID mit ML: Theorie, Simulation und Fahrplan für angewandte Forschung | ML结构:理论、模拟和应用研究路线图 2507.15899v1 |
Authors (3): Yile Yu, Anzhi Xu, Yi Wang
Causal inference in observational panel data has become a central concern in economics,policy analysis,and the broader social sciences.To address the core contradiction where traditional difference-in-differences (DID) struggles with high-dimensional confounding variables in observational panel data,while machine learning (ML) lacks causal structure interpretability,this paper proposes an innovative framework called S-DIDML that integrates structural identification with high-dimensional estimation.Building upon the structure of traditional DID methods,S-DIDML employs structured residual orthogonalization techniques (Neyman orthogonality+cross-fitting) to retain the group-time treatment effect (ATT) identification structure while resolving high-dimensional covariate interference issues.It designs a dynamic heterogeneity estimation module combining causal forests and semi-parametric models to capture spatiotemporal heterogeneity effects.The framework establishes a complete modular application process with standardized Stata implementation paths.The introduction of S-DIDML enriches methodological research on DID and DDML innovations, shifting causal inference from method stacking to architecture integration.This advancement enables social sciences to precisely identify policy-sensitive groups and optimize resource allocation.The framework provides replicable evaluation tools, decision optimization references,and methodological paradigms for complex intervention scenarios such as digital transformation policies and environmental regulations.
nan
Article 714
Title@2025-07-21 (1): Solving Formal Math Problems by Decomposition and Iterative Reflection
Title: Solving Formal Math Problems by Decomposition and Iterative Reflection | Formale Math-Probleme durch Zersetzung und iterative Reflexion lösen | 通过分解和迭代反射解决正规数学问题 2507.15225v1 |
Authors (17): Yichi Zhou, Jianqiu Zhao, Yongxin Zhang, Bohan Wang, Siran Wang, Luoxin Chen, Jiahui Wang, Haowei Chen, Allan Jie, Xinbo Zhang, Haocheng Wang, Luong Trung, Rong Ye, Phan Nhat Hoang, Huishuai Zhang, Peng Sun, Hang Li
General-purpose Large Language Models (LLMs) have achieved remarkable success in intelligence, performing comparably to human experts on complex reasoning tasks such as coding and mathematical reasoning. However, generating formal proofs in specialized languages like Lean 4 remains a significant challenge for these models, limiting their application in complex theorem proving and automated verification. Current approaches typically require specializing models through fine-tuning on dedicated formal corpora, incurring high costs for data collection and training. In this work, we introduce \textbf{Delta Prover}, an agent-based framework that orchestrates the interaction between a general-purpose LLM and the Lean 4 proof environment. Delta Prover leverages the reflection and reasoning capabilities of general-purpose LLMs to interactively construct formal proofs in Lean 4, circumventing the need for model specialization. At its core, the agent integrates two novel, interdependent components: an algorithmic framework for reflective decomposition and iterative proof repair, and a custom Domain-Specific Language (DSL) built upon Lean 4 for streamlined subproblem management. \textbf{Delta Prover achieves a state-of-the-art 95.9\% success rate on the miniF2F-test benchmark, surpassing all existing approaches, including those requiring model specialization.} Furthermore, Delta Prover exhibits a significantly stronger test-time scaling law compared to standard Best-of-N proof strategies. Crucially, our findings demonstrate that general-purpose LLMs, when guided by an effective agentic structure, possess substantial untapped theorem-proving capabilities. This presents a computationally efficient alternative to specialized models for robust automated reasoning in formal environments.
nan
Article 715
Title@2025-07-21 (1): Return Capping: Sample-Efficient CVaR Policy Gradient Optimisation
Title: Return Capping: Sample-Efficient CVaR Policy Gradient Optimisation | Return Capping: Sample-Efficient CVaR Policy Gradient Optimierung | 返回标记: 样本有效 CVaR 政策级政策优化优化 2504.20887v2 |
Authors (4): Harry Mead, Clarissa Costen, Bruno Lacerda, Nick Hawes
When optimising for conditional value at risk (CVaR) using policy gradients (PG), current methods rely on discarding a large proportion of trajectories, resulting in poor sample efficiency. We propose a reformulation of the CVaR optimisation problem by capping the total return of trajectories used in training, rather than simply discarding them, and show that this is equivalent to the original problem if the cap is set appropriately. We show, with empirical results in an number of environments, that this reformulation of the problem results in consistently improved performance compared to baselines. We have made all our code available here: https://github.com/HarryMJMead/cvar-return-capping.
nan
Article 716
Title@2025-07-21 (1): Misspecifying non-compensatory as compensatory IRT: analysis of estimated skills and variance
Title: Misspecifying non-compensatory as compensatory IRT: analysis of estimated skills and variance | Unbestimmtes Nicht-Kompensatorisches als kompensatorisches IRT: Analyse der geschätzten Fähigkeiten und Varianz | 排除作为补偿性IRT的非补偿性补偿:估计技能和差异分析 2507.15222v1 |
Authors (3): Hiroshi Tamano, Hideitsu Hino, Daichi Mochihashi
Multidimensional item response theory is a statistical test theory used to estimate the latent skills of learners and the difficulty levels of problems based on test results. Both compensatory and non-compensatory models have been proposed in the literature. Previous studies have revealed the substantial underestimation of higher skills when the non-compensatory model is misspecified as the compensatory model. However, the underlying mechanism behind this phenomenon has not been fully elucidated. It remains unclear whether overestimation also occurs and whether issues arise regarding the variance of the estimated parameters. In this paper, we aim to provide a comprehensive understanding of both underestimation and overestimation through a theoretical approach. In addition to the previously identified underestimation of the skills, we newly discover that the overestimation of skills occurs around the origin. Furthermore, we investigate the extent to which the asymptotic variance of the estimated parameters differs when considering model misspecification compared to when it is not taken into account.
nan
Article 717
Title@2025-07-21 (1): Detecting PTSD in Clinical Interviews: A Comparative Analysis of NLP Methods and Large Language Models
Title: Detecting PTSD in Clinical Interviews: A Comparative Analysis of NLP Methods and Large Language Models | PTSD in klinischen Interviews erkennen: Eine vergleichende Analyse von NLP-Methoden und großen Sprachmodellen | 临床访谈中检测创伤后创伤后精神紧张症:国家语言规划方法和大语言模式的比较分析 2504.01216v2 |
Authors (5): Feng Chen, Dror Ben-Zeev, Gillian Sparks, Arya Kadakia, Trevor Cohen
Post-Traumatic Stress Disorder (PTSD) remains underdiagnosed in clinical settings, presenting opportunities for automated detection to identify patients. This study evaluates natural language processing approaches for detecting PTSD from clinical interview transcripts. We compared general and mental health-specific transformer models (BERT/RoBERTa), embedding-based methods (SentenceBERT/LLaMA), and large language model prompting strategies (zero-shot/few-shot/chain-of-thought) using the DAIC-WOZ dataset. Domain-specific end-to-end models significantly outperformed general models (Mental-RoBERTa AUPRC=0.675+/-0.084 vs. RoBERTa-base 0.599+/-0.145). SentenceBERT embeddings with neural networks achieved the highest overall performance (AUPRC=0.758+/-0.128). Few-shot prompting using DSM-5 criteria yielded competitive results with two examples (AUPRC=0.737). Performance varied significantly across symptom severity and comorbidity status with depression, with higher accuracy for severe PTSD cases and patients with comorbid depression. Our findings highlight the potential of domain-adapted embeddings and LLMs for scalable screening while underscoring the need for improved detection of nuanced presentations and offering insights for developing clinically viable AI tools for PTSD assessment.
nan
Article 718
Title@2025-07-21 (1): Benchmarking Mobile Device Control Agents across Diverse Configurations
Title: Benchmarking Mobile Device Control Agents across Diverse Configurations | Benchmarking Mobile Device Control Agents über verschiedene Konfigurationen hinweg | 制定跨不同配置的移动设备控制工具基准 2404.16660v3 |
Authors (7): Juyong Lee, Taywon Min, Minyong An, Dongyoon Hahm, Haeone Lee, Changyeon Kim, Kimin Lee
Mobile device control agents can largely enhance user interactions and productivity by automating daily tasks. However, despite growing interest in developing practical agents, the absence of a commonly adopted benchmark in this area makes it challenging to quantify scientific progress. In this work, we introduce B-MoCA: a novel benchmark with interactive environments for evaluating and developing mobile device control agents. To create a realistic benchmark, we develop B-MoCA based on the Android operating system and define 131 common daily tasks. Importantly, we incorporate a randomization feature that changes the configurations of mobile devices, including user interface layouts and language settings, to assess generalization performance. We benchmark diverse agents, including agents employing large language models (LLMs) or multi-modal LLMs as well as agents trained with imitation learning using human expert demonstrations. While these agents demonstrate proficiency in executing straightforward tasks, their poor performance on complex tasks highlights significant opportunities for future research to improve effectiveness. Our source code is publicly available at https://b-moca.github.io.
nan
Article 719
Title@2025-07-21 (1): A Large Language Model-Enhanced Q-learning for Capacitated Vehicle Routing Problem with Time Windows
Title: A Large Language Model-Enhanced Q-learning for Capacitated Vehicle Routing Problem with Time Windows | Ein großes Sprachmodell-erweitertes Q-Lernen für kapazitierte Fahrzeugrouting-Probleme mit Zeitfenstern | 用时间窗口解决机动车辆停放问题大型语文强化快速学习模型 2505.06178v2 |
Authors (3): Linjiang Cao, Maonan Wang, Xi Xiong
The Capacitated Vehicle Routing Problem with Time Windows (CVRPTW) is a classic NP-hard combinatorial optimization problem widely applied in logistics distribution and transportation management. Its complexity stems from the constraints of vehicle capacity and time windows, which pose significant challenges to traditional approaches. Advances in Large Language Models (LLMs) provide new possibilities for finding approximate solutions to CVRPTW. This paper proposes a novel LLM-enhanced Q-learning framework to address the CVRPTW with real-time emergency constraints. Our solution introduces an adaptive two-phase training mechanism that transitions from the LLM-guided exploration phase to the autonomous optimization phase of Q-network. To ensure reliability, we design a three-tier self-correction mechanism based on the Chain-of-Thought (CoT) for LLMs: syntactic validation, semantic verification, and physical constraint enforcement. In addition, we also prioritized replay of the experience generated by LLMs to amplify the regulatory role of LLMs in the architecture. Experimental results demonstrate that our framework achieves a 7.3\% average reduction in cost compared to traditional Q-learning, with fewer training steps required for convergence.
nan
Article 720
Title@2025-07-21 (1): Federated Continual Instruction Tuning
Title: Federated Continual Instruction Tuning | Föderated Continual Instruction Tuning | 联邦连续教学 2503.12897v2 |
Authors (8): Haiyang Guo, Fanhu Zeng, Fei Zhu, Wenzhuo Liu, Da-Han Wang, Jian Xu, Xu-Yao Zhang, Cheng-Lin Liu
A vast amount of instruction tuning data is crucial for the impressive performance of Large Multimodal Models (LMMs), but the associated computational costs and data collection demands during supervised fine-tuning make it impractical for most researchers. Federated learning (FL) has the potential to leverage all distributed data and training resources to reduce the overhead of joint training. However, most existing methods assume a fixed number of tasks, while in real-world scenarios, clients continuously encounter new knowledge and often struggle to retain old tasks due to memory constraints. In this work, we introduce the Federated Continual Instruction Tuning (FCIT) benchmark to model this real-world challenge. Our benchmark includes two realistic scenarios, encompassing four different settings and twelve carefully curated instruction tuning datasets. To address the challenges posed by FCIT, we propose dynamic knowledge organization to effectively integrate updates from different tasks during training and subspace selective activation to allocate task-specific output during inference. Extensive experimental results demonstrate that our proposed method significantly enhances model performance across varying levels of data heterogeneity and catastrophic forgetting. Code and dataset are released at https://github.com/Ghy0501/FCIT.
nan
Article 721
Title@2025-07-21 (1): RetroDiff: Retrosynthesis as Multi-stage Distribution Interpolation
Title: RetroDiff: Retrosynthesis as Multi-stage Distribution Interpolation | RetroDiff: Retrosynthese als mehrstufige Verteilungsinterpolation | RetroDiff: 作为多阶段分销的回溯合成 2311.14077v2 |
Authors (7): Yiming Wang, Yuxuan Song, Yiqun Wang, Minkai Xu, Rui Wang, Hao Zhou, Wei-Ying Ma
Retrosynthesis poses a key challenge in biopharmaceuticals, aiding chemists in finding appropriate reactant molecules for given product molecules. With reactants and products represented as 2D graphs, retrosynthesis constitutes a conditional graph-to-graph (G2G) generative task. Inspired by advancements in discrete diffusion models for graph generation, we aim to design a diffusion-based method to address this problem. However, integrating a diffusion-based G2G framework while retaining essential chemical reaction template information presents a notable challenge. Our key innovation involves a multi-stage diffusion process. We decompose the retrosynthesis procedure to first sample external groups from the dummy distribution given products, then generate external bonds to connect products and generated groups. Interestingly, this generation process mirrors the reverse of the widely adapted semi-template retrosynthesis workflow, \emph{i.e.} from reaction center identification to synthon completion. Based on these designs, we introduce Retrosynthesis Diffusion (RetroDiff), a novel diffusion-based method for the retrosynthesis task. Experimental results demonstrate that RetroDiff surpasses all semi-template methods in accuracy, and outperforms template-based and template-free methods in large-scale scenarios and molecular validity, respectively. Code: https://github.com/Alsace08/RetroDiff.
nan
Article 722
Title@2025-07-21 (1): Joint-Local Grounded Action Transformation for Sim-to-Real Transfer in Multi-Agent Traffic Control
Title: Joint-Local Grounded Action Transformation for Sim-to-Real Transfer in Multi-Agent Traffic Control | Gemeinsam-Lokale Erdungstransformation für Sim-to-Real-Transfer in Multi-Agent Traffic Control | 在多机构交通管制中进行即时到实物转移的联合-当地行动转变 2507.15174v1 |
Authors (7): Justin Turnau, Longchao Da, Khoa Vo, Ferdous Al Rafi, Shreyas Bachiraju, Tiejin Chen, Hua Wei
Traffic Signal Control (TSC) is essential for managing urban traffic flow and reducing congestion. Reinforcement Learning (RL) offers an adaptive method for TSC by responding to dynamic traffic patterns, with multi-agent RL (MARL) gaining traction as intersections naturally function as coordinated agents. However, due to shifts in environmental dynamics, implementing MARL-based TSC policies in the real world often leads to a significant performance drop, known as the sim-to-real gap. Grounded Action Transformation (GAT) has successfully mitigated this gap in single-agent RL for TSC, but real-world traffic networks, which involve numerous interacting intersections, are better suited to a MARL framework. In this work, we introduce JL-GAT, an application of GAT to MARL-based TSC that balances scalability with enhanced grounding capability by incorporating information from neighboring agents. JL-GAT adopts a decentralized approach to GAT, allowing for the scalability often required in real-world traffic networks while still capturing key interactions between agents. Comprehensive experiments on various road networks under simulated adverse weather conditions, along with ablation studies, demonstrate the effectiveness of JL-GAT. The code is publicly available at https://github.com/DaRL-LibSignal/JL-GAT/.
nan
Article 723
Title@2025-07-21 (1): Better Models and Algorithms for Learning Ising Models from Dynamics
Title: Better Models and Algorithms for Learning Ising Models from Dynamics | Bessere Modelle und Algorithmen zum Lernen von Modellen aus der Dynamik | 从动态中学习型号模型的更好的模型和算法 2507.15173v1 |
Authors (3): Jason Gaitonde, Ankur Moitra, Elchanan Mossel
We study the problem of learning the structure and parameters of the Ising model, a fundamental model of high-dimensional data, when observing the evolution of an associated Markov chain. A recent line of work has studied the natural problem of learning when observing an evolution of the well-known Glauber dynamics [Bresler, Gamarnik, Shah, IEEE Trans. Inf. Theory 2018, Gaitonde, Mossel STOC 2024], which provides an arguably more realistic generative model than the classical i.i.d. setting. However, this prior work crucially assumes that all site update attempts are observed, \emph{even when this attempt does not change the configuration}: this strong observation model is seemingly essential for these approaches. While perhaps possible in restrictive contexts, this precludes applicability to most realistic settings where we can observe \emph{only} the stochastic evolution itself, a minimal and natural assumption for any process we might hope to learn from. However, designing algorithms that succeed in this more realistic setting has remained an open problem [Bresler, Gamarnik, Shah, IEEE Trans. Inf. Theory 2018, Gaitonde, Moitra, Mossel, STOC 2025]. In this work, we give the first algorithms that efficiently learn the Ising model in this much more natural observation model that only observes when the configuration changes. For Ising models with maximum degree $d$, our algorithm recovers the underlying dependency graph in time $\mathsf{poly}(d)\cdot n^2\log n$ and then the actual parameters in additional $\widetilde{O}(2^d n)$ time, which qualitatively matches the state-of-the-art even in the i.i.d. setting in a much weaker observation model. Our analysis holds more generally for a broader class of reversible, single-site Markov chains that also includes the popular Metropolis chain by leveraging more robust properties of reversible Markov chains.
nan
Article 724
Title@2025-07-21 (1): ReDi: Rectified Discrete Flow
Title: ReDi: Rectified Discrete Flow | ReDi: Rektifizierter diskreter Fluss | Redi: 纠正的分异流 2507.15897v1 |
Authors (3): Jaehoon Yoo, Wonjung Kim, Seunghoon Hong
Discrete Flow-based Models (DFMs) are powerful generative models for high-quality discrete data but typically suffer from slow sampling speeds due to their reliance on iterative decoding processes. This reliance on a multi-step process originates from the factorization approximation of DFMs, which is necessary for handling high-dimensional data. In this paper, we rigorously characterize the approximation error from factorization using Conditional Total Correlation (TC), which depends on the coupling. To reduce the Conditional TC and enable efficient few-step generation, we propose Rectified Discrete Flow (ReDi), a novel iterative method that reduces factorization error by rectifying the coupling between source and target distributions. We theoretically prove that each ReDi step guarantees a monotonic decreasing Conditional TC, ensuring its convergence. Empirically, ReDi significantly reduces Conditional TC and enables few-step generation. Moreover, we demonstrate that the rectified couplings are well-suited for training efficient one-step models on image generation. ReDi offers a simple and theoretically grounded approach for tackling the few-step challenge, providing a new perspective on efficient discrete data synthesis. Code is available at https://github.com/Ugness/ReDi_discrete
nan
Article 725
Title@2025-07-21 (1): LaViPlan : Language-Guided Visual Path Planning with RLVR
Title: LaViPlan : Language-Guided Visual Path Planning with RLVR | LaViPlan : Sprachgeführte visuelle Pfadplanung mit RLVR | Laviplan: RLVR 语言引导视觉路径规划 2507.12911v2 |
Authors (1): Hayeon Oh
Out-of-distribution (OOD) scenarios in autonomous driving refer to situations that deviate from the training domain, often leading to unexpected and potentially hazardous behavior from planners that lack prior exposure to such cases. Recently, Vision-Language Models (VLMs) have been introduced into autonomous driving research for their promising generalization capabilities in OOD settings. Early studies demonstrated that VLMs could recognize OOD scenarios and generate user-level decisions such as “go straight” or “turn right.” However, a new challenge has emerged due to the misalignment between the VLM’s high-level decisions or visual reasoning expressed in language, and the low-level predicted trajectories interpreted as actions. In this paper, we propose LaViPlan, a framework that leverages Reinforcement Learning with Verifiable Rewards (RLVR) to optimize VLMs using planning-oriented metrics. This approach addresses the vision-language-action misalignment observed in existing VLMs fine-tuned via supervised learning, which can recognize driving scenarios but often produce context-unaware decisions. Experimental results demonstrate that our method improves situational awareness and decision-making under OOD conditions, highlighting its potential to mitigate the misalignment issue. This work introduces a promising post-training paradigm for VLM agents in the context of autonomous driving.
nan
Article 726
Title@2025-07-20 (7): Designing User-Centric Metrics for Evaluation of Counterfactual Explanations
Title: Designing User-Centric Metrics for Evaluation of Counterfactual Explanations | Designing User-Centric Metrics für die Auswertung von gegenfaktischen Erklärungen | 设计用于评价反事实解释的用户中心计量器 2507.15162v1 |
Authors (5): Firdaus Ahmed Choudhury, Ethan Leicht, Jude Ethan Bislig, Hangzhi Guo, Amulya Yadav
Machine learning-based decision models are increasingly being used to make decisions that significantly impact people’s lives, but their opaque nature leaves end users without a clear understanding of why a decision was made. Counterfactual Explanations (CFEs) have grown in popularity as a means of offering actionable guidance by identifying the minimum changes in feature values required to flip a model’s prediction to something more desirable. Unfortunately, most prior research in CFEs relies on artificial evaluation metrics, such as proximity, which may overlook end-user preferences and constraints, e.g., the user’s perception of effort needed to make certain feature changes may differ from that of the model designer. To address this research gap, this paper makes three novel contributions. First, we conduct a pilot study with 20 crowd-workers on Amazon MTurk to experimentally validate the alignment of existing CF evaluation metrics with real-world user preferences. Results show that user-preferred CFEs matched those based on proximity in only 63.81% of cases, highlighting the limited applicability of these metrics in real-world settings. Second, inspired by the need to design a user-informed evaluation metric for CFEs, we conduct a more detailed two-day user study with 41 participants facing realistic credit application scenarios to find experimental support for or against three intuitive hypotheses that may explain how end users evaluate CFEs. Third, based on the findings of this second study, we propose the AWP model, a novel user-centric, two-stage model that describes one possible mechanism by which users evaluate and select CFEs. Our results show that AWP predicts user-preferred CFEs with 84.37% accuracy. Our study provides the first human-centered validation for personalized cost models in CFE generation and highlights the need for adaptive, user-centered evaluation metrics.
nan
Article 727
Title@2025-07-20 (7): Resonant-Tunnelling Diode Reservoir Computing System for Image Recognition
Title: Resonant-Tunnelling Diode Reservoir Computing System for Image Recognition | Resonant-Tunnelling Diode Reservoir Computing System für die Bilderkennung | 图像识别共振二氧化二氮储量计算系统 2507.15158v1 |
Authors (3): A. H. Abbas, Hend Abdel-Ghani, Ivan S. Maksymov
As artificial intelligence continues to push into real-time, edge-based and resource-constrained environments, there is an urgent need for novel, hardware-efficient computational models. In this study, we present and validate a neuromorphic computing architecture based on resonant-tunnelling diodes (RTDs), which exhibit the nonlinear characteristics ideal for physical reservoir computing (RC). We theoretically formulate and numerically implement an RTD-based RC system and demonstrate its effectiveness on two image recognition benchmarks: handwritten digit classification and object recognition using the Fruit~360 dataset. Our results show that this circuit-level architecture delivers promising performance while adhering to the principles of next-generation RC – eliminating random connectivity in favour of a deterministic nonlinear transformation of input signals.
nan
Article 728
Title@2025-07-20 (7): CBAGAN-RRT: Convolutional Block Attention Generative Adversarial Network for Sampling-Based Path Planning
Title: CBAGAN-RRT: Convolutional Block Attention Generative Adversarial Network for Sampling-Based Path Planning | CBAGAN-RRT: Convolutional Block Attention Generatives Adversarial Network für die stichprobengestützte Pfadplanung | CBAGAN-RRT: 以抽样为基础的路径规划革命性阻力引引引反向网络 2305.10442v3 |
Authors (2): Abhinav Sagar, Sai Teja Gilukara
Sampling-based path planning algorithms play an important role in autonomous robotics. However, a common problem among these algorithms is that the initial path generated is not optimal, and the convergence is too slow for real-world applications. In this paper, we propose a novel image-based learning algorithm using a Convolutional Block Attention Generative Adversarial Network (CBAGAN-RRT) with a combination of spatial and channel attention and a novel loss function to design the heuristics, find a better optimal path, and improve the convergence of the algorithm, both concerning time and speed. The probability distribution of the paths generated from our GAN model is used to guide the sampling process for the RRT algorithm. We demonstrate that our algorithm outperforms the previous state-of-the-art algorithms using both the image quality generation metrics, like IOU Score, Dice Score, FID score, and path planning metrics like time cost and the number of nodes. Ablation studies show the effectiveness of various components in our network architecture. The advantage of our approach is that we can avoid the complicated preprocessing in the state space, our model can be generalized to complex environments like those containing turns and narrow passages without loss of accuracy, and our model can be easily integrated with other sampling-based path planning algorithms.
nan
Article 729
Title@2025-07-20 (7): Constraint-aware Learning of Probabilistic Sequential Models for Multi-Label Classification
Title: Constraint-aware Learning of Probabilistic Sequential Models for Multi-Label Classification | Constraint-aware Learning of Probabilistic Sequential Models for Multi-Label Classification | 严格了解多标签分类概率序列模型 2507.15156v1 |
Authors (4): Mykhailo Buleshnyi, Anna Polova, Zsolt Zombori, Michael Benedikt
We investigate multi-label classification involving large sets of labels, where the output labels may be known to satisfy some logical constraints. We look at an architecture in which classifiers for individual labels are fed into an expressive sequential model, which produces a joint distribution. One of the potential advantages for such an expressive model is its ability to modelling correlations, as can arise from constraints. We empirically demonstrate the ability of the architecture both to exploit constraints in training and to enforce constraints at inference time.
nan
Article 730
Title@2025-07-20 (7): Studying Classifier(-Free) Guidance From a Classifier-Centric Perspective
Title: Studying Classifier(-Free) Guidance From a Classifier-Centric Perspective | Studieren Klassifikator(-frei) Anleitung aus einer klassifikator-zentralen Perspektive | 从分类分类中心角度研究分类(无)指导 2503.10638v2 |
Authors (2): Xiaoming Zhao, Alexander G. Schwing
Classifier-free guidance has become a staple for conditional generation with denoising diffusion models. However, a comprehensive understanding of classifier-free guidance is still missing. In this work, we carry out an empirical study to provide a fresh perspective on classifier-free guidance. Concretely, instead of solely focusing on classifier-free guidance, we trace back to the root, i.e., classifier guidance, pinpoint the key assumption for the derivation, and conduct a systematic study to understand the role of the classifier. We find that both classifier guidance and classifier-free guidance achieve conditional generation by pushing the denoising diffusion trajectories away from decision boundaries, i.e., areas where conditional information is usually entangled and is hard to learn. Based on this classifier-centric understanding, we propose a generic postprocessing step built upon flow-matching to shrink the gap between the learned distribution for a pre-trained denoising diffusion model and the real data distribution, majorly around the decision boundaries. Experiments on various datasets verify the effectiveness of the proposed approach.
nan
Article 731
Title@2025-07-20 (7): What Level of Automation is “Good Enough”? A Benchmark of Large Language Models for Meta-Analysis Data Extraction
Title: What Level of Automation is “Good Enough”? A Benchmark of Large Language Models for Meta-Analysis Data Extraction | Welche Stufe der Automatisierung ist “Gut genug”? Ein Benchmark für große Sprachmodelle für die Meta-Analyse-Datenextraktion | 自动化的等级是“好到好”? 元分析数据提取大语言模式的基准 2507.15152v1 |
Authors (3): Lingbo Li, Anuradha Mathrani, Teo Susnjak
Automating data extraction from full-text randomised controlled trials (RCTs) for meta-analysis remains a significant challenge. This study evaluates the practical performance of three LLMs (Gemini-2.0-flash, Grok-3, GPT-4o-mini) across tasks involving statistical results, risk-of-bias assessments, and study-level characteristics in three medical domains: hypertension, diabetes, and orthopaedics. We tested four distinct prompting strategies (basic prompting, self-reflective prompting, model ensemble, and customised prompts) to determine how to improve extraction quality. All models demonstrate high precision but consistently suffer from poor recall by omitting key information. We found that customised prompts were the most effective, boosting recall by up to 15\%. Based on this analysis, we propose a three-tiered set of guidelines for using LLMs in data extraction, matching data types to appropriate levels of automation based on task complexity and risk. Our study offers practical advice for automating data extraction in real-world meta-analyses, balancing LLM efficiency with expert oversight through targeted, task-specific automation.
nan
Article 732
Title@2025-07-20 (7): Design of an Edge-based Portable EHR System for Anemia Screening in Remote Health Applications
Title: Design of an Edge-based Portable EHR System for Anemia Screening in Remote Health Applications | Design eines Edge-basierten tragbaren EHR-Systems für die Anämie-Screening in Remote Health-Anwendungen | 设计一个以边缘为基础的远程保健应用中贫血筛查的便携EHR系统 2507.15146v1 |
Authors (5): Sebastian A. Cruz Romero, Misael J. Mercado Hernandez, Samir Y. Ali Rivera, Jorge A. Santiago Fernandez, Wilfredo E. Lugo Beauchamp
The design of medical systems for remote, resource-limited environments faces persistent challenges due to poor interoperability, lack of offline support, and dependency on costly infrastructure. Many existing digital health solutions neglect these constraints, limiting their effectiveness for frontline health workers in underserved regions. This paper presents a portable, edge-enabled Electronic Health Record platform optimized for offline-first operation, secure patient data management, and modular diagnostic integration. Running on small-form factor embedded devices, it provides AES-256 encrypted local storage with optional cloud synchronization for interoperability. As a use case, we integrated a non-invasive anemia screening module leveraging fingernail pallor analysis. Trained on 250 patient cases (27\% anemia prevalence) with KDE-balanced data, the Random Forest model achieved a test RMSE of 1.969 g/dL and MAE of 1.490 g/dL. A severity-based model reached 79.2\% sensitivity. To optimize performance, a YOLOv8n-based nail bed detector was quantized to INT8, reducing inference latency from 46.96 ms to 21.50 ms while maintaining mAP@0.5 at 0.995. The system emphasizes low-cost deployment, modularity, and data privacy compliance (HIPAA/GDPR), addressing critical barriers to digital health adoption in disconnected settings. Our work demonstrates a scalable approach to enhance portable health information systems and support frontline healthcare in underserved regions.
nan
Article 733
Title@2025-07-20 (7): Quantum Machine Learning for Secure Cooperative Multi-Layer Edge AI with Proportional Fairness
Title: Quantum Machine Learning for Secure Cooperative Multi-Layer Edge AI with Proportional Fairness | Quantum Machine Learning für sichere kooperative Multi-Layer Edge KI mit proportionaler Fairness | 以比例公平方式进行量子学习,确保多层合作和多层边缘安全合作 2507.15145v1 |
Authors (2): Thai T. Vu, John Le
This paper proposes a communication-efficient, event-triggered inference framework for cooperative edge AI systems comprising multiple user devices and edge servers. Building upon dual-threshold early-exit strategies for rare-event detection, the proposed approach extends classical single-device inference to a distributed, multi-device setting while incorporating proportional fairness constraints across users. A joint optimization framework is formulated to maximize classification utility under communication, energy, and fairness constraints. To solve the resulting problem efficiently, we exploit the monotonicity of the utility function with respect to the confidence thresholds and apply alternating optimization with Benders decomposition. Experimental results show that the proposed framework significantly enhances system-wide performance and fairness in resource allocation compared to single-device baselines.
nan
Article 734
Title@2025-07-20 (7): Transforming Datasets to Requested Complexity with Projection-based Many-Objective Genetic Algorithm
Title: Transforming Datasets to Requested Complexity with Projection-based Many-Objective Genetic Algorithm | Transformation von Datensätzen auf geforderte Komplexität mit Projektions-basiertem Viel-Objektive-Genetischen Algorithmus | 将数据集转换为具有基于投影的多目标遗传算法的复杂度 2507.15132v1 |
Authors (1): Joanna Komorniczak
The research community continues to seek increasingly more advanced synthetic data generators to reliably evaluate the strengths and limitations of machine learning methods. This work aims to increase the availability of datasets encompassing a diverse range of problem complexities by proposing a genetic algorithm that optimizes a set of problem complexity measures for classification and regression tasks towards specific targets. For classification, a set of 10 complexity measures was used, while for regression tasks, 4 measures demonstrating promising optimization capabilities were selected. Experiments confirmed that the proposed genetic algorithm can generate datasets with varying levels of difficulty by transforming synthetically created datasets to achieve target complexity values through linear feature projections. Evaluations involving state-of-the-art classifiers and regressors revealed a correlation between the complexity of the generated data and the recognition quality.
nan
Article 735
Title@2025-07-20 (7): A Semantic-based Optimization Approach for Repairing LLMs: Case Study on Code Generation
Title: A Semantic-based Optimization Approach for Repairing LLMs: Case Study on Code Generation | Ein semantisch-basierter Optimierungsansatz zur Reparatur von LLMs: Fallstudie zur Codegenerierung | 修复LLMLM 的基于语义的优化优化方法:关于代码生成的案例研究 2503.12899v3 |
Authors (4): Jian Gu, Aldeida Aleti, Chunyang Chen, Hongyu Zhang
Language Models (LMs) are widely used in software engineering for code generation, but they may produce code with errors. Rather than repairing the generated code, an alternative way is to address the underlying failures of models. LM repair offers a lightweight solution to this challenge: it requires minimal data, reduces computational costs, and reduces the side effects. Unlike retraining, LM repair focuses on applying tailored updates to targeted neurons, making it ideal for scenarios with limited resources, high-performance demands, or strict safety requirements. In this paper, we propose Semantic Targeting for Analytical Repair (STAR), a pioneering and novel semantic-based optimization approach for repairing LLMs. STAR realizes the main operations of repairing LMs in an optimization process, including locating buggy neurons'', solving
neuron patches’’, and patching ``buggy neurons’’. Correspondingly, it computes the deltas of weight matrix as the prior information to guide optimization; and attributes the targeted layers and neurons leveraging statistical insights. The neuron patches are computed with a solid semantic-based analytical formula, which directly bridges the changes to logits with the deltas of neurons, by steering latent representations. Compared to the prior work of LM repair (MINT) and optimization methods (SGD), STAR integrates their strengths while mitigating their limitations. STAR supports solving multiple failures together, significantly improving the usefulness. Evaluated on coding tasks using popular code LMs, STAR exhibits superior effectiveness (10.5%-19.9% improvements) and efficiency (2.4-7.0 times speedup). In terms of side effects, namely the balance between generalization and specificity, STAR outperforms prior work by a significant margin. Additionally, we conducted assessments on the overfitting risk of LM repair as well as the cumulative impact.
nan
Article 736
Title@2025-07-20 (7): Restrictions on Physical Stochastic Reservoir Computers
Title: Restrictions on Physical Stochastic Reservoir Computers | Beschränkungen der physikalischen stochastischen Speicherrechner | 限制物理储藏电脑 2307.14474v5 |
Authors (1): Anthony M. Polloreno
Reservoir computation is a recurrent framework for learning and predicting time series data, that benefits from extremely simple training and interpretability, often as the the dynamics of a physical system. In this paper, we will study the impact of noise on the learning capabilities of analog reservoir computers. Recent work on reservoir computation has shown that the information processing capacity (IPC) is a useful metric for quantifying the degradation of the performance due to noise. We further this analysis and demonstrate that this degradation of the IPC limits the possible features that can be meaningfully constructed in an analog reservoir computing setting. We borrow a result from quantum complexity theory that relates the circuit model of computation to a continuous time model, and demonstrate an exponential reduction in the accessible volume of reservoir configurations. We conclude by relating this degradation in the IPC to the fat-shattering dimension of a family of functions describing the reservoir dynamics, which allows us to express our result in terms of a classification task. We conclude that any physical, analog reservoir computer that is exposed to noise can only be used to perform a polynomial amount of learning, despite the exponentially large latent space, even with an exponential amount of post-processing.
nan
Article 737
Title@2025-07-20 (7): Are We Overlooking the Dimensions? Learning Latent Hierarchical Channel Structure for High-Dimensional Time Series Forecasting
Title: Are We Overlooking the Dimensions? Learning Latent Hierarchical Channel Structure for High-Dimensional Time Series Forecasting | Sind wir über die Dimensionen? Lernen Latent Hierarchical Channel Struktur für High-Dimensional Time Series Forecasting | 我们是不是忽略了维度?学习高级时代系列预测的 旧高阶通道结构 2507.15119v1 |
Authors (7): Juntong Ni, Shiyu Wang, Zewen Liu, Xiaoming Shi, Xinyue Zhong, Zhou Ye, Wei Jin
Time series forecasting (TSF) is a central problem in time series analysis. However, as the number of channels in time series datasets scales to the thousands or more, a scenario we define as High-Dimensional Time Series Forecasting (HDTSF), it introduces significant new modeling challenges that are often not the primary focus of traditional TSF research. HDTSF is challenging because the channel correlation often forms complex and hierarchical patterns. Existing TSF models either ignore these interactions or fail to scale as dimensionality grows. To address this issue, we propose U-Cast, a channel-dependent forecasting architecture that learns latent hierarchical channel structures with an innovative query-based attention. To disentangle highly correlated channel representation, U-Cast adds a full-rank regularization during training. We also release Time-HD, a benchmark of large, diverse, high-dimensional datasets. Our theory shows that exploiting cross-channel information lowers forecasting risk, and experiments on Time-HD demonstrate that U-Cast surpasses strong baselines in both accuracy and efficiency. Together, U-Cast and Time-HD provide a solid basis for future HDTSF research.
nan
Article 738
Title@2025-07-20 (7): Graph Attention Networks for Detecting Epilepsy from EEG Signals Using Accessible Hardware in Low-Resource Settings
Title: Graph Attention Networks for Detecting Epilepsy from EEG Signals Using Accessible Hardware in Low-Resource Settings | Graph Aufmerksamkeit Netzwerke zur Erkennung von Epilepsie von EEG-Signalen mit zugänglicher Hardware in Low-Resource-Einstellungen | 低资源设置设置中使用无障碍硬件从EEG信号中检测出癫痫的图示关注网络 2507.15118v1 |
Authors (3): Szymon Mazurek, Stephen Moore, Alessandro Crimi
Goal: Epilepsy remains under-diagnosed in low-income countries due to scarce neurologists and costly diagnostic tools. We propose a graph-based deep learning framework to detect epilepsy from low-cost Electroencephalography (EEG) hardware, tested on recordings from Nigeria and Guinea-Bissau. Our focus is on fair, accessible automatic assessment and explainability to shed light on epilepsy biomarkers. Methods: We model EEG signals as spatio-temporal graphs, classify them, and identify interchannel relationships and temporal dynamics using graph attention networks (GAT). To emphasize connectivity biomarkers, we adapt the inherently node-focused GAT to analyze edges. We also designed signal preprocessing for low-fidelity recordings and a lightweight GAT architecture trained on Google Colab and deployed on RaspberryPi devices. Results: The approach achieves promising classification performance, outperforming a standard classifier based on random forest and graph convolutional networks in terms of accuracy and robustness over multiple sessions, but also highlighting specific connections in the fronto-temporal region. Conclusions: The results highlight the potential of GATs to provide insightful and scalable diagnostic support for epilepsy in underserved regions, paving the way for affordable and accessible neurodiagnostic tools.
nan
Article 739
Title@2025-07-20 (7): Distributional Unlearning: Forgetting Distributions, Not Just Samples
Title: Distributional Unlearning: Forgetting Distributions, Not Just Samples | Verteilungsloses Lernen: Verteilungen vergessen, nicht nur Proben | 分发的不学习:忘记分发,而不仅仅是抽样 2507.15112v1 |
Authors (3): Youssef Allouah, Rachid Guerraoui, Sanmi Koyejo
Machine unlearning seeks to remove unwanted information from trained models, initially at the individual-sample level, but increasingly at the level of entire sub-populations. In many deployments, models must delete whole topical domains to satisfy privacy, legal, or quality requirements, e.g., removing several users’ posts under GDPR or copyrighted web content. Existing unlearning tools remain largely sample-oriented, and straightforward point deletion often leaves enough residual signal for downstream learners to recover the unwanted domain. We introduce distributional unlearning, a data-centric, model-agnostic framework that asks: Given examples from an unwanted distribution and a retained distribution, what is the smallest set of points whose removal makes the edited dataset far from the unwanted domain yet close to the retained one? Using Kullback-Leibler divergence to quantify removal and preservation, we derive the exact Pareto frontier in the Gaussian case and prove that any model retrained on the edited data incurs log-loss shifts bounded by the divergence thresholds. We propose a simple distance-based selection rule satisfying these constraints with a quadratic reduction in deletion budget compared to random removal. Experiments on synthetic Gaussians, Jigsaw Toxic Comments, SMS spam, and CIFAR-10 show 15-72% fewer deletions than random, with negligible impact on retained performance.
nan
Article 740
Title@2025-07-20 (7): LoopNet: A Multitasking Few-Shot Learning Approach for Loop Closure in Large Scale SLAM
Title: LoopNet: A Multitasking Few-Shot Learning Approach for Loop Closure in Large Scale SLAM | LoopNet: Ein multitasking weniger heißer Lernansatz für Loop Closure in Large Scale SLAM | 环网:大规模SLAMM环圈封闭的多任务、很少热的多学习方法 2507.15109v1 |
Authors (3): Mohammad-Maher Nakshbandi, Ziad Sharawy, Sorin Grigorescu
One of the main challenges in the Simultaneous Localization and Mapping (SLAM) loop closure problem is the recognition of previously visited places. In this work, we tackle the two main problems of real-time SLAM systems: 1) loop closure detection accuracy and 2) real-time computation constraints on the embedded hardware. Our LoopNet method is based on a multitasking variant of the classical ResNet architecture, adapted for online retraining on a dynamic visual dataset and optimized for embedded devices. The online retraining is designed using a few-shot learning approach. The architecture provides both an index into the queried visual dataset, and a measurement of the prediction quality. Moreover, by leveraging DISK (DIStinctive Keypoints) descriptors, LoopNet surpasses the limitations of handcrafted features and traditional deep learning methods, offering better performance under varying conditions. Code is available at https://github.com/RovisLab/LoopNet. Additinally, we introduce a new loop closure benchmarking dataset, coined LoopDB, which is available at https://github.com/RovisLab/LoopDB.
nan
Article 741
Title@2025-07-20 (7): Beyond Sin-Squared Error: Linear-Time Entrywise Uncertainty Quantification for Streaming PCA
Title: Beyond Sin-Squared Error: Linear-Time Entrywise Uncertainty Quantification for Streaming PCA | Über Sin-Squared-Fehler hinaus: Linear-Time Entrywise Uncertainty Quantification for Streaming PCA | Sin-Squred 错误:流动五氯苯甲醚的线性时序入门不确定性的量化 2506.12655v2 |
Authors (3): Syamantak Kumar, Shourya Pandey, Purnamrita Sarkar
We propose a novel statistical inference framework for streaming principal component analysis (PCA) using Oja’s algorithm, enabling the construction of confidence intervals for individual entries of the estimated eigenvector. Most existing works on streaming PCA focus on providing sharp sin-squared error guarantees. Recently, there has been some interest in uncertainty quantification for the sin-squared error. However, uncertainty quantification or sharp error guarantees for entries of the estimated eigenvector in the streaming setting remains largely unexplored. We derive a sharp Bernstein-type concentration bound for elements of the estimated vector matching the optimal error rate up to logarithmic factors. We also establish a Central Limit Theorem for a suitably centered and scaled subset of the entries. To efficiently estimate the coordinate-wise variance, we introduce a provably consistent subsampling algorithm that leverages the median-of-means approach, empirically achieving similar accuracy to multiplier bootstrap methods while being significantly more computationally efficient. Numerical experiments demonstrate its effectiveness in providing reliable uncertainty estimates with a fraction of the computational cost of existing methods.
nan
Article 742
Title@2025-07-20 (7): AnalogFed: Federated Discovery of Analog Circuit Topologies with Generative AI
Title: AnalogFed: Federated Discovery of Analog Circuit Topologies with Generative AI | AnalogFed: Federated Discovery of Analog Circuit Topologies with Generative AI | 模拟: 具有生成性人工智能的模拟电路地形的联邦发现 2507.15104v1 |
Authors (6): Qiufeng Li, Shu Hong, Jian Gao, Xuan Zhang, Tian Lan, Weidong Cao
Recent breakthroughs in AI/ML offer exciting opportunities to revolutionize analog design automation through data-driven approaches. In particular, researchers are increasingly fascinated by harnessing the power of generative AI to automate the discovery of novel analog circuit topologies. Unlocking the full potential of generative AI in these data-driven discoveries requires access to large and diverse datasets.Yet, there is a significant barrier in the analog domain–Analog circuit design is inherently proprietary, involving not only confidential circuit structures but also the underlying commercial semiconductor processes. As a result, current generative AI research is largely confined to individual researchers who construct small, narrowly focused private datasets. This fragmentation severely limits collaborative innovation and impedes progress across the research community. To address these challenges, we propose AnalogFed. AnalogFed enables collaborative topology discovery across decentralized clients (e.g., individual researchers or institutions) without requiring the sharing of raw private data. To make this vision practical, we introduce a suite of techniques tailored to the unique challenges of applying FedL in analog design–from generative model development and data heterogeneity handling to privacy-preserving strategies that ensure both flexibility and security for circuit designers and semiconductor manufacturers. Extensive experiments across varying client counts and dataset sizes demonstrate that AnalogFed achieves performance comparable to centralized baselines–while maintaining strict data privacy. Specifically, the generative AI model within AnalogFed achieves state-of-the-art efficiency and scalability in the design of analog circuit topologies.
nan
Article 743
Title@2025-07-20 (7): PhysioWave: A Multi-Scale Wavelet-Transformer for Physiological Signal Representation
Title: PhysioWave: A Multi-Scale Wavelet-Transformer for Physiological Signal Representation | PhysioWave: Multi-Scale Wavelet-Transformer für Physiologische Signaldarstellung | PhysioWave: 生理信号代表的多阶段波盘转换器 2506.10351v2 |
Authors (6): Yanlong Chen, Mattia Orlandi, Pierangelo Maria Rapa, Simone Benatti, Luca Benini, Yawei Li
Physiological signals are often corrupted by motion artifacts, baseline drift, and other low-SNR disturbances, which pose significant challenges for analysis. Additionally, these signals exhibit strong non-stationarity, with sharp peaks and abrupt changes that evolve continuously, making them difficult to represent using traditional time-domain or filtering methods. To address these issues, a novel wavelet-based approach for physiological signal analysis is presented, aiming to capture multi-scale time-frequency features in various physiological signals. Leveraging this technique, two large-scale pretrained models specific to EMG and ECG are introduced for the first time, achieving superior performance and setting new baselines in downstream tasks. Additionally, a unified multi-modal framework is constructed by integrating pretrained EEG model, where each modality is guided through its dedicated branch and fused via learnable weighted fusion. This design effectively addresses challenges such as low signal-to-noise ratio, high inter-subject variability, and device mismatch, outperforming existing methods on multi-modal tasks. The proposed wavelet-based architecture lays a solid foundation for analysis of diverse physiological signals, while the multi-modal design points to next-generation physiological signal processing with potential impact on wearable health monitoring, clinical diagnostics, and broader biomedical applications.
nan
Article 744
Title@2025-07-20 (7): Learning under Latent Group Sparsity via Diffusion on Networks
Title: Learning under Latent Group Sparsity via Diffusion on Networks | Lernen unter Latent Group Sparsity über Diffusion in Netzwerken | 通过网络传播在中端群体平等下学习 2507.15097v1 |
Authors (2): Subhroshekhar Ghosh, Soumendu Sundar Mukherjee
Group or cluster structure on explanatory variables in machine learning problems is a very general phenomenon, which has attracted broad interest from practitioners and theoreticians alike. In this work we contribute an approach to sparse learning under such group structure, that does not require prior information on the group identities. Our paradigm is motivated by the Laplacian geometry of an underlying network with a related community structure, and proceeds by directly incorporating this into a penalty that is effectively computed via a heat-flow-based local network dynamics. The proposed penalty interpolates between the lasso and the group lasso penalties, the runtime of the heat-flow dynamics being the interpolating parameter. As such it can automatically default to lasso when the group structure reflected in the Laplacian is weak. In fact, we demonstrate a data-driven procedure to construct such a network based on the available data. Notably, we dispense with computationally intensive pre-processing involving clustering of variables, spectral or otherwise. Our technique is underpinned by rigorous theorems that guarantee its effective performance and provide bounds on its sample complexity. In particular, in a wide range of settings, it provably suffices to run the diffusion for time that is only logarithmic in the problem dimensions. We explore in detail the interfaces of our approach with key statistical physics models in network science, such as the Gaussian Free Field and the Stochastic Block Model. Our work raises the possibility of applying similar diffusion-based techniques to classical learning tasks, exploiting the interplay between geometric, dynamical and stochastic structures underlying the data.
nan
Article 745
Title@2025-07-20 (7): Enhancing Lung Disease Diagnosis via Semi-Supervised Machine Learning
Title: Enhancing Lung Disease Diagnosis via Semi-Supervised Machine Learning | Verbesserung der Diagnose von Lungenerkrankungen durch semi-überwachtes maschinelles Lernen | 通过半监督机器学习加强肺病诊断 2507.16845v1 |
Authors (3): Xiaoran Xua, In-Ho Rab, Ravi Sankarc
Lung diseases, including lung cancer and COPD, are significant health concerns globally. Traditional diagnostic methods can be costly, time-consuming, and invasive. This study investigates the use of semi supervised learning methods for lung sound signal detection using a model combination of MFCC+CNN. By introducing semi supervised learning modules such as Mix Match, Co-Refinement, and Co Refurbishing, we aim to enhance the detection performance while reducing dependence on manual annotations. With the add-on semi-supervised modules, the accuracy rate of the MFCC+CNN model is 92.9%, an increase of 3.8% to the baseline model. The research contributes to the field of lung disease sound detection by addressing challenges such as individual differences, feature insufficient labeled data.
nan
Article 746
Title@2025-07-20 (7): ModelVerification.jl: a Comprehensive Toolbox for Formally Verifying Deep Neural Networks
Title: ModelVerification.jl: a Comprehensive Toolbox for Formally Verifying Deep Neural Networks | ModelVerification.jl: eine umfassende Toolbox zur formalen Überprüfung tiefer neuraler Netzwerke | 模型核查.jl:用于正式核查深神经网络的综合工具箱 2407.01639v2 |
Authors (7): Tianhao Wei, Hanjiang Hu, Luca Marzari, Kai S. Yun, Peizhi Niu, Xusheng Luo, Changliu Liu
Deep Neural Networks (DNN) are crucial in approximating nonlinear functions across diverse applications, ranging from image classification to control. Verifying specific input-output properties can be a highly challenging task due to the lack of a single, self-contained framework that allows a complete range of verification types. To this end, we present \texttt{ModelVerification.jl (MV)}, the first comprehensive, cutting-edge toolbox that contains a suite of state-of-the-art methods for verifying different types of DNNs and safety specifications. This versatile toolbox is designed to empower developers and machine learning practitioners with robust tools for verifying and ensuring the trustworthiness of their DNN models.
nan
Article 747
Title@2025-07-20 (7): Simulation-Prior Independent Neural Unfolding Procedure
Title: Simulation-Prior Independent Neural Unfolding Procedure | Simulation-Prior Unabhängiges Neural-Entfaltungsverfahren | 模拟 - 模拟 - 模拟 - 原始 - 独立神经元集载程序 2507.15084v1 |
Authors (5): Anja Butter, Theo Heimel, Nathan Huetsch, Michael Kagan, Tilman Plehn
Machine learning allows unfolding high-dimensional spaces without binning at the LHC. The new SPINUP method extracts the unfolded distribution based on a neural network encoding the forward mapping, making it independent of the prior from the simulated training data. It is made efficient through neural importance sampling, and ensembling can be used to estimate the effect of information loss in the forward process. We showcase SPINUP for unfolding detector effects on jet substructure observables and for unfolding to parton level of associated Higgs and single-top production.
nan
Article 748
Title@2025-07-20 (7): Beyond Win Rates: A Clustering-Based Approach to Character Balance Analysis in Team-Based Games
Title: Beyond Win Rates: A Clustering-Based Approach to Character Balance Analysis in Team-Based Games | Beyond Win Rates: Ein Clustering-basierter Ansatz zur Charakter-Balance-Analyse in Team-Based Games | 超越赢率:在团队运动会中采用基于集群办法进行性平衡分析 2502.01250v2 |
Authors (1): Haokun Zhou
Character diversity in competitive games, while enriching gameplay, often introduces balance challenges that can negatively impact player experience and strategic depth. Traditional balance assessments rely on aggregate metrics like win rates and pick rates, which offer limited insight into the intricate dynamics of team-based games and nuanced character roles. This paper proposes a novel clustering-based methodology to analyze character balance, leveraging in-game data from Valorant to account for team composition influences and reveal latent character roles. By applying hierarchical agglomerative clustering with Jensen-Shannon Divergence to professional match data from the Valorant Champions Tour 2022, our approach identifies distinct clusters of agents exhibiting similar co-occurrence patterns within team compositions. This method not only complements existing quantitative metrics but also provides a more holistic and interpretable perspective on character synergies and potential imbalances, offering game developers a valuable tool for informed and context-aware balance adjustments.
nan
Article 749
Title@2025-07-20 (7): PGT-I: Scaling Spatiotemporal GNNs with Memory-Efficient Distributed Training
Title: PGT-I: Scaling Spatiotemporal GNNs with Memory-Efficient Distributed Training | PGT-I: Scaling Spatiotemporal GNNs mit speichereffizienter verteilter Ausbildung | PGT-I: 具有记忆有效分配培训的Splap Spatotomotial GNNs 2507.11683v2 |
Authors (7): Seth Ockerman, Amal Gueroudji, Tanwi Mallick, Yixuan He, Line Pouchard, Robert Ross, Shivaram Venkataraman
Spatiotemporal graph neural networks (ST-GNNs) are powerful tools for modeling spatial and temporal data dependencies. However, their applications have been limited primarily to small-scale datasets because of memory constraints. While distributed training offers a solution, current frameworks lack support for spatiotemporal models and overlook the properties of spatiotemporal data. Informed by a scaling study on a large-scale workload, we present PyTorch Geometric Temporal Index (PGT-I), an extension to PyTorch Geometric Temporal that integrates distributed data parallel training and two novel strategies: index-batching and distributed-index-batching. Our index techniques exploit spatiotemporal structure to construct snapshots dynamically at runtime, significantly reducing memory overhead, while distributed-index-batching extends this approach by enabling scalable processing across multiple GPUs. Our techniques enable the first-ever training of an ST-GNN on the entire PeMS dataset without graph partitioning, reducing peak memory usage by up to 89% and achieving up to a 11.78x speedup over standard DDP with 128 GPUs.
nan
Article 750
Title@2025-07-20 (7): Robust Control with Gradient Uncertainty
Title: Robust Control with Gradient Uncertainty | Robuste Steuerung mit gradienter Unsicherheit | 带渐变不确定性的强力控制 2507.15082v1 |
Authors (1): Qian Qi
We introduce a novel extension to robust control theory that explicitly addresses uncertainty in the value function’s gradient, a form of uncertainty endemic to applications like reinforcement learning where value functions are approximated. We formulate a zero-sum dynamic game where an adversary perturbs both system dynamics and the value function gradient, leading to a new, highly nonlinear partial differential equation: the Hamilton-Jacobi-Bellman-Isaacs Equation with Gradient Uncertainty (GU-HJBI). We establish its well-posedness by proving a comparison principle for its viscosity solutions under a uniform ellipticity condition. Our analysis of the linear-quadratic (LQ) case yields a key insight: we prove that the classical quadratic value function assumption fails for any non-zero gradient uncertainty, fundamentally altering the problem structure. A formal perturbation analysis characterizes the non-polynomial correction to the value function and the resulting nonlinearity of the optimal control law, which we validate with numerical studies. Finally, we bridge theory to practice by proposing a novel Gradient-Uncertainty-Robust Actor-Critic (GURAC) algorithm, accompanied by an empirical study demonstrating its effectiveness in stabilizing training. This work provides a new direction for robust control, holding significant implications for fields where function approximation is common, including reinforcement learning and computational finance.
nan
Article 751
Title@2025-07-20 (7): Knowing When to Quit: Probabilistic Early Exits for Speech Separation
Title: Knowing When to Quit: Probabilistic Early Exits for Speech Separation | Zu wissen, wann man aufhören soll: probabilistische frühe Ausgänge für Sprachtrennung | 了解何时退出:语言分离的概率早期出场 2507.09768v2 |
Authors (7): Kenny Falkær Olsen, Mads Østergaard, Karl Ulbæk, Søren Føns Nielsen, Rasmus Malik Høegh Lindrup, Bjørn Sand Jensen, Morten Mørup
In recent years, deep learning-based single-channel speech separation has improved considerably, in large part driven by increasingly compute- and parameter-efficient neural network architectures. Most such architectures are, however, designed with a fixed compute and parameter budget, and consequently cannot scale to varying compute demands or resources, which limits their use in embedded and heterogeneous devices such as mobile phones and hearables. To enable such use-cases we design a neural network architecture for speech separation capable of early-exit, and we propose an uncertainty-aware probabilistic framework to jointly model the clean speech signal and error variance which we use to derive probabilistic early-exit conditions in terms of desired signal-to-noise ratios. We evaluate our methods on both speech separation and enhancement tasks, and we show that a single early-exit model can be competitive with state-of-the-art models trained at many compute and parameter budgets. Our framework enables fine-grained dynamic compute-scaling of speech separation networks while achieving state-of-the-art performance and interpretable exit conditions.
nan
Article 752
Title@2025-07-20 (7): Isotonic Quantile Regression Averaging for uncertainty quantification of electricity price forecasts
Title: Isotonic Quantile Regression Averaging for uncertainty quantification of electricity price forecasts | Isotonische Quantile Regression Mittelung der Unsicherheit Quantifizierung der Strompreisprognosen | 电价预测量化不确定性的误差 2507.15079v1 |
Authors (2): Arkadiusz Lipiecki, Bartosz Uniejewski
Quantifying the uncertainty of forecasting models is essential to assess and mitigate the risks associated with data-driven decisions, especially in volatile domains such as electricity markets. Machine learning methods can provide highly accurate electricity price forecasts, critical for informing the decisions of market participants. However, these models often lack uncertainty estimates, which limits the ability of decision makers to avoid unnecessary risks. In this paper, we propose a novel method for generating probabilistic forecasts from ensembles of point forecasts, called Isotonic Quantile Regression Averaging (iQRA). Building on the established framework of Quantile Regression Averaging (QRA), we introduce stochastic order constraints to improve forecast accuracy, reliability, and computational costs. In an extensive forecasting study of the German day-ahead electricity market, we show that iQRA consistently outperforms state-of-the-art postprocessing methods in terms of both reliability and sharpness. It produces well-calibrated prediction intervals across multiple confidence levels, providing superior reliability to all benchmark methods, particularly coverage-based conformal prediction. In addition, isotonic regularization decreases the complexity of the quantile regression problem and offers a hyperparameter-free approach to variable selection.
nan
Article 753
Title@2025-07-20 (7): Reinforcement Learning for Flow-Matching Policies
Title: Reinforcement Learning for Flow-Matching Policies | Verstärktes Lernen für Flow-Matching-Politiken | 流动派接政策强化学习 2507.15073v1 |
Authors (3): Samuel Pfrommer, Yixiao Huang, Somayeh Sojoudi
Flow-matching policies have emerged as a powerful paradigm for generalist robotics. These models are trained to imitate an action chunk, conditioned on sensor observations and textual instructions. Often, training demonstrations are generated by a suboptimal policy, such as a human operator. This work explores training flow-matching policies via reinforcement learning to surpass the original demonstration policy performance. We particularly note minimum-time control as a key application and present a simple scheme for variable-horizon flow-matching planning. We then introduce two families of approaches: a simple Reward-Weighted Flow Matching (RWFM) scheme and a Group Relative Policy Optimization (GRPO) approach with a learned reward surrogate. Our policies are trained on an illustrative suite of simulated unicycle dynamics tasks, and we show that both approaches dramatically improve upon the suboptimal demonstrator performance, with the GRPO approach in particular generally incurring between $50\%$ and $85\%$ less cost than a naive Imitation Learning Flow Matching (ILFM) approach.
nan
Article 754
Title@2025-07-20 (7): ROBAD: Robust Adversary-aware Local-Global Attended Bad Actor Detection Sequential Model
Title: ROBAD: Robust Adversary-aware Local-Global Attended Bad Actor Detection Sequential Model | ROBAD: Robustes Adversary-aware Local-Global besucht Bad Actor Detection Sequential Model | ROBAD: 强力反逆觉觉悟当地-全球 出席的不良行为器检测序列模型 2507.15067v1 |
Authors (3): Bing He, Mustaque Ahamad, Srijan Kumar
Detecting bad actors is critical to ensure the safety and integrity of internet platforms. Several deep learning-based models have been developed to identify such users. These models should not only accurately detect bad actors, but also be robust against adversarial attacks that aim to evade detection. However, past deep learning-based detection models do not meet the robustness requirement because they are sensitive to even minor changes in the input sequence. To address this issue, we focus on (1) improving the model understanding capability and (2) enhancing the model knowledge such that the model can recognize potential input modifications when making predictions. To achieve these goals, we create a novel transformer-based classification model, called ROBAD (RObust adversary-aware local-global attended Bad Actor Detection model), which uses the sequence of user posts to generate user embedding to detect bad actors. Particularly, ROBAD first leverages the transformer encoder block to encode each post bidirectionally, thus building a post embedding to capture the local information at the post level. Next, it adopts the transformer decoder block to model the sequential pattern in the post embeddings by using the attention mechanism, which generates the sequence embedding to obtain the global information at the sequence level. Finally, to enrich the knowledge of the model, embeddings of modified sequences by mimicked attackers are fed into a contrastive-learning-enhanced classification layer for sequence prediction. In essence, by capturing the local and global information (i.e., the post and sequence information) and leveraging the mimicked behaviors of bad actors in training, ROBAD can be robust to adversarial attacks. Extensive experiments on Yelp and Wikipedia datasets show that ROBAD can effectively detect bad actors when under state-of-the-art adversarial attacks.
nan
Article 755
Title@2025-07-20 (7): Time-RA: Towards Time Series Reasoning for Anomaly with LLM Feedback
Title: Time-RA: Towards Time Series Reasoning for Anomaly with LLM Feedback | Time-RA: Auf dem Weg zu einer Zeitreihe, die mit LLM Feedback zu Anomalie führt | 时间-RA:采用LLM反馈办法为异常情况寻找时间序列理由 2507.15066v1 |
Authors (9): Yiyuan Yang, Zichuan Liu, Lei Song, Kai Ying, Zhiguang Wang, Tom Bamford, Svitlana Vyetrenko, Jiang Bian, Qingsong Wen
Time series anomaly detection is critical across various domains, yet current approaches often limit analysis to mere binary anomaly classification without detailed categorization or further explanatory reasoning. To address these limitations, we propose a novel task, Time-series Reasoning for Anomaly (Time-RA) that transforms classical time series anomaly detection from a discriminative into a generative, reasoning-intensive task leveraging Large Language Models (LLMs). Also, we introduce the first real-world multimodal benchmark dataset, RATs40K, explicitly annotated for anomaly reasoning, comprising approximately 40,000 samples across 10 real-world domains. Each sample includes numeric time series data, contextual text information, and visual representations, each annotated with fine-grained categories (14 types for univariate anomalies and 6 for multivariate anomalies) and structured explanatory reasoning. We develop a sophisticated annotation framework utilizing ensemble-generated labels refined through GPT-4-driven feedback, ensuring accuracy and interpretability. Extensive benchmarking of LLMs and multimodal LLMs demonstrates the capabilities and limitations of current models, highlighting the critical role of supervised fine-tuning. Our dataset and task pave the way for significant advancements in interpretable time series anomaly detection and reasoning.
nan
Article 756
Title@2025-07-20 (7): Quantum Annealing for Machine Learning: Applications in Feature Selection, Instance Selection, and Clustering
Title: Quantum Annealing for Machine Learning: Applications in Feature Selection, Instance Selection, and Clustering | Quantenanaling für maschinelles Lernen: Anwendungen in der Feature Selection, Instance Selection und Clustering | 机器学习的保密:特写选择、选案和集群方面的应用 2507.15063v1 |
Authors (4): Chloe Pomeroy, Aleksandar Pramov, Karishma Thakrar, Lakshmi Yendapalli
This paper explores the applications of quantum annealing (QA) and classical simulated annealing (SA) to a suite of combinatorial optimization problems in machine learning, namely feature selection, instance selection, and clustering. We formulate each task as a Quadratic Unconstrained Binary Optimization (QUBO) problem and implement both quantum and classical solvers to compare their effectiveness. For feature selection, we propose several QUBO configurations that balance feature importance and redundancy, showing that quantum annealing (QA) produces solutions that are computationally more efficient. In instance selection, we propose a few novel heuristics for instance-level importance measures that extend existing methods. For clustering, we embed a classical-to-quantum pipeline, using classical clustering followed by QUBO-based medoid refinement, and demonstrate consistent improvements in cluster compactness and retrieval metrics. Our results suggest that QA can be a competitive and efficient tool for discrete machine learning optimization, even within the constraints of current quantum hardware.
nan
Article 757
Title@2025-07-20 (7): Touch in the Wild: Learning Fine-Grained Manipulation with a Portable Visuo-Tactile Gripper
Title: Touch in the Wild: Learning Fine-Grained Manipulation with a Portable Visuo-Tactile Gripper | Touch in the Wild: Fine-Grained Manipulation mit einem tragbaren Visuo-Taktilen Greifer lernen | 野生触摸:学习用便携活性手动Grippper 进行精密的操纵 2507.15062v1 |
Authors (3): Xinyue Zhu, Binghao Huang, Yunzhu Li
Handheld grippers are increasingly used to collect human demonstrations due to their ease of deployment and versatility. However, most existing designs lack tactile sensing, despite the critical role of tactile feedback in precise manipulation. We present a portable, lightweight gripper with integrated tactile sensors that enables synchronized collection of visual and tactile data in diverse, real-world, and in-the-wild settings. Building on this hardware, we propose a cross-modal representation learning framework that integrates visual and tactile signals while preserving their distinct characteristics. The learning procedure allows the emergence of interpretable representations that consistently focus on contacting regions relevant for physical interactions. When used for downstream manipulation tasks, these representations enable more efficient and effective policy learning, supporting precise robotic manipulation based on multimodal feedback. We validate our approach on fine-grained tasks such as test tube insertion and pipette-based fluid transfer, demonstrating improved accuracy and robustness under external disturbances. Our project page is available at https://binghao-huang.github.io/touch_in_the_wild/ .
nan
Article 758
Title@2025-07-20 (7): LibLMFuzz: LLM-Augmented Fuzz Target Generation for Black-box Libraries
Title: LibLMFuzz: LLM-Augmented Fuzz Target Generation for Black-box Libraries | LibLMFuzz: LLM-Augmented Fuzz Target Generation für Black-Box-Bibliotheken | LibLMFuzz: 黑盒图书馆LLM- 推荐的模糊目标生成 2507.15058v1 |
Authors (2): Ian Hardgrove, John D. Hastings
A fundamental problem in cybersecurity and computer science is determining whether a program is free of bugs and vulnerabilities. Fuzzing, a popular approach to discovering vulnerabilities in programs, has several advantages over alternative strategies, although it has investment costs in the form of initial setup and continuous maintenance. The choice of fuzzing is further complicated when only a binary library is available, such as the case of closed-source and proprietary software. In response, we introduce LibLMFuzz, a framework that reduces costs associated with fuzzing closed-source libraries by pairing an agentic Large Language Model (LLM) with a lightweight tool-chain (disassembler/compiler/fuzzer) to autonomously analyze stripped binaries, plan fuzz strategies, generate drivers, and iteratively self-repair build or runtime errors. Tested on four widely-used Linux libraries, LibLMFuzz produced syntactically correct drivers for all 558 fuzz-able API functions, achieving 100% API coverage with no human intervention. Across the 1601 synthesized drivers, 75.52% were nominally correct on first execution. The results show that LLM-augmented middleware holds promise in reducing the costs of fuzzing black box components and provides a foundation for future research efforts. Future opportunities exist for research in branch coverage.
nan
Article 759
Title@2025-07-20 (7): Frequency-Aligned Knowledge Distillation for Lightweight Spatiotemporal Forecasting
Title: Frequency-Aligned Knowledge Distillation for Lightweight Spatiotemporal Forecasting | Frequenzabhängige Wissensdestillation für leichte Spatiotemporale Vorhersagen | 轻量度对时预报知识蒸馏 2507.02939v2 |
Authors (8): Yuqi Li, Chuanguang Yang, Hansheng Zeng, Zeyu Dong, Zhulin An, Yongjun Xu, Yingli Tian, Hao Wu
Spatiotemporal forecasting tasks, such as traffic flow, combustion dynamics, and weather forecasting, often require complex models that suffer from low training efficiency and high memory consumption. This paper proposes a lightweight framework, Spectral Decoupled Knowledge Distillation (termed SDKD), which transfers the multi-scale spatiotemporal representations from a complex teacher model to a more efficient lightweight student network. The teacher model follows an encoder-latent evolution-decoder architecture, where its latent evolution module decouples high-frequency details and low-frequency trends using convolution and Transformer (global low-frequency modeler). However, the multi-layer convolution and deconvolution structures result in slow training and high memory usage. To address these issues, we propose a frequency-aligned knowledge distillation strategy, which extracts multi-scale spectral features from the teacher’s latent space, including both high and low frequency components, to guide the lightweight student model in capturing both local fine-grained variations and global evolution patterns. Experimental results show that SDKD significantly improves performance, achieving reductions of up to 81.3% in MSE and in MAE 52.3% on the Navier-Stokes equation dataset. The framework effectively captures both high-frequency variations and long-term trends while reducing computational complexity. Our codes are available at https://github.com/itsnotacie/SDKD
nan
Article 760
Title@2025-07-20 (7): Integrating Reason-Based Moral Decision-Making in the Reinforcement Learning Architecture
Title: Integrating Reason-Based Moral Decision-Making in the Reinforcement Learning Architecture | Integrieren von reason-based Moral Decision-Making in die Lernarchitektur der Stärkung | 将合理道德决策纳入强化学习架构 2507.15895v1 |
Authors (1): Lisa Dargasz
Reinforcement Learning is a machine learning methodology that has demonstrated strong performance across a variety of tasks. In particular, it plays a central role in the development of artificial autonomous agents. As these agents become increasingly capable, market readiness is rapidly approaching, which means those agents, for example taking the form of humanoid robots or autonomous cars, are poised to transition from laboratory prototypes to autonomous operation in real-world environments. This transition raises concerns leading to specific requirements for these systems - among them, the requirement that they are designed to behave ethically. Crucially, research directed toward building agents that fulfill the requirement to behave ethically - referred to as artificial moral agents(AMAs) - has to address a range of challenges at the intersection of computer science and philosophy. This study explores the development of reason-based artificial moral agents (RBAMAs). RBAMAs are build on an extension of the reinforcement learning architecture to enable moral decision-making based on sound normative reasoning, which is achieved by equipping the agent with the capacity to learn a reason-theory - a theory which enables it to process morally relevant propositions to derive moral obligations - through case-based feedback. They are designed such that they adapt their behavior to ensure conformance to these obligations while they pursue their designated tasks. These features contribute to the moral justifiability of the their actions, their moral robustness, and their moral trustworthiness, which proposes the extended architecture as a concrete and deployable framework for the development of AMAs that fulfills key ethical desiderata. This study presents a first implementation of an RBAMA and demonstrates the potential of RBAMAs in initial experiments.
nan
Article 761
Title@2025-07-20 (7): OpenBreastUS: Benchmarking Neural Operators for Wave Imaging Using Breast Ultrasound Computed Tomography
Title: OpenBreastUS: Benchmarking Neural Operators for Wave Imaging Using Breast Ultrasound Computed Tomography | OpenBreastUS: Benchmarking Neural Operators für Wave Imaging mit Breast Ultrasound Computed Tomography | Open BrestUS:使用乳房超声波计算地形学进行波成像基准神经操作员 2507.15035v1 |
Authors (11): Zhijun Zeng, Youjia Zheng, Hao Hu, Zeyuan Dong, Yihang Zheng, Xinliang Liu, Jinzhuo Wang, Zuoqiang Shi, Linfeng Zhang, Yubing Li, He Sun
Accurate and efficient simulation of wave equations is crucial in computational wave imaging applications, such as ultrasound computed tomography (USCT), which reconstructs tissue material properties from observed scattered waves. Traditional numerical solvers for wave equations are computationally intensive and often unstable, limiting their practical applications for quasi-real-time image reconstruction. Neural operators offer an innovative approach by accelerating PDE solving using neural networks; however, their effectiveness in realistic imaging is limited because existing datasets oversimplify real-world complexity. In this paper, we present OpenBreastUS, a large-scale wave equation dataset designed to bridge the gap between theoretical equations and practical imaging applications. OpenBreastUS includes 8,000 anatomically realistic human breast phantoms and over 16 million frequency-domain wave simulations using real USCT configurations. It enables a comprehensive benchmarking of popular neural operators for both forward simulation and inverse imaging tasks, allowing analysis of their performance, scalability, and generalization capabilities. By offering a realistic and extensive dataset, OpenBreastUS not only serves as a platform for developing innovative neural PDE solvers but also facilitates their deployment in real-world medical imaging problems. For the first time, we demonstrate efficient in vivo imaging of the human breast using neural operator solvers.
nan
Article 762
Title@2025-07-20 (7): The hunt for new pulsating ultraluminous X-ray sources: a clustering approach
Title: The hunt for new pulsating ultraluminous X-ray sources: a clustering approach | Die Jagd nach neuen pulsierenden ultrahellen Röntgenquellen: ein Clustering-Ansatz | 寻找新的脉动极光X光新来源:集群办法 2507.15032v1 |
Authors (8): Nicolò Oreste Pinciroli Vago, Roberta Amato, Matteo Imbrogno, GianLuca Israel, Andrea Belfiore, Konstantinos Kovlakas, Piero Fraternali, Mario Pasquato
The discovery of fast and variable coherent signals in a handful of ultraluminous X-ray sources (ULXs) testifies to the presence of super-Eddington accreting neutron stars, and drastically changed the understanding of the ULX class. Our capability of discovering pulsations in ULXs is limited, among others, by poor statistics. However, catalogues and archives of high-energy missions contain information which can be used to identify new candidate pulsating ULXs (PULXs). The goal of this research is to single out candidate PULXs among those ULXs which have not shown pulsations due to an unfavourable combination of factors. We applied an AI approach to an updated database of ULXs detected by XMM-Newton. We first used an unsupervised clustering algorithm to sort out sources with similar characteristics into two clusters. Then, the sample of known PULX observations has been used to set the separation threshold between the two clusters and to identify the one containing the new candidate PULXs. We found that only a few criteria are needed to assign the membership of an observation to one of the two clusters. The cluster of new candidate PULXs counts 85 unique sources for 355 observations, with $\sim$85% of these new candidates having multiple observations. A preliminary timing analysis found no new pulsations for these candidates. This work presents a sample of new candidate PULXs observed by XMM-Newton, the properties of which are similar (in a multi-dimensional phase space) to those of the known PULXs, despite the absence of pulsations in their light curves. While this result is a clear example of the predictive power of AI-based methods, it also highlights the need for high-statistics observational data to reveal coherent signals from the sources in this sample and thus validate the robustness of the approach.
nan
Article 763
Title@2025-07-20 (7): Integrating Newton’s Laws with deep learning for enhanced physics-informed compound flood modelling
Title: Integrating Newton’s Laws with deep learning for enhanced physics-informed compound flood modelling | Integration von Newtons Gesetzen mit tiefem Lernen für verbesserte Physik-informierte Mischflutmodellierung | 将牛顿法律与深层学习相结合,加强物理学知情复合物洪水建模 2507.15021v1 |
Authors (4): Soheil Radfar, Faezeh Maghsoodifar, Hamed Moftakhari, Hamid Moradkhani
Coastal communities increasingly face compound floods, where multiple drivers like storm surge, high tide, heavy rainfall, and river discharge occur together or in sequence to produce impacts far greater than any single driver alone. Traditional hydrodynamic models can provide accurate physics-based simulations but require substantial computational resources for real-time applications or risk assessments, while machine learning alternatives often sacrifice physical consistency for speed, producing unrealistic predictions during extreme events. This study addresses these challenges by developing ALPINE (All-in-one Physics Informed Neural Emulator), a physics-informed neural network (PINN) framework to enforce complete shallow water dynamics in compound flood modeling. Unlike previous approaches that implement partial constraints, our framework simultaneously enforces mass conservation and both momentum equations, ensuring full adherence to Newton’s laws throughout the prediction process. The model integrates a convolutional encoder-decoder architecture with ConvLSTM temporal processing, trained using a composite loss function that balances data fidelity with physics-based residuals. Using six historical storm events (four for training, one for validation, and one held-out for unseen testing), we observe substantial improvements over baseline neural networks. ALPINE reduces domain-averaged prediction errors and improves model skill metrics for water surface elevation and velocity components. Physics-informed constraints prove most valuable during peak storm intensity, when multiple flood drivers interact and reliable predictions matter most. This approach yields a physically consistent emulator capable of supporting compound-flood forecasting and large-scale risk analyses while preserving physical realism essential for coastal emergency management.
nan
Article 764
Title@2025-07-20 (7): Sampling Decisions
Title: Sampling Decisions | Stichprobenentscheidungen | 抽样决定 2503.14549v2 |
Authors (3): Michael Chertkov, Sungsoo Ahn, Hamidreza Behjoo
In this manuscript, we introduce a novel Decision Flow (DF) framework for sampling decisions from a target distribution while incorporating additional guidance from a prior sampler. DF can be viewed as an AI-driven algorithmic reincarnation of the Markov Decision Process (MDP) approach in stochastic optimal control. It extends the continuous-space, continuous-time Path Integral Diffusion sampling technique of [Behjoo, Chertkov 2025] to discrete time and space, while also generalizing the Generative Flow Network (GFN) framework of [Bengio, et al 2021]. In its most basic form an explicit formulation that does not require Neural Networks (NNs), DF leverages the linear solvability of the underlying MDP [Todorov, 2007] to adjust the transition probabilities of the prior sampler. The resulting Markov process is expressed as a convolution of the reverse-time Green’s function of the prior sampling with the target distribution. We illustrate the DF framework through an example of sampling from the Ising model – compare DF to Metropolis-Hastings to quantify its efficiency, discuss potential NN-based extensions, and outline how DF can enhance guided sampling across various applications.
nan
Article 765
Title@2025-07-20 (7): Can Mental Imagery Improve the Thinking Capabilities of AI Systems?
Title: Can Mental Imagery Improve the Thinking Capabilities of AI Systems? | Kann Mental Imagery die Denkfähigkeiten von KI-Systemen verbessern? | 精神形象能提高人工智能系统的思考能力吗? 2507.12555v2 |
Authors (1): Slimane Larabi
Although existing models can interact with humans and provide satisfactory responses, they lack the ability to act autonomously or engage in independent reasoning. Furthermore, input data in these models is typically provided as explicit queries, even when some sensory data is already acquired. In addition, AI agents, which are computational entities designed to perform tasks and make decisions autonomously based on their programming, data inputs, and learned knowledge, have shown significant progress. However, they struggle with integrating knowledge across multiple domains, unlike humans. Mental imagery plays a fundamental role in the brain’s thinking process, which involves performing tasks based on internal multisensory data, planned actions, needs, and reasoning capabilities. In this paper, we investigate how to integrate mental imagery into a machine thinking framework and how this could be beneficial in initiating the thinking process. Our proposed machine thinking framework integrates a Cognitive thinking unit supported by three auxiliary units: the Input Data Unit, the Needs Unit, and the Mental Imagery Unit. Within this framework, data is represented as natural language sentences or drawn sketches, serving both informative and decision-making purposes. We conducted validation tests for this framework, and the results are presented and discussed.
nan
Article 766
Title@2025-07-20 (7): Credit Risk Analysis for SMEs Using Graph Neural Networks in Supply Chain
Title: Credit Risk Analysis for SMEs Using Graph Neural Networks in Supply Chain | Kreditrisikoanalyse für KMU mit Hilfe von Graph Neural Networks in der Lieferkette | 利用供应链中图表神经网络的中小企业信贷风险分析 2507.07854v2 |
Authors (5): Zizhou Zhang, Qinyan Shen, Zhuohuan Hu, Qianying Liu, Huijie Shen
Small and Medium-sized Enterprises (SMEs) are vital to the modern economy, yet their credit risk analysis often struggles with scarce data, especially for online lenders lacking direct credit records. This paper introduces a Graph Neural Network (GNN)-based framework, leveraging SME interactions from transaction and social data to map spatial dependencies and predict loan default risks. Tests on real-world datasets from Discover and Ant Credit (23.4M nodes for supply chain analysis, 8.6M for default prediction) show the GNN surpasses traditional and other GNN baselines, with AUCs of 0.995 and 0.701 for supply chain mining and default prediction, respectively. It also helps regulators model supply chain disruption impacts on banks, accurately forecasting loan defaults from material shortages, and offers Federal Reserve stress testers key data for CCAR risk buffers. This approach provides a scalable, effective tool for assessing SME credit risk.
nan
Article 767
Title@2025-07-20 (7): Neural networks for bifurcation and linear stability analysis of steady states in partial differential equations
Title: Neural networks for bifurcation and linear stability analysis of steady states in partial differential equations | Neurale Netze zur Bifurkation und linearen Stabilitätsanalyse von Steady States in partiellen Differentialgleichungen | 以部分差异方程对稳定状态进行双向和线性稳定分析的神经网络 2407.19707v4 |
Authors (2): Muhammad Luthfi Shahab, Hadi Susanto
This research introduces an extended application of neural networks for solving nonlinear partial differential equations (PDEs). A neural network, combined with a pseudo-arclength continuation, is proposed to construct bifurcation diagrams from parameterized nonlinear PDEs. Additionally, a neural network approach is also presented for solving eigenvalue problems to analyze solution linear stability, focusing on identifying the largest eigenvalue. The effectiveness of the proposed neural network is examined through experiments on the Bratu equation and the Burgers equation. Results from a finite difference method are also presented as comparison. Varying numbers of grid points are employed in each case to assess the behavior and accuracy of both the neural network and the finite difference method. The experimental results demonstrate that the proposed neural network produces better solutions, generates more accurate bifurcation diagrams, has reasonable computational times, and proves effective for linear stability analysis.
nan
Article 768
Title@2025-07-20 (7): The Rise of AI Teammates in Software Engineering (SE) 3.0: How Autonomous Coding Agents Are Reshaping Software Engineering
Title: The Rise of AI Teammates in Software Engineering (SE) 3.0: How Autonomous Coding Agents Are Reshaping Software Engineering | Der Aufstieg von KI-Teamkollegen in der Software-Engineering (SE) 3.0: Wie autonome Coding-Agenten Software-Engineering umgestalten | AI软件工程(SE)3.0:自动编码代理人如何重组软件工程 2507.15003v1 |
Authors (3): Hao Li, Haoxiang Zhang, Ahmed E. Hassan
The future of software engineering–SE 3.0–is unfolding with the rise of AI teammates: autonomous, goal-driven systems collaborating with human developers. Among these, autonomous coding agents are especially transformative, now actively initiating, reviewing, and evolving code at scale. This paper introduces AIDev, the first large-scale dataset capturing how such agents operate in the wild. Spanning over 456,000 pull requests by five leading agents–OpenAI Codex, Devin, GitHub Copilot, Cursor, and Claude Code–across 61,000 repositories and 47,000 developers, AIDev provides an unprecedented empirical foundation for studying autonomous teammates in software development. Unlike prior work that has largely theorized the rise of AI-native software engineering, AIDev offers structured, open data to support research in benchmarking, agent readiness, optimization, collaboration modeling, and AI governance. The dataset includes rich metadata on PRs, authorship, review timelines, code changes, and integration outcomes–enabling exploration beyond synthetic benchmarks like SWE-bench. For instance, although agents often outperform humans in speed, their PRs are accepted less frequently, revealing a trust and utility gap. Furthermore, while agents accelerate code submission–one developer submitted as many PRs in three days as they had in three years–these are structurally simpler (via code complexity metrics). We envision AIDev as a living resource: extensible, analyzable, and ready for the SE and AI communities. Grounding SE 3.0 in real-world evidence, AIDev enables a new generation of research into AI-native workflows and supports building the next wave of symbiotic human-AI collaboration. The dataset is publicly available at https://github.com/SAILResearch/AI_Teammates_in_SE3. > AI Agent, Agentic AI, Coding Agent, Agentic Coding, Software Engineering Agent
nan
Article 769
Title@2025-07-20 (7): Clustered Federated Learning for Generalizable FDIA Detection in Smart Grids with Heterogeneous Data
Title: Clustered Federated Learning for Generalizable FDIA Detection in Smart Grids with Heterogeneous Data | Clustered Federated Learning for Generalizable FDA Detection in Smart Grids with Heterogenous Data | 具有异种数据的智能网格中的探测 2507.14999v1 |
Authors (5): Yunfeng Li, Junhong Liu, Zhaohui Yang, Guofu Liao, Chuyun Zhang
False Data Injection Attacks (FDIAs) pose severe security risks to smart grids by manipulating measurement data collected from spatially distributed devices such as SCADA systems and PMUs. These measurements typically exhibit Non-Independent and Identically Distributed (Non-IID) characteristics across different regions, which significantly challenges the generalization ability of detection models. Traditional centralized training approaches not only face privacy risks and data sharing constraints but also incur high transmission costs, limiting their scalability and deployment feasibility. To address these issues, this paper proposes a privacy-preserving federated learning framework, termed Federated Cluster Average (FedClusAvg), designed to improve FDIA detection in Non-IID and resource-constrained environments. FedClusAvg incorporates cluster-based stratified sampling and hierarchical communication (client-subserver-server) to enhance model generalization and reduce communication overhead. By enabling localized training and weighted parameter aggregation, the algorithm achieves accurate model convergence without centralizing sensitive data. Experimental results on benchmark smart grid datasets demonstrate that FedClusAvg not only improves detection accuracy under heterogeneous data distributions but also significantly reduces communication rounds and bandwidth consumption. This work provides an effective solution for secure and efficient FDIA detection in large-scale distributed power systems.
nan
Article 770
Title@2025-07-20 (7): Language Integration in Fine-Tuning Multimodal Large Language Models for Image-Based Regression
Title: Language Integration in Fine-Tuning Multimodal Large Language Models for Image-Based Regression | Sprachintegration in multimodale Großsprachenmodelle für die Bild-basierte Regression | 以图像为基础的倒退的精细调整多式大语言模型中的语言融合 2507.14997v1 |
Authors (4): Roy H. Jennings, Genady Paikin, Roy Shaul, Evgeny Soloveichik
Multimodal Large Language Models (MLLMs) show promise for image-based regression tasks, but current approaches face key limitations. Recent methods fine-tune MLLMs using preset output vocabularies and generic task-level prompts (e.g., “How would you rate this image?”), assuming this mimics human rating behavior. Our analysis reveals these approaches provide no benefit over image-only training. Models using preset vocabularies and generic prompts perform equivalently to image-only models, failing to leverage semantic understanding from textual input. We propose Regression via Transformer-Based Classification (RvTC), which replaces vocabulary-constrained classification with a flexible bin-based approach. Unlike approaches that address discretization errors through complex distributional modeling, RvTC eliminates manual vocabulary crafting through straightforward bin increase, achieving state-of-the-art performance on four image assessment datasets using only images. More importantly, we demonstrate that data-specific prompts dramatically improve performance. Unlike generic task descriptions, prompts containing semantic information about specific images enable MLLMs to leverage cross-modal understanding. On the AVA dataset, adding challenge titles to prompts improves correlations from 0.83 to 0.90, a new state-of-the-art. We demonstrate through empirical evidence from the AVA and AGIQA-3k datasets that MLLMs benefit from semantic prompt information surpassing mere statistical biases. This underscores the importance of incorporating meaningful textual context in multimodal regression tasks.
nan
Article 771
Title@2025-07-20 (7): Greedy Low-Rank Gradient Compression for Distributed Learning with Convergence Guarantees
Title: Greedy Low-Rank Gradient Compression for Distributed Learning with Convergence Guarantees | Greedy Low-Rank Gradient Compression für verteiltes Lernen mit Konvergenzgarantien | 利用聚合担保分配学习的贪婪低频梯度压缩 2507.08784v2 |
Authors (5): Chuyan Chen, Yutong He, Pengrui Li, Weichen Jia, Kun Yuan
Distributed optimization is pivotal for large-scale signal processing and machine learning, yet communication overhead remains a major bottleneck. Low-rank gradient compression, in which the transmitted gradients are approximated by low-rank matrices to reduce communication, offers a promising remedy. Existing methods typically adopt either randomized or greedy compression strategies: randomized approaches project gradients onto randomly chosen subspaces, introducing high variance and degrading empirical performance; greedy methods select the most informative subspaces, achieving strong empirical results but lacking convergence guarantees. To address this gap, we propose GreedyLore–the first Greedy Low-Rank gradient compression algorithm for distributed learning with rigorous convergence guarantees. GreedyLore incorporates error feedback to correct the bias introduced by greedy compression and introduces a semi-lazy subspace update that ensures the compression operator remains contractive throughout all iterations. With these techniques, we prove that GreedyLore achieves a convergence rate of $\mathcal{O}(\sigma/\sqrt{NT} + 1/T)$ under standard optimizers such as MSGD and Adam–marking the first linear speedup convergence rate for low-rank gradient compression. Extensive experiments are conducted to validate our theoretical findings.
nan
Article 772
Title@2025-07-20 (7): TD-Interpreter: Enhancing the Understanding of Timing Diagrams with Visual-Language Learning
Title: TD-Interpreter: Enhancing the Understanding of Timing Diagrams with Visual-Language Learning | TD-Interpreter: Mit Visual-Language-Lernen das Verständnis von Timing-Diagrammen verbessern | TD-解释:用视觉语言学习增进对时间图的了解 2507.16844v1 |
Authors (7): Jie He, Vincent Theo Willem Kenbeek, Zhantao Yang, Meixun Qu, Ezio Bartocci, Dejan Ničković, Radu Grosu
We introduce TD-Interpreter, a specialized ML tool that assists engineers in understanding complex timing diagrams (TDs), originating from a third party, during their design and verification process. TD-Interpreter is a visual question-answer environment which allows engineers to input a set of TDs and ask design and verification queries regarding these TDs. We implemented TD-Interpreter with multimodal learning by fine-tuning LLaVA, a lightweight 7B Multimodal Large Language Model (MLLM). To address limited training data availability, we developed a synthetic data generation workflow that aligns visual information with its textual interpretation. Our experimental evaluation demonstrates the usefulness of TD-Interpreter which outperformed untuned GPT-4o by a large margin on the evaluated benchmarks.
nan
Article 773
Title@2025-07-20 (7): AlphaAlign: Incentivizing Safety Alignment with Extremely Simplified Reinforcement Learning
Title: AlphaAlign: Incentivizing Safety Alignment with Extremely Simplified Reinforcement Learning | AlphaAlign: Förderung der Sicherheitsausrichtung mit extrem vereinfachtem Verstärkungslernen | 字母名称:以极其简化的强化学习方式激励安全调整 2507.14987v1 |
Authors (7): Yi Zhang, An Zhang, XiuYu Zhang, Leheng Sheng, Yuxin Chen, Zhenkai Liang, Xiang Wang
Large language models (LLMs), despite possessing latent safety understanding from their vast pretraining data, remain vulnerable to generating harmful content and exhibit issues such as over-refusal and utility degradation after safety alignment. Current safety alignment methods often result in superficial refusal shortcuts or rely on intensive supervision for reasoning-based approaches, failing to fully leverage the model’s intrinsic safety self-awareness. We propose \textbf{AlphaAlign}, a simple yet effective pure reinforcement learning (RL) framework with verifiable safety reward designed to incentivize this latent safety awareness through proactive safety reasoning.} AlphaAlign employs a dual-reward system: a verifiable safety reward encourages correctly formatted and explicitly justified refusals for harmful queries while penalizing over-refusals, and a normalized helpfulness reward guides high-quality responses to benign inputs. This allows the model to develop proactive safety reasoning capabilities without depending on supervised safety-specific reasoning data. AlphaAlign demonstrates three key advantages: (1) Simplicity and efficiency, requiring only binary prompt safety labels and minimal RL steps for substantial improvements. (2) Breaking the safety-utility trade-off, by enhancing refusal of harmful content and reducing over-refusals, while simultaneously maintaining or even improving general task performance and robustness to unseen jailbreaks. (3) Deep alignment, fostering proactive safety reasoning that generates explicit safety rationales rather than relying on shallow refusal patterns.
nan
Article 774
Title@2025-07-20 (7): FedWCM: Unleashing the Potential of Momentum-based Federated Learning in Long-Tailed Scenarios
Title: FedWCM: Unleashing the Potential of Momentum-based Federated Learning in Long-Tailed Scenarios | FedWCM: Entfesseln des Potenzials von Momentum-basiertem Föderierten Lernen in langanhaltenden Szenarien | FedWCM:在长期失败情况下释放基于动力的联邦学习潜力 2507.14980v1 |
Authors (8): Tianle Li, Yongzhi Huang, Linshan Jiang, Qipeng Xie, Chang Liu, Wenfeng Du, Lu Wang, Kaishun Wu
Federated Learning (FL) enables decentralized model training while preserving data privacy. Despite its benefits, FL faces challenges with non-identically distributed (non-IID) data, especially in long-tailed scenarios with imbalanced class samples. Momentum-based FL methods, often used to accelerate FL convergence, struggle with these distributions, resulting in biased models and making FL hard to converge. To understand this challenge, we conduct extensive investigations into this phenomenon, accompanied by a layer-wise analysis of neural network behavior. Based on these insights, we propose FedWCM, a method that dynamically adjusts momentum using global and per-round data to correct directional biases introduced by long-tailed distributions. Extensive experiments show that FedWCM resolves non-convergence issues and outperforms existing methods, enhancing FL’s efficiency and effectiveness in handling client heterogeneity and data imbalance.
nan
Article 775
Title@2025-07-20 (7): A Comparative Analysis of Statistical and Machine Learning Models for Outlier Detection in Bitcoin Limit Order Books
Title: A Comparative Analysis of Statistical and Machine Learning Models for Outlier Detection in Bitcoin Limit Order Books | Eine vergleichende Analyse statistischer und maschineller Lernmodelle zur Erkennung von Ausreißern in Bitcoin Limit Order Books | Bittcoin限制单书中用于外部探测的统计和机器学习模型比较分析 2507.14960v1 |
Authors (1): Ivan Letteri
The detection of outliers within cryptocurrency limit order books (LOBs) is of paramount importance for comprehending market dynamics, particularly in highly volatile and nascent regulatory environments. This study conducts a comprehensive comparative analysis of robust statistical methods and advanced machine learning techniques for real-time anomaly identification in cryptocurrency LOBs. Within a unified testing environment, named AITA Order Book Signal (AITA-OBS), we evaluate the efficacy of thirteen diverse models to identify which approaches are most suitable for detecting potentially manipulative trading behaviours. An empirical evaluation, conducted via backtesting on a dataset of 26,204 records from a major exchange, demonstrates that the top-performing model, Empirical Covariance (EC), achieves a 6.70% gain, significantly outperforming a standard Buy-and-Hold benchmark. These findings underscore the effectiveness of outlier-driven strategies and provide insights into the trade-offs between model complexity, trade frequency, and performance. This study contributes to the growing corpus of research on cryptocurrency market microstructure by furnishing a rigorous benchmark of anomaly detection models and highlighting their potential for augmenting algorithmic trading and risk management.
nan
Article 776
Title@2025-07-20 (7): FullRecall: A Semantic Search-Based Ranking Approach for Maximizing Recall in Patent Retrieval
Title: FullRecall: A Semantic Search-Based Ranking Approach for Maximizing Recall in Patent Retrieval | FullRecall: Ein semantischer Search-Based-Ranking-Ansatz zur Maximierung des Recalls im Patent Retrieval | 完全回想:在专利检索中最大限度地回想的语义搜索排名法 2507.14946v1 |
Authors (3): Amna Ali, Liyanage C. De Silva, Pg Emeroylariffion Abas
Patent examiners and inventors face significant pressure to verify the originality and non-obviousness of inventions, and the intricate nature of patent data intensifies the challenges of patent retrieval. Therefore, there is a pressing need to devise cutting-edge retrieval strategies that can reliably achieve the desired recall. This study introduces FullRecall, a novel patent retrieval approach that effectively manages the complexity of patent data while maintaining the reliability of relevance matching and maximising recall. It leverages IPC-guided knowledge to generate informative phrases, which are processed to extract key information in the form of noun phrases characterising the query patent under observation. From these, the top k keyphrases are selected to construct a query for retrieving a focused subset of the dataset. This initial retrieval step achieves complete recall, successfully capturing all relevant documents. To further refine the results, a ranking scheme is applied to the retrieved subset, reducing its size while maintaining 100% recall. This multi-phase process demonstrates an effective strategy for balancing precision and recall in patent retrieval tasks. Comprehensive experiments were conducted, and the results were compared with baseline studies, namely HRR2 [1] and ReQ-ReC [2]. The proposed approach yielded superior results, achieving 100% recall in all five test cases. However, HRR2[1] recall values across the five test cases were 10%, 25%, 33.3%, 0%, and 14.29%, while ReQ-ReC [2] showed 50% for the first test case, 25% for the second test case, and 0% for the third, fourth, and fifth test cases. The 100% recall ensures that no relevant prior art is overlooked, thereby strengthening the patent pre-filing and examination processes, hence reducing potential legal risks.
nan
Article 777
Title@2025-07-20 (7): Trustworthy Text-to-Image Diffusion Models: A Timely and Focused Survey
Title: Trustworthy Text-to-Image Diffusion Models: A Timely and Focused Survey | Vertrauenswürdige Text-zu-Bild-Diffusionsmodelle: Eine zeitgerechte und fokussierte Umfrage | 可信赖的文本到图像传播模型:及时和有重点的调查 2409.18214v2 |
Authors (9): Yi Zhang, Zhen Chen, Chih-Hong Cheng, Wenjie Ruan, Xiaowei Huang, Dezong Zhao, David Flynn, Siddartha Khastgir, Xingyu Zhao
Text-to-Image (T2I) Diffusion Models (DMs) have garnered widespread attention for their impressive advancements in image generation. However, their growing popularity has raised ethical and social concerns related to key non-functional properties of trustworthiness, such as robustness, fairness, security, privacy, factuality, and explainability, similar to those in traditional deep learning (DL) tasks. Conventional approaches for studying trustworthiness in DL tasks often fall short due to the unique characteristics of T2I DMs, e.g., the multi-modal nature. Given the challenge, recent efforts have been made to develop new methods for investigating trustworthiness in T2I DMs via various means, including falsification, enhancement, verification \& validation and assessment. However, there is a notable lack of in-depth analysis concerning those non-functional properties and means. In this survey, we provide a timely and focused review of the literature on trustworthy T2I DMs, covering a concise-structured taxonomy from the perspectives of property, means, benchmarks and applications. Our review begins with an introduction to essential preliminaries of T2I DMs, and then we summarise key definitions/metrics specific to T2I tasks and analyses the means proposed in recent literature based on these definitions/metrics. Additionally, we review benchmarks and domain applications of T2I DMs. Finally, we highlight the gaps in current research, discuss the limitations of existing methods, and propose future research directions to advance the development of trustworthy T2I DMs. Furthermore, we keep up-to-date updates in this field to track the latest developments and maintain our GitHub repository at: https://github.com/wellzline/Trustworthy_T2I_DMs
nan
Article 778
Title@2025-07-20 (7): SOC-DGL: Social Interaction Behavior Inspired Dual Graph Learning Framework for Drug-Target Interaction Identification
Title: SOC-DGL: Social Interaction Behavior Inspired Dual Graph Learning Framework for Drug-Target Interaction Identification | SOC-DGL: Social Interaction Behavior Inspired Dual Graph Learning Framework for Drug-Target Interaction Identification | SOC-DGL:由社会互动行为启发的药物目标互动识别双重图示学习框架 2506.01405v2 |
Authors (6): Xiang Zhao, Ruijie Li, Qiao Ning, Shikai Guo, Hui Li, Qian Ma
The identification of drug-target interactions (DTI) is critical for drug discovery and repositioning, as it reveals potential therapeutic uses of existing drugs, accelerating development and reducing costs. However, most existing models focus only on direct similarity in homogeneous graphs, failing to exploit the rich similarity in heterogeneous graphs. To address this gap, inspired by real-world social interaction behaviors, we propose SOC-DGL, which comprises two specialized modules: the Affinity-Driven Graph Learning (ADGL) module, learning global similarity through an affinity-enhanced drug-target graph, and the Equilibrium-Driven Graph Learning (EDGL) module, capturing higher-order similarity by amplifying the influence of even-hop neighbors using an even-polynomial graph filter based on balance theory. This dual approach enables SOC-DGL to effectively capture similarity information across multiple interaction scales within affinity and association matrices. To address the issue of imbalance in DTI datasets, we propose an adjustable imbalance loss function that adjusts the weight of negative samples by the parameter. Extensive experiments on four benchmark datasets demonstrate that SOC-DGL consistently outperforms existing state-of-the-art methods across both balanced and imbalanced scenarios. Moreover, SOC-DGL successfully predicts the top 9 drugs known to bind ABL1, and further analyzed the 10th drug, which has not been experimentally confirmed to interact with ABL1, providing supporting evidence for its potential binding.
nan
Article 779
Title@2025-07-20 (7): Measuring Leakage in Concept-Based Methods: An Information Theoretic Approach
Title: Measuring Leakage in Concept-Based Methods: An Information Theoretic Approach | Messung von Leckagen in konzeptbasierten Methoden: Ein informationenstheoretischer Ansatz | 衡量基于概念方法中的流失:信息理论方法 2504.09459v2 |
Authors (4): Mikael Makonnen, Moritz Vandenhirtz, Sonia Laguna, Julia E Vogt
Concept Bottleneck Models (CBMs) aim to enhance interpretability by structuring predictions around human-understandable concepts. However, unintended information leakage, where predictive signals bypass the concept bottleneck, compromises their transparency. This paper introduces an information-theoretic measure to quantify leakage in CBMs, capturing the extent to which concept embeddings encode additional, unintended information beyond the specified concepts. We validate the measure through controlled synthetic experiments, demonstrating its effectiveness in detecting leakage trends across various configurations. Our findings highlight that feature and concept dimensionality significantly influence leakage, and that classifier choice impacts measurement stability, with XGBoost emerging as the most reliable estimator. Additionally, preliminary investigations indicate that the measure exhibits the anticipated behavior when applied to soft joint CBMs, suggesting its reliability in leakage quantification beyond fully synthetic settings. While this study rigorously evaluates the measure in controlled synthetic experiments, future work can extend its application to real-world datasets.
nan
Article 780
Title@2025-07-20 (7): Old Rules in a New Game: Mapping Uncertainty Quantification to Quantum Machine Learning
Title: Old Rules in a New Game: Mapping Uncertainty Quantification to Quantum Machine Learning | Alte Regeln in einem neuen Spiel: Mapping Uncertainty Quantification to Quantum Machine Learning | 新游戏中的旧规则: 将不确定性量化成量子机器学习 2507.14919v1 |
Authors (3): Maximilian Wendlinger, Kilian Tscharke, Pascal Debus
One of the key obstacles in traditional deep learning is the reduction in model transparency caused by increasingly intricate model functions, which can lead to problems such as overfitting and excessive confidence in predictions. With the advent of quantum machine learning offering possible advances in computational power and latent space complexity, we notice the same opaque behavior. Despite significant research in classical contexts, there has been little advancement in addressing the black-box nature of quantum machine learning. Consequently, we approach this gap by building upon existing work in classical uncertainty quantification and initial explorations in quantum Bayesian modeling to theoretically develop and empirically evaluate techniques to map classical uncertainty quantification methods to the quantum machine learning domain. Our findings emphasize the necessity of leveraging classical insights into uncertainty quantification to include uncertainty awareness in the process of designing new quantum machine learning models.
nan
Article 781
Title@2025-07-20 (7): Interactive proofs for verifying (quantum) learning and testing
Title: Interactive proofs for verifying (quantum) learning and testing | Interaktive Nachweise für das (Quantum-)Lernen und Testen | 用于核实(量表)学习和测试的交互式证明 2410.23969v2 |
Authors (6): Matthias C. Caro, Jens Eisert, Marcel Hinsche, Marios Ioannou, Alexander Nietner, Ryan Sweke
We consider the problem of testing and learning from data in the presence of resource constraints, such as limited memory or weak data access, which place limitations on the efficiency and feasibility of testing or learning. In particular, we ask the following question: Could a resource-constrained learner/tester use interaction with a resource-unconstrained but untrusted party to solve a learning or testing problem more efficiently than they could without such an interaction? In this work, we answer this question both abstractly and for concrete problems, in two complementary ways: For a wide variety of scenarios, we prove that a resource-constrained learner cannot gain any advantage through classical interaction with an untrusted prover. As a special case, we show that for the vast majority of testing and learning problems in which quantum memory is a meaningful resource, a memory-constrained quantum algorithm cannot overcome its limitations via classical communication with a memory-unconstrained quantum prover. In contrast, when quantum communication is allowed, we construct a variety of interactive proof protocols, for specific learning and testing problems, which allow memory-constrained quantum verifiers to gain significant advantages through delegation to untrusted provers. These results highlight both the limitations and potential of delegating learning and testing problems to resource-rich but untrusted third parties.
nan
Article 782
Title@2025-07-20 (7): Partial Symmetry Enforced Attention Decomposition (PSEAD): A Group-Theoretic Framework for Equivariant Transformers in Biological Systems
Title: Partial Symmetry Enforced Attention Decomposition (PSEAD): A Group-Theoretic Framework for Equivariant Transformers in Biological Systems | Partielle Symmetrie verstärkter Aufmerksamkeitsabbau (PSEAD): Ein gruppentheoretischer Rahmen für äquivalente Transformer in biologischen Systemen | 部分对称强制强制注意力分解:生物系统中等离异变异变异器的集团理论框架 2507.14908v1 |
Authors (1): Daniel Ayomide Olanrewaju
This research introduces the Theory of Partial Symmetry Enforced Attention Decomposition (PSEAD), a new and rigorous group-theoretic framework designed to seamlessly integrate local symmetry awareness into the core architecture of self-attention mechanisms within Transformer models. We formalize the concept of local permutation subgroup actions on windows of biological data, proving that under such actions, the attention mechanism naturally decomposes into a direct sum of orthogonal irreducible components. Critically, these components are intrinsically aligned with the irreducible representations of the acting permutation subgroup, thereby providing a powerful mathematical basis for disentangling symmetric and asymmetric features. We show that PSEAD offers substantial advantages. These include enhanced generalization capabilities to novel biological motifs exhibiting similar partial symmetries, unprecedented interpretability by allowing direct visualization and analysis of attention contributions from different symmetry channels, and significant computational efficiency gains by focusing representational capacity on relevant symmetric subspaces. Beyond static data analysis, we extend PSEAD’s applicability to dynamic biological processes within reinforcement learning paradigms, showcasing its potential to accelerate the discovery and optimization of biologically meaningful policies in complex environments like protein folding and drug discovery. This work lays the groundwork for a new generation of biologically informed, symmetry-aware artificial intelligence models.
nan
Article 783
Title@2025-07-20 (7): 5G Traffic Prediction with Time Series Analysis
Title: 5G Traffic Prediction with Time Series Analysis | 5G Verkehrsvorhersage mit Zeitreihenanalyse | 5G 具有时间序列分析的交通预测 2110.03781v2 |
Authors (6): Nikhil Nayak, Rujula Singh R, Rameshwar Garg, Varun Danda, Chandana Kiran, Kaustuv Saha
In today’s day and age, a mobile phone has become a basic requirement needed for anyone to thrive. With the cellular traffic demand increasing so dramatically, it is now necessary to accurately predict the user traffic in cellular networks, so as to improve the performance in terms of resource allocation and utilisation. Since traffic learning and prediction is a classical and appealing field, which still yields many meaningful results, there has been an increasing interest in leveraging Machine Learning tools to analyse the total traffic served in a given region, to optimise the operation of the network. With the help of this project, we seek to exploit the traffic history by using it to predict the nature and occurrence of future traffic. Furthermore, we classify the traffic into particular application types, to increase our understanding of the nature of the traffic. By leveraging the power of machine learning and identifying its usefulness in the field of cellular networks we try to achieve three main objectives - classification of the application generating the traffic, prediction of packet arrival intensity and burst occurrence. The design of the prediction and classification system is done using Long Short Term Memory (LSTM) model. The LSTM predictor developed in this experiment would return the number of uplink packets and also estimate the probability of burst occurrence in the specified future time interval. For the purpose of classification, the regression layer in our LSTM prediction model is replaced by a softmax classifier which is used to classify the application generating the cellular traffic into one of the four applications including surfing, video calling, voice calling, and video streaming.
nan
Article 784
Title@2025-07-20 (7): Learning Nonlinear Causal Reductions to Explain Reinforcement Learning Policies
Title: Learning Nonlinear Causal Reductions to Explain Reinforcement Learning Policies | Nichtlineares Erlernen von Ursachenreduktionen zur Erklärung von Maßnahmen zur Stärkung des Lernens | 解释加强学习政策的非线性因果减量 2507.14901v1 |
Authors (5): Armin Kekić, Jan Schneider, Dieter Büchler, Bernhard Schölkopf, Michel Besserve
Why do reinforcement learning (RL) policies fail or succeed? This is a challenging question due to the complex, high-dimensional nature of agent-environment interactions. In this work, we take a causal perspective on explaining the behavior of RL policies by viewing the states, actions, and rewards as variables in a low-level causal model. We introduce random perturbations to policy actions during execution and observe their effects on the cumulative reward, learning a simplified high-level causal model that explains these relationships. To this end, we develop a nonlinear Causal Model Reduction framework that ensures approximate interventional consistency, meaning the simplified high-level model responds to interventions in a similar way as the original complex system. We prove that for a class of nonlinear causal models, there exists a unique solution that achieves exact interventional consistency, ensuring learned explanations reflect meaningful causal patterns. Experiments on both synthetic causal models and practical RL tasks-including pendulum control and robot table tennis-demonstrate that our approach can uncover important behavioral patterns, biases, and failure modes in trained RL policies.
nan
Article 785
Title@2025-07-20 (7): Application-Specific Component-Aware Structured Pruning of Deep Neural Networks via Soft Coefficient Optimization
Title: Application-Specific Component-Aware Structured Pruning of Deep Neural Networks via Soft Coefficient Optimization | Anwendungsspezifische Komponente-Bewusst strukturierte Pruning Deep Neural Networks durch Soft Coefficient Optimization | 通过软合效益优化对深神经网络进行调节 2507.14882v1 |
Authors (4): Ganesh Sundaram, Jonas Ulmen, Amjad Haider, Daniel Görges
Deep neural networks (DNNs) offer significant versatility and performance benefits, but their widespread adoption is often hindered by high model complexity and computational demands. Model compression techniques such as pruning have emerged as promising solutions to these challenges. However, it remains critical to ensure that application-specific performance characteristics are preserved during compression. In structured pruning, where groups of structurally coherent elements are removed, conventional importance metrics frequently fail to maintain these essential performance attributes. In this work, we propose an enhanced importance metric framework that not only reduces model size but also explicitly accounts for application-specific performance constraints. We employ multiple strategies to determine the optimal pruning magnitude for each group, ensuring a balance between compression and task performance. Our approach is evaluated on an autoencoder tasked with reconstructing MNIST images. Experimental results demonstrate that the proposed method effectively preserves task-relevant performance, maintaining the model’s usability even after substantial pruning, by satisfying the required application-specific criteria.
nan
Article 786
Title@2025-07-20 (7): Enhanced Pruning Strategy for Multi-Component Neural Architectures Using Component-Aware Graph Analysis
Title: Enhanced Pruning Strategy for Multi-Component Neural Architectures Using Component-Aware Graph Analysis | Verbesserte Pruning-Strategie für Mehrkomponenten-Neuralarchitekturen unter Verwendung von Komponenten-Aware Graphenanalyse | 利用组件软件图分析,加强多功能神经结构的审慎战略 2504.13296v2 |
Authors (3): Ganesh Sundaram, Jonas Ulmen, Daniel Görges
Deep neural networks (DNNs) deliver outstanding performance, but their complexity often prohibits deployment in resource-constrained settings. Comprehensive structured pruning frameworks based on parameter dependency analysis reduce model size with specific regard to computational performance. When applying them to Multi-Component Neural Architectures (MCNAs), they risk network integrity by removing large parameter groups. We introduce a component-aware pruning strategy, extending dependency graphs to isolate individual components and inter-component flows. This creates smaller, targeted pruning groups that conserve functional integrity. Demonstrated effectively on a control task, our approach achieves greater sparsity and reduced performance degradation, opening a path for optimizing complex, multi-component DNNs efficiently.
nan
Article 787
Title@2025-07-20 (7): Neural Flow Samplers with Shortcut Models
Title: Neural Flow Samplers with Shortcut Models | Neural Flow Sampler mit Shortcut-Modellen | 带有快捷模式的神经流样板 2502.07337v2 |
Authors (3): Wuhao Chen, Zijing Ou, Yingzhen Li
Sampling from unnormalized densities presents a fundamental challenge with wide-ranging applications, from posterior inference to molecular dynamics simulations. Continuous flow-based neural samplers offer a promising approach, learning a velocity field that satisfies key principles of marginal density evolution (e.g., the continuity equation) to generate samples. However, this learning procedure requires accurate estimation of intractable terms linked to the computationally challenging partition function, for which existing estimators often suffer from high variance or low accuracy. To overcome this, we introduce an improved estimator for these challenging quantities, employing a velocity-driven Sequential Monte Carlo method enhanced with control variates. Furthermore, we introduce a shortcut consistency model to boost the runtime efficiency of the flow-based neural sampler by minimizing its required sampling steps. Our proposed Neural Flow Shortcut Sampler empirically outperforms existing flow-based neural samplers on both synthetic datasets and complex n-body system targets.
nan
Article 788
Title@2025-07-20 (7): The Tsetlin Machine Goes Deep: Logical Learning and Reasoning With Graphs
Title: The Tsetlin Machine Goes Deep: Logical Learning and Reasoning With Graphs | Die Tsetlin-Maschine geht tief: Logisches Lernen und Nachdenken mit Graphen | Tsetlin 机器深层:逻辑学习和用图表解释 2507.14874v1 |
Authors (14): Ole-Christoffer Granmo, Youmna Abdelwahab, Per-Arne Andersen, Paul F. A. Clarke, Kunal Dumbre, Ylva Grønninsæter, Vojtech Halenka, Runar Helin, Lei Jiao, Ahmed Khalid, Rebekka Omslandseter, Rupsa Saha, Mayur Shende, Xuan Zhang
Pattern recognition with concise and flat AND-rules makes the Tsetlin Machine (TM) both interpretable and efficient, while the power of Tsetlin automata enables accuracy comparable to deep learning on an increasing number of datasets. We introduce the Graph Tsetlin Machine (GraphTM) for learning interpretable deep clauses from graph-structured input. Moving beyond flat, fixed-length input, the GraphTM gets more versatile, supporting sequences, grids, relations, and multimodality. Through message passing, the GraphTM builds nested deep clauses to recognize sub-graph patterns with exponentially fewer clauses, increasing both interpretability and data utilization. For image classification, GraphTM preserves interpretability and achieves 3.86%-points higher accuracy on CIFAR-10 than a convolutional TM. For tracking action coreference, faced with increasingly challenging tasks, GraphTM outperforms other reinforcement learning methods by up to 20.6%-points. In recommendation systems, it tolerates increasing noise to a greater extent than a Graph Convolutional Neural Network (GCN), e.g., for noise ratio 0.1, GraphTM obtains accuracy 89.86% compared to GCN’s 70.87%. Finally, for viral genome sequence data, GraphTM is competitive with BiLSTM-CNN and GCN accuracy-wise, training 2.5x faster than GCN. The GraphTM’s application to these varied fields demonstrates how graph representation learning and deep clauses bring new possibilities for TM learning.
nan
Article 789
Title@2025-07-20 (7): Recent Advances in Simulation-based Inference for Gravitational Wave Data Analysis
Title: Recent Advances in Simulation-based Inference for Gravitational Wave Data Analysis | Jüngste Fortschritte bei der simulationsbasierten Schlussfolgerung für die Analyse von Gravitationswellendaten | 引力波数据分析模拟推导法最近的进展 2507.11192v3 |
Authors (2): Bo Liang, He Wang
The detection of gravitational waves by the LIGO-Virgo-KAGRA collaboration has ushered in a new era of observational astronomy, emphasizing the need for rapid and detailed parameter estimation and population-level analyses. Traditional Bayesian inference methods, particularly Markov chain Monte Carlo, face significant computational challenges when dealing with the high-dimensional parameter spaces and complex noise characteristics inherent in gravitational wave data. This review examines the emerging role of simulation-based inference methods in gravitational wave astronomy, with a focus on approaches that leverage machine-learning techniques such as normalizing flows and neural posterior estimation. We provide a comprehensive overview of the theoretical foundations underlying various simulation-based inference methods, including neural posterior estimation, neural ratio estimation, neural likelihood estimation, flow matching, and consistency models. We explore the applications of these methods across diverse gravitational wave data processing scenarios, from single-source parameter estimation and overlapping signal analysis to testing general relativity and conducting population studies. Although these techniques demonstrate speed improvements over traditional methods in controlled studies, their model-dependent nature and sensitivity to prior assumptions are barriers to their widespread adoption. Their accuracy, which is similar to that of conventional methods, requires further validation across broader parameter spaces and noise conditions.
nan
Article 790
Title@2025-07-20 (7): Transformers and Ensemble methods: A solution for Hate Speech Detection in Arabic languages
Title: Transformers and Ensemble methods: A solution for Hate Speech Detection in Arabic languages | Transformer und Ensemble-Methoden: Eine Lösung für Hass-Spracherkennung in arabischen Sprachen | 变换器和组合方法:用阿拉伯语探测仇恨言论的解决方案 2303.09823v2 |
Authors (4): Angel Felipe Magnossão de Paula, Imene Bensalem, Paolo Rosso, Wajdi Zaghouani
This paper describes our participation in the shared task of hate speech detection, which is one of the subtasks of the CERIST NLP Challenge 2022. Our experiments evaluate the performance of six transformer models and their combination using 2 ensemble approaches. The best results on the training set, in a five-fold cross validation scenario, were obtained by using the ensemble approach based on the majority vote. The evaluation of this approach on the test set resulted in an F1-score of 0.60 and an Accuracy of 0.86.
nan
Article 791
Title@2025-07-20 (7): A Privacy-Centric Approach: Scalable and Secure Federated Learning Enabled by Hybrid Homomorphic Encryption
Title: A Privacy-Centric Approach: Scalable and Secure Federated Learning Enabled by Hybrid Homomorphic Encryption | Ein Datenschutz-Centric-Ansatz: Skalierbares und sicheres Federated Learning durch hybride homomorphe Verschlüsselung ermöglicht | 隐私中心方法:通过混合单态加密实现可扩展和安全的联邦学习 2507.14853v1 |
Authors (3): Khoa Nguyen, Tanveer Khan, Antonis Michalas
Federated Learning (FL) enables collaborative model training without sharing raw data, making it a promising approach for privacy-sensitive domains. Despite its potential, FL faces significant challenges, particularly in terms of communication overhead and data privacy. Privacy-preserving Techniques (PPTs) such as Homomorphic Encryption (HE) have been used to mitigate these concerns. However, these techniques introduce substantial computational and communication costs, limiting their practical deployment. In this work, we explore how Hybrid Homomorphic Encryption (HHE), a cryptographic protocol that combines symmetric encryption with HE, can be effectively integrated with FL to address both communication and privacy challenges, paving the way for scalable and secure decentralized learning system.
nan
Article 792
Title@2025-07-20 (7): Grounding Degradations in Natural Language for All-In-One Video Restoration
Title: Grounding Degradations in Natural Language for All-In-One Video Restoration | Erdungsdegradationen in natürlicher Sprache für die Wiederherstellung eines Video-All-in-One-Videos | 全体一体行动,恢复视频 2507.14851v1 |
Authors (6): Muhammad Kamran Janjua, Amirhosein Ghasemabadi, Kunlin Zhang, Mohammad Salameh, Chao Gao, Di Niu
In this work, we propose an all-in-one video restoration framework that grounds degradation-aware semantic context of video frames in natural language via foundation models, offering interpretable and flexible guidance. Unlike prior art, our method assumes no degradation knowledge in train or test time and learns an approximation to the grounded knowledge such that the foundation model can be safely disentangled during inference adding no extra cost. Further, we call for standardization of benchmarks in all-in-one video restoration, and propose two benchmarks in multi-degradation setting, three-task (3D) and four-task (4D), and two time-varying composite degradation benchmarks; one of the latter being our proposed dataset with varying snow intensity, simulating how weather degradations affect videos naturally. We compare our method with prior works and report state-of-the-art performance on all benchmarks.
nan
Article 793
Title@2025-07-20 (7): Hierarchical Multi-Agent Reinforcement Learning with Control Barrier Functions for Safety-Critical Autonomous Systems
Title: Hierarchical Multi-Agent Reinforcement Learning with Control Barrier Functions for Safety-Critical Autonomous Systems | Hierarchisches Mehr-Agenten-Verstärkungs-Lernen mit Kontrollbarrierefunktionen für sicherheitskritische autonome Systeme | 具有控制障碍功能的高级多机构强化学习 2507.14850v1 |
Authors (9): H. M. Sabbir Ahmad, Ehsan Sabouni, Alexander Wasilkoff, Param Budhraja, Zijian Guo, Songyuan Zhang, Chuchu Fan, Christos Cassandras, Wenchao Li
We address the problem of safe policy learning in multi-agent safety-critical autonomous systems. In such systems, it is necessary for each agent to meet the safety requirements at all times while also cooperating with other agents to accomplish the task. Toward this end, we propose a safe Hierarchical Multi-Agent Reinforcement Learning (HMARL) approach based on Control Barrier Functions (CBFs). Our proposed hierarchical approach decomposes the overall reinforcement learning problem into two levels learning joint cooperative behavior at the higher level and learning safe individual behavior at the lower or agent level conditioned on the high-level policy. Specifically, we propose a skill-based HMARL-CBF algorithm in which the higher level problem involves learning a joint policy over the skills for all the agents and the lower-level problem involves learning policies to execute the skills safely with CBFs. We validate our approach on challenging environment scenarios whereby a large number of agents have to safely navigate through conflicting road networks. Compared with existing state of the art methods, our approach significantly improves the safety achieving near perfect (within 5%) success/safety rate while also improving performance across all the environments.
nan
Article 794
Title@2025-07-20 (7): Vector Quantization Prompting for Continual Learning
Title: Vector Quantization Prompting for Continual Learning | Vector Quantization Prompting für kontinuierliches Lernen | 吸引持续学习的矢量量化 2410.20444v2 |
Authors (4): Li Jiao, Qiuxia Lai, Yu Li, Qiang Xu
Continual learning requires to overcome catastrophic forgetting when training a single model on a sequence of tasks. Recent top-performing approaches are prompt-based methods that utilize a set of learnable parameters (i.e., prompts) to encode task knowledge, from which appropriate ones are selected to guide the fixed pre-trained model in generating features tailored to a certain task. However, existing methods rely on predicting prompt identities for prompt selection, where the identity prediction process cannot be optimized with task loss. This limitation leads to sub-optimal prompt selection and inadequate adaptation of pre-trained features for a specific task. Previous efforts have tried to address this by directly generating prompts from input queries instead of selecting from a set of candidates. However, these prompts are continuous, which lack sufficient abstraction for task knowledge representation, making them less effective for continual learning. To address these challenges, we propose VQ-Prompt, a prompt-based continual learning method that incorporates Vector Quantization (VQ) into end-to-end training of a set of discrete prompts. In this way, VQ-Prompt can optimize the prompt selection process with task loss and meanwhile achieve effective abstraction of task knowledge for continual learning. Extensive experiments show that VQ-Prompt outperforms state-of-the-art continual learning methods across a variety of benchmarks under the challenging class-incremental setting. The code is available at \href{https://github.com/jiaolifengmi/VQ-Prompt}{this https URL}.
nan
Article 795
Title@2025-07-20 (7): Time-Aware Attention for Enhanced Electronic Health Records Modeling
Title: Time-Aware Attention for Enhanced Electronic Health Records Modeling | Zeitbewusste Aufmerksamkeit für verbesserte elektronische Gesundheitsdatensysteme | 提高电子健康记录强化建模时间意识关注 2507.14847v1 |
Authors (5): Junhan Yu, Zhunyi Feng, Junwei Lu, Tianxi Cai, Doudou Zhou
Electronic Health Records (EHR) contain valuable clinical information for predicting patient outcomes and guiding healthcare decisions. However, effectively modeling Electronic Health Records (EHRs) requires addressing data heterogeneity and complex temporal patterns. Standard approaches often struggle with irregular time intervals between clinical events. We propose TALE-EHR, a Transformer-based framework featuring a novel time-aware attention mechanism that explicitly models continuous temporal gaps to capture fine-grained sequence dynamics. To complement this temporal modeling with robust semantics, TALE-EHR leverages embeddings derived from standardized code descriptions using a pre-trained Large Language Model (LLM), providing a strong foundation for understanding clinical concepts. Experiments on the MIMIC-IV and PIC dataset demonstrate that our approach outperforms state-of-the-art baselines on tasks such as disease progression forecasting. TALE-EHR underscores the benefit of integrating explicit, continuous temporal modeling with strong semantic representations provides a powerful solution for advancing EHR analysis.
nan
Article 796
Title@2025-07-20 (7): Calibrated and Robust Foundation Models for Vision-Language and Medical Image Tasks Under Distribution Shift
Title: Calibrated and Robust Foundation Models for Vision-Language and Medical Image Tasks Under Distribution Shift | Kalibrierte und robuste Fundamentierungsmodelle für Vision-Sprache und medizinische Bildaufgaben unter Verteilungsverschiebung | 分配变化下的愿景语言和医疗图像任务模型 2507.09222v2 |
Authors (6): Behraj Khan, Tahir Qasim Syed, Nouman M. Durrani, Bilal Naseem, Shabir Ahmad, Rizwan Qureshi
Foundation models like CLIP and SAM have advanced computer vision and medical imaging via low-shot transfer learning, aiding CADD with limited data. However, their deployment faces two key challenges. \textit{distribution shift} where pre-training and post-training data distributions differ (e.g., due to inter-center image acquisition) and \textit{confidence misalignment}, which leads to overconfident errors. These issues surface differently, vision-language models (e.g., CLIP) suffer from 2D embedding shift (image-text misalignment), while medical models (e.g., SAM) encounter 3D domain shifts (e.g., scanner variation) and voxel-wise calibration need. Existing solutions are domain-specific. We propose \textbf{StaRFM}, a fusion of Fisher information penalty (FIP) and confidence misalignment penalty (CMP) tackling both challenges. It applies FIP, extended to 3D via patch-wise regularization, to reduce embedding shift, and CMP, reformulated for voxel-level predictions, to calibrate segmentation uncertainty. We derive PAC-Bayes bounds. FIP controls generalization via the Fisher-Rao norm, and CMP reduces calibration error via Brier score minimization. StaRFM surpasses baselines by \texttt{+}3.5\% accuracy and 28\% lower ECE on 19 vision datasets (e.g., ImageNet, Office-Home), achieves +4.2\% DSC over SAM-FT and 4.8mm HD95 on medical benchmarks (e.g., BraTS, ATLAS), and reduces cross-domain gaps by up to 20\%. The framework is plug-and-play, requiring minimal architectural changes. Code and models are available at: \href{https://anonymous.4open.science/r/StaRFM-C0CD/}{\textcolor{blue}{\underline{StaRFM}}}
nan
Article 797
Title@2025-07-20 (7): The Invisible Leash: Why RLVR May Not Escape Its Origin
Title: The Invisible Leash: Why RLVR May Not Escape Its Origin | Die unsichtbare Leine: Warum RLVR seinem Ursprung nicht entkommen kann | 隐形Leash:为什么RLVR不能逃离其起源 2507.14843v1 |
Authors (5): Fang Wu, Weihao Xuan, Ximing Lu, Zaid Harchaoui, Yejin Choi
Recent advances in large reasoning models highlight Reinforcement Learning with Verifiable Rewards (RLVR) as a promising method for enhancing AI’s capabilities, particularly in solving complex logical tasks. However, it remains unclear whether RLVR truly expands a model’s reasoning boundary or merely amplifies high-reward outputs that the base model already knows for improved precision. This study presents a theoretical and empirical investigation that provides fresh insights into the potential limits of RLVR. First, we offer a new theoretical perspective that RLVR is constrained by the base model’s support-unable to sample solutions with zero initial probability-and operates as a conservative reweighting mechanism that may restrict the discovery of entirely original solutions. We also identify an entropy-reward tradeoff: while RLVR reliably enhances precision, it may progressively narrow exploration and potentially overlook correct yet underrepresented solutions. Extensive empirical experiments validate that while RLVR consistently improves pass@1, the shrinkage of empirical support generally outweighs the expansion of empirical support under larger sampling budgets, failing to recover correct answers that were previously accessible to the base model. Interestingly, we also observe that while RLVR sometimes increases token-level entropy, resulting in greater uncertainty at each generation step, answer-level entropy declines, indicating that these seemingly more uncertain paths ultimately converge onto a smaller set of distinct answers. Taken together, these findings reveal potential limits of RLVR in extending reasoning horizons. Breaking this invisible leash may require future algorithmic innovations such as explicit exploration mechanisms or hybrid strategies that seed probability mass into underrepresented solution regions.
nan
Article 798
Title@2025-07-20 (7): Unraveling the Interplay between Carryover Effects and Reward Autocorrelations in Switchback Experiments
Title: Unraveling the Interplay between Carryover Effects and Reward Autocorrelations in Switchback Experiments | Entschlüsselung des Interplays zwischen Übertragungseffekten und Belohnungsautokorrelationen in Switchback-Experimenten | 在回转实验中解开结转效应与回转回实验中回调自动关系之间的交互作用 2403.17285v6 |
Authors (5): Qianglin Wen, Chengchun Shi, Ying Yang, Niansheng Tang, Hongtu Zhu
A/B testing has become the gold standard for policy evaluation in modern technological industries. Motivated by the widespread use of switchback experiments in A/B testing, this paper conducts a comprehensive comparative analysis of various switchback designs in Markovian environments. Unlike many existing works which derive the optimal design based on specific and relatively simple estimators, our analysis covers a range of state-of-the-art estimators developed in the reinforcement learning (RL) literature. It reveals that the effectiveness of different switchback designs depends crucially on (i) the size of the carryover effect and (ii) the auto-correlations among reward errors over time. Meanwhile, these findings are estimator-agnostic, i.e., they apply to most RL estimators. Based on these insights, we provide a workflow to offer guidelines for practitioners on designing switchback experiments in A/B testing.
nan
Article 799
Title@2025-07-20 (7): An explainable operator approximation framework under the guideline of Green’s function
Title: An explainable operator approximation framework under the guideline of Green’s function | Ein erklärbarer Bediener-Annäherungsrahmen unter der Leitlinie der Green-Funktion | Green职能准则下的可解释的运营商近似近似框架 2412.16644v2 |
Authors (4): Jianghang Gu, Ling Wen, Yuntian Chen, Shiyi Chen
Traditional numerical methods, such as the finite element method and finite volume method, adress partial differential equations (PDEs) by discretizing them into algebraic equations and solving these iteratively. However, this process is often computationally expensive and time-consuming. An alternative approach involves transforming PDEs into integral equations and solving them using Green’s functions, which provide analytical solutions. Nevertheless, deriving Green’s functions analytically is a challenging and non-trivial task, particularly for complex systems. In this study, we introduce a novel framework, termed GreensONet, which is constructed based on the strucutre of deep operator networks (DeepONet) to learn embedded Green’s functions and solve PDEs via Green’s integral formulation. Specifically, the Trunk Net within GreensONet is designed to approximate the unknown Green’s functions of the system, while the Branch Net are utilized to approximate the auxiliary gradients of the Green’s function. These outputs are subsequently employed to perform surface integrals and volume integrals, incorporating user-defined boundary conditions and source terms, respectively. The effectiveness of the proposed framework is demonstrated on three types of PDEs in bounded domains: 3D heat conduction equations, reaction-diffusion equations, and Stokes equations. Comparative results in these cases demonstrate that GreenONet’s accuracy and generalization ability surpass those of existing methods, including Physics-Informed Neural Networks (PINN), DeepONet, Physics-Informed DeepONet (PI-DeepONet), and Fourier Neural Operators (FNO).
nan
Article 800
Title@2025-07-20 (7): Differentially Private Synthetic Graphs Preserving Triangle-Motif Cuts
Title: Differentially Private Synthetic Graphs Preserving Triangle-Motif Cuts | Unterschiedlich Private Synthetische Graphen Vorhalten von Dreieck-Motif-Schnitten | 不同的私人合成图 保持三角-摩蒂夫剪切 2507.14835v1 |
Authors (2): Pan Peng, Hangyu Xu
We study the problem of releasing a differentially private (DP) synthetic graph $G’$ that well approximates the triangle-motif sizes of all cuts of any given graph $G$, where a motif in general refers to a frequently occurring subgraph within complex networks. Non-private versions of such graphs have found applications in diverse fields such as graph clustering, graph sparsification, and social network analysis. Specifically, we present the first $(\varepsilon,\delta)$-DP mechanism that, given an input graph $G$ with $n$ vertices, $m$ edges and local sensitivity of triangles $\ell_{3}(G)$, generates a synthetic graph $G’$ in polynomial time, approximating the triangle-motif sizes of all cuts $(S,V\setminus S)$ of the input graph $G$ up to an additive error of $\tilde{O}(\sqrt{m\ell_{3}(G)}n/\varepsilon^{3/2})$. Additionally, we provide a lower bound of $\Omega(\sqrt{mn}\ell_{3}(G)/\varepsilon)$ on the additive error for any DP algorithm that answers the triangle-motif size queries of all $(S,T)$-cut of $G$. Finally, our algorithm generalizes to weighted graphs, and our lower bound extends to any $K_h$-motif cut for any constant $h\geq 2$.
nan
Article 801
Title@2025-07-20 (7): Interpretable Reward Modeling with Active Concept Bottlenecks
Title: Interpretable Reward Modeling with Active Concept Bottlenecks | Interpretierbare Prämienmodellierung mit Active Concept Engpässen | 具有主动概念瓶颈的可解释的奖励模型 2507.04695v2 |
Authors (4): Sonia Laguna, Katarzyna Kobalczyk, Julia E. Vogt, Mihaela Van der Schaar
We introduce Concept Bottleneck Reward Models (CB-RM), a reward modeling framework that enables interpretable preference learning through selective concept annotation. Unlike standard RLHF methods that rely on opaque reward functions, CB-RM decomposes reward prediction into human-interpretable concepts. To make this framework efficient in low-supervision settings, we formalize an active learning strategy that dynamically acquires the most informative concept labels. We propose an acquisition function based on Expected Information Gain and show that it significantly accelerates concept learning without compromising preference accuracy. Evaluated on the UltraFeedback dataset, our method outperforms baselines in interpretability and sample efficiency, marking a step towards more transparent, auditable, and human-aligned reward models.
nan
Article 802
Title@2025-07-20 (7): eMargin: Revisiting Contrastive Learning with Margin-Based Separation
Title: eMargin: Revisiting Contrastive Learning with Margin-Based Separation | eMargin: Kontrastives Lernen mit Marge-basierter Trennung | eMargin: 重新审查与边际离职的矛盾学习 2507.14828v1 |
Authors (3): Abdul-Kazeem Shamba, Kerstin Bach, Gavin Taylor
We revisit previous contrastive learning frameworks to investigate the effect of introducing an adaptive margin into the contrastive loss function for time series representation learning. Specifically, we explore whether an adaptive margin (eMargin), adjusted based on a predefined similarity threshold, can improve the separation between adjacent but dissimilar time steps and subsequently lead to better performance in downstream tasks. Our study evaluates the impact of this modification on clustering performance and classification in three benchmark datasets. Our findings, however, indicate that achieving high scores on unsupervised clustering metrics does not necessarily imply that the learned embeddings are meaningful or effective in downstream tasks. To be specific, eMargin added to InfoNCE consistently outperforms state-of-the-art baselines in unsupervised clustering metrics, but struggles to achieve competitive results in downstream classification with linear probing. The source code is publicly available at https://github.com/sfi-norwai/eMargin.
nan
Article 803
Title@2025-07-20 (7): Efficient Visual Transformer by Learnable Token Merging
Title: Efficient Visual Transformer by Learnable Token Merging | Effizienter Visual Transformer durch erlernbares Token Merging | 以学习 Tok 合并方式高效视觉变形器 2407.15219v2 |
Authors (2): Yancheng Wang, Yingzhen Yang
Self-attention and transformers have been widely used in deep learning. Recent efforts have been devoted to incorporating transformer blocks into different neural architectures, including those with convolutions, leading to various visual transformers for computer vision tasks. In this paper, we propose a novel and compact transformer block, Transformer with Learnable Token Merging (LTM), or LTM-Transformer. LTM-Transformer performs token merging in a learnable scheme. LTM-Transformer is compatible with many popular and compact transformer networks, and it reduces the FLOPs and the inference time of the visual transformers while maintaining or even improving the prediction accuracy. In the experiments, we replace all the transformer blocks in popular visual transformers, including MobileViT, EfficientViT, ViT, and Swin, with LTM-Transformer blocks, leading to LTM-Transformer networks with different backbones. The LTM-Transformer is motivated by reduction of Information Bottleneck, and a novel and separable variational upper bound for the IB loss is derived. The architecture of the mask module in our LTM blocks, which generates the token merging mask, is designed to reduce the derived upper bound for the IB loss. Extensive results on computer vision tasks evidence that LTM-Transformer renders compact and efficient visual transformers with comparable or much better prediction accuracy than the original visual transformers. The code of the LTM-Transformer is available at https://github.com/Statistical-Deep-Learning/LTM}
nan
Article 804
Title@2025-07-20 (7): Benchmarking Foundation Models with Multimodal Public Electronic Health Records
Title: Benchmarking Foundation Models with Multimodal Public Electronic Health Records | Benchmarking-Stiftungsmodelle mit multimodalen Public Electronic Health-Datensätzen | 采用多式公共电子健康记录模式的基准基础模型 2507.14824v1 |
Authors (9): Kunyu Yu, Rui Yang, Jingchi Liao, Siqi Li, Huitao Li, Irene Li, Yifan Peng, Rishikesan Kamaleswaran, Nan Liu
Foundation models have emerged as a powerful approach for processing electronic health records (EHRs), offering flexibility to handle diverse medical data modalities. In this study, we present a comprehensive benchmark that evaluates the performance, fairness, and interpretability of foundation models, both as unimodal encoders and as multimodal learners, using the publicly available MIMIC-IV database. To support consistent and reproducible evaluation, we developed a standardized data processing pipeline that harmonizes heterogeneous clinical records into an analysis-ready format. We systematically compared eight foundation models, encompassing both unimodal and multimodal models, as well as domain-specific and general-purpose variants. Our findings demonstrate that incorporating multiple data modalities leads to consistent improvements in predictive performance without introducing additional bias. Through this benchmark, we aim to support the development of effective and trustworthy multimodal artificial intelligence (AI) systems for real-world clinical applications. Our code is available at https://github.com/nliulab/MIMIC-Multimodal.
nan
Article 805
Title@2025-07-20 (7): A Near-Optimal Single-Loop Stochastic Algorithm for Convex Finite-Sum Coupled Compositional Optimization
Title: A Near-Optimal Single-Loop Stochastic Algorithm for Convex Finite-Sum Coupled Compositional Optimization | Ein nahezu optimaler Single-Loop-Stochastischer Algorithmus für Convex-Finite-Sum-gekoppelte kompositorische Optimierung | 近于最佳的、 精度- Sum 组合构成优化的近最佳单极单极托盘算法 2312.02277v6 |
Authors (2): Bokun Wang, Tianbao Yang
This paper studies a class of convex Finite-sum Coupled Compositional Optimization (cFCCO) problems with applications including group distributionally robust optimization (GDRO) and learning with imbalanced data. To better address these problems, we introduce an efficient single-loop primal-dual block-coordinate stochastic algorithm called ALEXR. The algorithm employs block-coordinate stochastic mirror ascent with extrapolation for the dual variable and stochastic proximal gradient descent updates for the primal variable. We establish the convergence rates of ALEXR in both convex and strongly convex cases under smoothness and non-smoothness conditions of involved functions, which not only improve the best rates in previous works on smooth cFCCO problems but also expand the realm of cFCCO for solving more challenging non-smooth problems such as the dual form of GDRO. Finally, we derive lower complexity bounds, demonstrating the (near-)optimality of ALEXR within a broad class of stochastic algorithms for cFCCO. Experimental results on GDRO and partial Area Under the ROC Curve (pAUC) maximization demonstrate the promising performance of our algorithm.
nan
Article 806
Title@2025-07-20 (7): Transaction Profiling and Address Role Inference in Tokenized U.S. Treasuries
Title: Transaction Profiling and Address Role Inference in Tokenized U.S. Treasuries | Transaktion Profilierung und Adresse Rolle Inferenz in Tokenized US Treasuries | 美国金融债券中的交易分析和处理角色推断 2507.14808v1 |
Authors (5): Junliang Luo, Katrin Tinn, Samuel Ferreira Duran, Di Wu, Xue Liu
Tokenized U.S. Treasuries have emerged as a prominent subclass of real-world assets (RWAs), offering cryptographically enforced, yield-bearing instruments collateralized by sovereign debt and deployed across multiple blockchain networks. While the market has expanded rapidly, empirical analyses of transaction-level behaviour remain limited. This paper conducts a quantitative, function-level dissection of U.S. Treasury-backed RWA tokens including BUIDL, BENJI, and USDY, across multi-chain: mostly Ethereum and Layer-2s. We analyze decoded contract calls to isolate core functional primitives such as issuance, redemption, transfer, and bridge activity, revealing segmentation in behaviour between institutional actors and retail users. To model address-level economic roles, we introduce a curvature-aware representation learning framework using Poincar'e embeddings and liquidity-based graph features. Our method outperforms baseline models on our RWA Treasury dataset in role inference and generalizes to downstream tasks such as anomaly detection and wallet classification in broader blockchain transaction networks. These findings provide a structured understanding of functional heterogeneity and participant roles in tokenized Treasury in a transaction-level perspective, contributing new empirical evidence to the study of on-chain financialization.
nan
Article 807
Title@2025-07-20 (7): Subliminal Learning: Language models transmit behavioral traits via hidden signals in data
Title: Subliminal Learning: Language models transmit behavioral traits via hidden signals in data | Subliminales Lernen: Sprachmodelle übertragen Verhaltensmerkmale über versteckte Signale in Daten | 潜质学习:语言模式通过数据中隐藏的信号传递行为特征 2507.14805v1 |
Authors (8): Alex Cloud, Minh Le, James Chua, Jan Betley, Anna Sztyber-Betley, Jacob Hilton, Samuel Marks, Owain Evans
We study subliminal learning, a surprising phenomenon where language models transmit behavioral traits via semantically unrelated data. In our main experiments, a “teacher” model with some trait T (such as liking owls or being misaligned) generates a dataset consisting solely of number sequences. Remarkably, a “student” model trained on this dataset learns T. This occurs even when the data is filtered to remove references to T. We observe the same effect when training on code or reasoning traces generated by the same teacher model. However, we do not observe the effect when the teacher and student have different base models. To help explain our findings, we prove a theoretical result showing that subliminal learning occurs in all neural networks under certain conditions, and demonstrate subliminal learning in a simple MLP classifier. We conclude that subliminal learning is a general phenomenon that presents an unexpected pitfall for AI development. Distillation could propagate unintended traits, even when developers try to prevent this via data filtering.
nan
Article 808
Title@2025-07-20 (7): Lizard: An Efficient Linearization Framework for Large Language Models
Title: Lizard: An Efficient Linearization Framework for Large Language Models | Lizard: Ein effizienter Linearisierungsrahmen für große Sprachmodelle | Lizard:大型语言模型的高效线性框架 2507.09025v2 |
Authors (12): Chien Van Nguyen, Ruiyi Zhang, Hanieh Deilamsalehy, Puneet Mathur, Viet Dac Lai, Haoliang Wang, Jayakumar Subramanian, Ryan A. Rossi, Trung Bui, Nikos Vlassis, Franck Dernoncourt, Thien Huu Nguyen
We propose Lizard, a linearization framework that transforms pretrained Transformer-based Large Language Models (LLMs) into flexible, subquadratic architectures for infinite-context generation. Transformer-based LLMs face significant memory and computational bottlenecks as context lengths increase, due to the quadratic complexity of softmax attention and the growing key-value (KV) cache. Lizard addresses these limitations by introducing a subquadratic attention mechanism that closely approximates softmax attention while preserving the output quality. Unlike previous linearization methods, which are often limited by fixed model structures and therefore exclude gating mechanisms, Lizard incorporates a gating module inspired by recent state-of-the-art linear models. This enables adaptive memory control, supports constant-memory inference, offers strong length generalization, and allows more flexible model design. Lizard combines gated linear attention for global context compression with sliding window attention enhanced by meta memory, forming a hybrid mechanism that captures both long-range dependencies and fine-grained local interactions. Moreover, we introduce a hardware-aware algorithm that accelerates the training speed of our models. Extensive experiments show that Lizard achieves near-lossless recovery of the teacher model’s performance across standard language modeling tasks, while significantly outperforming previous linearization methods. On the 5-shot MMLU benchmark, Lizard improves over prior models by 18 points and shows significant improvements on associative recall tasks.
nan
Article 809
Title@2025-07-20 (7): Robust Local Polynomial Regression with Similarity Kernels
Title: Robust Local Polynomial Regression with Similarity Kernels | Robuste lokale polynomische Regression mit Ähnlichkeitskernen | 具有相似内核的强力局部多面回归 2501.10729v2 |
Authors (1): Yaniv Shulman
Local Polynomial Regression (LPR) is a widely used nonparametric method for modeling complex relationships due to its flexibility and simplicity. It estimates a regression function by fitting low-degree polynomials to localized subsets of the data, weighted by proximity. However, traditional LPR is sensitive to outliers and high-leverage points, which can significantly affect estimation accuracy. This paper revisits the kernel function used to compute regression weights and proposes a novel framework that incorporates both predictor and response variables in the weighting mechanism. The focus of this work is a conditional density kernel that robustly estimates weights by mitigating the influence of outliers through localized density estimation. A related joint density kernel is also discussed in an appendix. The proposed method is implemented in Python and is publicly available at https://github.com/yaniv-shulman/rsklpr, demonstrating competitive performance in synthetic benchmark experiments. Compared to standard LPR, the proposed approach consistently improves robustness and accuracy, especially in heteroscedastic and noisy environments, without requiring multiple iterations. This advancement provides a promising extension to traditional LPR, opening new possibilities for robust regression applications.
nan
Article 810
Title@2025-07-20 (7): Composing Linear Layers from Irreducibles
Title: Composing Linear Layers from Irreducibles | Das Komponieren von linearen Schichten aus Irreduzierbaren | 将来自不灵异的线性图层合成成线性图层 2507.11688v2 |
Authors (3): Travis Pence, Daisuke Yamada, Vikas Singh
Contemporary large models often exhibit behaviors suggesting the presence of low-level primitives that compose into modules with richer functionality, but these fundamental building blocks remain poorly understood. We investigate this compositional structure in linear layers by asking: can we identify/synthesize linear transformations from a minimal set of geometric primitives? Using Clifford algebra, we show that linear layers can be expressed as compositions of bivectors – geometric objects encoding oriented planes – and introduce a differentiable algorithm that decomposes them into products of rotors. This construction uses only O(log^2 d) parameters, versus O(d^2) required by dense matrices. Applied to the key, query, and value projections in LLM attention layers, our rotor-based layers match the performance of strong baselines such as block-Hadamard and low-rank approximations. Our findings provide an algebraic perspective on how these geometric primitives can compose into higher-level functions within deep models.
nan
Article 811
Title@2025-07-20 (7): NegRefine: Refining Negative Label-Based Zero-Shot OOD Detection
Title: NegRefine: Refining Negative Label-Based Zero-Shot OOD Detection | NegRefine: Verfeinerung negativer Label-basierter Zero-Shot-OOD-Erkennung | NegRefine: 改进以标签为基的零热 OOOD 检测 2507.09795v2 |
Authors (3): Amirhossein Ansari, Ke Wang, Pulei Xiong
Recent advancements in Vision-Language Models like CLIP have enabled zero-shot OOD detection by leveraging both image and textual label information. Among these, negative label-based methods such as NegLabel and CSP have shown promising results by utilizing a lexicon of words to define negative labels for distinguishing OOD samples. However, these methods suffer from detecting in-distribution samples as OOD due to negative labels that are subcategories of in-distribution labels or proper nouns. They also face limitations in handling images that match multiple in-distribution and negative labels. We propose NegRefine, a novel negative label refinement framework for zero-shot OOD detection. By introducing a filtering mechanism to exclude subcategory labels and proper nouns from the negative label set and incorporating a multi-matching-aware scoring function that dynamically adjusts the contributions of multiple labels matching an image, NegRefine ensures a more robust separation between in-distribution and OOD samples. We evaluate NegRefine on large-scale benchmarks, including ImageNet-1K. The code is available at https://github.com/ah-ansari/NegRefine.
nan
Article 812
Title@2025-07-20 (7): Flow Equivariant Recurrent Neural Networks
Title: Flow Equivariant Recurrent Neural Networks | Strömungsgleiche recurrente neurale Netzwerke | 流动等量经常经常神经网络 2507.14793v1 |
Authors (1): T. Anderson Keller
Data arrives at our senses as a continuous stream, smoothly transforming from one instant to the next. These smooth transformations can be viewed as continuous symmetries of the environment that we inhabit, defining equivalence relations between stimuli over time. In machine learning, neural network architectures that respect symmetries of their data are called equivariant and have provable benefits in terms of generalization ability and sample efficiency. To date, however, equivariance has been considered only for static transformations and feed-forward networks, limiting its applicability to sequence models, such as recurrent neural networks (RNNs), and corresponding time-parameterized sequence transformations. In this work, we extend equivariant network theory to this regime of `flows’ – one-parameter Lie subgroups capturing natural transformations over time, such as visual motion. We begin by showing that standard RNNs are generally not flow equivariant: their hidden states fail to transform in a geometrically structured manner for moving stimuli. We then show how flow equivariance can be introduced, and demonstrate that these models significantly outperform their non-equivariant counterparts in terms of training speed, length generalization, and velocity generalization, on both next step prediction and sequence classification. We present this work as a first step towards building sequence models that respect the time-parameterized symmetries which govern the world around us.
nan
Article 813
Title@2025-07-20 (7): Exploring the In-Context Learning Capabilities of LLMs for Money Laundering Detection in Financial Graphs
Title: Exploring the In-Context Learning Capabilities of LLMs for Money Laundering Detection in Financial Graphs | Erforschung der In-Context-Learning-Fähigkeiten von LLMs für Geldwäscheerkennung in Finanzgraphen | 探索金融图中洗钱侦查LLMs的学习能力 2507.14785v1 |
Authors (1): Erfan Pirmorad
The complexity and interconnectivity of entities involved in money laundering demand investigative reasoning over graph-structured data. This paper explores the use of large language models (LLMs) as reasoning engines over localized subgraphs extracted from a financial knowledge graph. We propose a lightweight pipeline that retrieves k-hop neighborhoods around entities of interest, serializes them into structured text, and prompts an LLM via few-shot in-context learning to assess suspiciousness and generate justifications. Using synthetic anti-money laundering (AML) scenarios that reflect common laundering behaviors, we show that LLMs can emulate analyst-style logic, highlight red flags, and provide coherent explanations. While this study is exploratory, it illustrates the potential of LLM-based graph reasoning in AML and lays groundwork for explainable, language-driven financial crime analytics.
nan
Article 814
Title@2025-07-20 (7): Video-based Exercise Classification and Activated Muscle Group Prediction with Hybrid X3D-SlowFast Network
Title: Video-based Exercise Classification and Activated Muscle Group Prediction with Hybrid X3D-SlowFast Network | Videobasierte Trainingsklassifikation und Aktivierung der Muskelgruppenvorhersage mit Hybrid X3D-SlowFast Netzwerk | 与混合X3D-低速网络的视频作业分类和启动式肌肉组预测 2406.06703v2 |
Authors (2): Manvik Pasula, Pramit Saha
This paper introduces a simple yet effective strategy for exercise classification and muscle group activation prediction (MGAP). These tasks have significant implications for personal fitness, facilitating more affordable, accessible, safer, and simpler exercise routines. This is particularly relevant for novices and individuals with disabilities. Previous research in the field is mostly dominated by the reliance on mounted sensors and a limited scope of exercises, reducing practicality for everyday use. Furthermore, existing MGAP methodologies suffer from a similar dependency on sensors and a restricted range of muscle groups, often excluding strength training exercises, which are pivotal for a comprehensive fitness regimen. Addressing these limitations, our research employs a video-based deep learning framework that encompasses a broad spectrum of exercises and muscle groups, including those vital for strength training. Utilizing the “Workout/Exercises Video” dataset, our approach integrates the X3D and SlowFast video activity recognition models in an effective way to enhance exercise classification and MGAP performance. Our findings demonstrate that this hybrid method, obtained via weighted ensemble, outperforms existing baseline models in accuracy. Pretrained models play a crucial role in enhancing overall performance, with optimal channel reduction values for the SlowFast model identified near 10. Through an ablation study that explores fine-tuning, we further elucidate the interrelation between the two tasks. Our composite model, a weighted-average ensemble of X3D and SlowFast, sets a new benchmark in both exercise classification and MGAP across all evaluated categories, offering a robust solution to the limitations of previous approaches.
nan
Article 815
Title@2025-07-20 (7): Uncertainty Quantification for Machine Learning-Based Prediction: A Polynomial Chaos Expansion Approach for Joint Model and Input Uncertainty Propagation
Title: Uncertainty Quantification for Machine Learning-Based Prediction: A Polynomial Chaos Expansion Approach for Joint Model and Input Uncertainty Propagation | Ungewissheitsquantifizierung für Machine Learning-based Prediction: Ein polynomialer Chaos-Expansionsansatz für gemeinsame Modell- und Input-Unsicherheitspropagation | 机械学习预测的不确定性量化:用于联合示范和投入不确定性传播的多元混乱扩大办法 2507.14782v1 |
Authors (1): Xiaoping Du
Machine learning (ML) surrogate models are increasingly used in engineering analysis and design to replace computationally expensive simulation models, significantly reducing computational cost and accelerating decision-making processes. However, ML predictions contain inherent errors, often estimated as model uncertainty, which is coupled with variability in model inputs. Accurately quantifying and propagating these combined uncertainties is essential for generating reliable engineering predictions. This paper presents a robust framework based on Polynomial Chaos Expansion (PCE) to handle joint input and model uncertainty propagation. While the approach applies broadly to general ML surrogates, we focus on Gaussian Process regression models, which provide explicit predictive distributions for model uncertainty. By transforming all random inputs into a unified standard space, a PCE surrogate model is constructed, allowing efficient and accurate calculation of the mean and standard deviation of the output. The proposed methodology also offers a mechanism for global sensitivity analysis, enabling the accurate quantification of the individual contributions of input variables and ML model uncertainty to the overall output variability. This approach provides a computationally efficient and interpretable framework for comprehensive uncertainty quantification, supporting trustworthy ML predictions in downstream engineering applications.
nan
Article 816
Title@2025-07-20 (7): A Mathematical Framework and a Suite of Learning Techniques for Neural-Symbolic Systems
Title: A Mathematical Framework and a Suite of Learning Techniques for Neural-Symbolic Systems | Ein mathematischer Rahmen und eine Suite von Lerntechniken für neural-symbolische Systeme | 神经-交响系统数学框架和学习技术套件 2407.09693v2 |
Authors (8): Charles Dickens, Connor Pryor, Changyu Gao, Alon Albalak, Eriq Augustine, William Wang, Stephen Wright, Lise Getoor
The field of Neural-Symbolic (NeSy) systems is growing rapidly. Proposed approaches show great promise in achieving symbiotic unions of neural and symbolic methods. However, a unifying framework is needed to organize common NeSy modeling patterns and develop general learning approaches. In this paper, we introduce Neural-Symbolic Energy-Based Models (NeSy-EBMs), a unifying mathematical framework for discriminative and generative NeSy modeling. Importantly, NeSy-EBMs allow the derivation of general expressions for gradients of prominent learning losses, and we introduce a suite of four learning approaches that leverage methods from multiple domains, including bilevel and stochastic policy optimization. Finally, we ground the NeSy-EBM framework with Neural Probabilistic Soft Logic (NeuPSL), an open-source NeSy-EBM library designed for scalability and expressivity, facilitating the real-world application of NeSy systems. Through extensive empirical analysis across multiple datasets, we demonstrate the practical advantages of NeSy-EBMs in various tasks, including image classification, graph node labeling, autonomous vehicle situation awareness, and question answering.
nan
Article 817
Title@2025-07-20 (7): Optimal Task Order for Continual Learning of Multiple Tasks
Title: Optimal Task Order for Continual Learning of Multiple Tasks | Optimale Auftragsreihenfolge für kontinuierliches Lernen mehrerer Aufgaben | 继续不断学习多种任务的最佳任务顺序 2502.03350v2 |
Authors (2): Ziyan Li, Naoki Hiratani
Continual learning of multiple tasks remains a major challenge for neural networks. Here, we investigate how task order influences continual learning and propose a strategy for optimizing it. Leveraging a linear teacher-student model with latent factors, we derive an analytical expression relating task similarity and ordering to learning performance. Our analysis reveals two principles that hold under a wide parameter range: (1) tasks should be arranged from the least representative to the most typical, and (2) adjacent tasks should be dissimilar. We validate these rules on both synthetic data and real-world image classification datasets (Fashion-MNIST, CIFAR-10, CIFAR-100), demonstrating consistent performance improvements in both multilayer perceptrons and convolutional neural networks. Our work thus presents a generalizable framework for task-order optimization in task-incremental continual learning.
nan
Article 818
Title@2025-07-20 (7): MultiKernelBench: A Multi-Platform Benchmark for Kernel Generation
Title: MultiKernelBench: A Multi-Platform Benchmark for Kernel Generation | MultiKernelBench: Ein Multi-Platform Benchmark für die Kernel-Generation | 多KenneelBench: 核心生成的多平台基准 2507.17773v1 |
Authors (6): Zhongzhen Wen, Yinghui Zhang, Zhong Li, Zhongxin Liu, Linna Xie, Tian Zhang
The automatic generation of deep learning (DL) kernels using large language models (LLMs) has emerged as a promising approach to reduce the manual effort and hardware-specific expertise required for writing high-performance operator implementations. However, existing benchmarks for evaluating LLMs in this domain suffer from limited hardware support, coarse-grained kernel categorization, and imbalanced task coverage. To address these limitations, we introduce MultiKernelBench, the first comprehensive, multi-platform benchmark for LLM-based DL kernel generation. MultiKernelBench spans 285 tasks across 14 well-defined kernel categories and supports three major hardware platforms: Nvidia GPUs, Huawei NPUs, and Google TPUs. To enable future extensibility, we design a modular backend abstraction layer that decouples platform-specific logic from the core benchmarking infrastructure, allowing easy integration of new hardware platforms. We further propose a simple yet effective category-aware one-shot prompting method that improves generation quality by providing in-category exemplars. Through systematic evaluations of seven state-of-the-art LLMs, we reveal significant variation in task difficulty, poor generalization to platforms with less training exposure, and the effectiveness of targeted prompting strategies. MultiKernelBench is publicly available at https://github.com/wzzll123/MultiKernelBench.
nan
Article 819
Title@2025-07-20 (7): HI-PMK: A Data-Dependent Kernel for Incomplete Heterogeneous Data Representation
Title: HI-PMK: A Data-Dependent Kernel for Incomplete Heterogeneous Data Representation | HI-PMK: Ein Data-Dependent-Kernel für unvollständige heterogene Datendarstellung | HI-PMK:一个数据依赖核心,用于不完全异基因数据代表 2501.04300v2 |
Authors (4): Youran Zhou, Mohamed Reda Bouadjenek, Jonathan Wells, Sunil Aryal
Handling incomplete and heterogeneous data remains a central challenge in real-world machine learning, where missing values may follow complex mechanisms (MCAR, MAR, MNAR) and features can be of mixed types (numerical and categorical). Existing methods often rely on imputation, which may introduce bias or privacy risks, or fail to jointly address data heterogeneity and structured missingness. We propose the \textbf{H}eterogeneous \textbf{I}ncomplete \textbf{P}robability \textbf{M}ass \textbf{K}ernel (\textbf{HI-PMK}), a novel data-dependent representation learning approach that eliminates the need for imputation. HI-PMK introduces two key innovations: (1) a probability mass-based dissimilarity measure that adapts to local data distributions across heterogeneous features (numerical, ordinal, nominal), and (2) a missingness-aware uncertainty strategy (MaxU) that conservatively handles all three missingness mechanisms by assigning maximal plausible dissimilarity to unobserved entries. Our approach is privacy-preserving, scalable, and readily applicable to downstream tasks such as classification and clustering. Extensive experiments on over 15 benchmark datasets demonstrate that HI-PMK consistently outperforms traditional imputation-based pipelines and kernel methods across a wide range of missing data settings. Code is available at: https://github.com/echoid/Incomplete-Heter-Kernel
nan
Article 820
Title@2025-07-20 (7): Improving Group Robustness on Spurious Correlation via Evidential Alignment
Title: Improving Group Robustness on Spurious Correlation via Evidential Alignment | Verbesserung der Robustheit der Gruppe bei sauberer Korrelation durch Evidential Alignment | 通过证据协调改进小组对净关系关联的威力 2506.11347v3 |
Authors (3): Wenqian Ye, Guangtao Zheng, Aidong Zhang
Deep neural networks often learn and rely on spurious correlations, i.e., superficial associations between non-causal features and the targets. For instance, an image classifier may identify camels based on the desert backgrounds. While it can yield high overall accuracy during training, it degrades generalization on more diverse scenarios where such correlations do not hold. This problem poses significant challenges for out-of-distribution robustness and trustworthiness. Existing methods typically mitigate this issue by using external group annotations or auxiliary deterministic models to learn unbiased representations. However, such information is costly to obtain, and deterministic models may fail to capture the full spectrum of biases learned by the models. To address these limitations, we propose Evidential Alignment, a novel framework that leverages uncertainty quantification to understand the behavior of the biased models without requiring group annotations. By quantifying the evidence of model prediction with second-order risk minimization and calibrating the biased models with the proposed evidential calibration technique, Evidential Alignment identifies and suppresses spurious correlations while preserving core features. We theoretically justify the effectiveness of our method as capable of learning the patterns of biased models and debiasing the model without requiring any spurious correlation annotations. Empirical results demonstrate that our method significantly improves group robustness across diverse architectures and data modalities, providing a scalable and principled solution to spurious correlations.
nan
Article 821
Title@2025-07-20 (7): Rethinking Memorization Measures and their Implications in Large Language Models
Title: Rethinking Memorization Measures and their Implications in Large Language Models | Rethinking Memoring Measures and their Implikationen in Large Language Models | 重新思考记忆措施及其对大语言模式的影响 2507.14777v1 |
Authors (7): Bishwamittra Ghosh, Soumi Das, Qinyuan Wu, Mohammad Aflah Khan, Krishna P. Gummadi, Evimaria Terzi, Deepak Garg
Concerned with privacy threats, memorization in LLMs is often seen as undesirable, specifically for learning. In this paper, we study whether memorization can be avoided when optimally learning a language, and whether the privacy threat posed by memorization is exaggerated or not. To this end, we re-examine existing privacy-focused measures of memorization, namely recollection-based and counterfactual memorization, along with a newly proposed contextual memorization. Relating memorization to local over-fitting during learning, contextual memorization aims to disentangle memorization from the contextual learning ability of LLMs. Informally, a string is contextually memorized if its recollection due to training exceeds the optimal contextual recollection, a learned threshold denoting the best contextual learning without training. Conceptually, contextual recollection avoids the fallacy of recollection-based memorization, where any form of high recollection is a sign of memorization. Theoretically, contextual memorization relates to counterfactual memorization, but imposes stronger conditions. Memorization measures differ in outcomes and information requirements. Experimenting on 18 LLMs from 6 families and multiple formal languages of different entropy, we show that (a) memorization measures disagree on memorization order of varying frequent strings, (b) optimal learning of a language cannot avoid partial memorization of training strings, and (c) improved learning decreases contextual and counterfactual memorization but increases recollection-based memorization. Finally, (d) we revisit existing reports of memorized strings by recollection that neither pose a privacy threat nor are contextually or counterfactually memorized.
nan
Article 822
Title@2025-07-20 (7): Conditional Front-door Adjustment for Heterogeneous Treatment Assignment Effect Estimation Under Non-adherence
Title: Conditional Front-door Adjustment for Heterogeneous Treatment Assignment Effect Estimation Under Non-adherence | Bedingte Front-Tür-Anpassung für heterogene Behandlung Zuordnungseffektschätzung unter Nichtbefolgung | 不遵守规定情况下对不同不同待遇不同待遇的 条件性前门调整 外门调整 2505.05677v4 |
Authors (3): Winston Chen, Trenton Chang, Jenna Wiens
Estimates of heterogeneous treatment assignment effects can inform treatment decisions. Under the presence of non-adherence (e.g., patients do not adhere to their assigned treatment), both the standard backdoor adjustment (SBD) and the conditional front-door adjustment (CFD) can recover unbiased estimates of the treatment assignment effects. However, the estimation variance of these approaches may vary widely across settings, which remains underexplored in the literature. In this work, we demonstrate theoretically and empirically that CFD yields lower-variance estimates than SBD when the true effect of treatment assignment is small (i.e., assigning an intervention leads to small changes in patients’ future outcome). Additionally, since CFD requires estimating multiple nuisance parameters, we introduce LobsterNet, a multi-task neural network that implements CFD with joint modeling of the nuisance parameters. Empirically, LobsterNet reduces estimation error across several semi-synthetic and real-world datasets compared to baselines. Our findings suggest CFD with shared nuisance parameter modeling can improve treatment assignment effect estimation under non-adherence.
nan
Article 823
Title@2025-07-19 (6): Fine-Tuning Diffusion Generative Models via Rich Preference Optimization
Title: Fine-Tuning Diffusion Generative Models via Rich Preference Optimization | Feintuning Diffusion Generative Modelle über Rich Preference Optimization | 通过富有普惠最佳化的精美推广创 创 创 创 型 型 型 型 型 型 2503.11720v4 |
Authors (8): Hanyang Zhao, Haoxian Chen, Yucheng Guo, Genta Indra Winata, Tingting Ou, Ziyu Huang, David D. Yao, Wenpin Tang
We introduce Rich Preference Optimization (RPO), a novel pipeline that leverages rich feedback signals to improve the curation of preference pairs for fine-tuning text-to-image diffusion models. Traditional methods, like Diffusion-DPO, often rely solely on reward model labeling, which can be opaque, offer limited insights into the rationale behind preferences, and are prone to issues such as reward hacking or overfitting. In contrast, our approach begins with generating detailed critiques of synthesized images, from which we extract reliable and actionable image editing instructions. By implementing these instructions, we create refined images, resulting in synthetic, informative preference pairs that serve as enhanced tuning datasets. We demonstrate the effectiveness of our pipeline and the resulting datasets in fine-tuning state-of-the-art diffusion models. Our code is available at https://github.com/Diffusion-RLHF/RPO.
nan
Article 824
Title@2025-07-19 (6): Collusion-Resilient Hierarchical Secure Aggregation with Heterogeneous Security Constraints
Title: Collusion-Resilient Hierarchical Secure Aggregation with Heterogeneous Security Constraints | Kollusion-Resiliente Hierarchische Sichere Aggregation mit heterogenen Sicherheitsbeschränkungen | 协同-抗力强的等级安全聚合与不同不同安全因素的限制 2507.14768v1 |
Authors (6): Zhou Li, Xiang Zhang, Jiawen Lv, Jihao Fan, Haiqiang Chen, Giuseppe Caire
Motivated by federated learning (FL), secure aggregation (SA) aims to securely compute, as efficiently as possible, the sum of a set of inputs distributed across many users. To understand the impact of network topology, hierarchical secure aggregation (HSA) investigated the communication and secret key generation efficiency in a 3-layer relay network, where clusters of users are connected to the aggregation server through an intermediate layer of relays. Due to the pre-aggregation of the messages at the relays, HSA reduces the communication burden on the relay-to-server links and is able to support a large number of users. However, as the number of users increases, a practical challenge arises from heterogeneous security requirements–for example, users in different clusters may require varying levels of input protection. Motivated by this, we study weakly-secure HSA (WS-HSA) with collusion resilience, where instead of protecting all the inputs from any set of colluding users, only the inputs belonging to a predefined collection of user groups (referred to as security input sets) need to be protected against another predefined collection of user groups (referred to as collusion sets). Since the security input sets and collusion sets can be arbitrarily defined, our formulation offers a flexible framework for addressing heterogeneous security requirements in HSA. We characterize the optimal total key rate, i.e., the total number of independent key symbols required to ensure both server and relay security, for a broad range of parameter configurations. For the remaining cases, we establish lower and upper bounds on the optimal key rate, providing constant-factor gap optimality guarantees.
nan
Article 825
Title@2025-07-19 (6): XplainAct: Visualization for Personalized Intervention Insights
Title: XplainAct: Visualization for Personalized Intervention Insights | XplainAct: Visualisierung für personalisierte Interventions-Insights | XPlainAct: 个性干预观察的可视化 2507.14767v1 |
Authors (3): Yanming Zhang, Krishnakumar Hegde, Klaus Mueller
Causality helps people reason about and understand complex systems, particularly through what-if analyses that explore how interventions might alter outcomes. Although existing methods embrace causal reasoning using interventions and counterfactual analysis, they primarily focus on effects at the population level. These approaches often fall short in systems characterized by significant heterogeneity, where the impact of an intervention can vary widely across subgroups. To address this challenge, we present XplainAct, a visual analytics framework that supports simulating, explaining, and reasoning interventions at the individual level within subpopulations. We demonstrate the effectiveness of XplainAct through two case studies: investigating opioid-related deaths in epidemiology and analyzing voting inclinations in the presidential election.
nan
Article 826
Title@2025-07-19 (6): CXR-TFT: Multi-Modal Temporal Fusion Transformer for Predicting Chest X-ray Trajectories
Title: CXR-TFT: Multi-Modal Temporal Fusion Transformer for Predicting Chest X-ray Trajectories | CXR-TFT: Multi-Modal Temporal Fusion Transformer zur Vorhersage von Röntgen-Trajektorien im Brustkorb | CXR-TFT:用于预测胸透X射线轨迹的多模式时际拆解变换器 2507.14766v1 |
Authors (10): Mehak Arora, Ayman Ali, Kaiyuan Wu, Carolyn Davis, Takashi Shimazui, Mahmoud Alwakeel, Victor Moas, Philip Yang, Annette Esper, Rishikesan Kamaleswaran
In intensive care units (ICUs), patients with complex clinical conditions require vigilant monitoring and prompt interventions. Chest X-rays (CXRs) are a vital diagnostic tool, providing insights into clinical trajectories, but their irregular acquisition limits their utility. Existing tools for CXR interpretation are constrained by cross-sectional analysis, failing to capture temporal dynamics. To address this, we introduce CXR-TFT, a novel multi-modal framework that integrates temporally sparse CXR imaging and radiology reports with high-frequency clinical data, such as vital signs, laboratory values, and respiratory flow sheets, to predict the trajectory of CXR findings in critically ill patients. CXR-TFT leverages latent embeddings from a vision encoder that are temporally aligned with hourly clinical data through interpolation. A transformer model is then trained to predict CXR embeddings at each hour, conditioned on previous embeddings and clinical measurements. In a retrospective study of 20,000 ICU patients, CXR-TFT demonstrated high accuracy in forecasting abnormal CXR findings up to 12 hours before they became radiographically evident. This predictive capability in clinical data holds significant potential for enhancing the management of time-sensitive conditions like acute respiratory distress syndrome, where early intervention is crucial and diagnoses are often delayed. By providing distinctive temporal resolution in prognostic CXR analysis, CXR-TFT offers actionable ‘whole patient’ insights that can directly improve clinical outcomes.
nan
Article 827
Title@2025-07-19 (6): Score-based Causal Representation Learning: Linear and General Transformations
Title: Score-based Causal Representation Learning: Linear and General Transformations | Score-based Causal Representation Learning: Lineare und allgemeine Transformationen | 基于计分的因果代表制学习:线性转变和一般转变 2402.00849v5 |
Authors (5): Burak Varıcı, Emre Acartürk, Karthikeyan Shanmugam, Abhishek Kumar, Ali Tajer
This paper addresses intervention-based causal representation learning (CRL) under a general nonparametric latent causal model and an unknown transformation that maps the latent variables to the observed variables. Linear and general transformations are investigated. The paper addresses both the identifiability and achievability aspects. Identifiability refers to determining algorithm-agnostic conditions that ensure the recovery of the true latent causal variables and the underlying latent causal graph. Achievability refers to the algorithmic aspects and addresses designing algorithms that achieve identifiability guarantees. By drawing novel connections between score functions (i.e., the gradients of the logarithm of density functions) and CRL, this paper designs a score-based class of algorithms that ensures both identifiability and achievability. First, the paper focuses on linear transformations and shows that one stochastic hard intervention per node suffices to guarantee identifiability. It also provides partial identifiability guarantees for soft interventions, including identifiability up to mixing with parents for general causal models and perfect recovery of the latent graph for sufficiently nonlinear causal models. Secondly, it focuses on general transformations and demonstrates that two stochastic hard interventions per node are sufficient for identifiability. This is achieved by defining a differentiable loss function whose global optima ensure identifiability for general CRL. Notably, one does not need to know which pair of interventional environments has the same node intervened. Finally, the theoretical results are empirically validated via experiments on structured synthetic data and image data.
nan
Article 828
Title@2025-07-19 (6): RACR-MIL: Rank-aware contextual reasoning for weakly supervised grading of squamous cell carcinoma using whole slide images
Title: RACR-MIL: Rank-aware contextual reasoning for weakly supervised grading of squamous cell carcinoma using whole slide images | RACR-MIL: Rank-aware kontextuelle Argumentation für schwach überwachte Einstufung von Plattenepithelkarzinom mit ganzen Diabildern | RACR-MIL: 使用整张幻灯片图像对典型细胞癌进行监管不力分类的背景推理 2308.15618v2 |
Authors (17): Anirudh Choudhary, Mosbah Aouad, Krishnakant Saboo, Angelina Hwang, Jacob Kechter, Blake Bordeaux, Puneet Bhullar, David DiCaudo, Steven Nelson, Nneka Comfere, Emma Johnson, Olayemi Sokumbi, Jason Sluzevich, Leah Swanson, Dennis Murphree, Aaron Mangold, Ravishankar Iyer
Squamous cell carcinoma (SCC) is the most common cancer subtype, with an increasing incidence and a significant impact on cancer-related mortality. SCC grading using whole slide images is inherently challenging due to the lack of a reliable protocol and substantial tissue heterogeneity. We propose RACR-MIL, the first weakly-supervised SCC grading approach achieving robust generalization across multiple anatomies (skin, head and neck, lung). RACR-MIL is an attention-based multiple-instance learning framework that enhances grade-relevant contextual representation learning and addresses tumor heterogeneity through two key innovations: (1) a hybrid WSI graph that captures both local tissue context and non-local phenotypical dependencies between tumor regions, and (2) a rank-ordering constraint in the attention mechanism that consistently prioritizes higher-grade tumor regions, aligning with pathologists diagnostic process. Our model achieves state-of-the-art performance across multiple SCC datasets, achieving 3-9% higher grading accuracy, resilience to class imbalance, and up to 16% improved tumor localization. In a pilot study, pathologists reported that RACR-MIL improved grading efficiency in 60% of cases, underscoring its potential as a clinically viable cancer diagnosis and grading assistant.
nan
Article 829
Title@2025-07-19 (6): QUTCC: Quantile Uncertainty Training and Conformal Calibration for Imaging Inverse Problems
Title: QUTCC: Quantile Uncertainty Training and Conformal Calibration for Imaging Inverse Problems | QUTCC: Quantile Uncertainty Training und Konforme Kalibrierung für bildgebende Inverse Probleme | QUTCC: 成反向问题量化不确定性培训和常规校准 2507.14760v1 |
Authors (4): Cassandra Tong Ye, Shamus Li, Tyler King, Kristina Monakhova
Deep learning models often hallucinate, producing realistic artifacts that are not truly present in the sample. This can have dire consequences for scientific and medical inverse problems, such as MRI and microscopy denoising, where accuracy is more important than perceptual quality. Uncertainty quantification techniques, such as conformal prediction, can pinpoint outliers and provide guarantees for image regression tasks, improving reliability. However, existing methods utilize a linear constant scaling factor to calibrate uncertainty bounds, resulting in larger, less informative bounds. We propose QUTCC, a quantile uncertainty training and calibration technique that enables nonlinear, non-uniform scaling of quantile predictions to enable tighter uncertainty estimates. Using a U-Net architecture with a quantile embedding, QUTCC enables the prediction of the full conditional distribution of quantiles for the imaging task. During calibration, QUTCC generates uncertainty bounds by iteratively querying the network for upper and lower quantiles, progressively refining the bounds to obtain a tighter interval that captures the desired coverage. We evaluate our method on several denoising tasks as well as compressive MRI reconstruction. Our method successfully pinpoints hallucinations in image estimates and consistently achieves tighter uncertainty intervals than prior methods while maintaining the same statistical coverage.
nan
Article 830
Title@2025-07-19 (6): A Structure-Guided Gauss-Newton Method for Shallow ReLU Neural Network
Title: A Structure-Guided Gauss-Newton Method for Shallow ReLU Neural Network | Eine strukturgeführte Gauß-Newton-Methode für shallow ReLU Neural Network | 浅光 ReLU 神经网络结构引导高斯-牛顿方法 2404.05064v2 |
Authors (5): Zhiqiang Cai, Tong Ding, Min Liu, Xinyu Liu, Jianlin Xia
In this paper, we propose a structure-guided Gauss-Newton (SgGN) method for solving least squares problems using a shallow ReLU neural network. The method effectively takes advantage of both the least squares structure and the neural network structure of the objective function. By categorizing the weights and biases of the hidden and output layers of the network as nonlinear and linear parameters, respectively, the method iterates back and forth between the nonlinear and linear parameters. The nonlinear parameters are updated by a damped Gauss-Newton method and the linear ones are updated by a linear solver. Moreover, at the Gauss-Newton step, a special form of the Gauss-Newton matrix is derived for the shallow ReLU neural network and is used for efficient iterations. It is shown that the corresponding mass and Gauss-Newton matrices in the respective linear and nonlinear steps are symmetric and positive definite under reasonable assumptions. Thus, the SgGN method naturally produces an effective search direction without the need of additional techniques like shifting in the Levenberg-Marquardt method to achieve invertibility of the Gauss-Newton matrix. The convergence and accuracy of the method are demonstrated numerically for several challenging function approximation problems, especially those with discontinuities or sharp transition layers that pose significant challenges for commonly used training algorithms in machine learning.
nan
Article 831
Title@2025-07-19 (6): Iceberg: Enhancing HLS Modeling with Synthetic Data
Title: Iceberg: Enhancing HLS Modeling with Synthetic Data | Iceberg: Verbesserung der HLS-Modellierung mit synthetischen Daten | 冰山:加强利用合成数据建立HLS模型 2507.09948v2 |
Authors (6): Zijian Ding, Tung Nguyen, Weikai Li, Aditya Grover, Yizhou Sun, Jason Cong
Deep learning-based prediction models for High-Level Synthesis (HLS) of hardware designs often struggle to generalize. In this paper, we study how to close the generalizability gap of these models through pretraining on synthetic data and introduce Iceberg, a synthetic data augmentation approach that expands both large language model (LLM)-generated programs and weak labels of unseen design configurations. Our weak label generation method is integrated with an in-context model architecture, enabling meta-learning from actual and proximate labels. Iceberg improves the geometric mean modeling accuracy by $86.4\%$ when adapt to six real-world applications with few-shot examples and achieves a $2.47\times$ and a $1.12\times$ better offline DSE performance when adapting to two different test datasets. Our open-sourced code is here: https://github.com/UCLA-VAST/iceberg
nan
Article 832
Title@2025-07-19 (6): Supervised Graph Contrastive Learning for Gene Regulatory Network
Title: Supervised Graph Contrastive Learning for Gene Regulatory Network | Überwachtes Graph Kontrastives Lernen für Gene Regulatory Network | 受监督的基因监管网络图表对比性学习 2505.17786v3 |
Authors (5): Sho Oshima, Yuji Okamoto, Taisei Tosaki, Ryosuke Kojima, Yasushi Okuno
Graph representation learning is effective for obtaining a meaningful latent space utilizing the structure of graph data and is widely applied, including biological networks. In particular, Graph Contrastive Learning (GCL) has emerged as a powerful self-supervised method that relies on applying perturbations to graphs for data augmentation. However, when applying existing GCL methods to biological networks such as Gene Regulatory Networks (GRNs), they overlooked meaningful biologically relevant perturbations, e.g., gene knockdowns. In this study, we introduce SupGCL (Supervised Graph Contrastive Learning), a novel GCL method for GRNs that directly incorporates biological perturbations derived from gene knockdown experiments as the supervision. SupGCL mathematically extends existing GCL methods that utilize non-biological perturbations to probabilistic models that introduce actual biological gene perturbation utilizing gene knockdown data. Using the GRN representation obtained by our proposed method, our aim is to improve the performance of biological downstream tasks such as patient hazard prediction and disease subtype classification (graph-level task), and gene function classification (node-level task). We applied SupGCL on real GRN datasets derived from patients with multiple types of cancer, and in all experiments SupGCL achieves better performance than state-of-the-art baselines.
nan
Article 833
Title@2025-07-19 (6): Domain-Adaptive Small Language Models for Structured Tax Code Prediction
Title: Domain-Adaptive Small Language Models for Structured Tax Code Prediction | Domain-Adaptive kleine Sprachmodelle für strukturierte Steuervorhersage | 结构化税法预测结构化税法 2507.10880v2 |
Authors (3): Souvik Nath, Sumit Wadhwa, Luis Perez
Every day, multinational firms process thousands of transactions, each of which must adhere to tax regulations that vary by jurisdiction and are often nuanced. The determination of product and service tax codes, such as HSN or SAC is a major use case in Tax compliance. An accurate determination of such codes is imperative to avoid any tax penalties. This paper proposes a domain-adaptive small language model (SLM) with an encoder-decoder architecture for the enhanced prediction of product and service tax codes. In this approach, we address the problem of predicting hierarchical tax code sequences using unstructured product and services data. We employ an SLM based upon encoder-decoder architecture as this enables sequential generation of tax codes to capture the hierarchical dependencies present within the tax codes. Our experiments demonstrate that encoder-decoder SLMs can be successfully applied to the sequential prediction of structured tax codes, a domain that remains comparatively unexplored in current NLP research. In this paper, we demonstrate the superior performance of the domain-adaptive encoder-decoder SLMs over flat classifiers when applied to the Harmonized System of Nomenclature (HSN), and achieve superior results compared to decoder-only and encoder-only architectures for structured sequence generation tasks. This approach can also be scaled to other government-mandated tax commodity codes, such as United Nations Standard Products and Services Codes (UNSPSC), or Brazil’s Nomenclatura Comum do Mercosul (NCM).
nan
Article 834
Title@2025-07-19 (6): SemiOccam: A Robust Semi-Supervised Image Recognition Network Using Sparse Labels
Title: SemiOccam: A Robust Semi-Supervised Image Recognition Network Using Sparse Labels | SemiOccam: Ein robustes semi-überwachtes Bilderkennungsnetzwerk mit Sparse-Labels | 半 Occam: 使用粗略标签粗略标签的强力半半超图像识别网络 2506.03582v3 |
Authors (3): Rui Yann, Tianshuo Zhang, Xianglei Xing
We present SemiOccam, an image recognition network that leverages semi-supervised learning in a highly efficient manner. Existing works often rely on complex training techniques and architectures, requiring hundreds of GPU hours for training, while their generalization ability with extremely limited labeled data remains to be improved. To address these limitations, we construct a hierarchical mixture density classification mechanism by optimizing mutual information between feature representations and target classes, compressing redundant information while retaining crucial discriminative components. Experimental results demonstrate that our method achieves state-of-the-art performance on three commonly used datasets, with accuracy exceeding 95% on two of them using only 4 labeled samples per class, and its simple architecture keeps training time at the minute level. Notably, this paper reveals a long-overlooked data leakage issue in the STL-10 dataset for semi-supervised learning and removes duplicates to ensure reliable experimental results. We release the deduplicated CleanSTL-10 dataset to facilitate fair and reproducible research. Code available at https://github.com/Shu1L0n9/SemiOccam.
nan
Article 835
Title@2025-07-19 (6): Skill Learning via Policy Diversity Yields Identifiable Representations for Reinforcement Learning
Title: Skill Learning via Policy Diversity Yields Identifiable Representations for Reinforcement Learning | Kompetenzerwerb durch politische Vielfalt führt zu identifizierbaren Repräsentationen für verstärktes Lernen | 通过政策多样性学习技能 2507.14748v1 |
Authors (6): Patrik Reizinger, Bálint Mucsányi, Siyuan Guo, Benjamin Eysenbach, Bernhard Schölkopf, Wieland Brendel
Self-supervised feature learning and pretraining methods in reinforcement learning (RL) often rely on information-theoretic principles, termed mutual information skill learning (MISL). These methods aim to learn a representation of the environment while also incentivizing exploration thereof. However, the role of the representation and mutual information parametrization in MISL is not yet well understood theoretically. Our work investigates MISL through the lens of identifiable representation learning by focusing on the Contrastive Successor Features (CSF) method. We prove that CSF can provably recover the environment’s ground-truth features up to a linear transformation due to the inner product parametrization of the features and skill diversity in a discriminative sense. This first identifiability guarantee for representation learning in RL also helps explain the implications of different mutual information objectives and the downsides of entropy regularizers. We empirically validate our claims in MuJoCo and DeepMind Control and show how CSF provably recovers the ground-truth features both from states and pixels.
nan
Article 836
Title@2025-07-19 (6): Pruning Increases Orderedness in Recurrent Computation
Title: Pruning Increases Orderedness in Recurrent Computation | Pruning erhöht Ordnung in der recurrent Computation | 经常计算中审慎增加的有秩序性 2507.14747v1 |
Authors (1): Yiding Song
Inspired by the prevalence of recurrent circuits in biological brains, we investigate the degree to which directionality is a helpful inductive bias for artificial neural networks. Taking directionality as topologically-ordered information flow between neurons, we formalise a perceptron layer with all-to-all connections (mathematically equivalent to a weight-tied recurrent neural network) and demonstrate that directionality, a hallmark of modern feed-forward networks, can be induced rather than hard-wired by applying appropriate pruning techniques. Across different random seeds our pruning schemes successfully induce greater topological ordering in information flow between neurons without compromising performance, suggesting that directionality is not a prerequisite for learning, but may be an advantageous inductive bias discoverable by gradient descent and sparsification.
nan
Article 837
Title@2025-07-19 (6): Sampling from Gaussian Processes: A Tutorial and Applications in Global Sensitivity Analysis and Optimization
Title: Sampling from Gaussian Processes: A Tutorial and Applications in Global Sensitivity Analysis and Optimization | Probenahme aus gaussischen Prozessen: Ein Tutorial und Anwendungen in der globalen Sensitivitätsanalyse und Optimierung | Gaussian进程抽样:全球敏感性分析和优化的教学和应用 2507.14746v1 |
Authors (4): Bach Do, Nafeezat A. Ajenifuja, Taiwo A. Adebiyi, Ruda Zhang
High-fidelity simulations and physical experiments are essential for engineering analysis and design. However, their high cost often limits their applications in two critical tasks: global sensitivity analysis (GSA) and optimization. This limitation motivates the common use of Gaussian processes (GPs) as proxy regression models to provide uncertainty-aware predictions based on a limited number of high-quality observations. GPs naturally enable efficient sampling strategies that support informed decision-making under uncertainty by extracting information from a subset of possible functions for the model of interest. Despite their popularity in machine learning and statistics communities, sampling from GPs has received little attention in the community of engineering optimization. In this paper, we present the formulation and detailed implementation of two notable sampling methods – random Fourier features and pathwise conditioning – for generating posterior samples from GPs. Alternative approaches are briefly described. Importantly, we detail how the generated samples can be applied in GSA, single-objective optimization, and multi-objective optimization. We show successful applications of these sampling methods through a series of numerical examples.
nan
Article 838
Title@2025-07-19 (6): Beyond the Single-Best Model: Rashomon Partial Dependence Profile for Trustworthy Explanations in AutoML
Title: Beyond the Single-Best Model: Rashomon Partial Dependence Profile for Trustworthy Explanations in AutoML | Jenseits des Single-Best-Modells: Rashomon Partial Dependence Profile für vertrauenswürdige Erklärungen in AutoML | 超越单一最佳模式:自动ML中可信赖解释的Rashomon部分依赖性简介 2507.14744v1 |
Authors (3): Mustafa Cavus, Jan N. van Rijn, Przemysław Biecek
Automated machine learning systems efficiently streamline model selection but often focus on a single best-performing model, overlooking explanation uncertainty, an essential concern in human centered explainable AI. To address this, we propose a novel framework that incorporates model multiplicity into explanation generation by aggregating partial dependence profiles (PDP) from a set of near optimal models, known as the Rashomon set. The resulting Rashomon PDP captures interpretive variability and highlights areas of disagreement, providing users with a richer, uncertainty aware view of feature effects. To evaluate its usefulness, we introduce two quantitative metrics, the coverage rate and the mean width of confidence intervals, to evaluate the consistency between the standard PDP and the proposed Rashomon PDP. Experiments on 35 regression datasets from the OpenML CTR23 benchmark suite show that in most cases, the Rashomon PDP covers less than 70% of the best model’s PDP, underscoring the limitations of single model explanations. Our findings suggest that Rashomon PDP improves the reliability and trustworthiness of model interpretations by adding additional information that would otherwise be neglected. This is particularly useful in high stakes domains where transparency and confidence are critical.
nan
Article 839
Title@2025-07-19 (6): Better Training Data Attribution via Better Inverse Hessian-Vector Products
Title: Better Training Data Attribution via Better Inverse Hessian-Vector Products | Bessere Datenzuweisung durch bessere inverse hessisch-Vektor-Produkte | 通过 “ 更好的反向 “ 赫森 – – 选民产品更好地分配培训数据 2507.14740v1 |
Authors (6): Andrew Wang, Elisa Nguyen, Runshi Yang, Juhan Bae, Sheila A. McIlraith, Roger Grosse
Training data attribution (TDA) provides insights into which training data is responsible for a learned model behavior. Gradient-based TDA methods such as influence functions and unrolled differentiation both involve a computation that resembles an inverse Hessian-vector product (iHVP), which is difficult to approximate efficiently. We introduce an algorithm (ASTRA) which uses the EKFAC-preconditioner on Neumann series iterations to arrive at an accurate iHVP approximation for TDA. ASTRA is easy to tune, requires fewer iterations than Neumann series iterations, and is more accurate than EKFAC-based approximations. Using ASTRA, we show that improving the accuracy of the iHVP approximation can significantly improve TDA performance.
nan
Article 840
Title@2025-07-19 (6): Multi-parameter Control for the $(1+(λ,λ))$-GA on OneMax via Deep Reinforcement Learning
Title: Multi-parameter Control for the $(1+(λ,λ))$-GA on OneMax via Deep Reinforcement Learning | Multiparameter-Steuerung für das $(1+(λ,λ))$-GA auf OneMax über Deep Reinforcement Learning | (1+(,,)$-GA的多参数控制 2505.12982v3 |
Authors (4): Tai Nguyen, Phong Le, Carola Doerr, Nguyen Dang
It is well known that evolutionary algorithms can benefit from dynamic choices of the key parameters that control their behavior, to adjust their search strategy to the different stages of the optimization process. A prominent example where dynamic parameter choices have shown a provable super-constant speed-up is the $(1+(\lambda,\lambda))$ Genetic Algorithm optimizing the OneMax function. While optimal parameter control policies result in linear expected running times, this is not possible with static parameter choices. This result has spurred a lot of interest in parameter control policies. However, many works, in particular theoretical running time analyses, focus on controlling one single parameter. Deriving policies for controlling multiple parameters remains very challenging. In this work we reconsider the problem of the $(1+(\lambda,\lambda))$ Genetic Algorithm optimizing OneMax. We decouple its four main parameters and investigate how well state-of-the-art deep reinforcement learning techniques can approximate good control policies. We show that although making deep reinforcement learning learn effectively is a challenging task, once it works, it is very powerful and is able to find policies that outperform all previously known control policies on the same benchmark. Based on the results found through reinforcement learning, we derive a simple control policy that consistently outperforms the default theory-recommended setting by $27\%$ and the irace-tuned policy, the strongest existing control policy on this benchmark, by $13\%$, for all tested problem sizes up to $40{,}000$.
nan
Article 841
Title@2025-07-19 (6): Reevaluating Policy Gradient Methods for Imperfect-Information Games
Title: Reevaluating Policy Gradient Methods for Imperfect-Information Games | Neubewertung der Politik Gradient Methoden für Imperfect-Informations-Spiele | 重新评估不完善信息运动会的逐步政策方法 2502.08938v2 |
Authors (9): Max Rudolph, Nathan Lichtle, Sobhan Mohammadpour, Alexandre Bayen, J. Zico Kolter, Amy Zhang, Gabriele Farina, Eugene Vinitsky, Samuel Sokota
In the past decade, motivated by the putative failure of naive self-play deep reinforcement learning (DRL) in adversarial imperfect-information games, researchers have developed numerous DRL algorithms based on fictitious play (FP), double oracle (DO), and counterfactual regret minimization (CFR). In light of recent results of the magnetic mirror descent algorithm, we hypothesize that simpler generic policy gradient methods like PPO are competitive with or superior to these FP-, DO-, and CFR-based DRL approaches. To facilitate the resolution of this hypothesis, we implement and release the first broadly accessible exact exploitability computations for four large games. Using these games, we conduct the largest-ever exploitability comparison of DRL algorithms for imperfect-information games. Over 5600 training runs, we find that FP-, DO-, and CFR-based approaches fail to outperform generic policy gradient methods. Code is available at https://github.com/nathanlct/IIG-RL-Benchmark and https://github.com/gabrfarina/exp-a-spiel .
nan
Article 842
Title@2025-07-19 (6): Balancing Expressivity and Robustness: Constrained Rational Activations for Reinforcement Learning
Title: Balancing Expressivity and Robustness: Constrained Rational Activations for Reinforcement Learning | Ausbalancierende Expressivität und Robustheit: eingeschränkte rationale Aktivierungen für verstärktes Lernen | 平衡表达性和强力:加强学习的有节制的理性行动 2507.14736v1 |
Authors (5): Rafał Surdej, Michał Bortkiewicz, Alex Lewandowski, Mateusz Ostaszewski, Clare Lyle
Trainable activation functions, whose parameters are optimized alongside network weights, offer increased expressivity compared to fixed activation functions. Specifically, trainable activation functions defined as ratios of polynomials (rational functions) have been proposed to enhance plasticity in reinforcement learning. However, their impact on training stability remains unclear. In this work, we study trainable rational activations in both reinforcement and continual learning settings. We find that while their flexibility enhances adaptability, it can also introduce instability, leading to overestimation in RL and feature collapse in longer continual learning scenarios. Our main result is demonstrating a trade-off between expressivity and plasticity in rational activations. To address this, we propose a constrained variant that structurally limits excessive output scaling while preserving adaptability. Experiments across MetaWorld and DeepMind Control Suite (DMC) environments show that our approach improves training stability and performance. In continual learning benchmarks, including MNIST with reshuffled labels and Split CIFAR-100, we reveal how different constraints affect the balance between expressivity and long-term retention. While preliminary experiments in discrete action domains (e.g., Atari) did not show similar instability, this suggests that the trade-off is particularly relevant for continuous control. Together, our findings provide actionable design principles for robust and adaptable trainable activations in dynamic, non-stationary environments. Code available at: https://github.com/special114/rl_rational_plasticity.
nan
Article 843
Title@2025-07-19 (6): Attention-Based Reconstruction of Full-Field Tsunami Waves from Sparse Tsunameter Networks
Title: Attention-Based Reconstruction of Full-Field Tsunami Waves from Sparse Tsunameter Networks | Aufmerksamkeitsbasierte Rekonstruktion von Ganzfeld-Tsunamiwellen aus Sparse Tsunameter-Netzwerken | 利用微缩起子网络重建全战地海啸波 2411.12948v5 |
Authors (5): Edward McDugald, Arvind Mohan, Darren Engwirda, Agnese Marcato, Javier Santos
We investigate the potential of an attention-based neural network architecture, the Senseiver, for sparse sensing in tsunami forecasting. Specifically, we focus on the Tsunami Data Assimilation Method, which generates forecasts from tsunameter networks. Our model is used to reconstruct high-resolution tsunami wavefields from extremely sparse observations, including cases where the tsunami epicenters are not represented in the training set. Furthermore, we demonstrate that our approach significantly outperforms the Linear Interpolation with Huygens-Fresnel Principle in generating dense observation networks, achieving markedly improved accuracy.
nan
Article 844
Title@2025-07-19 (6): Beyond Atomic Geometry Representations in Materials Science: A Human-in-the-Loop Multimodal Framework
Title: Beyond Atomic Geometry Representations in Materials Science: A Human-in-the-Loop Multimodal Framework | Jenseits von atomaren Geometriedarstellungen in der Materialwissenschaft: Ein multimodaler Rahmen für Mensch-in-the-Loop | 超出原子几何在材料科学中的代表性以外的原子几何代表性:人类在洛博多模式框架 2506.00302v2 |
Authors (4): Can Polat, Erchin Serpedin, Mustafa Kurban, Hasan Kurban
Most materials science datasets are limited to atomic geometries (e.g., XYZ files), restricting their utility for multimodal learning and comprehensive data-centric analysis. These constraints have historically impeded the adoption of advanced machine learning techniques in the field. This work introduces MultiCrystalSpectrumSet (MCS-Set), a curated framework that expands materials datasets by integrating atomic structures with 2D projections and structured textual annotations, including lattice parameters and coordination metrics. MCS-Set enables two key tasks: (1) multimodal property and summary prediction, and (2) constrained crystal generation with partial cluster supervision. Leveraging a human-in-the-loop pipeline, MCS-Set combines domain expertise with standardized descriptors for high-quality annotation. Evaluations using state-of-the-art language and vision-language models reveal substantial modality-specific performance gaps and highlight the importance of annotation quality for generalization. MCS-Set offers a foundation for benchmarking multimodal models, advancing annotation practices, and promoting accessible, versatile materials science datasets. The dataset and implementations are available at https://github.com/KurbanIntelligenceLab/MultiCrystalSpectrumSet.
nan
Article 845
Title@2025-07-19 (6): Task-Agnostic Continual Prompt Tuning with Gradient-Based Selection and Decoding
Title: Task-Agnostic Continual Prompt Tuning with Gradient-Based Selection and Decoding | Task-Agnostic Continual Prompt Tuning mit gradient-based Auswahl und Decodierung | 以渐进选择和下限方式进行任务不可确定的持续快速快速调试 2507.14725v1 |
Authors (4): Anushka Tiwari, Sayantan Pal, Rohini K. Srihari, Kaiyi Ji
Prompt-based continual learning (CL) offers a parameter-efficient way to adapt large language models (LLMs) across task sequences. However, most existing methods assume task-aware inference and maintain a growing list of task-specific prompts, which limits scalability and hides latent forgetting. In this work, we introduce GRID, a unified framework that addresses two key limitations: (1) latent forgetting under task-agnostic inference, and (2) prompt memory explosion as task sequences grow. GRID integrates a task-aware decoding mechanism that improves backward transfer by leveraging representative inputs, automatic task identification, and constrained decoding. Additionally, we propose a gradient-based prompt selection strategy that compresses less informative prompts into a single aggregated representation, enabling scalable and memory-efficient lifelong learning. Extensive experiments across short-sequence, long-sequence, and negative transfer benchmarks show that GRID significantly improves backward transfer, achieves competitive forward transfer, and reduces forgotten tasks by up to 80\%, outperforming state-of-the-art methods on T5 and Flan-T5 backbones.
nan
Article 846
Title@2025-07-19 (6): The unknotting number, hard unknot diagrams, and reinforcement learning
Title: The unknotting number, hard unknot diagrams, and reinforcement learning | Die unknotierende Zahl, harte Unknot-Diagramme und das Erlernen von Verstärkungen | 点点数, 硬点点点数图表, 和强化学习 2409.09032v2 |
Authors (8): Taylor Applebaum, Sam Blackwell, Alex Davies, Thomas Edlich, András Juhász, Marc Lackenby, Nenad Tomašev, Daniel Zheng
We have developed a reinforcement learning agent that often finds a minimal sequence of unknotting crossing changes for a knot diagram with up to 200 crossings, hence giving an upper bound on the unknotting number. We have used this to determine the unknotting number of 57k knots. We took diagrams of connected sums of such knots with oppositely signed signatures, where the summands were overlaid. The agent has found examples where several of the crossing changes in an unknotting collection of crossings result in hyperbolic knots. Based on this, we have shown that, given knots $K$ and $K’$ that satisfy some mild assumptions, there is a diagram of their connected sum and $u(K) + u(K’)$ unknotting crossings such that changing any one of them results in a prime knot. As a by-product, we have obtained a dataset of 2.6 million distinct hard unknot diagrams; most of them under 35 crossings. Assuming the additivity of the unknotting number, we have determined the unknotting number of 43 at most 12-crossing knots for which the unknotting number is unknown.
nan
Article 847
Title@2025-07-19 (6): LeanTree: Accelerating White-Box Proof Search with Factorized States in Lean 4
Title: LeanTree: Accelerating White-Box Proof Search with Factorized States in Lean 4 | LeanTree: Beschleunigen der White-Box-Proof-Suche mit faktorisierten Zuständen in Lean 4 | 利安特里:在利安4区与加工业国家加速白纸体校对搜索 2507.14722v1 |
Authors (3): Matěj Kripner, Michal Šustr, Milan Straka
Automated theorem proving (ATP) has been a classical problem in artificial intelligence since its inception, yet it remains challenging due to its vast state and action space. Large language models (LLMs) have recently emerged as a promising heuristic for ATP, but they lack correctness guarantees and thus require interaction with a proof verifier. Such interactions typically follow one of two approaches: black-box interaction, which does not utilize intermediate proof states, or white-box approaches, which allow for incremental proof construction and examination of intermediate states. While black-box approaches have directly benefited from recent LLM advances, white-box methods have comparatively lagged behind. In this paper, we address this gap by introducing LeanTree, which consists of (i) a tool built in the Lean 4 language that factorizes complex proof states into simpler, independent branches, and (ii) a dataset of these factorized intermediate states. Our white-box tooling offers several advantages over black-box approaches: it simplifies evaluation, reduces necessary context, generates richer training data, enables parallel search across multiple states, supports efficient reuse of states, and provides feedback in case of errors. Our preliminary results hint that white-box approaches outperform black-box alternatives in some settings.
nan
Article 848
Title@2025-07-19 (6): Exploring the Dynamic Scheduling Space of Real-Time Generative AI Applications on Emerging Heterogeneous Systems
Title: Exploring the Dynamic Scheduling Space of Real-Time Generative AI Applications on Emerging Heterogeneous Systems | Erforschung des dynamischen Planungsraums von Echtzeit-Generativen KI-Anwendungen auf entstehenden Heterogenen Systemen | 探索新兴异变体系统实时产生AI应用的动态日程安排空间 2507.14715v1 |
Authors (4): Rachid Karami, Rajeev Patwari, Hyoukjun Kwon, Ashish Sirasao
The integration of generative AI models, particularly large language models (LLMs), into real-time multi-model AI applications such as video conferencing and gaming is giving rise to a new class of workloads: real-time generative AI (RTGen). These workloads combine the compute intensity and dynamic execution patterns of generative models with the stringent latency and concurrency constraints of real-time inference. To meet the diverse demands of RTGen workloads, modern edge platforms increasingly adopt heterogeneous system-on-chip (SoC) architectures that integrate CPUs, GPUs, and NPUs. Despite the potential of heterogeneous SoC, the scheduling space complexity and performance implications of RTGen workloads on such platforms remain underexplored. In this work, we perform a comprehensive characterization of RTGen workloads on AMD’s latest heterogeneous SoC, Ryzen AI. We construct realistic multi-model scenarios inspired by industry use cases and profile model performance across all available backends. Using this data, we evaluate five scheduling policies and their impact on both real-time metrics (e.g., deadline violation rate) and LLM performance (e.g., time-to-first-token and tokens-per-second). Our results show that scheduling decisions significantly affect workload performance (e.g., leading to a 41.7% difference in deadline violation rates on average), and highlight the need for scheduling strategies that are aware of workload dynamics and hardware heterogeneity. Our findings underscore the importance of workload-aware, dynamic heterogeneous scheduling in enabling high-performance, on-device RTGen applications.
nan
Article 849
Title@2025-07-19 (6): Sortformer: A Novel Approach for Permutation-Resolved Speaker Supervision in Speech-to-Text Systems
Title: Sortformer: A Novel Approach for Permutation-Resolved Speaker Supervision in Speech-to-Text Systems | Sorformer: Ein neuartiger Ansatz für Permutations-Resolved Speaker Supervision in Speech-to-Text Systemen | 排序前:语音到文字系统变换解决的议长监督新办法 2409.06656v3 |
Authors (9): Taejin Park, Ivan Medennikov, Kunal Dhawan, Weiqing Wang, He Huang, Nithin Rao Koluguri, Krishna C. Puvvada, Jagadeesh Balam, Boris Ginsburg
Sortformer is an encoder-based speaker diarization model designed for supervising speaker tagging in speech-to-text models. Instead of relying solely on permutation invariant loss (PIL), Sortformer introduces Sort Loss to resolve the permutation problem, either independently or in tandem with PIL. In addition, we propose a streamlined multi-speaker speech-to-text architecture that leverages Sortformer for speaker supervision, embedding speaker labels into the encoder using sinusoidal kernel functions. This design addresses the speaker permutation problem through sorted objectives, effectively bridging timestamps and tokens to supervise speaker labels in the output transcriptions. Experiments demonstrate that Sort Loss can boost speaker diarization performance, and incorporating the speaker supervision from Sortformer improves multi-speaker transcription accuracy. We anticipate that the proposed Sortformer and multi-speaker architecture will enable the seamless integration of speaker tagging capabilities into foundational speech-to-text systems and multimodal large language models (LLMs), offering an easily adoptable and user-friendly mechanism to enhance their versatility and performance in speaker-aware tasks. The code and trained models are made publicly available through the NVIDIA NeMo Framework.
nan
Article 850
Title@2025-07-19 (6): Fraud is Not Just Rarity: A Causal Prototype Attention Approach to Realistic Synthetic Oversampling
Title: Fraud is Not Just Rarity: A Causal Prototype Attention Approach to Realistic Synthetic Oversampling | Betrug ist nicht nur Seltenheit: Ein kausaler Prototyp Aufmerksamkeit Ansatz zur realistischen synthetischen Oversampling | 欺诈不仅仅是报复:对现实的合成合成过度抽样采取因果原型关注方法 2507.14706v1 |
Authors (4): Claudio Giusti, Luca Guarnera, Mirko Casu, Sebastiano Battiato
Detecting fraudulent credit card transactions remains a significant challenge, due to the extreme class imbalance in real-world data and the often subtle patterns that separate fraud from legitimate activity. Existing research commonly attempts to address this by generating synthetic samples for the minority class using approaches such as GANs, VAEs, or hybrid generative models. However, these techniques, particularly when applied only to minority-class data, tend to result in overconfident classifiers and poor latent cluster separation, ultimately limiting real-world detection performance. In this study, we propose the Causal Prototype Attention Classifier (CPAC), an interpretable architecture that promotes class-aware clustering and improved latent space structure through prototype-based attention mechanisms and we will couple it with the encoder in a VAE-GAN allowing it to offer a better cluster separation moving beyond post-hoc sample augmentation. We compared CPAC-augmented models to traditional oversamplers, such as SMOTE, as well as to state-of-the-art generative models, both with and without CPAC-based latent classifiers. Our results show that classifier-guided latent shaping with CPAC delivers superior performance, achieving an F1-score of 93.14\% percent and recall of 90.18\%, along with improved latent cluster separation. Further ablation studies and visualizations provide deeper insight into the benefits and limitations of classifier-driven representation learning for fraud detection. The codebase for this work will be available at final submission.
nan
Article 851
Title@2025-07-19 (6): APIGen-MT: Agentic Pipeline for Multi-Turn Data Generation via Simulated Agent-Human Interplay
Title: APIGen-MT: Agentic Pipeline for Multi-Turn Data Generation via Simulated Agent-Human Interplay | APIGen-MT: Agentische Pipeline für die Multi-Turn-Datengenerierung über simuliertes Agent-Human-Interplay | PAPIGen-MT: 通过模拟代理人间相互作用生成多发数据时的代理管道 2504.03601v4 |
Authors (15): Akshara Prabhakar, Zuxin Liu, Ming Zhu, Jianguo Zhang, Tulika Awalgaonkar, Shiyu Wang, Zhiwei Liu, Haolin Chen, Thai Hoang, Juan Carlos Niebles, Shelby Heinecke, Weiran Yao, Huan Wang, Silvio Savarese, Caiming Xiong
Training effective AI agents for multi-turn interactions requires high-quality data that captures realistic human-agent dynamics, yet such data is scarce and expensive to collect manually. We introduce APIGen-MT, a two-phase framework that generates verifiable and diverse multi-turn agent data. In the first phase, our agentic pipeline produces detailed task blueprints with ground-truth actions, leveraging a committee of LLM reviewers and iterative feedback loops. These blueprints are then transformed into complete interaction trajectories through simulated human-agent interplay. We train a family of models – the xLAM-2-fc-r series with sizes ranging from 1B to 70B parameters. Our models outperform frontier models such as GPT-4o and Claude 3.5 on $\tau$-bench and BFCL benchmarks, with the smaller models surpassing their larger counterparts, particularly in multi-turn settings, while maintaining superior consistency across multiple trials. Comprehensive experiments demonstrate that our verified blueprint-to-details approach yields high-quality training data, enabling the development of more reliable, efficient, and capable agents. We open-source 5K synthetic data trajectories and the trained xLAM-2-fc-r models to advance research in AI agents. Models at https://huggingface.co/collections/Salesforce/xlam-2-67ef5be12949d8dcdae354c4; Dataset at https://huggingface.co/datasets/Salesforce/APIGen-MT-5k and Website at https://apigen-mt.github.io
nan
Article 852
Title@2025-07-19 (6): Towards the Next Frontier in Speech Representation Learning Using Disentanglement
Title: Towards the Next Frontier in Speech Representation Learning Using Disentanglement | Auf dem Weg zur nächsten Front in der Sprachrepräsentanz Lernen mit Entflechtung | 走向使用分离手段进行演讲代表学习的下一个前沿 2407.02543v2 |
Authors (2): Varun Krishna, Sriram Ganapathy
The popular frameworks for self-supervised learning of speech representations have largely focused on frame-level masked prediction of speech regions. While this has shown promising downstream task performance for speech recognition and related tasks, this has largely ignored factors of speech that are encoded at coarser level, like characteristics of the speaker or channel that remain consistent through-out a speech utterance. In this work, we propose a framework for Learning Disentangled Self Supervised (termed as Learn2Diss) representations of speech, which consists of frame-level and an utterance-level encoder modules. The two encoders are initially learned independently, where the frame-level model is largely inspired by existing self supervision techniques, thereby learning pseudo-phonemic representations, while the utterance-level encoder is inspired by constrastive learning of pooled embeddings, thereby learning pseudo-speaker representations. The joint learning of these two modules consists of disentangling the two encoders using a mutual information based criterion. With several downstream evaluation experiments, we show that the proposed Learn2Diss achieves state-of-the-art results on a variety of tasks, with the frame-level encoder representations improving semantic tasks, while the utterance-level representations improve non-semantic tasks.
nan
Article 853
Title@2025-07-19 (6): Spatial-Temporal Transformer with Curriculum Learning for EEG-Based Emotion Recognition
Title: Spatial-Temporal Transformer with Curriculum Learning for EEG-Based Emotion Recognition | Raum-Temporal Transformer mit Curriculum-Lernen für EEG-basierte Emotionserkennung | 具有基于EEG的情感识别课程学习的空间时空变换器 2507.14698v1 |
Authors (5): Xuetao Lin, Tianhao Peng, Peihong Dai, Yu Liang, Wenjun Wu
EEG-based emotion recognition plays an important role in developing adaptive brain-computer communication systems, yet faces two fundamental challenges in practical implementations: (1) effective integration of non-stationary spatial-temporal neural patterns, (2) robust adaptation to dynamic emotional intensity variations in real-world scenarios. This paper proposes SST-CL, a novel framework integrating spatial-temporal transformers with curriculum learning. Our method introduces two core components: a spatial encoder that models inter-channel relationships and a temporal encoder that captures multi-scale dependencies through windowed attention mechanisms, enabling simultaneous extraction of spatial correlations and temporal dynamics from EEG signals. Complementing this architecture, an intensity-aware curriculum learning strategy progressively guides training from high-intensity to low-intensity emotional states through dynamic sample scheduling based on a dual difficulty assessment. Comprehensive experiments on three benchmark datasets demonstrate state-of-the-art performance across various emotional intensity levels, with ablation studies confirming the necessity of both architectural components and the curriculum learning mechanism.
nan
Article 854
Title@2025-07-19 (6): Forecasting Faculty Placement from Patterns in Co-authorship Networks
Title: Forecasting Faculty Placement from Patterns in Co-authorship Networks | Forecasting Fakultät Platzierung aus Mustern in Co-Autorship Networks | 共同领导网络中基于模式的学院定位预测 2507.14696v1 |
Authors (3): Samantha Dies, David Liu, Tina Eliassi-Rad
Faculty hiring shapes the flow of ideas, resources, and opportunities in academia, influencing not only individual career trajectories but also broader patterns of institutional prestige and scientific progress. While traditional studies have found strong correlations between faculty hiring and attributes such as doctoral department prestige and publication record, they rarely assess whether these associations generalize to individual hiring outcomes, particularly for future candidates outside the original sample. Here, we consider faculty placement as an individual-level prediction task. Our data consist of temporal co-authorship networks with conventional attributes such as doctoral department prestige and bibliometric features. We observe that using the co-authorship network significantly improves predictive accuracy by up to 10% over traditional indicators alone, with the largest gains observed for placements at the most elite (top-10) departments. Our results underscore the role that social networks, professional endorsements, and implicit advocacy play in faculty hiring beyond traditional measures of scholarly productivity and institutional prestige. By introducing a predictive framing of faculty placement and establishing the benefit of considering co-authorship networks, this work provides a new lens for understanding structural biases in academia that could inform targeted interventions aimed at increasing transparency, fairness, and equity in academic hiring practices.
nan
Article 855
Title@2025-07-19 (6): Caching Techniques for Reducing the Communication Cost of Federated Learning in IoT Environments
Title: Caching Techniques for Reducing the Communication Cost of Federated Learning in IoT Environments | Caching-Techniken zur Reduzierung der Kommunikationskosten von Federated Learning in IoT-Umgebungen | 降低在IoT环境中联邦学习的传播成本的缓冲技术 2507.17772v1 |
Authors (2): Ahmad Alhonainy, Praveen Rao
Federated Learning (FL) allows multiple distributed devices to jointly train a shared model without centralizing data, but communication cost remains a major bottleneck, especially in resource-constrained environments. This paper introduces caching strategies - FIFO, LRU, and Priority-Based - to reduce unnecessary model update transmissions. By selectively forwarding significant updates, our approach lowers bandwidth usage while maintaining model accuracy. Experiments on CIFAR-10 and medical datasets show reduced communication with minimal accuracy loss. Results confirm that intelligent caching improves scalability, memory efficiency, and supports reliable FL in edge IoT networks, making it practical for deployment in smart cities, healthcare, and other latency-sensitive applications.
nan
Article 856
Title@2025-07-19 (6): Rethinking Suicidal Ideation Detection: A Trustworthy Annotation Framework and Cross-Lingual Model Evaluation
Title: Rethinking Suicidal Ideation Detection: A Trustworthy Annotation Framework and Cross-Lingual Model Evaluation | Umdenken bei der Erkennung von Selbstmordgedanken: Ein vertrauensvolles Annotations-Framework und Cross-Lingual Model Evaluation | 重新思考潮ideideididation 探测:可信赖的注解框架和跨语言模式评价 2507.14693v1 |
Authors (3): Amina Dzafic, Merve Kavut, Ulya Bayram
Suicidal ideation detection is critical for real-time suicide prevention, yet its progress faces two under-explored challenges: limited language coverage and unreliable annotation practices. Most available datasets are in English, but even among these, high-quality, human-annotated data remains scarce. As a result, many studies rely on available pre-labeled datasets without examining their annotation process or label reliability. The lack of datasets in other languages further limits the global realization of suicide prevention via artificial intelligence (AI). In this study, we address one of these gaps by constructing a novel Turkish suicidal ideation corpus derived from social media posts and introducing a resource-efficient annotation framework involving three human annotators and two large language models (LLMs). We then address the remaining gaps by performing a bidirectional evaluation of label reliability and model consistency across this dataset and three popular English suicidal ideation detection datasets, using transfer learning through eight pre-trained sentiment and emotion classifiers. These transformers help assess annotation consistency and benchmark model performance against manually labeled data. Our findings underscore the need for more rigorous, language-inclusive approaches to annotation and evaluation in mental health natural language processing (NLP) while demonstrating the questionable performance of popular models with zero-shot transfer learning. We advocate for transparency in model training and dataset construction in mental health NLP, prioritizing data and model reliability.
nan
Article 857
Title@2025-07-19 (6): Mind the Gap: A Review of Arabic Post-Training Datasets and Their Limitations
Title: Mind the Gap: A Review of Arabic Post-Training Datasets and Their Limitations | Mind the Gap: Eine Überprüfung der arabischen Post-Training-Datensätze und deren Einschränkungen | 《思想差距:对阿拉伯培训后数据集及其局限性的审查》 2507.14688v1 |
Authors (8): Mohammed Alkhowaiter, Norah Alshahrani, Saied Alshahrani, Reem I. Masoud, Alaa Alzahrani, Deema Alnuhait, Emad A. Alghamdi, Khalid Almubarak
Post-training has emerged as a crucial technique for aligning pre-trained Large Language Models (LLMs) with human instructions, significantly enhancing their performance across a wide range of tasks. Central to this process is the quality and diversity of post-training datasets. This paper presents a review of publicly available Arabic post-training datasets on the Hugging Face Hub, organized along four key dimensions: (1) LLM Capabilities (e.g., Question Answering, Translation, Reasoning, Summarization, Dialogue, Code Generation, and Function Calling); (2) Steerability (e.g., persona and system prompts); (3) Alignment (e.g., cultural, safety, ethics, and fairness), and (4) Robustness. Each dataset is rigorously evaluated based on popularity, practical adoption, recency and maintenance, documentation and annotation quality, licensing transparency, and scientific contribution. Our review revealed critical gaps in the development of Arabic post-training datasets, including limited task diversity, inconsistent or missing documentation and annotation, and low adoption across the community. Finally, the paper discusses the implications of these gaps on the progress of Arabic LLMs and applications while providing concrete recommendations for future efforts in post-training dataset development.
nan
Article 858
Title@2025-07-19 (6): Revisiting Graph Contrastive Learning on Anomaly Detection: A Structural Imbalance Perspective
Title: Revisiting Graph Contrastive Learning on Anomaly Detection: A Structural Imbalance Perspective | Überblick auf Graph Kontrastives Lernen über Anomalienerkennung: Eine strukturelle Ungleichgewichtsperspektive | 重新审视异常探测方面的对比图表学习:结构不平衡的视角 2507.14677v1 |
Authors (7): Yiming Xu, Zhen Peng, Bin Shi, Xu Hua, Bo Dong, Song Wang, Chen Chen
The superiority of graph contrastive learning (GCL) has prompted its application to anomaly detection tasks for more powerful risk warning systems. Unfortunately, existing GCL-based models tend to excessively prioritize overall detection performance while neglecting robustness to structural imbalance, which can be problematic for many real-world networks following power-law degree distributions. Particularly, GCL-based methods may fail to capture tail anomalies (abnormal nodes with low degrees). This raises concerns about the security and robustness of current anomaly detection algorithms and therefore hinders their applicability in a variety of realistic high-risk scenarios. To the best of our knowledge, research on the robustness of graph anomaly detection to structural imbalance has received little scrutiny. To address the above issues, this paper presents a novel GCL-based framework named AD-GCL. It devises the neighbor pruning strategy to filter noisy edges for head nodes and facilitate the detection of genuine tail nodes by aligning from head nodes to forged tail nodes. Moreover, AD-GCL actively explores potential neighbors to enlarge the receptive field of tail nodes through anomaly-guided neighbor completion. We further introduce intra- and inter-view consistency loss of the original and augmentation graph for enhanced representation. The performance evaluation of the whole, head, and tail nodes on multiple datasets validates the comprehensive superiority of the proposed AD-GCL in detecting both head anomalies and tail anomalies.
nan
Article 859
Title@2025-07-19 (6): Rec-AD: An Efficient Computation Framework for FDIA Detection Based on Tensor Train Decomposition and Deep Learning Recommendation Model
Title: Rec-AD: An Efficient Computation Framework for FDIA Detection Based on Tensor Train Decomposition and Deep Learning Recommendation Model | Rec-AD: Ein effizienter Berechnungsrahmen für die FDA-Erkennung auf der Grundlage von Tensor Train Decomposition und Deep Learning Empfehlungsmodell | Res-AD:基于Tensor 列车分解和深学习建议模型的FDIA探测有效计算框架 2507.14668v1 |
Authors (5): Yunfeng Li, Junhong Liu, Zhaohui Yang, Guofu Liao, Chuyun Zhang
Deep learning models have been widely adopted for False Data Injection Attack (FDIA) detection in smart grids due to their ability to capture unstructured and sparse features. However, the increasing system scale and data dimensionality introduce significant computational and memory burdens, particularly in large-scale industrial datasets, limiting detection efficiency. To address these issues, this paper proposes Rec-AD, a computationally efficient framework that integrates Tensor Train decomposition with the Deep Learning Recommendation Model (DLRM). Rec-AD enhances training and inference efficiency through embedding compression, optimized data access via index reordering, and a pipeline training mechanism that reduces memory communication overhead. Fully compatible with PyTorch, Rec-AD can be integrated into existing FDIA detection systems without code modifications. Experimental results show that Rec-AD significantly improves computational throughput and real-time detection performance, narrowing the attack window and increasing attacker cost. These advancements strengthen edge computing capabilities and scalability, providing robust technical support for smart grid security.
nan
Article 860
Title@2025-07-19 (6): Which Experiences Are Influential for RL Agents? Efficiently Estimating The Influence of Experiences
Title: Which Experiences Are Influential for RL Agents? Efficiently Estimating The Influence of Experiences | Welche Erfahrungen sind einflussreich für RL-Agenten? Effiziente Einschätzung des Einflusses von Erfahrungen | RL代理机构有哪些经验可资借鉴? 有效估计经验的影响 2405.14629v3 |
Authors (4): Takuya Hiraoka, Guanquan Wang, Takashi Onishi, Yoshimasa Tsuruoka
In reinforcement learning (RL) with experience replay, experiences stored in a replay buffer influence the RL agent’s performance. Information about how these experiences influence the agent’s performance is valuable for various purposes, such as identifying experiences that negatively influence underperforming agents. One method for estimating the influence of experiences is the leave-one-out (LOO) method. However, this method is usually computationally prohibitive. In this paper, we present Policy Iteration with Turn-over Dropout (PIToD), which efficiently estimates the influence of experiences. We evaluate how correctly PIToD estimates the influence of experiences and its efficiency compared to LOO. We then apply PIToD to amend underperforming RL agents, i.e., we use PIToD to estimate negatively influential experiences for the RL agents and to delete the influence of these experiences. We show that RL agents’ performance is significantly improved via amendments with PIToD.
nan
Article 861
Title@2025-07-19 (6): When few labeled target data suffice: a theory of semi-supervised domain adaptation via fine-tuning from multiple adaptive starts
Title: When few labeled target data suffice: a theory of semi-supervised domain adaptation via fine-tuning from multiple adaptive starts | Wenn nur wenige beschriftete Zieldaten ausreichen: eine Theorie der semi-überwachten Domänenanpassung durch Feinabstimmung von mehreren adaptiven Starts | 当贴有标签的目标数据数量很少时,只要有以下标记的目标数据就足够:从多重适应开始进行微调,通过半监督的域域适应理论 2507.14661v1 |
Authors (2): Wooseok Ha, Yuansi Chen
Semi-supervised domain adaptation (SSDA) aims to achieve high predictive performance in the target domain with limited labeled target data by exploiting abundant source and unlabeled target data. Despite its significance in numerous applications, theory on the effectiveness of SSDA remains largely unexplored, particularly in scenarios involving various types of source-target distributional shifts. In this work, we develop a theoretical framework based on structural causal models (SCMs) which allows us to analyze and quantify the performance of SSDA methods when labeled target data is limited. Within this framework, we introduce three SSDA methods, each having a fine-tuning strategy tailored to a distinct assumption about the source and target relationship. Under each assumption, we demonstrate how extending an unsupervised domain adaptation (UDA) method to SSDA can achieve minimax-optimal target performance with limited target labels. When the relationship between source and target data is only vaguely known – a common practical concern – we propose the Multi Adaptive-Start Fine-Tuning (MASFT) algorithm, which fine-tunes UDA models from multiple starting points and selects the best-performing one based on a small hold-out target validation dataset. Combined with model selection guarantees, MASFT achieves near-optimal target predictive performance across a broad range of types of distributional shifts while significantly reducing the need for labeled target data. We empirically validate the effectiveness of our proposed methods through simulations.
nan
Article 862
Title@2025-07-19 (6): Learning to Communicate in Multi-Agent Reinforcement Learning for Autonomous Cyber Defence
Title: Learning to Communicate in Multi-Agent Reinforcement Learning for Autonomous Cyber Defence | Lernen zur Kommunikation im Mehr-Agenten-Verstärkungs-Lernen für die autonome Cyber-Verteidigung | 学习多机构强化学习,以交流多机构强化学习,促进自动网络防御 2507.14658v1 |
Authors (3): Faizan Contractor, Li Li, Ranwa Al Mallah
Popular methods in cooperative Multi-Agent Reinforcement Learning with partially observable environments typically allow agents to act independently during execution, which may limit the coordinated effect of the trained policies. However, by sharing information such as known or suspected ongoing threats, effective communication can lead to improved decision-making in the cyber battle space. We propose a game design where defender agents learn to communicate and defend against imminent cyber threats by playing training games in the Cyber Operations Research Gym, using the Differentiable Inter Agent Learning algorithm adapted to the cyber operational environment. The tactical policies learned by these autonomous agents are akin to those of human experts during incident responses to avert cyber threats. In addition, the agents simultaneously learn minimal cost communication messages while learning their defence tactical policies.
nan
Article 863
Title@2025-07-19 (6): State-observation augmented diffusion model for nonlinear assimilation with unknown dynamics
Title: State-observation augmented diffusion model for nonlinear assimilation with unknown dynamics | State-observation Augmented Diffusion Modell für nichtlineare Assimilation mit unbekannter Dynamik | 国家观测扩大非线性同同化扩散模型,具有未知动态 2407.21314v3 |
Authors (3): Zhuoyuan Li, Bin Dong, Pingwen Zhang
Data assimilation has become a key technique for combining physical models with observational data to estimate state variables. However, classical assimilation algorithms often struggle with the high nonlinearity present in both physical and observational models. To address this challenge, a novel generative model, termed the State-Observation Augmented Diffusion (SOAD) model is proposed for data-driven assimilation. The marginal posterior associated with SOAD has been derived and then proved to match the true posterior distribution under mild assumptions, suggesting its theoretical advantages over previous score-based approaches. Experimental results also indicate that SOAD may offer improved performance compared to existing data-driven methods.
nan
Article 864
Title@2025-07-19 (6): Accelerating Hamiltonian Monte Carlo for Bayesian Inference in Neural Networks and Neural Operators
Title: Accelerating Hamiltonian Monte Carlo for Bayesian Inference in Neural Networks and Neural Operators | Beschleunigen Hamiltonian Monte Carlo für Bayesian Inferenz in neuralen Netzwerken und neuralen Betreibern | 加速汉密尔顿·蒙特卡洛的神经网络和神经操作员中的贝耶斯推理速度 2507.14652v1 |
Authors (3): Ponkrshnan Thiagarajan, Tamer A. Zaki, Michael D. Shields
Hamiltonian Monte Carlo (HMC) is a powerful and accurate method to sample from the posterior distribution in Bayesian inference. However, HMC techniques are computationally demanding for Bayesian neural networks due to the high dimensionality of the network’s parameter space and the non-convexity of their posterior distributions. Therefore, various approximation techniques, such as variational inference (VI) or stochastic gradient MCMC, are often employed to infer the posterior distribution of the network parameters. Such approximations introduce inaccuracies in the inferred distributions, resulting in unreliable uncertainty estimates. In this work, we propose a hybrid approach that combines inexpensive VI and accurate HMC methods to efficiently and accurately quantify uncertainties in neural networks and neural operators. The proposed approach leverages an initial VI training on the full network. We examine the influence of individual parameters on the prediction uncertainty, which shows that a large proportion of the parameters do not contribute substantially to uncertainty in the network predictions. This information is then used to significantly reduce the dimension of the parameter space, and HMC is performed only for the subset of network parameters that strongly influence prediction uncertainties. This yields a framework for accelerating the full batch HMC for posterior inference in neural networks. We demonstrate the efficiency and accuracy of the proposed framework on deep neural networks and operator networks, showing that inference can be performed for large networks with tens to hundreds of thousands of parameters. We show that this method can effectively learn surrogates for complex physical systems by modeling the operator that maps from upstream conditions to wall-pressure data on a cone in hypersonic flow.
nan
Article 865
Title@2025-07-19 (6): Deep Learning-Based Survival Analysis with Copula-Based Activation Functions for Multivariate Response Prediction
Title: Deep Learning-Based Survival Analysis with Copula-Based Activation Functions for Multivariate Response Prediction | Deep Learning-Based Survival Analysis mit Copula-basierten Aktivierungsfunktionen für Multivariate Response Prediction | 具有多变量反应预测以科普拉为基础的启动功能的深学习生存分析 2507.14641v1 |
Authors (3): Jong-Min Kim, Il Do Ha, Sangjin Kim
This research integrates deep learning, copula functions, and survival analysis to effectively handle highly correlated and right-censored multivariate survival data. It introduces copula-based activation functions (Clayton, Gumbel, and their combinations) to model the nonlinear dependencies inherent in such data. Through simulation studies and analysis of real breast cancer data, our proposed CNN-LSTM with copula-based activation functions for multivariate multi-types of survival responses enhances prediction accuracy by explicitly addressing right-censored data and capturing complex patterns. The model’s performance is evaluated using Shewhart control charts, focusing on the average run length (ARL).
nan
Article 866
Title@2025-07-19 (6): KinForm: Kinetics Informed Feature Optimised Representation Models for Enzyme $k_{cat}$ and $K_{M}$ Prediction
Title: KinForm: Kinetics Informed Feature Optimised Representation Models for Enzyme $k_{cat}$ and $K_{M}$ Prediction | KinForm: Kinetics Informiertes Feature Optimierte Darstellungsmodelle für Enzyme $k_{cat}$ und $K_{M}$ Vorhersage | 基质形式: Enzyme $kcat} 和 $KM} 预测值的动因、知情地物最佳代表模型 2507.14639v1 |
Authors (2): Saleh Alwer, Ronan Fleming
Kinetic parameters such as the turnover number ($k_{cat}$) and Michaelis constant ($K_{\mathrm{M}}$) are essential for modelling enzymatic activity but experimental data remains limited in scale and diversity. Previous methods for predicting enzyme kinetics typically use mean-pooled residue embeddings from a single protein language model to represent the protein. We present KinForm, a machine learning framework designed to improve predictive accuracy and generalisation for kinetic parameters by optimising protein feature representations. KinForm combines several residue-level embeddings (Evolutionary Scale Modeling Cambrian, Evolutionary Scale Modeling 2, and ProtT5-XL-UniRef50), taken from empirically selected intermediate transformer layers and applies weighted pooling based on per-residue binding-site probability. To counter the resulting high dimensionality, we apply dimensionality reduction using principal–component analysis (PCA) on concatenated protein features, and rebalance the training data via a similarity-based oversampling strategy. KinForm outperforms baseline methods on two benchmark datasets. Improvements are most pronounced in low sequence similarity bins. We observe improvements from binding-site probability pooling, intermediate-layer selection, PCA, and oversampling of low-identity proteins. We also find that removing sequence overlap between folds provides a more realistic evaluation of generalisation and should be the standard over random splitting when benchmarking kinetic prediction models.
nan
Article 867
Title@2025-07-19 (6): Agentic Satellite-Augmented Low-Altitude Economy and Terrestrial Networks: A Survey on Generative Approaches
Title: Agentic Satellite-Augmented Low-Altitude Economy and Terrestrial Networks: A Survey on Generative Approaches | Agentische Satelliten-Augmented Low-Altitude Economy and Terrestrial Networks: Eine Umfrage zu generativen Ansätzen | 高空低空经济和地面网络:关于创造方法的调查 2507.14633v1 |
Authors (12): Xiaozheng Gao, Yichen Wang, Bosen Liu, Xiao Zhou, Ruichen Zhang, Jiacheng Wang, Dusit Niyato, Dong In Kim, Abbas Jamalipour, Chau Yuen, Jianping An, Kai Yang
The development of satellite-augmented low-altitude economy and terrestrial networks (SLAETNs) demands intelligent and autonomous systems that can operate reliably across heterogeneous, dynamic, and mission-critical environments. To address these challenges, this survey focuses on enabling agentic artificial intelligence (AI), that is, artificial agents capable of perceiving, reasoning, and acting, through generative AI (GAI) and large language models (LLMs). We begin by introducing the architecture and characteristics of SLAETNs, and analyzing the challenges that arise in integrating satellite, aerial, and terrestrial components. Then, we present a model-driven foundation by systematically reviewing five major categories of generative models: variational autoencoders (VAEs), generative adversarial networks (GANs), generative diffusion models (GDMs), transformer-based models (TBMs), and LLMs. Moreover, we provide a comparative analysis to highlight their generative mechanisms, capabilities, and deployment trade-offs within SLAETNs. Building on this foundation, we examine how these models empower agentic functions across three domains: communication enhancement, security and privacy protection, and intelligent satellite tasks. Finally, we outline key future directions for building scalable, adaptive, and trustworthy generative agents in SLAETNs. This survey aims to provide a unified understanding and actionable reference for advancing agentic AI in next-generation integrated networks.
nan
Article 868
Title@2025-07-19 (6): $k$-PCA for (non-squared) Euclidean Distances: Polynomial Time Approximation
Title: $k$-PCA for (non-squared) Euclidean Distances: Polynomial Time Approximation | $k$-PCA für (nicht quadratisch) Euklidische Entfernungen: Polynomzeit-Annäherung | 用于(非平方)欧洲大陆距离:多边时间接近 2507.14631v1 |
Authors (2): Daniel Greenhut, Dan Feldman
Given an integer $k\geq1$ and a set $P$ of $n$ points in $\REAL^d$, the classic $k$-PCA (Principle Component Analysis) approximates the affine \emph{$k$-subspace mean} of $P$, which is the $k$-dimensional affine linear subspace that minimizes its sum of squared Euclidean distances ($\ell_{2,2}$-norm) over the points of $P$, i.e., the mean of these distances. The \emph{$k$-subspace median} is the subspace that minimizes its sum of (non-squared) Euclidean distances ($\ell_{2,1}$-mixed norm), i.e., their median. The median subspace is usually more sparse and robust to noise/outliers than the mean, but also much harder to approximate since, unlike the $\ell_{z,z}$ (non-mixed) norms, it is non-convex for $k<d-1$. We provide the first polynomial-time deterministic algorithm whose both running time and approximation factor are not exponential in $k$. More precisely, the multiplicative approximation factor is $\sqrt{d}$, and the running time is polynomial in the size of the input. We expect that our technique would be useful for many other related problems, such as $\ell_{2,z}$ norm of distances for $z\not \in \br{1,2}$, e.g., $z=\infty$, and handling outliers/sparsity. Open code and experimental results on real-world datasets are also provided.
nan
Article 869
Title@2025-07-19 (6): Knockout: A simple way to handle missing inputs
Title: Knockout: A simple way to handle missing inputs | Knockout: Ein einfacher Weg, um fehlende Eingänge zu handhaben | Knookout: 处理缺失输入的简单方法 2405.20448v3 |
Authors (6): Minh Nguyen, Batuhan K. Karaman, Heejong Kim, Alan Q. Wang, Fengbei Liu, Mert R. Sabuncu
Deep learning models benefit from rich (e.g., multi-modal) input features. However, multimodal models might be challenging to deploy, because some inputs may be missing at inference. Current popular solutions include marginalization, imputation, and training multiple models. Marginalization achieves calibrated predictions, but it is computationally expensive and only feasible for low dimensional inputs. Imputation may result in inaccurate predictions, particularly when high-dimensional data, such as images, are missing. Training multiple models, where each model is designed to handle different subsets of inputs, can work well but requires prior knowledge of missing input patterns. Furthermore, training and retaining multiple models can be costly. We propose an efficient method to learn both the conditional distribution using full inputs and the marginal distributions. Our method, Knockout, randomly replaces input features with appropriate placeholder values during training. We provide a theoretical justification for Knockout and show that it can be interpreted as an implicit marginalization strategy. We evaluate Knockout across a wide range of simulations and real-world datasets and show that it offers strong empirical performance.
nan
Article 870
Title@2025-07-19 (6): Hierarchical Reinforcement Learning for Temporal Abstraction of Listwise Recommendation
Title: Hierarchical Reinforcement Learning for Temporal Abstraction of Listwise Recommendation | Hierarchisches Verstärkungslernen für die zeitliche Abstraktion der Listwise-Empfehlung | 用于清单建议时间摘要汇编的等级强化学习 2409.07416v2 |
Authors (5): Luo Ji, Gao Liu, Mingyang Yin, Hongxia Yang, Jingren Zhou
Modern listwise recommendation systems need to consider both long-term user perceptions and short-term interest shifts. Reinforcement learning can be applied on recommendation to study such a problem but is also subject to large search space, sparse user feedback and long interactive latency. Motivated by recent progress in hierarchical reinforcement learning, we propose a novel framework called mccHRL to provide different levels of temporal abstraction on listwise recommendation. Within the hierarchical framework, the high-level agent studies the evolution of user perception, while the low-level agent produces the item selection policy by modeling the process as a sequential decision-making problem. We argue that such framework has a well-defined decomposition of the outra-session context and the intra-session context, which are encoded by the high-level and low-level agents, respectively. To verify this argument, we implement both a simulator-based environment and an industrial dataset-based experiment. Results observe significant performance improvement by our method, compared with several well-known baselines. Data and codes have been made public.
nan
Article 871
Title@2025-07-19 (6): Exp-Graph: How Connections Learn Facial Attributes in Graph-based Expression Recognition
Title: Exp-Graph: How Connections Learn Facial Attributes in Graph-based Expression Recognition | Exp-Graph: Wie Verbindungen Gesichtsattribute in Graph-basierter Expression erkennen lernen | Exp-Graph: 图形表达式识别中连接如何学习模糊属性 2507.14608v1 |
Authors (2): Nandani Sharma, Dinesh Singh
Facial expression recognition is crucial for human-computer interaction applications such as face animation, video surveillance, affective computing, medical analysis, etc. Since the structure of facial attributes varies with facial expressions, incorporating structural information into facial attributes is essential for facial expression recognition. In this paper, we propose Exp-Graph, a novel framework designed to represent the structural relationships among facial attributes using graph-based modeling for facial expression recognition. For facial attributes graph representation, facial landmarks are used as the graph’s vertices. At the same time, the edges are determined based on the proximity of the facial landmark and the similarity of the local appearance of the facial attributes encoded using the vision transformer. Additionally, graph convolutional networks are utilized to capture and integrate these structural dependencies into the encoding of facial attributes, thereby enhancing the accuracy of expression recognition. Thus, Exp-Graph learns from the facial attribute graphs highly expressive semantic representations. On the other hand, the vision transformer and graph convolutional blocks help the framework exploit the local and global dependencies among the facial attributes that are essential for the recognition of facial expressions. We conducted comprehensive evaluations of the proposed Exp-Graph model on three benchmark datasets: Oulu-CASIA, eNTERFACE05, and AFEW. The model achieved recognition accuracies of 98.09\%, 79.01\%, and 56.39\%, respectively. These results indicate that Exp-Graph maintains strong generalization capabilities across both controlled laboratory settings and real-world, unconstrained environments, underscoring its effectiveness for practical facial expression recognition applications.
nan
Article 872
Title@2025-07-19 (6): Understanding Matching Mechanisms in Cross-Encoders
Title: Understanding Matching Mechanisms in Cross-Encoders | Vergleichbare Mechanismen in Cross-Encodern verstehen | 跨企业的匹配机制 2507.14604v1 |
Authors (4): Mathias Vast, Basile Van Cooten, Laure Soulier, Benjamin Piwowarski
Neural IR architectures, particularly cross-encoders, are highly effective models whose internal mechanisms are mostly unknown. Most works trying to explain their behavior focused on high-level processes (e.g., what in the input influences the prediction, does the model adhere to known IR axioms) but fall short of describing the matching process. Instead of Mechanistic Interpretability approaches which specifically aim at explaining the hidden mechanisms of neural models, we demonstrate that more straightforward methods can already provide valuable insights. In this paper, we first focus on the attention process and extract causal insights highlighting the crucial roles of some attention heads in this process. Second, we provide an interpretation of the mechanism underlying matching detection.
nan
Article 873
Title@2025-07-19 (6): Towards a Proactive Autoscaling Framework for Data Stream Processing at the Edge using GRU and Transfer Learning
Title: Towards a Proactive Autoscaling Framework for Data Stream Processing at the Edge using GRU and Transfer Learning | Auf dem Weg zu einem proaktiven Autoscaling-Framework für die Datenstromverarbeitung am Rand mittels GRU und Transfer Learning | 争取在边缘使用GRU和转移学习实现数据流处理的主动自动调整框架 2507.14597v1 |
Authors (2): Eugene Armah, Linda Amoako Bannning
Processing data at high speeds is becoming increasingly critical as digital economies generate enormous data. The current paradigms for timely data processing are edge computing and data stream processing (DSP). Edge computing places resources closer to where data is generated, while stream processing analyzes the unbounded high-speed data in motion. However, edge stream processing faces rapid workload fluctuations, complicating resource provisioning. Inadequate resource allocation leads to bottlenecks, whereas excess allocation results in wastage. Existing reactive methods, such as threshold-based policies and queuing theory scale only after performance degrades, potentially violating SLAs. Although reinforcement learning (RL) offers a proactive approach through agents that learn optimal runtime adaptation policies, it requires extensive simulation. Furthermore, predictive machine learning models face online distribution and concept drift that minimize their accuracy. We propose a three-step solution to the proactive edge stream processing autoscaling problem. Firstly, a GRU neural network forecasts the upstream load using real-world and synthetic DSP datasets. Secondly, a transfer learning framework integrates the predictive model into an online stream processing system using the DTW algorithm and joint distribution adaptation to handle the disparities between offline and online domains. Finally, a horizontal autoscaling module dynamically adjusts the degree of operator parallelism, based on predicted load while considering edge resource constraints. The lightweight GRU model for load predictions recorded up to 1.3\% SMAPE value on a real-world data set. It outperformed CNN, ARIMA, and Prophet on the SMAPE and RMSE evaluation metrics, with lower training time than the computationally intensive RL models.
nan
Article 874
Title@2025-07-19 (6): PLADIS: Pushing the Limits of Attention in Diffusion Models at Inference Time by Leveraging Sparsity
Title: PLADIS: Pushing the Limits of Attention in Diffusion Models at Inference Time by Leveraging Sparsity | PLADIS: Die Grenzen der Aufmerksamkeit bei Diffusionsmodellen zur Folgezeit durch Sparsamkeit drücken | PLADIS:通过杠杆利用公平,在推论时间传播模型中提高注意的限度 2503.07677v3 |
Authors (2): Kwanyoung Kim, Byeongsu Sim
Diffusion models have shown impressive results in generating high-quality conditional samples using guidance techniques such as Classifier-Free Guidance (CFG). However, existing methods often require additional training or neural function evaluations (NFEs), making them incompatible with guidance-distilled models. Also, they rely on heuristic approaches that need identifying target layers. In this work, we propose a novel and efficient method, termed PLADIS, which boosts pre-trained models (U-Net/Transformer) by leveraging sparse attention. Specifically, we extrapolate query-key correlations using softmax and its sparse counterpart in the cross-attention layer during inference, without requiring extra training or NFEs. By leveraging the noise robustness of sparse attention, our PLADIS unleashes the latent potential of text-to-image diffusion models, enabling them to excel in areas where they once struggled with newfound effectiveness. It integrates seamlessly with guidance techniques, including guidance-distilled models. Extensive experiments show notable improvements in text alignment and human preference, offering a highly efficient and universally applicable solution. See Our project page : https://cubeyoung.github.io/pladis-proejct/
nan
Article 875
Title@2025-07-19 (6): Coordinate Heart System: A Geometric Framework for Emotion Representation
Title: Coordinate Heart System: A Geometric Framework for Emotion Representation | Koordinaten-Herzsystem: Ein geometrisches Rahmenwerk für die Emotionsdarstellung | 协调心脏系统:情感代表的几何框架 2507.14593v1 |
Authors (1): Omar Al-Desi
This paper presents the Coordinate Heart System (CHS), a geometric framework for emotion representation in artificial intelligence applications. We position eight core emotions as coordinates on a unit circle, enabling mathematical computation of complex emotional states through coordinate mixing and vector operations. Our initial five-emotion model revealed significant coverage gaps in the emotion space, leading to the development of an eight-emotion system that provides complete geometric coverage with mathematical guarantees. The framework converts natural language input to emotion coordinates and supports real-time emotion interpolation through computational algorithms. The system introduces a re-calibrated stability parameter S in [0,1], which dynamically integrates emotional load, conflict resolution, and contextual drain factors. This stability model leverages advanced Large Language Model interpretation of textual cues and incorporates hybrid temporal tracking mechanisms to provide nuanced assessment of psychological well-being states. Our key contributions include: (i) mathematical proof demonstrating why five emotions are insufficient for complete geometric coverage, (ii) an eight-coordinate system that eliminates representational blind spots, (iii) novel algorithms for emotion mixing, conflict resolution, and distance calculation in emotion space, and (iv) a comprehensive computational framework for AI emotion recognition with enhanced multi-dimensional stability modeling. Experimental validation through case studies demonstrates the system’s capability to handle emotionally conflicted states, contextual distress factors, and complex psychological scenarios that traditional categorical emotion models cannot adequately represent. This work establishes a new mathematical foundation for emotion modeling in artificial intelligence systems.
nan
Article 876
Title@2025-07-19 (6): A Transformer-Based Conditional GAN with Multiple Instance Learning for UAV Signal Detection and Classification
Title: A Transformer-Based Conditional GAN with Multiple Instance Learning for UAV Signal Detection and Classification | Ein transformerbasierter Bedingter GAN mit Multiple Instance-Lernen für UAV-Signalerkennung und -Klassifizierung | 以变换器为基础的条件性GAN,具有用于无人驾驶飞行器信号探测和分类的多实例学习 2507.14592v1 |
Authors (5): Haochen Liu, Jia Bi, Xiaomin Wang, Xin Yang, Ling Wang
Unmanned Aerial Vehicles (UAVs) are increasingly used in surveillance, logistics, agriculture, disaster management, and military operations. Accurate detection and classification of UAV flight states, such as hovering, cruising, ascending, or transitioning, which are essential for safe and effective operations. However, conventional time series classification (TSC) methods often lack robustness and generalization for dynamic UAV environments, while state of the art(SOTA) models like Transformers and LSTM based architectures typically require large datasets and entail high computational costs, especially with high-dimensional data streams. This paper proposes a novel framework that integrates a Transformer-based Generative Adversarial Network (GAN) with Multiple Instance Locally Explainable Learning (MILET) to address these challenges in UAV flight state classification. The Transformer encoder captures long-range temporal dependencies and complex telemetry dynamics, while the GAN module augments limited datasets with realistic synthetic samples. MIL is incorporated to focus attention on the most discriminative input segments, reducing noise and computational overhead. Experimental results show that the proposed method achieves superior accuracy 96.5% on the DroneDetect dataset and 98.6% on the DroneRF dataset that outperforming other SOTA approaches. The framework also demonstrates strong computational efficiency and robust generalization across diverse UAV platforms and flight states, highlighting its potential for real-time deployment in resource constrained environments.
nan
Article 877
Title@2025-07-19 (6): AlgoTune: Can Language Models Speed Up General-Purpose Numerical Programs?
Title: AlgoTune: Can Language Models Speed Up General-Purpose Numerical Programs? | AlgoTune: Können Sprachmodelle allgemeine numerische Programme beschleunigen? | AlgoTune: 语言模型能加速通用计算程序吗? 2507.15887v1 |
Authors (24): Ori Press, Brandon Amos, Haoyu Zhao, Yikai Wu, Samuel K. Ainsworth, Dominik Krupke, Patrick Kidger, Touqir Sajed, Bartolomeo Stellato, Jisun Park, Nathanael Bosch, Eli Meril, Albert Steppi, Arman Zharmagambetov, Fangzhao Zhang, David Perez-Pineiro, Alberto Mercurio, Ni Zhan, Talor Abramovich, Kilian Lieret, Hanlin Zhang, Shirley Huang, Matthias Bethge, Ofir Press
Despite progress in language model (LM) capabilities, evaluations have thus far focused on models’ performance on tasks that humans have previously solved, including in programming (Jimenez et al., 2024) and mathematics (Glazer et al., 2024). We therefore propose testing models’ ability to design and implement algorithms in an open-ended benchmark: We task LMs with writing code that efficiently solves computationally challenging problems in computer science, physics, and mathematics. Our AlgoTune benchmark consists of 155 coding tasks collected from domain experts and a framework for validating and timing LM-synthesized solution code, which is compared to reference implementations from popular open-source packages. In addition, we develop a baseline LM agent, AlgoTuner, and evaluate its performance across a suite of frontier models. AlgoTuner achieves an average 1.72x speedup against our reference solvers, which use libraries such as SciPy, sk-learn and CVXPY. However, we find that current models fail to discover algorithmic innovations, instead preferring surface-level optimizations. We hope that AlgoTune catalyzes the development of LM agents exhibiting creative problem solving beyond state-of-the-art human performance.
nan
Article 878
Title@2025-07-19 (6): LPS-GNN : Deploying Graph Neural Networks on Graphs with 100-Billion Edges
Title: LPS-GNN : Deploying Graph Neural Networks on Graphs with 100-Billion Edges | LPS-GNN : Einsatz von Graphen-Neuralnetzwerken auf Graphen mit 100-Billionen-Kanten | LPS-GNN:在100亿米边缘的图图上部署图形神经网络 2507.14570v1 |
Authors (9): Xu Cheng, Liang Yao, Feng He, Yukuo Cen, Yufei He, Chenhui Zhang, Wenzheng Feng, Hongyun Cai, Jie Tang
Graph Neural Networks (GNNs) have emerged as powerful tools for various graph mining tasks, yet existing scalable solutions often struggle to balance execution efficiency with prediction accuracy. These difficulties stem from iterative message-passing techniques, which place significant computational demands and require extensive GPU memory, particularly when dealing with the neighbor explosion issue inherent in large-scale graphs. This paper introduces a scalable, low-cost, flexible, and efficient GNN framework called LPS-GNN, which can perform representation learning on 100 billion graphs with a single GPU in 10 hours and shows a 13.8% improvement in User Acquisition scenarios. We examine existing graph partitioning methods and design a superior graph partition algorithm named LPMetis. In particular, LPMetis outperforms current state-of-the-art (SOTA) approaches on various evaluation metrics. In addition, our paper proposes a subgraph augmentation strategy to enhance the model’s predictive performance. It exhibits excellent compatibility, allowing the entire framework to accommodate various GNN algorithms. Successfully deployed on the Tencent platform, LPS-GNN has been tested on public and real-world datasets, achieving performance lifts of 8. 24% to 13. 89% over SOTA models in online applications.
nan
Article 879
Title@2025-07-19 (6): The Origin of Self-Attention: From Pairwise Affinity Matrices to Transformers
Title: The Origin of Self-Attention: From Pairwise Affinity Matrices to Transformers | Der Ursprung der Selbstachtung: Von Paarweiser Affinität zu Transformern | 自我关注的起源:从对等亲和矩阵到变异体 2507.14560v1 |
Authors (1): Giorgio Roffo
The self-attention mechanism, now central to deep learning architectures such as Transformers, is a modern instance of a more general computational principle: learning and using pairwise affinity matrices to control how information flows through a model. This paper traces the conceptual origins of self-attention across multiple domains, including computer vision, natural language processing, and graph learning, through their shared reliance on an affinity matrix, denoted as A. We highlight Infinite Feature Selection (Inf-FS) as a foundational approach that generalizes the idea of affinity-based weighting. Unlike the fixed dot-product structure used in Transformers, Inf-FS defines A either through domain knowledge or by learning, and computes feature relevance through multi-hop propagation over the affinity graph. From this perspective, self-attention can be seen as a special case of Inf-FS: it uses a single-hop affinity computation where A is dynamically built from token similarities. We argue that the underlying structure, reasoning over pairwise relationships, is preserved across both approaches, and the key differences lie in how the affinity matrix is defined and applied. By situating self-attention within the broader paradigm of affinity-based computation, we unify several strands of machine learning research and highlight a common mathematical foundation that underpins diverse models and tasks.
nan
Article 880
Title@2025-07-19 (6): Maximum Causal Entropy IRL in Mean-Field Games and GNEP Framework for Forward RL
Title: Maximum Causal Entropy IRL in Mean-Field Games and GNEP Framework for Forward RL | Maximale Causal Entropy IRL in Mittelfeldspielen und GNEP-Rahmen für Forward RL | 中场运动会和GNEP 前转转场框架的最大因果导入性IRL 2401.06566v2 |
Authors (3): Berkay Anahtarci, Can Deha Kariksiz, Naci Saldi
This paper explores the use of Maximum Causal Entropy Inverse Reinforcement Learning (IRL) within the context of discrete-time stationary Mean-Field Games (MFGs) characterized by finite state spaces and an infinite-horizon, discounted-reward setting. Although the resulting optimization problem is non-convex with respect to policies, we reformulate it as a convex optimization problem in terms of state-action occupation measures by leveraging the linear programming framework of Markov Decision Processes. Based on this convex reformulation, we introduce a gradient descent algorithm with a guaranteed convergence rate to efficiently compute the optimal solution. Moreover, we develop a new method that conceptualizes the MFG problem as a Generalized Nash Equilibrium Problem (GNEP), enabling effective computation of the mean-field equilibrium for forward reinforcement learning (RL) problems and marking an advancement in MFG solution techniques. We further illustrate the practical applicability of our GNEP approach by employing this algorithm to generate data for numerical MFG examples.
nan
Article 881
Title@2025-07-19 (6): Brain Foundation Models: A Survey on Advancements in Neural Signal Processing and Brain Discovery
Title: Brain Foundation Models: A Survey on Advancements in Neural Signal Processing and Brain Discovery | Brain Foundation Models: Eine Umfrage über Fortschritte bei der Neural Signalverarbeitung und Gehirnentdeckung | 脑基础模型:神经信号处理和脑发现进展调查 2503.00580v2 |
Authors (7): Xinliang Zhou, Chenyu Liu, Zhisheng Chen, Kun Wang, Yi Ding, Ziyu Jia, Qingsong Wen
Brain foundation models (BFMs) have emerged as a transformative paradigm in computational neuroscience, offering a revolutionary framework for processing diverse neural signals across different brain-related tasks. These models leverage large-scale pre-training techniques, allowing them to generalize effectively across multiple scenarios, tasks, and modalities, thus overcoming the traditional limitations faced by conventional artificial intelligence (AI) approaches in understanding complex brain data. By tapping into the power of pretrained models, BFMs provide a means to process neural data in a more unified manner, enabling advanced analysis and discovery in the field of neuroscience. In this survey, we define BFMs for the first time, providing a clear and concise framework for constructing and utilizing these models in various applications. We also examine the key principles and methodologies for developing these models, shedding light on how they transform the landscape of neural signal processing. This survey presents a comprehensive review of the latest advancements in BFMs, covering the most recent methodological innovations, novel views of application areas, and challenges in the field. Notably, we highlight the future directions and key challenges that need to be addressed to fully realize the potential of BFMs. These challenges include improving the quality of brain data, optimizing model architecture for better generalization, increasing training efficiency, and enhancing the interpretability and robustness of BFMs in real-world applications.
nan
Article 882
Title@2025-07-19 (6): Real Time Captioning of Sign Language Gestures in Video Meetings
Title: Real Time Captioning of Sign Language Gestures in Video Meetings | Echtzeit-Beschriftung von Gesten in Gebärdensprache in Video-Treffen | 视频会议手语手语手势实时定位 2507.14543v1 |
Authors (3): Sharanya Mukherjee, Md Hishaam Akhtar, Kannadasan R
It has always been a rather tough task to communicate with someone possessing a hearing impairment. One of the most tested ways to establish such a communication is through the use of sign based languages. However, not many people are aware of the smaller intricacies involved with sign language. Sign language recognition using computer vision aims at eliminating the communication barrier between deaf-mute and ordinary people so that they can properly communicate with others. Recently the pandemic has left the whole world shaken up and has transformed the way we communicate. Video meetings have become essential for everyone, even people with a hearing disability. In recent studies, it has been found that people with hearing disabilities prefer to sign over typing during these video calls. In this paper, we are proposing a browser extension that will automatically translate sign language to subtitles for everyone else in the video call. The Large-scale dataset which contains more than 2000 Word-Level ASL videos, which were performed by over 100 signers will be used.
nan
Article 883
Title@2025-07-19 (6): Characterizing State Space Model (SSM) and SSM-Transformer Hybrid Language Model Performance with Long Context Length
Title: Characterizing State Space Model (SSM) and SSM-Transformer Hybrid Language Model Performance with Long Context Length | Charakterisieren von State Space Model (SSM) und SSM-Transformer Hybrid Language Model Performance mit langer Kontextlänge | 确定国家空间模型(SSM)和SSM-过渡混合语言模型长内性性能特点 2507.12442v2 |
Authors (5): Saptarshi Mitra, Rachid Karami, Haocheng Xu, Sitao Huang, Hyoukjun Kwon
The demand for machine intelligence capable of processing continuous, long-context inputs on local devices is growing rapidly. However, the quadratic complexity and memory requirements of traditional Transformer architectures make them inefficient and often unusable for these tasks. This has spurred a paradigm shift towards new architectures like State Space Models (SSMs) and hybrids, which promise near-linear scaling. While most current research focuses on the accuracy and theoretical throughput of these models, a systematic performance characterization on practical consumer hardware is critically needed to guide system-level optimization and unlock new applications. To address this gap, we present a comprehensive, comparative benchmarking of carefully selected Transformer, SSM, and hybrid models specifically for long-context inference on consumer and embedded GPUs. Our analysis reveals that SSMs are not only viable but superior for this domain, capable of processing sequences up to 220K tokens on a 24GB consumer GPU-approximately 4x longer than comparable Transformers. While Transformers may be up to 1.8x faster at short sequences, SSMs demonstrate a dramatic performance inversion, becoming up to 4x faster at very long contexts (~57K tokens). Our operator-level analysis reveals that custom, hardware-aware SSM kernels dominate the inference runtime, accounting for over 55% of latency on edge platforms, identifying them as a primary target for future hardware acceleration. We also provide detailed, device-specific characterization results to guide system co-design for the edge. To foster further research, we will open-source our characterization framework.
nan
Article 884
Title@2025-07-19 (6): Kernel Based Maximum Entropy Inverse Reinforcement Learning for Mean-Field Games
Title: Kernel Based Maximum Entropy Inverse Reinforcement Learning for Mean-Field Games | Kernelbasiertes maximales Entropie-Inverse-Verstärkung-Lernen für Mittelfeld-Spiele | 以核心为核心的中场运动会最大内心反向强化学习 2507.14529v1 |
Authors (3): Berkay Anahtarci, Can Deha Kariksiz, Naci Saldi
We consider the maximum causal entropy inverse reinforcement learning problem for infinite-horizon stationary mean-field games, in which we model the unknown reward function within a reproducing kernel Hilbert space. This allows the inference of rich and potentially nonlinear reward structures directly from expert demonstrations, in contrast to most existing inverse reinforcement learning approaches for mean-field games that typically restrict the reward function to a linear combination of a fixed finite set of basis functions. We also focus on the infinite-horizon cost structure, whereas prior studies primarily rely on finite-horizon formulations. We introduce a Lagrangian relaxation to this maximum causal entropy inverse reinforcement learning problem that enables us to reformulate it as an unconstrained log-likelihood maximization problem, and obtain a solution \lk{via} a gradient ascent algorithm. To illustrate the theoretical consistency of the algorithm, we establish the smoothness of the log-likelihood objective by proving the Fr'echet differentiability of the related soft Bellman operators with respect to the parameters in the reproducing kernel Hilbert space. We demonstrate the effectiveness of our method on a mean-field traffic routing game, where it accurately recovers expert behavior.
nan
Article 885
Title@2025-07-19 (6): Positive-Unlabeled Learning for Control Group Construction in Observational Causal Inference
Title: Positive-Unlabeled Learning for Control Group Construction in Observational Causal Inference | Positiv unbeschriftetes Lernen für den Aufbau von Kontrollgruppen in beobachtungsbedingtem Kausalzusammenhang | 在观察性因果关系中进行控制组建设的积极无标签学习 2507.14528v1 |
Authors (7): Ilias Tsoumas, Dimitrios Bormpoudakis, Vasileios Sitokonstantinou, Athanasios Askitopoulos, Andreas Kalogeras, Charalampos Kontoes, Ioannis Athanasiadis
In causal inference, whether through randomized controlled trials or observational studies, access to both treated and control units is essential for estimating the effect of a treatment on an outcome of interest. When treatment assignment is random, the average treatment effect (ATE) can be estimated directly by comparing outcomes between groups. In non-randomized settings, various techniques are employed to adjust for confounding and approximate the counterfactual scenario to recover an unbiased ATE. A common challenge, especially in observational studies, is the absence of units clearly labeled as controls-that is, units known not to have received the treatment. To address this, we propose positive-unlabeled (PU) learning as a framework for identifying, with high confidence, control units from a pool of unlabeled ones, using only the available treated (positive) units. We evaluate this approach using both simulated and real-world data. We construct a causal graph with diverse relationships and use it to generate synthetic data under various scenarios, assessing how reliably the method recovers control groups that allow estimates of true ATE. We also apply our approach to real-world data on optimal sowing and fertilizer treatments in sustainable agriculture. Our findings show that PU learning can successfully identify control (negative) units from unlabeled data based only on treated units and, through the resulting control group, estimate an ATE that closely approximates the true value. This work has important implications for observational causal inference, especially in fields where randomized experiments are difficult or costly. In domains such as earth, environmental, and agricultural sciences, it enables a plethora of quasi-experiments by leveraging available earth observation and climate data, particularly when treated units are available but control units are lacking.
nan
Article 886
Title@2025-07-19 (6): Explainable Graph Neural Networks via Structural Externalities
Title: Explainable Graph Neural Networks via Structural Externalities | Erklärbare Graph Neuronale Netzwerke über strukturelle Externalitäten | 通过结构外貌可解释的图形神经网络 2507.17848v1 |
Authors (3): Lijun Wu, Dong Hao, Zhiyi Fan
Graph Neural Networks (GNNs) have achieved outstanding performance across a wide range of graph-related tasks. However, their “black-box” nature poses significant challenges to their explainability, and existing methods often fail to effectively capture the intricate interaction patterns among nodes within the network. In this work, we propose a novel explainability framework, GraphEXT, which leverages cooperative game theory and the concept of social externalities. GraphEXT partitions graph nodes into coalitions, decomposing the original graph into independent subgraphs. By integrating graph structure as an externality and incorporating the Shapley value under externalities, GraphEXT quantifies node importance through their marginal contributions to GNN predictions as the nodes transition between coalitions. Unlike traditional Shapley value-based methods that primarily focus on node attributes, our GraphEXT places greater emphasis on the interactions among nodes and the impact of structural changes on GNN predictions. Experimental studies on both synthetic and real-world datasets show that GraphEXT outperforms existing baseline methods in terms of fidelity across diverse GNN architectures , significantly enhancing the explainability of GNN models.
nan
Article 887
Title@2025-07-19 (6): Diffusion Models for Time Series Forecasting: A Survey
Title: Diffusion Models for Time Series Forecasting: A Survey | Diffusionsmodelle für die Zeitreihenprognose: Eine Umfrage | 时间序列预测传播模型:调查 2507.14507v1 |
Authors (5): Chen Su, Zhengzhou Cai, Yuanhe Tian, Zihong Zheng, Yan Song
Diffusion models, initially developed for image synthesis, demonstrate remarkable generative capabilities. Recently, their application has expanded to time series forecasting (TSF), yielding promising results. In this survey, we firstly introduce the standard diffusion models and their prevalent variants, explaining their adaptation to TSF tasks. We then provide a comprehensive review of diffusion models for TSF, paying special attention to the sources of conditional information and the mechanisms for integrating this conditioning within the models. In analyzing existing approaches using diffusion models for TSF, we provide a systematic categorization and a comprehensive summary of them in this survey. Furthermore, we examine several foundational diffusion models applied to TSF, alongside commonly used datasets and evaluation metrics. Finally, we discuss current limitations in these approaches and potential future research directions. Overall, this survey details recent progress and future prospects for diffusion models in TSF, serving as a reference for researchers in the field.
nan
Article 888
Title@2025-07-19 (6): Generalized Linear Bandits with Limited Adaptivity
Title: Generalized Linear Bandits with Limited Adaptivity | Generalisierte Linear Banditen mit begrenzter Adaptivität | 有限适应性通用直线强盗 2404.06831v5 |
Authors (4): Ayush Sawarni, Nirjhar Das, Siddharth Barman, Gaurav Sinha
We study the generalized linear contextual bandit problem within the constraints of limited adaptivity. In this paper, we present two algorithms, $\texttt{B-GLinCB}$ and $\texttt{RS-GLinCB}$, that address, respectively, two prevalent limited adaptivity settings. Given a budget $M$ on the number of policy updates, in the first setting, the algorithm needs to decide upfront $M$ rounds at which it will update its policy, while in the second setting it can adaptively perform $M$ policy updates during its course. For the first setting, we design an algorithm $\texttt{B-GLinCB}$, that incurs $\tilde{O}(\sqrt{T})$ regret when $M = \Omega( \log{\log T} )$ and the arm feature vectors are generated stochastically. For the second setting, we design an algorithm $\texttt{RS-GLinCB}$ that updates its policy $\tilde{O}(\log^2 T)$ times and achieves a regret of $\tilde{O}(\sqrt{T})$ even when the arm feature vectors are adversarially generated. Notably, in these bounds, we manage to eliminate the dependence on a key instance dependent parameter $\kappa$, that captures non-linearity of the underlying reward model. Our novel approach for removing this dependence for generalized linear contextual bandits might be of independent interest.
nan
Article 889
Title@2025-07-19 (6): RingFormer: A Neural Vocoder with Ring Attention and Convolution-Augmented Transformer
Title: RingFormer: A Neural Vocoder with Ring Attention and Convolution-Augmented Transformer | RingFormer: Ein neuraler Vocoder mit Ringaufmerksamkeit und Convolution-Augmented Transformer | Ringformer: 具有响起注意力和革命推动变形器的神经导体Vocoder 2501.01182v2 |
Authors (2): Seongho Hong, Yong-Hoon Choi
While transformers demonstrate outstanding performance across various audio tasks, their application to neural vocoders remains challenging. Neural vocoders require the generation of long audio signals at the sample level, which demands high temporal resolution. This results in significant computational costs for attention map generation and limits their ability to efficiently process both global and local information. Additionally, the sequential nature of sample generation in neural vocoders poses difficulties for real-time processing, making the direct adoption of transformers impractical. To address these challenges, we propose RingFormer, a neural vocoder that incorporates the ring attention mechanism into a lightweight transformer variant, the convolution-augmented transformer (Conformer). Ring attention effectively captures local details while integrating global information, making it well-suited for processing long sequences and enabling real-time audio generation. RingFormer is trained using adversarial training with two discriminators. The proposed model is applied to the decoder of the text-to-speech model VITS and compared with state-of-the-art vocoders such as HiFi-GAN, iSTFT-Net, and BigVGAN under identical conditions using various objective and subjective metrics. Experimental results show that RingFormer achieves comparable or superior performance to existing models, particularly excelling in real-time audio generation. Our code and audio samples are available on GitHub.
nan
Article 890
Title@2025-07-19 (6): Generative Distribution Distillation
Title: Generative Distribution Distillation | Generative Verteilungsdestillation | 蒸馏 2507.14503v1 |
Authors (9): Jiequan Cui, Beier Zhu, Qingshan Xu, Xiaogang Xu, Pengguang Chen, Xiaojuan Qi, Bei Yu, Hanwang Zhang, Richang Hong
In this paper, we formulate the knowledge distillation (KD) as a conditional generative problem and propose the \textit{Generative Distribution Distillation (GenDD)} framework. A naive \textit{GenDD} baseline encounters two major challenges: the curse of high-dimensional optimization and the lack of semantic supervision from labels. To address these issues, we introduce a \textit{Split Tokenization} strategy, achieving stable and effective unsupervised KD. Additionally, we develop the \textit{Distribution Contraction} technique to integrate label supervision into the reconstruction objective. Our theoretical proof demonstrates that \textit{GenDD} with \textit{Distribution Contraction} serves as a gradient-level surrogate for multi-task learning, realizing efficient supervised training without explicit classification loss on multi-step sampling image representations. To evaluate the effectiveness of our method, we conduct experiments on balanced, imbalanced, and unlabeled data. Experimental results show that \textit{GenDD} performs competitively in the unsupervised setting, significantly surpassing KL baseline by \textbf{16.29\%} on ImageNet validation set. With label supervision, our ResNet-50 achieves \textbf{82.28\%} top-1 accuracy on ImageNet in 600 epochs training, establishing a new state-of-the-art.
nan
Article 891
Title@2025-07-19 (6): Neural Brownian Motion
Title: Neural Brownian Motion | Neural Brownian Bewegung | 神经棕色运动 2507.14499v1 |
Authors (1): Qian Qi
This paper introduces the Neural-Brownian Motion (NBM), a new class of stochastic processes for modeling dynamics under learned uncertainty. The NBM is defined axiomatically by replacing the classical martingale property with respect to linear expectation with one relative to a non-linear Neural Expectation Operator, $\varepsilon^\theta$, generated by a Backward Stochastic Differential Equation (BSDE) whose driver $f_\theta$ is parameterized by a neural network. Our main result is a representation theorem for a canonical NBM, which we define as a continuous $\varepsilon^\theta$-martingale with zero drift under the physical measure. We prove that, under a key structural assumption on the driver, such a canonical NBM exists and is the unique strong solution to a stochastic differential equation of the form ${\rm d} M_t = \nu_\theta(t, M_t) {\rm d} W_t$. Crucially, the volatility function $\nu_\theta$ is not postulated a priori but is implicitly defined by the algebraic constraint $g_\theta(t, M_t, \nu_\theta(t, M_t)) = 0$, where $g_\theta$ is a specialization of the BSDE driver. We develop the stochastic calculus for this process and prove a Girsanov-type theorem for the quadratic case, showing that an NBM acquires a drift under a new, learned measure. The character of this measure, whether pessimistic or optimistic, is endogenously determined by the learned parameters $\theta$, providing a rigorous foundation for models where the attitude towards uncertainty is a discoverable feature.
nan
Article 892
Title@2025-07-19 (6): Rethinking Data Protection in the (Generative) Artificial Intelligence Era
Title: Rethinking Data Protection in the (Generative) Artificial Intelligence Era | Datenschutz im Zeitalter der (generativen) Künstlichen Intelligenz neu denken | 在人工(人工)情报时代重新思考数据保护问题 2507.03034v3 |
Authors (11): Yiming Li, Shuo Shao, Yu He, Junfeng Guo, Tianwei Zhang, Zhan Qin, Pin-Yu Chen, Michael Backes, Philip Torr, Dacheng Tao, Kui Ren
The (generative) artificial intelligence (AI) era has profoundly reshaped the meaning and value of data. No longer confined to static content, data now permeates every stage of the AI lifecycle from the training samples that shape model parameters to the prompts and outputs that drive real-world model deployment. This shift renders traditional notions of data protection insufficient, while the boundaries of what needs safeguarding remain poorly defined. Failing to safeguard data in AI systems can inflict societal and individual, underscoring the urgent need to clearly delineate the scope of and rigorously enforce data protection. In this perspective, we propose a four-level taxonomy, including non-usability, privacy preservation, traceability, and deletability, that captures the diverse protection needs arising in modern (generative) AI models and systems. Our framework offers a structured understanding of the trade-offs between data utility and control, spanning the entire AI pipeline, including training datasets, model weights, system prompts, and AI-generated content. We analyze representative technical approaches at each level and reveal regulatory blind spots that leave critical assets exposed. By offering a structured lens to align future AI technologies and governance with trustworthy data practices, we underscore the urgency of rethinking data protection for modern AI techniques and provide timely guidance for developers, researchers, and regulators alike.
nan
Article 893
Title@2025-07-19 (6): Glitches in Decision Tree Ensemble Models
Title: Glitches in Decision Tree Ensemble Models | Glitches in Decision Tree Ensemble Modelle | 决策树组合模型中的漏洞 2507.14492v1 |
Authors (5): Satyankar Chandra, Ashutosh Gupta, Kaushik Mallik, Krishna Shankaranarayanan, Namrita Varshney
Many critical decision-making tasks are now delegated to machine-learned models, and it is imperative that their decisions are trustworthy and reliable, and their outputs are consistent across similar inputs. We identify a new source of unreliable behaviors-called glitches-which may significantly impair the reliability of AI models having steep decision boundaries. Roughly speaking, glitches are small neighborhoods in the input space where the model’s output abruptly oscillates with respect to small changes in the input. We provide a formal definition of glitches, and use well-known models and datasets from the literature to demonstrate that they have widespread existence and argue they usually indicate potential model inconsistencies in the neighborhood of where they are found. We proceed to the algorithmic search of glitches for widely used gradient-boosted decision tree (GBDT) models. We prove that the problem of detecting glitches is NP-complete for tree ensembles, already for trees of depth 4. Our glitch-search algorithm for GBDT models uses an MILP encoding of the problem, and its effectiveness and computational feasibility are demonstrated on a set of widely used GBDT benchmarks taken from the literature.
nan
Article 894
Title@2025-07-19 (6): Numerical Artifacts in Learning Dynamical Systems
Title: Numerical Artifacts in Learning Dynamical Systems | Numerische Artefakte im Lernen dynamischer Systeme | 学习动态系统中的数值手法 2507.14491v1 |
Authors (2): Bing-Ze Lu, Richard Tsai
In many applications, one needs to learn a dynamical system from its solutions sampled at a finite number of time points. The learning problem is often formulated as an optimization problem over a chosen function class. However, in the optimization procedure, it is necessary to employ a numerical scheme to integrate candidate dynamical systems and assess how their solutions fit the data. This paper reveals potentially serious effects of a chosen numerical scheme on the learning outcome. In particular, our analysis demonstrates that a damped oscillatory system may be incorrectly identified as having “anti-damping” and exhibiting a reversed oscillation direction, despite adequately fitting the given data points.
nan
Article 895
Title@2025-07-19 (6): Federated Reinforcement Learning in Heterogeneous Environments
Title: Federated Reinforcement Learning in Heterogeneous Environments | Föderiertes Stärkungslernen in heterogenen Umgebungen | 不同不同环境的联邦强化学习 2507.14487v1 |
Authors (2): Ukjo Hwang, Songnam Hong
We investigate a Federated Reinforcement Learning with Environment Heterogeneity (FRL-EH) framework, where local environments exhibit statistical heterogeneity. Within this framework, agents collaboratively learn a global policy by aggregating their collective experiences while preserving the privacy of their local trajectories. To better reflect real-world scenarios, we introduce a robust FRL-EH framework by presenting a novel global objective function. This function is specifically designed to optimize a global policy that ensures robust performance across heterogeneous local environments and their plausible perturbations. We propose a tabular FRL algorithm named FedRQ and theoretically prove its asymptotic convergence to an optimal policy for the global objective function. Furthermore, we extend FedRQ to environments with continuous state space through the use of expectile loss, addressing the key challenge of minimizing a value function over a continuous subset of the state space. This advancement facilitates the seamless integration of the principles of FedRQ with various Deep Neural Network (DNN)-based RL algorithms. Extensive empirical evaluations validate the effectiveness and robustness of our FRL algorithms across diverse heterogeneous environments, consistently achieving superior performance over the existing state-of-the-art FRL algorithms.
nan
Article 896
Title@2025-07-19 (6): ReDiSC: A Reparameterized Masked Diffusion Model for Scalable Node Classification with Structured Predictions
Title: ReDiSC: A Reparameterized Masked Diffusion Model for Scalable Node Classification with Structured Predictions | ReDiSC: Ein reparameterisiertes Maskiertes Diffusionsmodell für skalierbare Knotenklassifikation mit strukturierten Vorhersagen | ReDISC:具有结构预测的可缩放节节点分类可修复的蒙面扩散模型 2507.14484v1 |
Authors (6): Yule Li, Yifeng Lu, Zhen Wang, Zhewei Wei, Yaliang Li, Bolin Ding
In recent years, graph neural networks (GNN) have achieved unprecedented successes in node classification tasks. Although GNNs inherently encode specific inductive biases (e.g., acting as low-pass or high-pass filters), most existing methods implicitly assume conditional independence among node labels in their optimization objectives. While this assumption is suitable for traditional classification tasks such as image recognition, it contradicts the intuitive observation that node labels in graphs remain correlated, even after conditioning on the graph structure. To make structured predictions for node labels, we propose ReDiSC, namely, Reparameterized masked Diffusion model for Structured node Classification. ReDiSC estimates the joint distribution of node labels using a reparameterized masked diffusion model, which is learned through the variational expectation-maximization (EM) framework. Our theoretical analysis shows the efficiency advantage of ReDiSC in the E-step compared to DPM-SNC, a state-of-the-art model that relies on a manifold-constrained diffusion model in continuous domain. Meanwhile, we explicitly link ReDiSC’s M-step objective to popular GNN and label propagation hybrid approaches. Extensive experiments demonstrate that ReDiSC achieves superior or highly competitive performance compared to state-of-the-art GNN, label propagation, and diffusion-based baselines across both homophilic and heterophilic graphs of varying sizes. Notably, ReDiSC scales effectively to large-scale datasets on which previous structured diffusion methods fail due to computational constraints, highlighting its significant practical advantage in structured node classification tasks.
nan
Article 897
Title@2025-07-19 (6): Label-semantics Aware Generative Approach for Domain-Agnostic Multilabel Classification
Title: Label-semantics Aware Generative Approach for Domain-Agnostic Multilabel Classification | Label-Semantik Aware Generativer Ansatz für Domain-Agnostic Multilabel-Klassifikation | 域-不可知性多标签分类的认知生成方法 2506.06806v2 |
Authors (5): Subhendu Khatuya, Shashwat Naidu, Saptarshi Ghosh, Pawan Goyal, Niloy Ganguly
The explosion of textual data has made manual document classification increasingly challenging. To address this, we introduce a robust, efficient domain-agnostic generative model framework for multi-label text classification. Instead of treating labels as mere atomic symbols, our approach utilizes predefined label descriptions and is trained to generate these descriptions based on the input text. During inference, the generated descriptions are matched to the pre-defined labels using a finetuned sentence transformer. We integrate this with a dual-objective loss function, combining cross-entropy loss and cosine similarity of the generated sentences with the predefined target descriptions, ensuring both semantic alignment and accuracy. Our proposed model LAGAMC stands out for its parameter efficiency and versatility across diverse datasets, making it well-suited for practical applications. We demonstrate the effectiveness of our proposed model by achieving new state-of-the-art performances across all evaluated datasets, surpassing several strong baselines. We achieve improvements of 13.94% in Micro-F1 and 24.85% in Macro-F1 compared to the closest baseline across all datasets.
nan
Article 898
Title@2025-07-19 (6): Learning Stochastic Hamiltonian Systems via Stochastic Generating Function Neural Network
Title: Learning Stochastic Hamiltonian Systems via Stochastic Generating Function Neural Network | Stochastische Hamiltonische Systeme über stochastische Generierungsfunktion neurales Netzwerk lernen | 通过Stochatic生成功能神经网络学习斯托卡特·汉密尔顿系统 2507.14467v1 |
Authors (4): Chen Chen, Lijin Wang, Yanzhao Cao, Xupeng Cheng
In this paper we propose a novel neural network model for learning stochastic Hamiltonian systems (SHSs) from observational data, termed the stochastic generating function neural network (SGFNN). SGFNN preserves symplectic structure of the underlying stochastic Hamiltonian system and produces symplectic predictions. Our model utilizes the autoencoder framework to identify the randomness of the latent system by the encoder network, and detects the stochastic generating function of the system through the decoder network based on the random variables extracted from the encoder. Symplectic predictions can then be generated by the stochastic generating function. Numerical experiments are performed on several stochastic Hamiltonian systems, varying from additive to multiplicative, and from separable to non-separable SHSs with single or multiple noises. Compared with the benchmark stochastic flow map learning (sFML) neural network, our SGFNN model exhibits higher accuracy across various prediction metrics, especially in long-term predictions, with the property of maintaining the symplectic structure of the underlying SHSs.
nan
Article 899
Title@2025-07-19 (6): SWI: Speaking with Intent in Large Language Models
Title: SWI: Speaking with Intent in Large Language Models | SWI: Sprechen mit Intent in großen Sprachmodellen | SWI:用大语言模型表达意向 2503.21544v2 |
Authors (3): Yuwei Yin, EunJeong Hwang, Giuseppe Carenini
Intent, typically clearly formulated and planned, functions as a cognitive framework for communication and problem-solving. This paper introduces the concept of Speaking with Intent (SWI) in large language models (LLMs), where the explicitly generated intent encapsulates the model’s underlying intention and provides high-level planning to guide subsequent analysis and action. By emulating deliberate and purposeful thoughts in the human mind, SWI is hypothesized to enhance the reasoning capabilities and generation quality of LLMs. Extensive experiments on text summarization, multi-task question answering, and mathematical reasoning benchmarks consistently demonstrate the effectiveness and generalizability of Speaking with Intent over direct generation without explicit intent. Further analysis corroborates the generalizability of SWI under different experimental settings. Moreover, human evaluations verify the coherence, effectiveness, and interpretability of the intent produced by SWI. The promising results in enhancing LLMs with explicit intents pave a new avenue for boosting LLMs’ generation and reasoning abilities with cognitive notions.
nan
Article 900
Title@2025-07-19 (6): AlphaDPO: Adaptive Reward Margin for Direct Preference Optimization
Title: AlphaDPO: Adaptive Reward Margin for Direct Preference Optimization | AlphaDPO: Adaptive Prämienspanne für direkte Präferenzoptimierung | AlphaDPO: 直接优化优惠的适应性回报边缘 2410.10148v4 |
Authors (8): Junkang Wu, Xue Wang, Zhengyi Yang, Jiancan Wu, Jinyang Gao, Bolin Ding, Xiang Wang, Xiangnan He
Aligning large language models (LLMs) with human values and intentions is crucial for their utility, honesty, and safety. Reinforcement learning from human feedback (RLHF) is a popular approach to achieve this alignment, but it faces challenges in computational efficiency and training stability. Recent methods like Direct Preference Optimization (DPO) and Simple Preference Optimization (SimPO) have proposed offline alternatives to RLHF, simplifying the process by reparameterizing the reward function. However, DPO depends on a potentially suboptimal reference model, and SimPO’s assumption of a fixed target reward margin may lead to suboptimal decisions in diverse data settings. In this work, we propose $\alpha$-DPO, an adaptive preference optimization algorithm designed to address these limitations by introducing a dynamic reward margin. Specifically, $\alpha$-DPO employs an adaptive preference distribution, balancing the policy model and the reference model to achieve personalized reward margins. We provide theoretical guarantees for $\alpha$-DPO, demonstrating its effectiveness as a surrogate optimization objective and its ability to balance alignment and diversity through KL divergence control. Empirical evaluations on AlpacaEval 2 and Arena-Hard show that $\alpha$-DPO consistently outperforms DPO and SimPO across various model settings, establishing it as a robust approach for fine-tuning LLMs. Our method achieves significant improvements in win rates, highlighting its potential as a powerful tool for LLM alignment. The code is available at https://github.com/junkangwu/alpha-DPO
nan
Article 901
Title@2025-07-19 (6): Continual Learning with Neuromorphic Computing: Foundations, Methods, and Emerging Applications
Title: Continual Learning with Neuromorphic Computing: Foundations, Methods, and Emerging Applications | Kontinuierliches Lernen mit neuromorphem Rechnen: Grundlagen, Methoden und neu entstehende Anwendungen | 与神经陆基计算机的不断学习:基础、方法和新兴应用 2410.09218v3 |
Authors (5): Mishal Fatima Minhas, Rachmad Vidya Wicaksana Putra, Falah Awwad, Osman Hasan, Muhammad Shafique
The challenging deployment of compute- and memory-intensive methods from Deep Neural Network (DNN)-based Continual Learning (CL) underscores the critical need for a paradigm shift towards more efficient approaches. Neuromorphic Continual Learning (NCL) appears as an emerging solution, by leveraging the principles of Spiking Neural Networks (SNNs) which enable efficient CL algorithms executed in dynamically-changed environments with resource-constrained computing systems. Motivated by the need for a holistic study of NCL, in this survey, we first provide a detailed background on CL, encompassing the desiderata, settings, metrics, scenario taxonomy, Online Continual Learning (OCL) paradigm, recent DNN-based methods to address catastrophic forgetting (CF). Then, we analyze these methods considering CL desiderata, computational and memory costs, as well as network complexity, hence emphasizing the need for energy-efficient CL. Afterward, we provide background of low-power neuromorphic systems including encoding techniques, neuronal dynamics, network architectures, learning rules, hardware processors, software and hardware frameworks, datasets, benchmarks, and evaluation metrics. Then, this survey comprehensively reviews and analyzes state-of-the-art in NCL. The key ideas, implementation frameworks, and performance assessments are also provided. This survey covers several hybrid approaches that combine supervised and unsupervised learning paradigms. It also covers optimization techniques including SNN operations reduction, weight quantization, and knowledge distillation. Then, this survey discusses the progress of real-world NCL applications. Finally, this paper provides a future perspective on the open research challenges for NCL, since the purpose of this study is to be useful for the wider neuromorphic AI research community and to inspire future research in bio-plausible OCL.
nan
Article 902
Title@2025-07-19 (6): Faster Low-Rank Approximation and Kernel Ridge Regression via the Block-Nyström Method
Title: Faster Low-Rank Approximation and Kernel Ridge Regression via the Block-Nyström Method | Schnellere Low-Rank-Annäherung und Kernel Ridge-Regression über die Block-Nyström-Methode | 通过块-Nyström方法更快地低兰克相近和内核脊回归 2506.17556v2 |
Authors (2): Sachin Garg, Michał Dereziński
The Nystr"om method is a popular low-rank approximation technique for large matrices that arise in kernel methods and convex optimization. Yet, when the data exhibits heavy-tailed spectral decay, the effective dimension of the problem often becomes so large that even the Nystr"om method may be outside of our computational budget. To address this, we propose Block-Nystr"om, an algorithm that injects a block-diagonal structure into the Nystr"om method, thereby significantly reducing its computational cost while recovering strong approximation guarantees. We show that Block-Nystr"om can be used to construct improved preconditioners for second-order optimization, as well as to efficiently solve kernel ridge regression for statistical learning over Hilbert spaces. Our key technical insight is that, within the same computational budget, combining several smaller Nystr"om approximations leads to stronger tail estimates of the input spectrum than using one larger approximation. Along the way, we provide a novel recursive preconditioning scheme for efficiently inverting the Block-Nystr"om matrix, and provide new statistical learning bounds for a broad class of approximate kernel ridge regression solvers.
nan
Article 903
Title@2025-07-19 (6): DiCE-Extended: A Robust Approach to Counterfactual Explanations in Machine Learning
Title: DiCE-Extended: A Robust Approach to Counterfactual Explanations in Machine Learning | DiCE-erweitert: Ein robuster Ansatz gegenfaktische Erklärungen im maschinellen Lernen | DiCE-Expended: 机械学习中反事实解释的有力方法 2504.19027v2 |
Authors (3): Volkan Bakir, Polat Goktas, Sureyya Akyuz
Explainable artificial intelligence (XAI) has become increasingly important in decision-critical domains such as healthcare, finance, and law. Counterfactual (CF) explanations, a key approach in XAI, provide users with actionable insights by suggesting minimal modifications to input features that lead to different model outcomes. Despite significant advancements, existing CF generation methods often struggle to balance proximity, diversity, and robustness, limiting their real-world applicability. A widely adopted framework, Diverse Counterfactual Explanations (DiCE), emphasizes diversity but lacks robustness, making CF explanations sensitive to perturbations and domain constraints. To address these challenges, we introduce DiCE-Extended, an enhanced CF explanation framework that integrates multi-objective optimization techniques to improve robustness while maintaining interpretability. Our approach introduces a novel robustness metric based on the Dice-S{\o}rensen coefficient, enabling stability under small input variations. Additionally, we refine CF generation using weighted loss components (lambda_p, lambda_d, lambda_r) to balance proximity, diversity, and robustness. We empirically validate DiCE-Extended on benchmark datasets (COMPAS, Lending Club, German Credit, Adult Income) across multiple ML backends (Scikit-learn, PyTorch, TensorFlow). Results demonstrate improved CF validity, stability, and alignment with decision boundaries compared to standard DiCE-generated explanations. Our findings highlight the potential of DiCE-Extended in generating more reliable and interpretable CFs for high-stakes applications. Future work could explore adaptive optimization techniques and domain-specific constraints to further enhance CF generation in real-world scenarios
nan
Article 904
Title@2025-07-19 (6): Statistical and Algorithmic Foundations of Reinforcement Learning
Title: Statistical and Algorithmic Foundations of Reinforcement Learning | Statistische und algorithmische Grundlagen des verstärkten Lernens | 强化学习的统计和算法基础 2507.14444v1 |
Authors (3): Yuejie Chi, Yuxin Chen, Yuting Wei
As a paradigm for sequential decision making in unknown environments, reinforcement learning (RL) has received a flurry of attention in recent years. However, the explosion of model complexity in emerging applications and the presence of nonconvexity exacerbate the challenge of achieving efficient RL in sample-starved situations, where data collection is expensive, time-consuming, or even high-stakes (e.g., in clinical trials, autonomous systems, and online advertising). How to understand and enhance the sample and computational efficacies of RL algorithms is thus of great interest. In this tutorial, we aim to introduce several important algorithmic and theoretical developments in RL, highlighting the connections between new ideas and classical topics. Employing Markov Decision Processes as the central mathematical model, we cover several distinctive RL scenarios (i.e., RL with a simulator, online RL, offline RL, robust RL, and RL with human feedback), and present several mainstream RL approaches (i.e., model-based approach, value-based approach, and policy optimization). Our discussions gravitate around the issues of sample complexity, computational efficiency, as well as algorithm-dependent and information-theoretic lower bounds from a non-asymptotic viewpoint.
nan
Article 905
Title@2025-07-19 (6): The Perception of Phase Intercept Distortion and its Application in Data Augmentation
Title: The Perception of Phase Intercept Distortion and its Application in Data Augmentation | Die Wahrnehmung von Phase Intercept Distortion und ihre Anwendung in der Datenvergrößerung | 阶段拦截干扰的感知及其在数据增加中的应用 2506.14571v2 |
Authors (2): Venkatakrishnan Vaidyanathapuram Krishnan, Nathaniel Condit-Schultz
Phase distortion refers to the alteration of the phase relationships between frequencies in a signal, which can be perceptible. In this paper, we discuss a special case of phase distortion known as phase-intercept distortion, which is created by a frequency-independent phase shift. We hypothesize that, though this form of distortion changes a signal’s waveform significantly, the distortion is imperceptible. Human-subject experiment results are reported which are consistent with this hypothesis. Furthermore, we discuss how the imperceptibility of phase-intercept distortion can be useful for machine learning, specifically for data augmentation. We conducted multiple experiments using phase-intercept distortion as a novel approach to data augmentation, and obtained improved results for audio machine learning tasks.
nan
Article 906
Title@2025-07-19 (6): Adversarial bandit optimization for approximately linear functions
Title: Adversarial bandit optimization for approximately linear functions | Adversariale Bandit-Optimierung für etwa lineare Funktionen | 大约直线功能的对面土匪优化 2505.20734v4 |
Authors (3): Zhuoyu Cheng, Kohei Hatano, Eiji Takimoto
We consider a bandit optimization problem for nonconvex and non-smooth functions, where in each trial the loss function is the sum of a linear function and a small but arbitrary perturbation chosen after observing the player’s choice. We give both expected and high probability regret bounds for the problem. Our result also implies an improved high-probability regret bound for the bandit linear optimization, a special case with no perturbation. We also give a lower bound on the expected regret.
nan
Article 907
Title@2025-07-19 (6): Escaping Saddle Points for Nonsmooth Weakly Convex Functions via Perturbed Proximal Algorithms
Title: Escaping Saddle Points for Nonsmooth Weakly Convex Functions via Perturbed Proximal Algorithms | Escaping Sattel Punkte für nonsmooth schwach Konvex Funktionen über Perturbed Proximal Algorithmen | 通过 Perturbed Proximal Proximal 精度算法为非mooth 微弱 Convex 函数解开套接合点 2102.02837v3 |
Authors (2): Minhui Huang, Weiming Zhu
We propose perturbed proximal algorithms that can provably escape strict saddles for nonsmooth weakly convex functions. The main results are based on a novel characterization of $\epsilon$-approximate local minimum for nonsmooth functions, and recent developments on perturbed gradient methods for escaping saddle points for smooth problems. Specifically, we show that under standard assumptions, the perturbed proximal point, perturbed proximal gradient and perturbed proximal linear algorithms find $\epsilon$-approximate local minimum for nonsmooth weakly convex functions in $O(\epsilon^{-2}\log(d)^4)$ iterations, where $d$ is the dimension of the problem.
nan
Article 908
Title@2025-07-19 (6): ShiftKD: Benchmarking Knowledge Distillation under Distribution Shift
Title: ShiftKD: Benchmarking Knowledge Distillation under Distribution Shift | ShiftKD: Benchmarking Knowledge Destillation unter Distribution Shift | ShiftKD: 分配转移下知识蒸馏基准 2312.16242v3 |
Authors (4): Songming Zhang, Yuxiao Luo, Ziyu Lyu, Xiaofeng Chen
Knowledge Distillation (KD) transfers knowledge from large models to small models and has recently achieved remarkable success. However, the reliability of existing KD methods in real-world applications, especially under distribution shift, remains underexplored. Distribution shift refers to the data distribution drifts between the training and testing phases, and this can adversely affect the efficacy of KD. In this paper, we propose a unified and systematic framework \textsc{ShiftKD} to benchmark KD against two general distributional shifts: diversity and correlation shift. The evaluation benchmark covers more than 30 methods from algorithmic, data-driven, and optimization perspectives for five benchmark datasets. Our development of \textsc{ShiftKD} conducts extensive experiments and reveals strengths and limitations of current SOTA KD methods. More importantly, we thoroughly analyze key factors in student model training process, including data augmentation, pruning methods, optimizers, and evaluation metrics. We believe \textsc{ShiftKD} could serve as an effective benchmark for assessing KD in real-world scenarios, thus driving the development of more robust KD methods in response to evolving demands. The code will be made available upon publication.
nan
Article 909
Title@2025-07-19 (6): Grokking at the Edge of Linear Separability
Title: Grokking at the Edge of Linear Separability | Grokking am Rande der linearen Separierbarkeit | 位于线性分离的边缘 2410.04489v2 |
Authors (3): Alon Beck, Noam Levi, Yohai Bar-Sinai
We investigate the phenomenon of grokking – delayed generalization accompanied by non-monotonic test loss behavior – in a simple binary logistic classification task, for which “memorizing” and “generalizing” solutions can be strictly defined. Surprisingly, we find that grokking arises naturally even in this minimal model when the parameters of the problem are close to a critical point, and provide both empirical and analytical insights into its mechanism. Concretely, by appealing to the implicit bias of gradient descent, we show that logistic regression can exhibit grokking when the training dataset is nearly linearly separable from the origin and there is strong noise in the perpendicular directions. The underlying reason is that near the critical point, “flat” directions in the loss landscape with nearly zero gradient cause training dynamics to linger for arbitrarily long times near quasi-stable solutions before eventually reaching the global minimum. Finally, we highlight similarities between our findings and the recent literature, strengthening the conjecture that grokking generally occurs in proximity to the interpolation threshold, reminiscent of critical phenomena often observed in physical systems.
nan
Article 910
Title@2025-07-19 (6): Likelihood-Free Gaussian Process for Regression
Title: Likelihood-Free Gaussian Process for Regression | Wahrscheinlichkeitsfreier Gauß-Prozess für Regression | 高斯回归进程 2006.13456v5 |
Authors (1): Yuta Shikuri
Gaussian process regression can flexibly represent the posterior distribution of an interest parameter given sufficient information on the likelihood. However, in some cases, we have little knowledge regarding the probability model. For example, when investing in a financial instrument, the probability model of cash flow is generally unknown. In this paper, we propose a novel framework called the likelihood-free Gaussian process (LFGP), which allows representation of the posterior distributions of interest parameters for scalable problems without directly setting their likelihood functions. The LFGP establishes clusters in which the value of the interest parameter can be considered approximately identical, and it approximates the likelihood of the interest parameter in each cluster to a Gaussian using the asymptotic normality of the maximum likelihood estimator. We expect that the proposed framework will contribute significantly to likelihood-free modeling, particularly by reducing the assumptions for the probability model and the computational costs for scalable problems.
nan
Article 911
Title@2025-07-19 (6): Decomposed Quadratization: Efficient QUBO Formulation for Learning Bayesian Network
Title: Decomposed Quadratization: Efficient QUBO Formulation for Learning Bayesian Network | Zersetzte Quadratisierung: Effiziente QUBO-Formulierung für Bayesisches Netzwerk | 分解四分化:高效的QUBO 制定学习海湾网络 2006.06926v7 |
Authors (1): Yuta Shikuri
Algorithms and hardware for solving quadratic unconstrained binary optimization (QUBO) problems have made significant recent progress. This advancement has focused attention on formulating combinatorial optimization problems as quadratic polynomials. To improve the performance of solving large QUBO problems, it is essential to minimize the number of binary variables used in the objective function. In this paper, we propose a QUBO formulation that offers a bit capacity advantage over conventional quadratization techniques. As a key application, this formulation significantly reduces the number of binary variables required for score-based Bayesian network structure learning. Experimental results on $16$ instances, ranging from $37$ to $223$ variables, demonstrate that our approach requires notably fewer binary variables than quadratization. Moreover, an annealing machine that implement our formulation have outperformed existing algorithms in score maximization.
nan
Article 912
Title@2025-07-19 (6): It’s Not That Simple. An Analysis of Simple Test-Time Scaling
Title: It’s Not That Simple. An Analysis of Simple Test-Time Scaling | Es ist nicht so einfach. Eine Analyse der einfachen Test-Zeit-Skalierung | 不是那么简单 简单的测试时间缩放分析 2507.14419v1 |
Authors (1): Guojun Wu
Prior work proposed simple test-time scaling, a method for replicating this scaling behavior with models distilled from o1-like models by manually controlling test-time compute: either scaling down by enforcing a maximum length or scaling up by iteratively appending “Wait” when the model is about to terminate its generation. This paper presents an analysis of simple test-time scaling and finds that the scaling behavior is largely attributed to scaling down by enforcing a maximum length. In contrast, fine-tuning on long CoT data distilled from o1-like models has no significant impact on scaling behavior, and scaling up by appending “Wait” leads to inconsistencies, as the model may oscillate between solutions. A key distinction exists between scaling down by enforcing a maximum length and scaling up test-time compute in o1-like models, such as DeepSeek-R1\@. These models are typically allowed to utilize as much compute as needed, with the only constraint being the model’s maximum supported length. By learning to naturally scale up test-time compute during reinforcement learning, o1-like models surpass their peak performance when scaling up. In contrast, simple test-time scaling progressively imposes a lower upper limit on model performance as it scales down. While replicating the test-time scaling behavior of o1 models can be straightforward by scaling down, it is crucial to recognize that the goal of scaling test-time compute is to unlock higher performance – beyond what the model could originally achieve – rather than merely reproducing the appearance of scaling behavior.
nan
Article 913
Title@2025-07-18 (5): BARNN: A Bayesian Autoregressive and Recurrent Neural Network
Title: BARNN: A Bayesian Autoregressive and Recurrent Neural Network | BARNN: Ein bayesisches Autoregressives und recurrentes Neuronales Netzwerk | Bayesian自动递减和经常性神经网络 2501.18665v2 |
Authors (4): Dario Coscia, Max Welling, Nicola Demo, Gianluigi Rozza
Autoregressive and recurrent networks have achieved remarkable progress across various fields, from weather forecasting to molecular generation and Large Language Models. Despite their strong predictive capabilities, these models lack a rigorous framework for addressing uncertainty, which is key in scientific applications such as PDE solving, molecular generation and Machine Learning Force Fields. To address this shortcoming we present BARNN: a variational Bayesian Autoregressive and Recurrent Neural Network. BARNNs aim to provide a principled way to turn any autoregressive or recurrent model into its Bayesian version. BARNN is based on the variational dropout method, allowing to apply it to large recurrent neural networks as well. We also introduce a temporal version of the “Variational Mixtures of Posteriors” prior (tVAMP-prior) to make Bayesian inference efficient and well-calibrated. Extensive experiments on PDE modelling and molecular generation demonstrate that BARNN not only achieves comparable or superior accuracy compared to existing methods, but also excels in uncertainty quantification and modelling long-range dependencies.
nan
Article 914
Title@2025-07-18 (5): Fail Fast, or Ask: Mitigating the Deficiencies of Reasoning LLMs with Human-in-the-Loop Systems Engineering
Title: Fail Fast, or Ask: Mitigating the Deficiencies of Reasoning LLMs with Human-in-the-Loop Systems Engineering | Fail Fast oder Ask: Die Defizite von LLMs mit Human-in-the-Loop-System-Engineering abzumildern | 快速失灵, 或问: 减轻Loop系统人文工程公司在理据有限性方面的缺陷 2507.14406v1 |
Authors (2): Michael J. Zellinger, Matt Thomson
State-of-the-art reasoning LLMs are powerful problem solvers, but they still occasionally make mistakes. However, adopting AI models in risk-sensitive domains often requires error rates near 0%. To address this gap, we propose collaboration between a reasoning model and a human expert who resolves queries the model cannot confidently answer. We find that quantifying the uncertainty of a reasoning model through the length of its reasoning trace yields an effective basis for deferral to a human, e.g., cutting the error rate of Qwen3 235B-A22B on difficult MATH problems from 3% to less than 1% when deferring 7.5% of queries. However, the high latency of reasoning models still makes them challenging to deploy on use cases with high query volume. To address this challenge, we explore fronting a reasoning model with a large non-reasoning model. We call this modified human-in-the-loop system “Fail Fast, or Ask”, since the non-reasoning model may defer difficult queries to the human expert directly (“failing fast”), without incurring the reasoning model’s higher latency. We show that this approach yields around 40% latency reduction and about 50% cost savings for DeepSeek R1 while maintaining 90+% area under the accuracy-rejection curve. However, we observe that latency savings are lower than expected because of “latency drag”, the phenomenon that processing easier queries with a non-reasoning model pushes the reasoning model’s latency distribution towards longer latencies. Broadly, our results suggest that the deficiencies of state-of-the-art reasoning models – nontrivial error rates and high latency – can be substantially mitigated through black-box systems engineering, without requiring access to LLM internals.
nan
Article 915
Title@2025-07-18 (5): ADEPTS: A Capability Framework for Human-Centered Agent Design
Title: ADEPTS: A Capability Framework for Human-Centered Agent Design | ADEPTS: Ein Capability Framework für das Design von Human-Centered Agents | ADEPTS:以人为中心的制剂设计能力框架 2507.15885v1 |
Authors (4): Pierluca D’Oro, Caley Drooff, Joy Chen, Joseph Tighe
Large language models have paved the way to powerful and flexible AI agents, assisting humans by increasingly integrating into their daily life. This flexibility, potential, and growing adoption demands a holistic and cross-disciplinary approach to developing, monitoring and discussing the capabilities required for agent-driven user experiences. However, current guidance on human-centered AI agent development is scattered: UX heuristics focus on interface behaviors, engineering taxonomies describe internal pipelines, and ethics checklists address high-level governance. There is no concise, user-facing vocabulary that tells teams what an agent should fundamentally be able to do. We introduce ADEPTS, a capability framework defining a set of core user-facing capabilities to provide unified guidance around the development of AI agents. ADEPTS is based on six principles for human-centered agent design, that express the minimal, user-facing capabilities an AI agent should demonstrate to be understandable, controllable and trustworthy in everyday use. ADEPTS complements existing frameworks and taxonomies; differently from them, it sits at the interface between technical and experience development. By presenting ADEPTS, we aim to condense complex AI-UX requirements into a compact framework that is actionable guidance for AI researchers, designers, engineers, and policy reviewers alike. We believe ADEPTS has the potential of accelerating the improvement of user-relevant agent capabilities, of easing the design of experiences that take advantage of those capabilities, and of providing a shared language to track and discuss progress around the development of AI agents.
nan
Article 916
Title@2025-07-18 (5): Incremental Causal Graph Learning for Online Cyberattack Detection in Cyber-Physical Infrastructures
Title: Incremental Causal Graph Learning for Online Cyberattack Detection in Cyber-Physical Infrastructures | Inkrementales Causal Graph Learning für Online Cyberattack Detection in Cyber-Physical Infrastructures | 网络物理基础设施在线网络攻击探测的递增因果图表学习 2507.14387v1 |
Authors (4): Arun Vignesh Malarkkan, Dongjie Wang, Haoyue Bai, Yanjie Fu
The escalating threat of cyberattacks on real-time critical infrastructures poses serious risks to public safety, demanding detection methods that effectively capture complex system interdependencies and adapt to evolving attack patterns. Traditional real-time anomaly detection techniques often suffer from excessive false positives due to their statistical sensitivity to high data variance and class imbalance. To address these limitations, recent research has explored modeling causal relationships among system components. However, prior work mainly focuses on offline causal graph-based approaches that require static historical data and fail to generalize to real-time settings. These methods are fundamentally constrained by: (1) their inability to adapt to dynamic shifts in data distribution without retraining, and (2) the risk of catastrophic forgetting when lacking timely supervision in live systems. To overcome these challenges, we propose INCADET, a novel framework for incremental causal graph learning tailored to real-time cyberattack detection. INCADET dynamically captures evolving system behavior by incrementally updating causal graphs across streaming time windows. The framework comprises three modules: 1) Early Symptom Detection: Detects transitions in system status using divergence in edge-weight distributions across sequential causal graphs. 2) Incremental Causal Graph Learning: Leverages experience replay and edge reinforcement to continually refine causal structures while preserving prior knowledge. 3) Causal Graph Classification: Employs Graph Convolutional Networks (GCNs) to classify system status using the learned causal graphs. Extensive experiments on real-world critical infrastructure datasets demonstrate that INCADET achieves superior accuracy, robustness, and adaptability compared to both static causal and deep temporal baselines in evolving attack scenarios.
nan
Article 917
Title@2025-07-18 (5): Statistical learning for constrained functional parameters in infinite-dimensional models
Title: Statistical learning for constrained functional parameters in infinite-dimensional models | Statistisches Lernen für eingeschränkte funktionale Parameter in unendlich-dimensionalen Modellen | 关于无限模式中有限功能参数的统计学习 2404.09847v2 |
Authors (4): Razieh Nabi, Nima S. Hejazi, Mark J. van der Laan, David Benkeser
We develop a general framework for estimating function-valued parameters under equality or inequality constraints in infinite-dimensional statistical models. Such constrained learning problems are common across many areas of statistics and machine learning, where estimated parameters must satisfy structural requirements such as moment restrictions, policy benchmarks, calibration criteria, or fairness considerations. To address these problems, we characterize the solution as the minimizer of a penalized population risk using a Lagrange-type formulation, and analyze it through a statistical functional lens. Central to our approach is a constraint-specific path through the unconstrained parameter space that defines the constrained solutions. For a broad class of constraint-risk pairs, this path admits closed-form expressions and reveals how constraints shape optimal adjustments. When closed forms are unavailable, we derive recursive representations that support tractable estimation. Our results also suggest natural estimators of the constrained parameter, constructed by combining estimates of unconstrained components of the data-generating distribution. Thus, our procedure can be integrated with any statistical learning approach and implemented using standard software. We provide general conditions under which the resulting estimators achieve optimal risk and constraint satisfaction, and we demonstrate the flexibility and effectiveness of the proposed method through various examples, simulations, and real-data applications.
nan
Article 918
Title@2025-07-18 (5): Combinatorial Optimization for All: Using LLMs to Aid Non-Experts in Improving Optimization Algorithms
Title: Combinatorial Optimization for All: Using LLMs to Aid Non-Experts in Improving Optimization Algorithms | Kombinatorische Optimierung für alle: Verwendung von LLMs zur Unterstützung von Nicht-Experten bei der Verbesserung von Optimierungsalgorithmen | 组合优化全民:利用LLMs帮助非专家改进最佳化算法 2503.10968v2 |
Authors (2): Camilo Chacón Sartori, Christian Blum
Large Language Models (LLMs) have shown notable potential in code generation for optimization algorithms, unlocking exciting new opportunities. This paper examines how LLMs, rather than creating algorithms from scratch, can improve existing ones without the need for specialized expertise. To explore this potential, we selected 10 baseline optimization algorithms from various domains (metaheuristics, reinforcement learning, deterministic, and exact methods) to solve the classic Travelling Salesman Problem. The results show that our simple methodology often results in LLM-generated algorithm variants that improve over the baseline algorithms in terms of solution quality, reduction in computational time, and simplification of code complexity, all without requiring specialized optimization knowledge or advanced algorithmic implementation skills.
nan
Article 919
Title@2025-07-18 (5): Schemora: schema matching via multi-stage recommendation and metadata enrichment using off-the-shelf llms
Title: Schemora: schema matching via multi-stage recommendation and metadata enrichment using off-the-shelf llms | Schema: Schema-Matching über mehrstufige Empfehlung und Metadaten-Anreicherung mit Off-the-Shelf-llms | Schemora:通过多阶段建议和元数据利用现成光束进行元数据浓缩的匹配方案 2507.14376v1 |
Authors (3): Osman Erman Gungor, Derak Paulsen, William Kang
Schema matching is essential for integrating heterogeneous data sources and enhancing dataset discovery, yet it remains a complex and resource-intensive problem. We introduce SCHEMORA, a schema matching framework that combines large language models with hybrid retrieval techniques in a prompt-based approach, enabling efficient identification of candidate matches without relying on labeled training data or exhaustive pairwise comparisons. By enriching schema metadata and leveraging both vector-based and lexical retrieval, SCHEMORA improves matching accuracy and scalability. Evaluated on the MIMIC-OMOP benchmark, it establishes new state-of-the-art performance, with gains of 7.49% in HitRate@5 and 3.75% in HitRate@3 over previous best results. To our knowledge, this is the first LLM-based schema matching method with an open-source implementation, accompanied by analysis that underscores the critical role of retrieval and provides practical guidance on model selection.
nan
Article 920
Title@2025-07-18 (5): Prompt Smart, Pay Less: Cost-Aware APO for Real-World Applications
Title: Prompt Smart, Pay Less: Cost-Aware APO for Real-World Applications | Prompt Smart, weniger zahlen: Kosten-Bewusst-APO für Real-World-Anwendungen | 即时智能,低薪:用于现实世界应用的成本软件APO 2507.15884v1 |
Authors (4): Jayesh Choudhari, Piyush Kumar Singh, Douglas McIlwraith, Snehal Nair
Prompt design is a critical factor in the effectiveness of Large Language Models (LLMs), yet remains largely heuristic, manual, and difficult to scale. This paper presents the first comprehensive evaluation of Automatic Prompt Optimization (APO) methods for real-world, high-stakes multiclass classification in a commercial setting, addressing a critical gap in the existing literature where most of the APO frameworks have been validated only on benchmark classification tasks of limited complexity. We introduce APE-OPRO, a novel hybrid framework that combines the complementary strengths of APE and OPRO, achieving notably better cost-efficiency, around $18\%$ improvement over OPRO, without sacrificing performance. We benchmark APE-OPRO alongside both gradient-free (APE, OPRO) and gradient-based (ProTeGi) methods on a dataset of ~2,500 labeled products. Our results highlight key trade-offs: ProTeGi offers the strongest absolute performance at lower API cost but higher computational time as noted in~\cite{protegi}, while APE-OPRO strikes a compelling balance between performance, API efficiency, and scalability. We further conduct ablation studies on depth and breadth hyperparameters, and reveal notable sensitivity to label formatting, indicating implicit sensitivity in LLM behavior. These findings provide actionable insights for implementing APO in commercial applications and establish a foundation for future research in multi-label, vision, and multimodal prompt optimization scenarios.
nan
Article 921
Title@2025-07-18 (5): Smarter Together: Combining Large Language Models and Small Models for Physiological Signals Visual Inspection
Title: Smarter Together: Combining Large Language Models and Small Models for Physiological Signals Visual Inspection | Smarter Together: Kombination von großen Sprachmodellen und kleinen Modellen für die visuelle Inspektion physiologischer Signale | 将大语言模型和生理信号视觉检查小模型结合起来 2501.16215v2 |
Authors (11): Huayu Li, Zhengxiao He, Xiwen Chen, Ci Zhang, Stuart F. Quan, William D. S. Killgore, Shu-Fen Wung, Chen X. Chen, Geng Yuan, Jin Lu, Ao Li
Large language models (LLMs) have shown promising capabilities in visually interpreting medical time-series data. However, their general-purpose design can limit domain-specific precision, and the proprietary nature of many models poses challenges for fine-tuning on specialized clinical datasets. Conversely, small specialized models (SSMs) offer strong performance on focused tasks but lack the broader reasoning needed for complex medical decision-making. To address these complementary limitations, we introduce \ConMIL{} (Conformalized Multiple Instance Learning), a novel decision-support framework distinctively synergizes three key components: (1) a new Multiple Instance Learning (MIL) mechanism, QTrans-Pooling, designed for per-class interpretability in identifying clinically relevant physiological signal segments; (2) conformal prediction, integrated with MIL to generate calibrated, set-valued outputs with statistical reliability guarantees; and (3) a structured approach for these interpretable and uncertainty-quantified SSM outputs to enhance the visual inspection capabilities of LLMs. Our experiments on arrhythmia detection and sleep stage classification demonstrate that \ConMIL{} can enhance the accuracy of LLMs such as ChatGPT4.0, Qwen2-VL-7B, and MiMo-VL-7B-RL. For example, \ConMIL{}-supported Qwen2-VL-7B and MiMo-VL-7B-RL both achieves 94.92% and 96.82% precision on confident samples and (70.61% and 78.02%)/(78.10% and 71.98%) on uncertain samples for the two tasks, compared to 46.13% and 13.16% using the LLM alone. These results suggest that integrating task-specific models with LLMs may offer a promising pathway toward more interpretable and trustworthy AI-driven clinical decision support.
nan
Article 922
Title@2025-07-18 (5): Layerwise Recall and the Geometry of Interwoven Knowledge in LLMs
Title: Layerwise Recall and the Geometry of Interwoven Knowledge in LLMs | Layerwise Recall und die Geometrie des verwobenen Wissens in LLMs | 平整图层回溯和LLM 中互交知识的几何 2502.10871v2 |
Authors (2): Ge Lei, Samuel J. Cooper
This study explores how large language models (LLMs) encode interwoven scientific knowledge, using chemical elements and LLaMA-series models as a case study. We identify a 3D spiral structure in the hidden states that aligns with the conceptual structure of the periodic table, suggesting that LLMs can reflect the geometric organization of scientific concepts learned from text. Linear probing reveals that middle layers encode continuous, overlapping attributes that enable indirect recall, while deeper layers sharpen categorical distinctions and incorporate linguistic context. These findings suggest that LLMs represent symbolic knowledge not as isolated facts, but as structured geometric manifolds that intertwine semantic information across layers. We hope this work inspires further exploration of how LLMs represent and reason about scientific knowledge, particularly in domains such as materials science.
nan
Article 923
Title@2025-07-18 (5): Oversmoothing Alleviation in Graph Neural Networks: A Survey and Unified View
Title: Oversmoothing Alleviation in Graph Neural Networks: A Survey and Unified View | Überglättende Linderung in Graph Neural Networks: Eine Umfrage und Unified View | 图形神经网络的压倒性缓解:调查和统一观点 2405.01663v2 |
Authors (2): Yufei Jin, Xingquan Zhu
Oversmoothing is a common challenge in learning graph neural networks (GNN), where, as layers increase, embedding features learned from GNNs quickly become similar or indistinguishable, making them incapable of differentiating network proximity. A GNN with shallow layer architectures can only learn short-term relation or localized structure information, limiting its power of learning long-term connection, evidenced by their inferior learning performance on heterophilous graphs. Tackling oversmoothing is crucial for harnessing deep-layer architectures for GNNs. To date, many methods have been proposed to alleviate oversmoothing. The vast difference behind their design principles, combined with graph complications, make it difficult to understand and even compare the difference between different approaches in tackling the oversmoothing. In this paper, we propose ATNPA, a unified view with five key steps: Augmentation, Transformation, Normalization, Propagation, and Aggregation, to summarize GNN oversmoothing alleviation approaches. We first propose a taxonomy for GNN oversmoothing alleviation which includes three themes to tackle oversmoothing. After that, we separate all methods into six categories, followed by detailed reviews of representative methods, including their relation to ATNPA, and discussion of their niche, strength, and weakness. The review not only draws an in-depth understanding of existing methods in the field but also shows a clear road map for future study.
nan
Article 924
Title@2025-07-18 (5): Comparing skill of historical rainfall data based monsoon rainfall prediction in India with NWP forecasts
Title: Comparing skill of historical rainfall data based monsoon rainfall prediction in India with NWP forecasts | Vergleich der Fähigkeiten von historischen Niederschlagsdaten basierend auf Monsunregen Vorhersage in Indien mit NWP Prognosen | 将印度基于历史降雨数据的历史降雨量数据季风降雨量预测与内罗毕工作方案预测的技能进行比较 2402.07851v2 |
Authors (5): Apoorva Narula, Aastha Jain, Jatin Batra, MN Rajeevan, Sandeep Juneja
The Indian summer monsoon is a highly complex and critical weather system that directly affects the livelihoods of over a billion people across the Indian subcontinent. Accurate short-term forecasting remains a major scientific challenge due to the monsoon’s intrinsic nonlinearity and its sensitivity to multi-scale drivers, including local land-atmosphere interactions and large-scale ocean-atmosphere phenomena. In this study, we address the problem of forecasting daily rainfall across India during the summer months, focusing on both one-day and three-day lead times. We use Autoformers - deep learning transformer-based architectures designed for time series forecasting. These are trained on historical gridded precipitation data from the Indian Meteorological Department (1901–2023) at spatial resolutions of $0.25^\circ \times 0.25^\circ$, as well as $1^\circ \times 1^\circ$. The models also incorporate auxiliary meteorological variables from ECMWFs reanalysis datasets, namely, cloud cover, humidity, temperature, soil moisture, vorticity, and wind speed. Forecasts at $0.25^\circ \times 0.25^\circ$ are benchmarked against ECMWFs High-Resolution Ensemble System (HRES), widely regarded as the most accurate numerical weather predictor, and at $1^\circ \times 1^\circ $ with those from National Centre for Environmental Prediction (NCEP). We conduct both nationwide evaluations and localized analyses for major Indian cities. Our results indicate that transformer-based deep learning models consistently outperform both HRES and NCEP, as well as other climatological baselines. Specifically, compared to our model, forecasts from HRES and NCEP model have about 22\% and 43\% higher error, respectively, for a single day prediction, and over 27\% and 66\% higher error respectively, for a three day prediction.
nan
Article 925
Title@2025-07-18 (5): Generative Models and Connected and Automated Vehicles: A Survey in Exploring the Intersection of Transportation and AI
Title: Generative Models and Connected and Automated Vehicles: A Survey in Exploring the Intersection of Transportation and AI | Generative Modelle und vernetzte und Automatisierte Fahrzeuge: Eine Umfrage bei der Erforschung der Intersektion von Transport und KI | 生成模型以及连接和自动化车辆:探索运输和AI的交叉路口调查 2403.10559v2 |
Authors (3): Bo Shu, Yiting Zhang, Dong Shu
This report investigates the history and impact of Generative Models and Connected and Automated Vehicles (CAVs), two groundbreaking forces pushing progress in technology and transportation. By focusing on the application of generative models within the context of CAVs, the study aims to unravel how this integration could enhance predictive modeling, simulation accuracy, and decision-making processes in autonomous vehicles. This thesis discusses the benefits and challenges of integrating generative models and CAV technology in transportation. It aims to highlight the progress made, the remaining obstacles, and the potential for advancements in safety and innovation.
nan
Article 926
Title@2025-07-18 (5): Relative Entropy Pathwise Policy Optimization
Title: Relative Entropy Pathwise Policy Optimization | Relative Entropie pfadweise politische Optimierung | 相对 Entrop 路径式政策优化 2507.11019v2 |
Authors (9): Claas Voelcker, Axel Brunnbauer, Marcel Hussing, Michal Nauman, Pieter Abbeel, Eric Eaton, Radu Grosu, Amir-massoud Farahmand, Igor Gilitschenski
Score-function policy gradients have delivered strong results in game-playing, robotics and language-model fine-tuning. Yet its high-variance often undermines training stability. On the other hand, pathwise policy gradients alleviate the training variance, but are reliable only when driven by an accurate action-conditioned value function which is notoriously hard to train without relying on past off-policy data. In this paper, we discuss how to construct a value-gradient driven, on-policy algorithm that allow training Q-value models purely from on-policy data, unlocking the possibility of using pathwise policy updates in the context of on-policy learning. We show how to balance stochastic policies for exploration with constrained policy updates for stable training, and evaluate important architectural components that facilitate accurate value function learning. Building on these insights, we propose Relative Entropy Pathwise Policy Optimization (REPPO), an efficient on-policy algorithm that combines the sample-efficiency of pathwise policy gradients with the simplicity and minimal memory footprint of standard on-policy learning. We demonstrate that REPPO provides strong empirical performance at decreased sample requirements, wall-clock time, memory footprint as well as high hyperparameter robustness in a set of experiments on two standard GPU-parallelized benchmarks.
nan
Article 927
Title@2025-07-18 (5): Solo Connection: A Parameter Efficient Fine-Tuning Technique for Transformers
Title: Solo Connection: A Parameter Efficient Fine-Tuning Technique for Transformers | Solo-Anschluss: Eine Parameter-Effiziente Feintuning-Technik für Transformatoren | Solo 连接: 用于变形器的参数节能微调技术 2507.14353v1 |
Authors (2): Harsh Nilesh Pathak, Randy Paffenroth
Parameter efficient fine tuning (PEFT) is a versatile and extensible approach for adapting a Large Language Model (LLM) for newer tasks. One of the most prominent PEFT approaches, Low Rank Adaptation (LoRA), primarily focuses on adjusting the attention weight matrices within individual decoder blocks of a Generative Pre trained Transformer (GPT2). In contrast, we introduce Solo Connection a novel method that adapts the representation at the decoder-block level rather than modifying individual weight matrices. Not only does Solo Connection outperform LoRA on E2E natural language generation benchmarks, but it also reduces the number of trainable parameters by 59% relative to LoRA and by more than 99% compared to full fine-tuning of GPT2, an early version of Large Language Models (LLMs). Solo Connection is also motivated by homotopy theory: we introduce a trainable linear transformation that gradually interpolates between a zero vector and the task-specific representation, enabling smooth and stable adaptation over time. While skip connections in the original 12 layer GPT2 are typically confined to individual decoder blocks, subsequent GPT2 variants scale up to 48 layers, and even larger language models can include 128 or more decoder blocks. These expanded architectures underscore the need to revisit how skip connections are employed during fine-tuning. This paper focuses on long skip connections that link outputs of different decoder blocks, potentially enhancing the model’s ability to adapt to new tasks while leveraging pre-trained knowledge.
nan
Article 928
Title@2025-07-18 (5): Still More Shades of Null: An Evaluation Suite for Responsible Missing Value Imputation
Title: Still More Shades of Null: An Evaluation Suite for Responsible Missing Value Imputation | Noch mehr Schattierungen von Null: Eine Bewertungs-Suite für verantwortungsbewusste wertvermißte Imputation | 更多 “ 无 “ 的阴影:负责任的缺失价值估计评估套件 2409.07510v6 |
Authors (4): Falaah Arif Khan, Denys Herasymuk, Nazar Protsiv, Julia Stoyanovich
Data missingness is a practical challenge of sustained interest to the scientific community. In this paper, we present Shades-of-Null, an evaluation suite for responsible missing value imputation. Our work is novel in two ways (i) we model realistic and socially-salient missingness scenarios that go beyond Rubin’s classic Missing Completely at Random (MCAR), Missing At Random (MAR) and Missing Not At Random (MNAR) settings, to include multi-mechanism missingness (when different missingness patterns co-exist in the data) and missingness shift (when the missingness mechanism changes between training and test) (ii) we evaluate imputers holistically, based on imputation quality and imputation fairness, as well as on the predictive performance, fairness and stability of the models that are trained and tested on the data post-imputation. We use Shades-of-Null to conduct a large-scale empirical study involving 29,736 experimental pipelines, and find that while there is no single best-performing imputation approach for all missingness types, interesting trade-offs arise between predictive performance, fairness and stability, based on the combination of missingness scenario, imputer choice, and the architecture of the predictive model. We make Shades-of-Null publicly available, to enable researchers to rigorously evaluate missing value imputation methods on a wide range of metrics in plausible and socially meaningful scenarios.
nan
Article 929
Title@2025-07-18 (5): Influence Functions for Preference Dataset Pruning
Title: Influence Functions for Preference Dataset Pruning | Einflussfunktionen für Preference Dataset Pruning | 优先数据集缓冲影响函数 2507.14344v1 |
Authors (2): Daniel Fein, Gabriela Aranguiz-Dias
Language models are commonly fine-tuned via reinforcement learning to alter their behavior or elicit new capabilities. Datasets used for these purposes, and particularly human preference datasets, are often noisy. The relatively small size post-training datasets, combined with parameter-efficient fine-tuning methods, enable the use of influence functions approximations to detect and prune training examples that are harmful to performance on a validation set. In this work, we adapt the TL;DR dataset for reward model training to demonstrate how conjugate-gradient approximated influence functions can be used to filter datasets. In our experiments, influence function filtering yields a small retraining accuracy uplift of 1.5% after removing 10% of training examples. We also show that gradient similarity outperforms influence functions for detecting helpful training examples. This suggests that local curvature is important for detecting harmful training examples, but less so for identifying helpful examples.
nan
Article 930
Title@2025-07-18 (5): MENO: Hybrid Matrix Exponential-based Neural Operator for Stiff ODEs. Application to Thermochemical Kinetics
Title: MENO: Hybrid Matrix Exponential-based Neural Operator for Stiff ODEs. Application to Thermochemical Kinetics | MENO: Hybrid-Matrix Exponential-basierter Neural-Operator für Stiff-ODEs. Anwendung in der thermochemischen Kinetik | MENO: Stiff DES 混合矩阵指数基神经操作器。 2507.14341v1 |
Authors (3): Ivan Zanardi, Simone Venturi, Marco Panesi
We introduce MENO (‘‘Matrix Exponential-based Neural Operator’’), a hybrid surrogate modeling framework for efficiently solving stiff systems of ordinary differential equations (ODEs) that exhibit a sparse nonlinear structure. In such systems, only a few variables contribute nonlinearly to the dynamics, while the majority influence the equations linearly. MENO exploits this property by decomposing the system into two components: the low-dimensional nonlinear part is modeled using conventional neural operators, while the linear time-varying subsystem is integrated using a novel neural matrix exponential formulation. This approach combines the exact solution of linear time-invariant systems with learnable, time-dependent graph-based corrections applied to the linear operators. Unlike black-box or soft-constrained physics-informed (PI) models, MENO embeds the governing equations directly into its architecture, ensuring physical consistency (e.g., steady states), improved robustness, and more efficient training. We validate MENO on three complex thermochemical systems: the POLLU atmospheric chemistry model, an oxygen mixture in thermochemical nonequilibrium, and a collisional-radiative argon plasma in one- and two-dimensional shock-tube simulations. MENO achieves relative errors below 2% in trained zero-dimensional settings and maintains good accuracy in extrapolatory multidimensional regimes. It also delivers substantial computational speedups, achieving up to 4 800$\times$ on GPU and 185$\times$ on CPU compared to standard implicit ODE solvers. Although intrusive by design, MENO’s physics-based architecture enables superior generalization and reliability, offering a scalable path for real-time simulation of stiff reactive systems.
nan
Article 931
Title@2025-07-18 (5): Topological Social Choice: Designing a Noise-Robust Polar Distance for Persistence Diagrams
Title: Topological Social Choice: Designing a Noise-Robust Polar Distance for Persistence Diagrams | Topologische soziale Wahl: Entwerfen einer Rausch-Robusten Polardistanz für Persistenzdiagramme | 地形社会选择:为持久性图解设计一个噪音-沸流极地距离 2507.14340v1 |
Authors (2): Athanasios Andrikopoulos, Nikolaos Sampanis
Topological Data Analysis (TDA) has emerged as a powerful framework for extracting robust and interpretable features from noisy high-dimensional data. In the context of Social Choice Theory, where preference profiles and collective decisions are geometrically rich yet sensitive to perturbations, TDA remains largely unexplored. This work introduces a novel conceptual bridge between these domains by proposing a new metric framework for persistence diagrams tailored to noisy preference data.We define a polar coordinate-based distance that captures both the magnitude and orientation of topological features in a smooth and differentiable manner. Our metric addresses key limitations of classical distances, such as bottleneck and Wasserstein, including instability under perturbation, lack of continuity, and incompatibility with gradient-based learning. The resulting formulation offers improved behavior in both theoretical and applied settings.To the best of our knowledge, this is the first study to systematically apply persistent homology to social choice systems, providing a mathematically grounded method for comparing topological summaries of voting structures and preference dynamics. We demonstrate the superiority of our approach through extensive experiments, including robustness tests and supervised learning tasks, and we propose a modular pipeline for building predictive models from online preference data. This work contributes a conceptually novel and computationally effective tool to the emerging interface of topology and decision theory, opening new directions in interpretable machine learning for political and economic systems.
nan
Article 932
Title@2025-07-18 (5): Fiduciary AI for the Future of Brain-Technology Interactions
Title: Fiduciary AI for the Future of Brain-Technology Interactions | Fiduciary KI für die Zukunft von Brain-Technology Interaktionen | 未来脑-技术相互作用协会 2507.14339v1 |
Authors (3): Abhishek Bhattacharjee, Jack Pilkington, Nita Farahany
Brain foundation models represent a new frontier in AI: instead of processing text or images, these models interpret real-time neural signals from EEG, fMRI, and other neurotechnologies. When integrated with brain-computer interfaces (BCIs), they may enable transformative applications-from thought controlled devices to neuroprosthetics-by interpreting and acting on brain activity in milliseconds. However, these same systems pose unprecedented risks, including the exploitation of subconscious neural signals and the erosion of cognitive liberty. Users cannot easily observe or control how their brain signals are interpreted, creating power asymmetries that are vulnerable to manipulation. This paper proposes embedding fiduciary duties-loyalty, care, and confidentiality-directly into BCI-integrated brain foundation models through technical design. Drawing on legal traditions and recent advancements in AI alignment techniques, we outline implementable architectural and governance mechanisms to ensure these systems act in users’ best interests. Placing brain foundation models on a fiduciary footing is essential to realizing their potential without compromising self-determination.
nan
Article 933
Title@2025-07-18 (5): Document Haystack: A Long Context Multimodal Image/Document Understanding Vision LLM Benchmark
Title: Document Haystack: A Long Context Multimodal Image/Document Understanding Vision LLM Benchmark | Document Haystack: Ein langer Kontext Multimodales Bild/Dokument Verständnis Vision LLM Benchmark | Haystack文件:长期、多模式图像/文件理解愿景LLM基准 2507.15882v1 |
Authors (5): Goeric Huybrechts, Srikanth Ronanki, Sai Muralidhar Jayanthi, Jack Fitzgerald, Srinivasan Veeravanallur
The proliferation of multimodal Large Language Models has significantly advanced the ability to analyze and understand complex data inputs from different modalities. However, the processing of long documents remains under-explored, largely due to a lack of suitable benchmarks. To address this, we introduce Document Haystack, a comprehensive benchmark designed to evaluate the performance of Vision Language Models (VLMs) on long, visually complex documents. Document Haystack features documents ranging from 5 to 200 pages and strategically inserts pure text or multimodal text+image “needles” at various depths within the documents to challenge VLMs’ retrieval capabilities. Comprising 400 document variants and a total of 8,250 questions, it is supported by an objective, automated evaluation framework. We detail the construction and characteristics of the Document Haystack dataset, present results from prominent VLMs and discuss potential research avenues in this area.
nan
Article 934
Title@2025-07-18 (5): GreenCrossingAI: A Camera Trap/Computer Vision Pipeline for Environmental Science Research Groups
Title: GreenCrossingAI: A Camera Trap/Computer Vision Pipeline for Environmental Science Research Groups | GreenCrossingAI: Eine Kamerafalle/Computer Vision Pipeline für Forschungsgruppen der Umweltwissenschaften | GreenCrossingAI:环境科学研究小组的相机陷阱/计算机视觉管道 2507.09410v2 |
Authors (5): Bernie Boscoe, Shawn Johnson, Andrea Osbon, Chandler Campbell, Karen Mager
Camera traps have long been used by wildlife researchers to monitor and study animal behavior, population dynamics, habitat use, and species diversity in a non-invasive and efficient manner. While data collection from the field has increased with new tools and capabilities, methods to develop, process, and manage the data, especially the adoption of ML/AI tools, remain challenging. These challenges include the sheer volume of data generated, the need for accurate labeling and annotation, variability in environmental conditions affecting data quality, and the integration of ML/AI tools into existing workflows that often require domain-specific customization and computational resources. This paper provides a guide to a low-resource pipeline to process camera trap data on-premise, incorporating ML/AI capabilities tailored for small research groups with limited resources and computational expertise. By focusing on practical solutions, the pipeline offers accessible approaches for data transmission, inference, and evaluation, enabling researchers to discover meaningful insights from their ever-increasing camera trap datasets.
nan
Article 935
Title@2025-07-18 (5): Development and Deployment of Hybrid ML Models for Critical Heat Flux Prediction in Annulus Geometries
Title: Development and Deployment of Hybrid ML Models for Critical Heat Flux Prediction in Annulus Geometries | Entwicklung und Einsatz von Hybrid-ML-Modellen für kritische Wärmeflussprognosen in Annulus Geometrien | 开发和部署安努卢斯地貌特征下临界热量流量预测混合模型模型 2507.14332v1 |
Authors (4): Aidan Furlong, Xingang Zhao, Robert Salko, Xu Wu
Accurate prediction of critical heat flux (CHF) is an essential component of safety analysis in pressurized and boiling water reactors. To support reliable prediction of this quantity, several empirical correlations and lookup tables have been constructed from physical experiments over the past several decades. With the onset of accessible machine learning (ML) frameworks, multiple initiatives have been established with the goal of predicting CHF more accurately than these traditional methods. While purely data-driven surrogate modeling has been extensively investigated, these approaches lack interpretability, lack resilience to data scarcity, and have been developed mostly using data from tube experiments. As a result, bias-correction hybrid approaches have become increasingly popular, which correct initial “low-fidelity” estimates provided by deterministic base models by using ML-predicted residuals. This body of work has mostly considered round tube geometries; annular geometry-specific ML models have not yet been deployed in thermal hydraulic codes. This study developed, deployed, and validated four ML models to predict CHF in annular geometries using the CTF subchannel code. Three empirical correlation models, Biasi, Bowring, and Katto, were used as base models for comparison. The ML models were trained and tested using 577 experimental annulus data points from four datasets: Becker, Beus, Janssen, and Mortimore. Baseline CHF predictions were obtained from the empirical correlations, with mean relative errors above 26%. The ML-driven models achieved mean relative errors below 3.5%, with no more than one point exceeding the 10% error envelope. In all cases, the hybrid ML models significantly outperformed their empirical counterparts.
nan
Article 936
Title@2025-07-18 (5): Defending Against Unforeseen Failure Modes with Latent Adversarial Training
Title: Defending Against Unforeseen Failure Modes with Latent Adversarial Training | Verteidigung gegen unvorhergesehene Ausfallmodi mit latenten Adversarial Training | 利用远程反反向培训,防范意外失灵模式 2403.05030v5 |
Authors (4): Stephen Casper, Lennart Schulze, Oam Patel, Dylan Hadfield-Menell
Despite extensive diagnostics and debugging by developers, AI systems sometimes exhibit harmful unintended behaviors. Finding and fixing these is challenging because the attack surface is so large – it is not tractable to exhaustively search for inputs that may elicit harmful behaviors. Red-teaming and adversarial training (AT) are commonly used to improve robustness, however, they empirically struggle to fix failure modes that differ from the attacks used during training. In this work, we utilize latent adversarial training (LAT) to defend against vulnerabilities without leveraging knowledge of what they are or using inputs that elicit them. LAT makes use of the compressed, abstract, and structured latent representations of concepts that the network actually uses for prediction. Here, we use it to defend against failure modes without examples that elicit them. Specifically, we use LAT to remove backdoors and defend against held-out classes of adversarial attacks. We show in image classification, text classification, and text generation tasks that LAT usually improves both robustness to novel attacks and performance on clean data relative to AT. This suggests that LAT can be a promising tool for defending against failure modes that are not explicitly identified by developers.
nan
Article 937
Title@2025-07-18 (5): Plan for Speed: Dilated Scheduling for Masked Diffusion Language Models
Title: Plan for Speed: Dilated Scheduling for Masked Diffusion Language Models | Plan für Geschwindigkeit: Erweitertes Scheduling für maskierte Diffusions-Sprachmodelle | 速度计划: 遮蔽传播语言模型的饱和日程安排 2506.19037v2 |
Authors (3): Omer Luxembourg, Haim Permuter, Eliya Nachmani
Masked diffusion language models (MDLMs) promise fast, non-autoregressive text generation, yet existing samplers, which pick tokens to unmask based on model confidence, ignore interactions when unmasking multiple positions in parallel and effectively reduce to slow, autoregressive behavior. We propose the Dilated Unmasking Scheduler (DUS), an inference-only, planner-model-free method that partitions sequence positions into non-adjacent dilated groups and unmasked them in parallel so as to minimize an upper bound on joint entropy gain at each denoising step. By explicitly trading off the number of network calls against generation quality, DUS recovers most of the performance lost under traditional parallel unmasking strategies. Across math (GSM8K, MATH500), code (HumanEval, MBPP) and general-knowledge benchmarks (BBH, MMLU-Pro), DUS outperforms confidence-based planners, without modifying the underlying denoiser, and reveals the true speed-quality frontier of MDLMs.
nan
Article 938
Title@2025-07-18 (5): TREET: TRansfer Entropy Estimation via Transformers
Title: TREET: TRansfer Entropy Estimation via Transformers | TREET: TRansfer-Entropieschätzung über Transformatoren | TREET: 通过变压器对TRansfer Entropy 估计 2402.06919v4 |
Authors (3): Omer Luxembourg, Dor Tsur, Haim Permuter
Transfer entropy (TE) is an information theoretic measure that reveals the directional flow of information between processes, providing valuable insights for a wide range of real-world applications. This work proposes Transfer Entropy Estimation via Transformers (TREET), a novel attention-based approach for estimating TE for stationary processes. The proposed approach employs Donsker-Varadhan representation to TE and leverages the attention mechanism for the task of neural estimation. We propose a detailed theoretical and empirical study of the TREET, comparing it to existing methods on a dedicated estimation benchmark. To increase its applicability, we design an estimated TE optimization scheme that is motivated by the functional representation lemma, and use it to estimate the capacity of communication channels with memory, which is a canonical optimization problem in information theory. We further demonstrate how an optimized TREET can be used to estimate underlying densities, providing experimental results. Finally, we apply TREET to feature analysis of patients with Apnea, demonstrating its applicability to real-world physiological data. Our work, applied with state-of-the-art deep learning methods, opens a new door for communication problems which are yet to be solved.
nan
Article 939
Title@2025-07-18 (5): Rethinking Individual Fairness in Deepfake Detection
Title: Rethinking Individual Fairness in Deepfake Detection | Individuelle Fairness in Deepfake Detection neu denken | 重新思考个人在深假探测中的公平性 2507.14326v1 |
Authors (4): Aryana Hou, Li Lin, Justin Li, Shu Hu
Generative AI models have substantially improved the realism of synthetic media, yet their misuse through sophisticated DeepFakes poses significant risks. Despite recent advances in deepfake detection, fairness remains inadequately addressed, enabling deepfake markers to exploit biases against specific populations. While previous studies have emphasized group-level fairness, individual fairness (i.e., ensuring similar predictions for similar individuals) remains largely unexplored. In this work, we identify for the first time that the original principle of individual fairness fundamentally fails in the context of deepfake detection, revealing a critical gap previously unexplored in the literature. To mitigate it, we propose the first generalizable framework that can be integrated into existing deepfake detectors to enhance individual fairness and generalization. Extensive experiments conducted on leading deepfake datasets demonstrate that our approach significantly improves individual fairness while maintaining robust detection performance, outperforming state-of-the-art methods. The code is available at https://github.com/Purdue-M2/Individual-Fairness-Deepfake-Detection.
nan
Article 940
Title@2025-07-18 (5): The Elicitation Game: Evaluating Capability Elicitation Techniques
Title: The Elicitation Game: Evaluating Capability Elicitation Techniques | Das Elizitation Spiel: Evaluieren der Fähigkeit Elizitationstechniken | Eliucation Game: Elicative Elication Techniques: Elicity Elicucation Technologies 引用游戏:评估能力应用技术 2502.02180v3 |
Authors (6): Felix Hofstätter, Teun van der Weij, Jayden Teoh, Rada Djoneva, Henning Bartsch, Francis Rhys Ward
Capability evaluations are required to understand and regulate AI systems that may be deployed or further developed. Therefore, it is important that evaluations provide an accurate estimation of an AI system’s capabilities. However, in numerous cases, previously latent capabilities have been elicited from models, sometimes long after initial release. Accordingly, substantial efforts have been made to develop methods for eliciting latent capabilities from models. In this paper, we evaluate the effectiveness of capability elicitation techniques by intentionally training model organisms – language models with hidden capabilities that are revealed by a password. We introduce a novel method for training model organisms, based on circuit-breaking, which is more robust to elicitation techniques than standard password-locked models. We focus on elicitation techniques based on prompting and activation steering, and compare these to fine-tuning methods. Prompting techniques can elicit the actual capability of both password-locked and circuit-broken model organisms in the MCQA setting, while steering fails to do so. For a code-generation task, only fine-tuning can elicit the hidden capabilities of our novel model organism. Additionally, our results suggest that combining techniques improves elicitation. Still, if possible, fine-tuning should be the method of choice to improve the trustworthiness of capability evaluations.
nan
Article 941
Title@2025-07-18 (5): FedStrategist: A Meta-Learning Framework for Adaptive and Robust Aggregation in Federated Learning
Title: FedStrategist: A Meta-Learning Framework for Adaptive and Robust Aggregation in Federated Learning | FedStrategist: Ein Meta-Learning-Framework für adaptive und robuste Aggregation im Federated Learning | 联邦战略:联邦学习中适应性和强力聚合的元学习框架 2507.14322v1 |
Authors (3): Md Rafid Haque, Abu Raihan Mostofa Kamal, Md. Azam Hossain
Federated Learning (FL) offers a paradigm for privacy-preserving collaborative AI, but its decentralized nature creates significant vulnerabilities to model poisoning attacks. While numerous static defenses exist, their effectiveness is highly context-dependent, often failing against adaptive adversaries or in heterogeneous data environments. This paper introduces FedStrategist, a novel meta-learning framework that reframes robust aggregation as a real-time, cost-aware control problem. We design a lightweight contextual bandit agent that dynamically selects the optimal aggregation rule from an arsenal of defenses based on real-time diagnostic metrics. Through comprehensive experiments, we demonstrate that no single static rule is universally optimal. We show that our adaptive agent successfully learns superior policies across diverse scenarios, including a ``Krum-favorable” environment and against a sophisticated “stealth” adversary designed to neutralize specific diagnostic signals. Critically, we analyze the paradoxical scenario where a non-robust baseline achieves high but compromised accuracy, and demonstrate that our agent learns a conservative policy to prioritize model integrity. Furthermore, we prove the agent’s policy is controllable via a single “risk tolerance” parameter, allowing practitioners to explicitly manage the trade-off between performance and security. Our work provides a new, practical, and analyzable approach to creating resilient and intelligent decentralized AI systems.
nan
Article 942
Title@2025-07-18 (5): Symbolic Mixture-of-Experts: Adaptive Skill-based Routing for Heterogeneous Reasoning
Title: Symbolic Mixture-of-Experts: Adaptive Skill-based Routing for Heterogeneous Reasoning | Symbolische Mixture-of-Experts: Adaptives Skill-basiertes Routing für heterogene Vernunft | 专家的混合符号:基于适应性技能的异异源理据调离 2503.05641v3 |
Authors (5): Justin Chih-Yao Chen, Sukwon Yun, Elias Stengel-Eskin, Tianlong Chen, Mohit Bansal
Combining existing pre-trained expert LLMs is a promising avenue for scalably tackling large-scale and diverse tasks. However, selecting task-level experts is often too coarse-grained, as heterogeneous tasks may require different expertise per instance. To enable adaptive instance-level mixing of pre-trained LLM experts, we propose Symbolic-MoE, a symbolic, text-based, and gradient-free Mixture-of-Experts framework. Symbolic-MoE takes a fine-grained approach to selection by emphasizing skills, e.g., algebra in math or molecular biology in biomedical reasoning. We propose a skill-based recruiting strategy that dynamically selects the most relevant set of expert LLMs for diverse reasoning tasks based on their strengths. Each selected expert then generates its own reasoning, resulting in k outputs from k experts, which are then synthesized into a final high-quality response by an aggregator chosen based on its ability to integrate diverse reasoning outputs. We show that Symbolic-MoE’s instance-level expert selection improves performance by a large margin but – when implemented naively – can introduce a high computational overhead due to the need for constant model loading and offloading. To address this, we implement a batch strategy that groups instances based on their assigned experts, loading each model only once. This allows us to integrate 16 expert models on 1 GPU with a time cost comparable to or better than prior multi-agent baselines using 4 GPUs. Through extensive evaluations on diverse benchmarks (MMLU-Pro, GPQA, AIME, and MedMCQA), we show that Symbolic-MoE beats strong LLMs like GPT4o-mini, as well as multi-agent approaches, with an absolute avg. gain of 8.15% over the best multi-agent baseline. Moreover, Symbolic-MoE generalizes well to unseen tasks and removes the need for expensive multi-round discussions, outperforming discussion baselines with less computation.
nan
Article 943
Title@2025-07-18 (5): Aligning Large Language Models to Low-Resource Languages through LLM-Based Selective Translation: A Systematic Study
Title: Aligning Large Language Models to Low-Resource Languages through LLM-Based Selective Translation: A Systematic Study | Ausrichtung großer Sprachmodelle auf ressourcenarme Sprachen durch LLM-basierte Selektive Übersetzung: Eine systematische Studie | 通过基于LLM的选择性翻译,使大语言模式与低资源语言相一致:系统研究 2507.14304v1 |
Authors (7): Rakesh Paul, Anusha Kamath, Kanishk Singla, Raviraj Joshi, Utkarsh Vaidya, Sanjay Singh Chauhan, Niranjan Wartikar
Multilingual large language models (LLMs) often demonstrate a performance gap between English and non-English languages, particularly in low-resource settings. Aligning these models to low-resource languages is essential yet challenging due to limited high-quality data. While English alignment datasets are readily available, curating equivalent data in other languages is expensive and time-consuming. A common workaround is to translate existing English alignment data; however, standard translation techniques often fail to preserve critical elements such as code, mathematical expressions, and structured formats like JSON. In this work, we investigate LLM-based selective translation, a technique that selectively translates only the translatable parts of a text while preserving non-translatable content and sentence structure. We conduct a systematic study to explore key questions around this approach, including its effectiveness compared to vanilla translation, the importance of filtering noisy outputs, and the benefits of mixing translated samples with original English data during alignment. Our experiments focus on the low-resource Indic language Hindi and compare translations generated by Google Cloud Translation (GCP) and Llama-3.1-405B. The results highlight the promise of selective translation as a practical and effective method for improving multilingual alignment in LLMs.
nan
Article 944
Title@2025-07-18 (5): A universal augmentation framework for long-range electrostatics in machine learning interatomic potentials
Title: A universal augmentation framework for long-range electrostatics in machine learning interatomic potentials | Ein universeller Augmentations-Rahmen für Langstrecken-Elektrostatik in interatomaren Potenzialen des maschinellen Lernens | 用于机器学习跨原子潜能的远程电磁学的通用扩增框架 2507.14302v1 |
Authors (6): Dongjin Kim, Xiaoyu Wang, Peichen Zhong, Daniel S. King, Theo Jaffrelot Inizan, Bingqing Cheng
Most current machine learning interatomic potentials (MLIPs) rely on short-range approximations, without explicit treatment of long-range electrostatics. To address this, we recently developed the Latent Ewald Summation (LES) method, which infers electrostatic interactions, polarization, and Born effective charges (BECs), just by learning from energy and force training data. Here, we present LES as a standalone library, compatible with any short-range MLIP, and demonstrate its integration with methods such as MACE, NequIP, CACE, and CHGNet. We benchmark LES-enhanced models on distinct systems, including bulk water, polar dipeptides, and gold dimer adsorption on defective substrates, and show that LES not only captures correct electrostatics but also improves accuracy. Additionally, we scale LES to large and chemically diverse data by training MACELES-OFF on the SPICE set containing molecules and clusters, making a universal MLIP with electrostatics for organic systems including biomolecules. MACELES-OFF is more accurate than its short-range counterpart (MACE-OFF) trained on the same dataset, predicts dipoles and BECs reliably, and has better descriptions of bulk liquids. By enabling efficient long-range electrostatics without directly training on electrical properties, LES paves the way for electrostatic foundation MLIPs.
nan
Article 945
Title@2025-07-18 (5): Age of Information Minimization in UAV-Enabled Integrated Sensing and Communication Systems
Title: Age of Information Minimization in UAV-Enabled Integrated Sensing and Communication Systems | Alter der Informationsminimierung in UAV-fähigen integrierten Sensing- und Kommunikationssystemen | 无人驾驶航空器 – – 使用无人驾驶航空器的 综合遥感和通信系统信息最小化的时代 2507.14299v1 |
Authors (7): Yu Bai, Yifan Zhang, Boxuan Xie, Zheng Chang, Yanru Zhang, Riku Jantti, Zhu Han
Unmanned aerial vehicles (UAVs) equipped with integrated sensing and communication (ISAC) capabilities are envisioned to play a pivotal role in future wireless networks due to their enhanced flexibility and efficiency. However, jointly optimizing UAV trajectory planning, multi-user communication, and target sensing under stringent resource constraints and time-critical conditions remains a significant challenge. To address this, we propose an Age of Information (AoI)-centric UAV-ISAC system that simultaneously performs target sensing and serves multiple ground users, emphasizing information freshness as the core performance metric. We formulate a long-term average AoI minimization problem that jointly optimizes the UAV’s flight trajectory and beamforming. To tackle the high-dimensional, non-convexity of this problem, we develop a deep reinforcement learning (DRL)-based algorithm capable of providing real-time decisions on UAV movement and beamforming for both radar sensing and multi-user communication. Specifically, a Kalman filter is employed for accurate target state prediction, regularized zero-forcing is utilized to mitigate inter-user interference, and the Soft Actor-Critic algorithm is applied for training the DRL agent on continuous actions. The proposed framework adaptively balances the trade-offs between sensing accuracy and communication quality. Extensive simulation results demonstrate that our proposed method consistently achieves lower average AoI compared to baseline approaches.
nan
Article 946
Title@2025-07-18 (5): A Simple “Try Again” Can Elicit Multi-Turn LLM Reasoning
Title: A Simple “Try Again” Can Elicit Multi-Turn LLM Reasoning | Ein einfaches “Testen Sie wieder” kann die Multi-Turn LLM Reasoning beseitigen | 简单“ 再试一次 ” , 能够将多发 LLM 解析 2507.14295v1 |
Authors (8): Licheng Liu, Zihan Wang, Linjie Li, Chenwei Xu, Yiping Lu, Han Liu, Avirup Sil, Manling Li
Multi-turn problem solving is critical yet challenging for Large Reasoning Models (LRMs) to reflect on their reasoning and revise from feedback. Existing Reinforcement Learning (RL) methods train large reasoning models on a single-turn paradigm with verifiable rewards. However, we observe that models trained with existing RL paradigms often lose their ability to solve problems across multiple turns and struggle to revise answers based on contextual feedback, leading to repetitive responses. We ask: can LRMs learn to reflect their answers in a multi-turn context? In this work, we find that training models with multi-turn RL using only unary feedback (e.g., “Let’s try again”) after wrong answers can improve both single-turn performance and multi-turn reasoning. We introduce Unary Feedback as Observation (UFO) for reinforcement learning, which uses minimal yet common unary user feedback during iterative problem solving. It can be easily applied to existing single-turn RL training setups. Experimental results show that RL training with UFO keeps single-turn performance and improves multi-turn reasoning accuracy by up to 14%, enabling language models to better react to feedback in multi-turn problem solving. To further minimize the number of turns needed for a correct answer while encouraging diverse reasoning when mistakes occur, we design reward structures that guide models to produce careful and deliberate answers in each turn. Code: https://github.com/lichengliu03/unary-feedback
nan
Article 947
Title@2025-07-18 (5): Toward Temporal Causal Representation Learning with Tensor Decomposition
Title: Toward Temporal Causal Representation Learning with Tensor Decomposition | Auf dem Weg zur zeitlichen kausalen Repräsentation Lernen mit Tensor-Zersetzung | 走向时间性因果代表制学习,使Tensor分解 2507.14126v1 |
Authors (4): Jianhong Chen, Meng Zhao, Mostafa Reisi Gahrooei, Xubo Yue
Temporal causal representation learning is a powerful tool for uncovering complex patterns in observational studies, which are often represented as low-dimensional time series. However, in many real-world applications, data are high-dimensional with varying input lengths and naturally take the form of irregular tensors. To analyze such data, irregular tensor decomposition is critical for extracting meaningful clusters that capture essential information. In this paper, we focus on modeling causal representation learning based on the transformed information. First, we present a novel causal formulation for a set of latent clusters. We then propose CaRTeD, a joint learning framework that integrates temporal causal representation learning with irregular tensor decomposition. Notably, our framework provides a blueprint for downstream tasks using the learned tensor factors, such as modeling latent structures and extracting causal information, and offers a more flexible regularization design to enhance tensor decomposition. Theoretically, we show that our algorithm converges to a stationary point. More importantly, our results fill the gap in theoretical guarantees for the convergence of state-of-the-art irregular tensor decomposition. Experimental results on synthetic and real-world electronic health record (EHR) datasets (MIMIC-III), with extensive benchmarks from both phenotyping and network recovery perspectives, demonstrate that our proposed method outperforms state-of-the-art techniques and enhances the explainability of causal representations.
nan
Article 948
Title@2025-07-18 (5): A General Framework for Inference-time Scaling and Steering of Diffusion Models
Title: A General Framework for Inference-time Scaling and Steering of Diffusion Models | Ein allgemeiner Rahmen für Schlussfolgerungs-Zeit-Skalierung und Steuerung von Diffusionsmodellen | 传播模型的推推时间缩放和引导总框架 2501.06848v5 |
Authors (7): Raghav Singhal, Zachary Horvitz, Ryan Teehan, Mengye Ren, Zhou Yu, Kathleen McKeown, Rajesh Ranganath
Diffusion models produce impressive results in modalities ranging from images and video to protein design and text. However, generating samples with user-specified properties remains a challenge. Recent research proposes fine-tuning models to maximize rewards that capture desired properties, but these methods require expensive training and are prone to mode collapse. In this work, we present Feynman-Kac (FK) steering, an inference-time framework for steering diffusion models with reward functions. FK steering works by sampling a system of multiple interacting diffusion processes, called particles, and resampling particles at intermediate steps based on scores computed using functions called potentials. Potentials are defined using rewards for intermediate states and are selected such that a high value indicates that the particle will yield a high-reward sample. We explore various choices of potentials, intermediate rewards, and samplers. We evaluate FK steering on text-to-image and text diffusion models. For steering text-to-image models with a human preference reward, we find that FK steering a 0.8B parameter model outperforms a 2.6B parameter fine-tuned model on prompt fidelity, with faster sampling and no training. For steering text diffusion models with rewards for text quality and specific text attributes, we find that FK steering generates lower perplexity, more linguistically acceptable outputs and enables gradient-free control of attributes like toxicity. Our results demonstrate that inference-time scaling and steering of diffusion models - even with off-the-shelf rewards - can provide significant sample quality gains and controllability benefits. Code is available at https://github.com/zacharyhorvitz/Fk-Diffusion-Steering .
nan
Article 949
Title@2025-07-18 (5): Kolmogorov Arnold Networks (KANs) for Imbalanced Data – An Empirical Perspective
Title: Kolmogorov Arnold Networks (KANs) for Imbalanced Data – An Empirical Perspective | Kolmogorov Arnold Networks (KANs) für unausgewogene Daten – Eine empirische Perspektive | Kolmogorov Arnold 数据不平衡网络 – – 经验视角 2507.14121v1 |
Authors (2): Pankaj Yadav, Vivek Vijay
Kolmogorov Arnold Networks (KANs) are recent architectural advancement in neural computation that offer a mathematically grounded alternative to standard neural networks. This study presents an empirical evaluation of KANs in context of class imbalanced classification, using ten benchmark datasets. We observe that KANs can inherently perform well on raw imbalanced data more effectively than Multi-Layer Perceptrons (MLPs) without any resampling strategy. However, conventional imbalance strategies fundamentally conflict with KANs mathematical structure as resampling and focal loss implementations significantly degrade KANs performance, while marginally benefiting MLPs. Crucially, KANs suffer from prohibitive computational costs without proportional performance gains. Statistical validation confirms that MLPs with imbalance techniques achieve equivalence with KANs ( | d | < 0.08 across metrics) at minimal resource costs. These findings reveal that KANs represent a specialized solution for raw imbalanced data where resources permit. But their severe performance-resource tradeoffs and incompatibility with standard resampling techniques currently limits practical deployment. We identify critical research priorities as developing KAN specific architectural modifications for imbalance learning, optimizing computational efficiency, and theoretical reconciling their conflict with data augmentation. This work establishes foundational insights for next generation KAN architectures in imbalanced classification scenarios. |
nan
Article 950
Title@2025-07-18 (5): Harmony in Divergence: Towards Fast, Accurate, and Memory-efficient Zeroth-order LLM Fine-tuning
Title: Harmony in Divergence: Towards Fast, Accurate, and Memory-efficient Zeroth-order LLM Fine-tuning | Harmonie in Divergenz: Auf dem Weg zu einer schnellen, präzisen und speichereffizienten Null-Order-LLM Feinabstimmung | 和谐共存:快速、准确和记忆效率高的零级LLM微调 2502.03304v2 |
Authors (9): Qitao Tan, Jun Liu, Zheng Zhan, Caiwei Ding, Yanzhi Wang, Xiaolong Ma, Jaewoo Lee, Jin Lu, Geng Yuan
Large language models (LLMs) excel across various tasks, but standard first-order (FO) fine-tuning demands considerable memory, significantly limiting real-world deployment. Recently, zeroth-order (ZO) optimization stood out as a promising memory-efficient training paradigm, avoiding backward passes and relying solely on forward passes for gradient estimation, making it attractive for resource-constrained scenarios. However, ZO method lags far behind FO method in both convergence speed and accuracy. To bridge the gap, we introduce a novel layer-wise divergence analysis that uncovers the distinct update pattern of FO and ZO optimization. Aiming to resemble the learning capacity of FO method from the findings, we propose Divergence-driven Zeroth-Order (DiZO) optimization. DiZO conducts divergence-driven layer adaptation by incorporating projections to ZO updates, generating diverse-magnitude updates precisely scaled to layer-wise individual optimization needs. Our results demonstrate that DiZO significantly reduces the needed iterations for convergence without sacrificing throughput, cutting training GPU hours by up to 48% on various datasets. Moreover, DiZO consistently outperforms the representative ZO baselines in fine-tuning RoBERTa-large, OPT-series, and Llama-series on downstream tasks and, in some cases, even surpasses memory-intensive FO fine-tuning. Our code is released at https://anonymous.4open.science/r/DiZO-E86D.
nan
Article 951
Title@2025-07-18 (5): NoHumansRequired: Autonomous High-Quality Image Editing Triplet Mining
Title: NoHumansRequired: Autonomous High-Quality Image Editing Triplet Mining | NoHumansRequired: Autonome High-Quality Bildbearbeitung Triplet Mining | 无人要求:自主高品质图像编辑三线采矿 2507.14119v1 |
Authors (7): Maksim Kuprashevich, Grigorii Alekseenko, Irina Tolstykh, Georgii Fedorov, Bulat Suleimanov, Vladimir Dokholyan, Aleksandr Gordeev
Recent advances in generative modeling enable image editing assistants that follow natural language instructions without additional user input. Their supervised training requires millions of triplets: original image, instruction, edited image. Yet mining pixel-accurate examples is hard. Each edit must affect only prompt-specified regions, preserve stylistic coherence, respect physical plausibility, and retain visual appeal. The lack of robust automated edit-quality metrics hinders reliable automation at scale. We present an automated, modular pipeline that mines high-fidelity triplets across domains, resolutions, instruction complexities, and styles. Built on public generative models and running without human intervention, our system uses a task-tuned Gemini validator to score instruction adherence and aesthetics directly, removing any need for segmentation or grounding models. Inversion and compositional bootstrapping enlarge the mined set by approximately 2.2x, enabling large-scale high-fidelity training data. By automating the most repetitive annotation steps, the approach allows a new scale of training without human labeling effort. To democratize research in this resource-intensive area, we release NHR-Edit: an open dataset of 358k high-quality triplets. In the largest cross-dataset evaluation, it surpasses all public alternatives. We also release Bagel-NHR-Edit, an open-source fine-tuned Bagel model, which achieves state-of-the-art metrics in our experiments.
nan
Article 952
Title@2025-07-18 (5): Quantum Boltzmann Machines using Parallel Annealing for Medical Image Classification
Title: Quantum Boltzmann Machines using Parallel Annealing for Medical Image Classification | Quantum Boltzmann Maschinen mit paralleler Abschirmung für medizinische Bildklassifikation | 使用平行安内处理医疗图像分类的 量子波尔兹曼机器 2507.14116v1 |
Authors (8): Daniëlle Schuman, Mark V. Seebode, Tobias Rohe, Maximilian Balthasar Mansky, Michael Schroedl-Baumann, Jonas Stein, Claudia Linnhoff-Popien, Florian Krellner
Exploiting the fact that samples drawn from a quantum annealer inherently follow a Boltzmann-like distribution, annealing-based Quantum Boltzmann Machines (QBMs) have gained increasing popularity in the quantum research community. While they harbor great promises for quantum speed-up, their usage currently stays a costly endeavor, as large amounts of QPU time are required to train them. This limits their applicability in the NISQ era. Following the idea of No`e et al. (2024), who tried to alleviate this cost by incorporating parallel quantum annealing into their unsupervised training of QBMs, this paper presents an improved version of parallel quantum annealing that we employ to train QBMs in a supervised setting. Saving qubits to encode the inputs, the latter setting allows us to test our approach on medical images from the MedMNIST data set (Yang et al., 2023), thereby moving closer to real-world applicability of the technology. Our experiments show that QBMs using our approach already achieve reasonable results, comparable to those of similarly-sized Convolutional Neural Networks (CNNs), with markedly smaller numbers of epochs than these classical models. Our parallel annealing technique leads to a speed-up of almost 70 % compared to regular annealing-based BM executions.
nan
Article 953
Title@2025-07-18 (5): An Adversarial-Driven Experimental Study on Deep Learning for RF Fingerprinting
Title: An Adversarial-Driven Experimental Study on Deep Learning for RF Fingerprinting | Eine adversariell-getriebene Experimentalstudie zum Deep Learning für RF-Fingerprinting | 为RF指纹的深入学习进行反versarial-Driven实验研究 2507.14109v1 |
Authors (5): Xinyu Cao, Bimal Adhikari, Shangqing Zhao, Jingxian Wu, Yanjun Pan
Radio frequency (RF) fingerprinting, which extracts unique hardware imperfections of radio devices, has emerged as a promising physical-layer device identification mechanism in zero trust architectures and beyond 5G networks. In particular, deep learning (DL) methods have demonstrated state-of-the-art performance in this domain. However, existing approaches have primarily focused on enhancing system robustness against temporal and spatial variations in wireless environments, while the security vulnerabilities of these DL-based approaches have often been overlooked. In this work, we systematically investigate the security risks of DL-based RF fingerprinting systems through an adversarial-driven experimental analysis. We observe a consistent misclassification behavior for DL models under domain shifts, where a device is frequently misclassified as another specific one. Our analysis based on extensive real-world experiments demonstrates that this behavior can be exploited as an effective backdoor to enable external attackers to intrude into the system. Furthermore, we show that training DL models on raw received signals causes the models to entangle RF fingerprints with environmental and signal-pattern features, creating additional attack vectors that cannot be mitigated solely through post-processing security methods such as confidence thresholds.
nan
Article 954
Title@2025-07-18 (5): UGPL: Uncertainty-Guided Progressive Learning for Evidence-Based Classification in Computed Tomography
Title: UGPL: Uncertainty-Guided Progressive Learning for Evidence-Based Classification in Computed Tomography | UGPL: Ungewissheitsorientiertes Progressives Lernen für evidenzbasierte Klassifizierung in der berechneten Tomographie | UGPL: 计算地形学循证分类的不确定性-指导渐进学习 2507.14102v1 |
Authors (4): Shravan Venkatraman, Pavan Kumar S, Rakesh Raj Madavan, Chandrakala S
Accurate classification of computed tomography (CT) images is essential for diagnosis and treatment planning, but existing methods often struggle with the subtle and spatially diverse nature of pathological features. Current approaches typically process images uniformly, limiting their ability to detect localized abnormalities that require focused analysis. We introduce UGPL, an uncertainty-guided progressive learning framework that performs a global-to-local analysis by first identifying regions of diagnostic ambiguity and then conducting detailed examination of these critical areas. Our approach employs evidential deep learning to quantify predictive uncertainty, guiding the extraction of informative patches through a non-maximum suppression mechanism that maintains spatial diversity. This progressive refinement strategy, combined with an adaptive fusion mechanism, enables UGPL to integrate both contextual information and fine-grained details. Experiments across three CT datasets demonstrate that UGPL consistently outperforms state-of-the-art methods, achieving improvements of 3.29%, 2.46%, and 8.08% in accuracy for kidney abnormality, lung cancer, and COVID-19 detection, respectively. Our analysis shows that the uncertainty-guided component provides substantial benefits, with performance dramatically increasing when the full progressive learning pipeline is implemented. Our code is available at: https://github.com/shravan-18/UGPL
nan
Article 955
Title@2025-07-18 (5): On Logical Extrapolation for Mazes with Recurrent and Implicit Networks
Title: On Logical Extrapolation for Mazes with Recurrent and Implicit Networks | Über Logische Extrapolation für Labyrinthe mit recurrenten und impliziten Netzwerken | 经常和隐含网络的磁带逻辑外推法 2410.03020v2 |
Authors (7): Brandon Knutson, Amandin Chyba Rabeendran, Michael Ivanitskiy, Jordan Pettyjohn, Cecilia Diniz-Behn, Samy Wu Fung, Daniel McKenzie
Recent work suggests that certain neural network architectures – particularly recurrent neural networks (RNNs) and implicit neural networks (INNs) – are capable of logical extrapolation. When trained on easy instances of a task, these networks (henceforth: logical extrapolators) can generalize to more difficult instances. Previous research has hypothesized that logical extrapolators do so by learning a scalable, iterative algorithm for the given task which converges to the solution. We examine this idea more closely in the context of a single task: maze solving. By varying test data along multiple axes – not just maze size – we show that models introduced in prior work fail in a variety of ways, some expected and others less so. It remains uncertain whether any of these models has truly learned an algorithm. However, we provide evidence that a certain RNN has approximately learned a form of `deadend-filling’. We show that training these models on more diverse data addresses some failure modes but, paradoxically, does not improve logical extrapolation. We also analyze convergence behavior, and show that models explicitly trained to converge to a fixed point are likely to do so when extrapolating, while models that are not may exhibit more exotic limiting behavior such as limit cycles, even when they correctly solve the problem. Our results (i) show that logical extrapolation is not immune to the problem of goal misgeneralization, and (ii) suggest that analyzing the dynamics of extrapolation may yield insights into designing better logical extrapolators.
nan
Article 956
Title@2025-07-18 (5): Multi-Centre Validation of a Deep Learning Model for Scoliosis Assessment
Title: Multi-Centre Validation of a Deep Learning Model for Scoliosis Assessment | Multi-Centre-Validierung eines Deep-Learning-Modells für Skoliose Assessment | 多中心校验脊柱病评估深学习模型 2507.14093v1 |
Authors (6): Šimon Kubov, Simon Klíčník, Jakub Dandár, Zdeněk Straka, Karolína Kvaková, Daniel Kvak
Scoliosis affects roughly 2 to 4 percent of adolescents, and treatment decisions depend on precise Cobb angle measurement. Manual assessment is time consuming and subject to inter observer variation. We conducted a retrospective, multi centre evaluation of a fully automated deep learning software (Carebot AI Bones, Spine Measurement functionality; Carebot s.r.o.) on 103 standing anteroposterior whole spine radiographs collected from ten hospitals. Two musculoskeletal radiologists independently measured each study and served as reference readers. Agreement between the AI and each radiologist was assessed with Bland Altman analysis, mean absolute error (MAE), root mean squared error (RMSE), Pearson correlation coefficient, and Cohen kappa for four grade severity classification. Against Radiologist 1 the AI achieved an MAE of 3.89 degrees (RMSE 4.77 degrees) with a bias of 0.70 degrees and limits of agreement from minus 8.59 to plus 9.99 degrees. Against Radiologist 2 the AI achieved an MAE of 3.90 degrees (RMSE 5.68 degrees) with a bias of 2.14 degrees and limits from minus 8.23 to plus 12.50 degrees. Pearson correlations were r equals 0.906 and r equals 0.880 (inter reader r equals 0.928), while Cohen kappa for severity grading reached 0.51 and 0.64 (inter reader kappa 0.59). These results demonstrate that the proposed software reproduces expert level Cobb angle measurements and categorical grading across multiple centres, suggesting its utility for streamlining scoliosis reporting and triage in clinical workflows.
nan
Article 957
Title@2025-07-18 (5): Learning to Reason at the Frontier of Learnability
Title: Learning to Reason at the Frontier of Learnability | Vernunft lernen an der Grenze der Lernfähigkeit | 学习在可学习的前沿学习理性 2502.12272v5 |
Authors (5): Thomas Foster, Anya Sims, Johannes Forkel, Mattie Fellows, Jakob Foerster
Reinforcement learning is now widely adopted as the final stage of large language model training, especially for reasoning-style tasks such as maths problems. Typically, models attempt each question many times during a single training step and attempt to learn from their successes and failures. However, we demonstrate that throughout training with two popular algorithms (PPO and VinePPO) on two widely used datasets, many questions are either solved by all attempts - meaning they are already learned - or by none - providing no meaningful training signal. To address this, we adapt a method from the reinforcement learning literature - sampling for learnability - and apply it to the reinforcement learning stage of LLM training. Our curriculum prioritises questions with high variance of success, i.e. those where the agent sometimes succeeds, but not always. Our findings demonstrate that this curriculum consistently boosts training performance across multiple algorithms and datasets, paving the way for more efficient and effective reinforcement learning with LLMs.
nan
Article 958
Title@2025-07-18 (5): Uncertainty-Aware Explanations Through Probabilistic Self-Explainable Neural Networks
Title: Uncertainty-Aware Explanations Through Probabilistic Self-Explainable Neural Networks | Ungewissheitsbewusste Erklärungen durch probabilistische selbsterklärbare neurale Netzwerke | 通过概率性自我探索的神经神经网络的不确定性—- 软件解释 2403.13740v3 |
Authors (4): Jon Vadillo, Roberto Santana, Jose A. Lozano, Marta Kwiatkowska
The lack of transparency of Deep Neural Networks continues to be a limitation that severely undermines their reliability and usage in high-stakes applications. Promising approaches to overcome such limitations are Prototype-Based Self-Explainable Neural Networks (PSENNs), whose predictions rely on the similarity between the input at hand and a set of prototypical representations of the output classes, offering therefore a deep, yet transparent-by-design, architecture. In this paper, we introduce a probabilistic reformulation of PSENNs, called Prob-PSENN, which replaces point estimates for the prototypes with probability distributions over their values. This provides not only a more flexible framework for an end-to-end learning of prototypes, but can also capture the explanatory uncertainty of the model, which is a missing feature in previous approaches. In addition, since the prototypes determine both the explanation and the prediction, Prob-PSENNs allow us to detect when the model is making uninformed or uncertain predictions, and to obtain valid explanations for them. Our experiments demonstrate that Prob-PSENNs provide more meaningful and robust explanations than their non-probabilistic counterparts, while remaining competitive in terms of predictive performance, thus enhancing the explainability and reliability of the models.
nan
Article 959
Title@2025-07-18 (5): DPMT: Dual Process Multi-scale Theory of Mind Framework for Real-time Human-AI Collaboration
Title: DPMT: Dual Process Multi-scale Theory of Mind Framework for Real-time Human-AI Collaboration | DPMT: Dualer Prozess Multi-Skala Theorie des Geistes Rahmen für Echtzeit Mensch-AI-Kollaboration | DPMT: 人类-AI实时合作的多规模思维框架的多层次理论 2507.14088v1 |
Authors (9): Xiyun Li, Yining Ding, Yuhua Jiang, Yunlong Zhao, Runpeng Xie, Shuang Xu, Yuanhua Ni, Yiqin Yang, Bo Xu
Real-time human-artificial intelligence (AI) collaboration is crucial yet challenging, especially when AI agents must adapt to diverse and unseen human behaviors in dynamic scenarios. Existing large language model (LLM) agents often fail to accurately model the complex human mental characteristics such as domain intentions, especially in the absence of direct communication. To address this limitation, we propose a novel dual process multi-scale theory of mind (DPMT) framework, drawing inspiration from cognitive science dual process theory. Our DPMT framework incorporates a multi-scale theory of mind (ToM) module to facilitate robust human partner modeling through mental characteristic reasoning. Experimental results demonstrate that DPMT significantly enhances human-AI collaboration, and ablation studies further validate the contributions of our multi-scale ToM in the slow system.
nan
Article 960
Title@2025-07-18 (5): DENSE: Longitudinal Progress Note Generation with Temporal Modeling of Heterogeneous Clinical Notes Across Hospital Visits
Title: DENSE: Longitudinal Progress Note Generation with Temporal Modeling of Heterogeneous Clinical Notes Across Hospital Visits | DENSE: Longitudinal Progress Note Generation mit zeitlicher Modellierung von heterogenen klinischen Anmerkungen über Krankenhausbesuche hinweg | DENS: 医院全程探视不同临床诊断说明的实时建模纵向进展说明的生成 2507.14079v1 |
Authors (2): Garapati Keerthana, Manik Gupta
Progress notes are among the most clinically meaningful artifacts in an Electronic Health Record (EHR), offering temporally grounded insights into a patient’s evolving condition, treatments, and care decisions. Despite their importance, they are severely underrepresented in large-scale EHR datasets. For instance, in the widely used Medical Information Mart for Intensive Care III (MIMIC-III) dataset, only about $8.56\%$ of hospital visits include progress notes, leaving gaps in longitudinal patient narratives. In contrast, the dataset contains a diverse array of other note types, each capturing different aspects of care. We present DENSE (Documenting Evolving Progress Notes from Scattered Evidence), a system designed to align with clinical documentation workflows by simulating how physicians reference past encounters while drafting progress notes. The system introduces a fine-grained note categorization and a temporal alignment mechanism that organizes heterogeneous notes across visits into structured, chronological inputs. At its core, DENSE leverages a clinically informed retrieval strategy to identify temporally and semantically relevant content from both current and prior visits. This retrieved evidence is used to prompt a large language model (LLM) to generate clinically coherent and temporally aware progress notes. We evaluate DENSE on a curated cohort of patients with multiple visits and complete progress note documentation. The generated notes demonstrate strong longitudinal fidelity, achieving a temporal alignment ratio of $1.089$, surpassing the continuity observed in original notes. By restoring narrative coherence across fragmented documentation, our system supports improved downstream tasks such as summarization, predictive modeling, and clinical decision support, offering a scalable solution for LLM-driven note synthesis in real-world healthcare settings.
nan
Article 961
Title@2025-07-18 (5): Glucose-ML: A collection of longitudinal diabetes datasets for development of robust AI solutions
Title: Glucose-ML: A collection of longitudinal diabetes datasets for development of robust AI solutions | Glucose-ML: Sammlung von Längsschnittdatensätzen für die Entwicklung robuster KI-Lösungen | Glucose-ML:收集纵向糖尿病数据集,以制定稳健的AI解决方案 2507.14077v1 |
Authors (3): Temiloluwa Prioleau, Baiying Lu, Yanjun Cui
Artificial intelligence (AI) algorithms are a critical part of state-of-the-art digital health technology for diabetes management. Yet, access to large high-quality datasets is creating barriers that impede development of robust AI solutions. To accelerate development of transparent, reproducible, and robust AI solutions, we present Glucose-ML, a collection of 10 publicly available diabetes datasets, released within the last 7 years (i.e., 2018 - 2025). The Glucose-ML collection comprises over 300,000 days of continuous glucose monitor (CGM) data with a total of 38 million glucose samples collected from 2500+ people across 4 countries. Participants include persons living with type 1 diabetes, type 2 diabetes, prediabetes, and no diabetes. To support researchers and innovators with using this rich collection of diabetes datasets, we present a comparative analysis to guide algorithm developers with data selection. Additionally, we conduct a case study for the task of blood glucose prediction - one of the most common AI tasks within the field. Through this case study, we provide a benchmark for short-term blood glucose prediction across all 10 publicly available diabetes datasets within the Glucose-ML collection. We show that the same algorithm can have significantly different prediction results when developed/evaluated with different datasets. Findings from this study are then used to inform recommendations for developing robust AI solutions within the diabetes or broader health domain. We provide direct links to each longitudinal diabetes dataset in the Glucose-ML collection and openly provide our code.
nan
Article 962
Title@2025-07-18 (5): Statistical and Computational Guarantees of Kernel Max-Sliced Wasserstein Distances
Title: Statistical and Computational Guarantees of Kernel Max-Sliced Wasserstein Distances | Statistische und rechnerische Garantien von Kern Max-Sliced Wasserstein Distanzen | 内核断层断层断层瓦色斯坦距离的统计和计算保障 2405.15441v4 |
Authors (3): Jie Wang, March Boedihardjo, Yao Xie
Optimal transport has been very successful for various machine learning tasks; however, it is known to suffer from the curse of dimensionality. Hence, dimensionality reduction is desirable when applied to high-dimensional data with low-dimensional structures. The kernel max-sliced (KMS) Wasserstein distance is developed for this purpose by finding an optimal nonlinear mapping that reduces data into $1$ dimension before computing the Wasserstein distance. However, its theoretical properties have not yet been fully developed. In this paper, we provide sharp finite-sample guarantees under milder technical assumptions compared with state-of-the-art for the KMS $p$-Wasserstein distance between two empirical distributions with $n$ samples for general $p\in[1,\infty)$. Algorithm-wise, we show that computing the KMS $2$-Wasserstein distance is NP-hard, and then we further propose a semidefinite relaxation (SDR) formulation (which can be solved efficiently in polynomial time) and provide a relaxation gap for the obtained solution. We provide numerical examples to demonstrate the good performance of our scheme for high-dimensional two-sample testing.
nan
Article 963
Title@2025-07-18 (5): Critiques of World Models
Title: Critiques of World Models | Kritik an Weltmodellen | 世界模式的证明 2507.05169v2 |
Authors (4): Eric Xing, Mingkai Deng, Jinyu Hou, Zhiting Hu
World Model, the supposed algorithmic surrogate of the real-world environment which biological agents experience with and act upon, has been an emerging topic in recent years because of the rising needs to develop virtual agents with artificial (general) intelligence. There has been much debate on what a world model really is, how to build it, how to use it, and how to evaluate it. In this essay, starting from the imagination in the famed Sci-Fi classic Dune, and drawing inspiration from the concept of “hypothetical thinking” in psychology literature, we offer critiques of several schools of thoughts on world modeling, and argue the primary goal of a world model to be simulating all actionable possibilities of the real world for purposeful reasoning and acting. Building on the critiques, we propose a new architecture for a general-purpose world model, based on hierarchical, multi-level, and mixed continuous/discrete representations, and a generative and self-supervision learning framework, with an outlook of a Physical, Agentic, and Nested (PAN) AGI system enabled by such a model.
nan
Article 964
Title@2025-07-18 (5): The Duality of Generative AI and Reinforcement Learning in Robotics: A Review
Title: The Duality of Generative AI and Reinforcement Learning in Robotics: A Review | Die Dualität des Generativen KI- und Verstärkungslernens in der Robotik: Ein Rückblick | 机器人学创性人工智能和强化学习的质量:审查 2410.16411v2 |
Authors (6): Angelo Moroncelli, Vishal Soni, Marco Forgione, Dario Piga, Blerina Spahiu, Loris Roveda
Recently, generative AI and reinforcement learning (RL) have been redefining what is possible for AI agents that take information flows as input and produce intelligent behavior. As a result, we are seeing similar advancements in embodied AI and robotics for control policy generation. Our review paper examines the integration of generative AI models with RL to advance robotics. Our primary focus is on the duality between generative AI and RL for robotics downstream tasks. Specifically, we investigate: (1) The role of prominent generative AI tools as modular priors for multi-modal input fusion in RL tasks. (2) How RL can train, fine-tune and distill generative models for policy generation, such as VLA models, similarly to RL applications in large language models. We then propose a new taxonomy based on a considerable amount of selected papers. Lastly, we identify open challenges accounting for model scalability, adaptation and grounding, giving recommendations and insights on future research directions. We reflect on which generative AI models best fit the RL tasks and why. On the other side, we reflect on important issues inherent to RL-enhanced generative policies, such as safety concerns and failure modes, and what are the limitations of current methods. A curated collection of relevant research papers is maintained on our GitHub repository, serving as a resource for ongoing research and development in this field: https://github.com/clmoro/Robotics-RL-FMs-Integration.
nan
Article 965
Title@2025-07-18 (5): Preference-based Multi-Objective Reinforcement Learning
Title: Preference-based Multi-Objective Reinforcement Learning | Präferenzbasiertes Mehrziel-Verstärkungs-Lernen | 以优惠为基础的多目标强化学习 2507.14066v1 |
Authors (3): Ni Mu, Yao Luan, Qing-Shan Jia
Multi-objective reinforcement learning (MORL) is a structured approach for optimizing tasks with multiple objectives. However, it often relies on pre-defined reward functions, which can be hard to design for balancing conflicting goals and may lead to oversimplification. Preferences can serve as more flexible and intuitive decision-making guidance, eliminating the need for complicated reward design. This paper introduces preference-based MORL (Pb-MORL), which formalizes the integration of preferences into the MORL framework. We theoretically prove that preferences can derive policies across the entire Pareto frontier. To guide policy optimization using preferences, our method constructs a multi-objective reward model that aligns with the given preferences. We further provide theoretical proof to show that optimizing this reward model is equivalent to training the Pareto optimal policy. Extensive experiments in benchmark multi-objective tasks, a multi-energy management task, and an autonomous driving task on a multi-line highway show that our method performs competitively, surpassing the oracle method, which uses the ground truth reward function. This highlights its potential for practical applications in complex real-world systems.
nan
Article 966
Title@2025-07-18 (5): Architect of the Bits World: Masked Autoregressive Modeling for Circuit Generation Guided by Truth Table
Title: Architect of the Bits World: Masked Autoregressive Modeling for Circuit Generation Guided by Truth Table | Architekt der Bits-Welt: Masked Autoregressive Modellierung für Schaltungsgeneration von Truth Table geführt | Bits World 建筑师:真相表引导电路生成的蒙面自动递减模型 2502.12751v2 |
Authors (5): Haoyuan Wu, Haisheng Zheng, Shoubo Hu, Zhuolun He, Bei Yu
Logic synthesis, a critical stage in electronic design automation (EDA), optimizes gate-level circuits to minimize power consumption and area occupancy in integrated circuits (ICs). Traditional logic synthesis tools rely on human-designed heuristics, often yielding suboptimal results. Although differentiable architecture search (DAS) has shown promise in generating circuits from truth tables, it faces challenges such as high computational complexity, convergence to local optima, and extensive hyperparameter tuning. Consequently, we propose a novel approach integrating conditional generative models with DAS for circuit generation. Our approach first introduces CircuitVQ, a circuit tokenizer trained based on our Circuit AutoEncoder We then develop CircuitAR, a masked autoregressive model leveraging CircuitVQ as the tokenizer. CircuitAR can generate preliminary circuit structures from truth tables, which guide DAS in producing functionally equivalent circuits. Notably, we observe the scalability and emergent capability in generating complex circuit structures of our CircuitAR models. Extensive experiments also show the superior performance of our method. This research bridges the gap between probabilistic generative models and precise circuit generation, offering a robust solution for logic synthesis.
nan
Article 967
Title@2025-07-18 (5): Step-DAD: Semi-Amortized Policy-Based Bayesian Experimental Design
Title: Step-DAD: Semi-Amortized Policy-Based Bayesian Experimental Design | Schritt-DAD: Semi-amortisiertes politikbasiertes Bayesian Experimental Design | 渐进式DAD:半统一政策基巴伊斯实验设计 2507.14057v1 |
Authors (4): Marcel Hedman, Desi R. Ivanova, Cong Guan, Tom Rainforth
We develop a semi-amortized, policy-based, approach to Bayesian experimental design (BED) called Stepwise Deep Adaptive Design (Step-DAD). Like existing, fully amortized, policy-based BED approaches, Step-DAD trains a design policy upfront before the experiment. However, rather than keeping this policy fixed, Step-DAD periodically updates it as data is gathered, refining it to the particular experimental instance. This test-time adaptation improves both the flexibility and the robustness of the design strategy compared with existing approaches. Empirically, Step-DAD consistently demonstrates superior decision-making and robustness compared with current state-of-the-art BED methods.
nan
Article 968
Title@2025-07-18 (5): Noradrenergic-inspired gain modulation attenuates the stability gap in joint training
Title: Noradrenergic-inspired gain modulation attenuates the stability gap in joint training | Noradrenergisch inspirierte Gain Modulation dämpft die Stabilitätslücke im gemeinsamen Training | 调整适应,缩小联合培训中的稳定差距 2507.14056v1 |
Authors (3): Alejandro Rodriguez-Garcia, Anindya Ghosh, Srikanth Ramaswamy
Recent studies in continual learning have identified a transient drop in performance on mastered tasks when assimilating new ones, known as the stability gap. Such dynamics contradict the objectives of continual learning, revealing a lack of robustness in mitigating forgetting, and notably, persisting even under an ideal joint-loss regime. Examining this gap within this idealized joint training context is critical to isolate it from other sources of forgetting. We argue that it reflects an imbalance between rapid adaptation and robust retention at task boundaries, underscoring the need to investigate mechanisms that reconcile plasticity and stability within continual learning frameworks. Biological brains navigate a similar dilemma by operating concurrently on multiple timescales, leveraging neuromodulatory signals to modulate synaptic plasticity. However, artificial networks lack native multitimescale dynamics, and although optimizers like momentum-SGD and Adam introduce implicit timescale regularization, they still exhibit stability gaps. Inspired by locus coeruleus mediated noradrenergic bursts, which transiently enhance neuronal gain under uncertainty to facilitate sensory assimilation, we propose uncertainty-modulated gain dynamics - an adaptive mechanism that approximates a two-timescale optimizer and dynamically balances integration of knowledge with minimal interference on previously consolidated information. We evaluate our mechanism on domain-incremental and class-incremental variants of the MNIST and CIFAR benchmarks under joint training, demonstrating that uncertainty-modulated gain dynamics effectively attenuate the stability gap. Finally, our analysis elucidates how gain modulation replicates noradrenergic functions in cortical circuits, offering mechanistic insights into reducing stability gaps and enhance performance in continual learning tasks.
nan
Article 969
Title@2025-07-18 (5): D2IP: Deep Dynamic Image Prior for 3D Time-sequence Pulmonary Impedance Imaging
Title: D2IP: Deep Dynamic Image Prior for 3D Time-sequence Pulmonary Impedance Imaging | D2IP: Deep Dynamic Image Prior für 3D-Zeitsequenz Pulmonäre Impedanz-Imaging | D2IP: 3D 时间序列肺阻力成像前深动态图像 2507.14046v1 |
Authors (8): Hao Fang, Hao Yu, Sihao Teng, Tao Zhang, Siyi Yuan, Huaiwu He, Zhe Liu, Yunjie Yang
Unsupervised learning methods, such as Deep Image Prior (DIP), have shown great potential in tomographic imaging due to their training-data-free nature and high generalization capability. However, their reliance on numerous network parameter iterations results in high computational costs, limiting their practical application, particularly in complex 3D or time-sequence tomographic imaging tasks. To overcome these challenges, we propose Deep Dynamic Image Prior (D2IP), a novel framework for 3D time-sequence imaging. D2IP introduces three key strategies - Unsupervised Parameter Warm-Start (UPWS), Temporal Parameter Propagation (TPP), and a customized lightweight reconstruction backbone, 3D-FastResUNet - to accelerate convergence, enforce temporal coherence, and improve computational efficiency. Experimental results on both simulated and clinical pulmonary datasets demonstrate that D2IP enables fast and accurate 3D time-sequence Electrical Impedance Tomography (tsEIT) reconstruction. Compared to state-of-the-art baselines, D2IP delivers superior image quality, with a 24.8% increase in average MSSIM and an 8.1% reduction in ERR, alongside significantly reduced computational time (7.1x faster), highlighting its promise for clinical dynamic pulmonary imaging.
nan
Article 970
Title@2025-07-18 (5): DONUT: Physics-aware Machine Learning for Real-time X-ray Nanodiffraction Analysis
Title: DONUT: Physics-aware Machine Learning for Real-time X-ray Nanodiffraction Analysis | DONUT: Physik-bewusstes maschinelles Lernen für Echtzeit-Röntgen-Nanodiffraktionsanalyse | DONUT: 实时X射线纳米中伤分析物理意识机器学习 2507.14038v1 |
Authors (6): Aileen Luo, Tao Zhou, Ming Du, Martin V. Holt, Andrej Singer, Mathew J. Cherukara
Coherent X-ray scattering techniques are critical for investigating the fundamental structural properties of materials at the nanoscale. While advancements have made these experiments more accessible, real-time analysis remains a significant bottleneck, often hindered by artifacts and computational demands. In scanning X-ray nanodiffraction microscopy, which is widely used to spatially resolve structural heterogeneities, this challenge is compounded by the convolution of the divergent beam with the sample’s local structure. To address this, we introduce DONUT (Diffraction with Optics for Nanobeam by Unsupervised Training), a physics-aware neural network designed for the rapid and automated analysis of nanobeam diffraction data. By incorporating a differentiable geometric diffraction model directly into its architecture, DONUT learns to predict crystal lattice strain and orientation in real-time. Crucially, this is achieved without reliance on labeled datasets or pre-training, overcoming a fundamental limitation for supervised machine learning in X-ray science. We demonstrate experimentally that DONUT accurately extracts all features within the data over 200 times more efficiently than conventional fitting methods.
nan
Article 971
Title@2025-07-18 (5): QuantEIT: Ultra-Lightweight Quantum-Assisted Inference for Chest Electrical Impedance Tomography
Title: QuantEIT: Ultra-Lightweight Quantum-Assisted Inference for Chest Electrical Impedance Tomography | QuantEIT: Ultraleichte Quantum-Assistente Schlussfolgerung für die elektrische Impedanztomographie im Brustkorb | QautEIT: 胸前电气阻碍肿瘤学超重量量量辅助量子推断 2507.14031v1 |
Authors (7): Hao Fang, Sihao Teng, Hao Yu, Siyi Yuan, Huaiwu He, Zhe Liu, Yunjie Yang
Electrical Impedance Tomography (EIT) is a non-invasive, low-cost bedside imaging modality with high temporal resolution, making it suitable for bedside monitoring. However, its inherently ill-posed inverse problem poses significant challenges for accurate image reconstruction. Deep learning (DL)-based approaches have shown promise but often rely on complex network architectures with a large number of parameters, limiting efficiency and scalability. Here, we propose an Ultra-Lightweight Quantum-Assisted Inference (QuantEIT) framework for EIT image reconstruction. QuantEIT leverages a Quantum-Assisted Network (QA-Net), combining parallel 2-qubit quantum circuits to generate expressive latent representations that serve as implicit nonlinear priors, followed by a single linear layer for conductivity reconstruction. This design drastically reduces model complexity and parameter number. Uniquely, QuantEIT operates in an unsupervised, training-data-free manner and represents the first integration of quantum circuits into EIT image reconstruction. Extensive experiments on simulated and real-world 2D and 3D EIT lung imaging data demonstrate that QuantEIT outperforms conventional methods, achieving comparable or superior reconstruction accuracy using only 0.2% of the parameters, with enhanced robustness to noise.
nan
Article 972
Title@2025-07-18 (5): Equivalent and Compact Representations of Neural Network Controllers With Decision Trees
Title: Equivalent and Compact Representations of Neural Network Controllers With Decision Trees | Gleichwertige und kompakte Darstellungen von neuralen Netzwerkcontrollern mit Entscheidungsbäumen | 神经网络主计长与决策树的等效和契约代表 2304.06049v3 |
Authors (4): Kevin Chang, Nathan Dahlin, Rahul Jain, Pierluigi Nuzzo
Over the past decade, neural network (NN)-based controllers have demonstrated remarkable efficacy in a variety of decision-making tasks. However, their black-box nature and the risk of unexpected behaviors pose a challenge to their deployment in real-world systems requiring strong guarantees of correctness and safety. We address these limitations by investigating the transformation of NN-based controllers into equivalent soft decision tree (SDT)-based controllers and its impact on verifiability. In contrast to existing work, we focus on discrete-output NN controllers including rectified linear unit (ReLU) activation functions as well as argmax operations. We then devise an exact yet efficient transformation algorithm which automatically prunes redundant branches. We first demonstrate the practical efficacy of the transformation algorithm applied to an autonomous driving NN controller within OpenAI Gym’s CarRacing environment. Subsequently, we evaluate our approach using two benchmarks from the OpenAI Gym environment. Our results indicate that the SDT transformation can benefit formal verification, showing runtime improvements of up to $21 \times$ and $2 \times$ for MountainCar-v0 and CartPole-v1, respectively.
nan
Article 973
Title@2025-07-18 (5): Conformalized Regression for Continuous Bounded Outcomes
Title: Conformalized Regression for Continuous Bounded Outcomes | Conformalisierte Regression für kontinuierliche geschlossene Ergebnisse | 持续受损害结果的正规回归 2507.14023v1 |
Authors (3): Zhanli Wu, Fabrizio Leisen, F. Javier Rubio
Regression problems with bounded continuous outcomes frequently arise in real-world statistical and machine learning applications, such as the analysis of rates and proportions. A central challenge in this setting is predicting a response associated with a new covariate value. Most of the existing statistical and machine learning literature has focused either on point prediction of bounded outcomes or on interval prediction based on asymptotic approximations. We develop conformal prediction intervals for bounded outcomes based on transformation models and beta regression. We introduce tailored non-conformity measures based on residuals that are aligned with the underlying models, and account for the inherent heteroscedasticity in regression settings with bounded outcomes. We present a theoretical result on asymptotic marginal and conditional validity in the context of full conformal prediction, which remains valid under model misspecification. For split conformal prediction, we provide an empirical coverage analysis based on a comprehensive simulation study. The simulation study demonstrates that both methods provide valid finite-sample predictive coverage, including settings with model misspecification. Finally, we demonstrate the practical performance of the proposed conformal prediction intervals on real data and compare them with bootstrap-based alternatives.
nan
Article 974
Title@2025-07-18 (5): CPC-CMS: Cognitive Pairwise Comparison Classification Model Selection Framework for Document-level Sentiment Analysis
Title: CPC-CMS: Cognitive Pairwise Comparison Classification Model Selection Framework for Document-level Sentiment Analysis | CPC-CMS: Kognitives Paarweises Vergleichs-Klassifikation Modellauswahl-Framework für Dokument-Level-Sentimentanalyse | CPC-CMS:文件级别感知分析文件级别感应分析的认知对称比较比较分类示范选择框架 2507.14022v1 |
Authors (2): Jianfei Li, Kevin Kam Fung Yuen
This study proposes the Cognitive Pairwise Comparison Classification Model Selection (CPC-CMS) framework for document-level sentiment analysis. The CPC, based on expert knowledge judgment, is used to calculate the weights of evaluation criteria, including accuracy, precision, recall, F1-score, specificity, Matthews Correlation Coefficient (MCC), Cohen’s Kappa (Kappa), and efficiency. Naive Bayes, Linear Support Vector Classification (LSVC), Random Forest, Logistic Regression, Extreme Gradient Boosting (XGBoost), Long Short-Term Memory (LSTM), and A Lite Bidirectional Encoder Representations from Transformers (ALBERT) are chosen as classification baseline models. A weighted decision matrix consisting of classification evaluation scores with respect to criteria weights, is formed to select the best classification model for a classification problem. Three open datasets of social media are used to demonstrate the feasibility of the proposed CPC-CMS. Based on our simulation, for evaluation results excluding the time factor, ALBERT is the best for the three datasets; if time consumption is included, no single model always performs better than the other models. The CPC-CMS can be applied to the other classification applications in different areas.
nan
Article 975
Title@2025-07-18 (5): Byzantine-resilient federated online learning for Gaussian process regression
Title: Byzantine-resilient federated online learning for Gaussian process regression | Byzantinisch-resilient föderiertes Online-Lernen für Gaußsche Prozessregression | Byzantine抗拜占庭弹性联邦联盟在线学习,促进高斯进程回归 2507.14021v1 |
Authors (3): Xu Zhang, Zhenyuan Yuan, Minghui Zhu
In this paper, we study Byzantine-resilient federated online learning for Gaussian process regression (GPR). We develop a Byzantine-resilient federated GPR algorithm that allows a cloud and a group of agents to collaboratively learn a latent function and improve the learning performances where some agents exhibit Byzantine failures, i.e., arbitrary and potentially adversarial behavior. Each agent-based local GPR sends potentially compromised local predictions to the cloud, and the cloud-based aggregated GPR computes a global model by a Byzantine-resilient product of experts aggregation rule. Then the cloud broadcasts the current global model to all the agents. Agent-based fused GPR refines local predictions by fusing the received global model with that of the agent-based local GPR. Moreover, we quantify the learning accuracy improvements of the agent-based fused GPR over the agent-based local GPR. Experiments on a toy example and two medium-scale real-world datasets are conducted to demonstrate the performances of the proposed algorithm.
nan
Article 976
Title@2025-07-18 (5): Efficient Temporal Tokenization for Mobility Prediction with Large Language Models
Title: Efficient Temporal Tokenization for Mobility Prediction with Large Language Models | Effiziente zeitliche Tokenisierung für Mobilitätsvorhersage mit großen Sprachmodellen | 具有大语言模式的流动预测高效时时适调 2507.14017v1 |
Authors (4): Haoyu He, Haozheng Luo, Yan Chen, Qi R. Wang
We introduce RHYTHM (Reasoning with Hierarchical Temporal Tokenization for Human Mobility), a framework that leverages large language models (LLMs) as spatio-temporal predictors and trajectory reasoners. RHYTHM partitions trajectories into daily segments encoded as discrete tokens with hierarchical attention, capturing both daily and weekly dependencies while substantially reducing the sequence length. Token representations are enriched with pre-computed prompt embeddings via a frozen LLM, enhancing the model’s ability to capture interdependencies without extensive computational overhead. By freezing the LLM backbone, RHYTHM achieves significant computational efficiency. Evaluation on three real-world datasets demonstrates a 2.4% improvement in accuracy, 5.0% increase on weekends, and 24.6% reduction in training time compared to state-of-the-art methods.
nan
Article 977
Title@2025-07-18 (5): On the Fundamental Limitations of Dual Static CVaR Decompositions in Markov Decision Processes
Title: On the Fundamental Limitations of Dual Static CVaR Decompositions in Markov Decision Processes | Über die grundlegenden Einschränkungen der dualen statischen CVaR-Zersetzungen in Markov-Entscheidungsprozessen | 关于Markov决定程序中双重静态CVaR分解的基本限制 2507.14005v1 |
Authors (2): Mathieu Godbout, Audrey Durand
Recent work has shown that dynamic programming (DP) methods for finding static CVaR-optimal policies in Markov Decision Processes (MDPs) can fail when based on the dual formulation, yet the root cause for the failure has remained unclear. We expand on these findings by shifting focus from policy optimization to the seemingly simpler task of policy evaluation. We show that evaluating the static CVaR of a given policy can be framed as two distinct minimization problems. For their solutions to match, a set of ``risk-assignment consistency constraints’’ must be satisfied, and we demonstrate that the intersection of the constraints being empty is the source of previously observed evaluation errors. Quantifying the evaluation error as the CVaR evaluation gap, we then demonstrate that the issues observed when optimizing over the dual-based CVaR DP are explained by the returned policy having a non-zero CVaR evaluation gap. We then leverage our proposed risk-assignment perspective to prove that the search for a single, uniformly optimal policy via on the dual CVaR decomposition is fundamentally limited, identifying an MDP where no single policy can be optimal across all initial risk levels.
nan
Article 978
Title@2025-07-18 (5): Multi-Objective Reinforcement Learning for Adaptable Personalized Autonomous Driving
Title: Multi-Objective Reinforcement Learning for Adaptable Personalized Autonomous Driving | Multi-Zielives Stärkungslernen für anpassungsfähiges, personalisiertes autonomes Fahren | 适应性个性自主驾驶多目标强化学习 2505.05223v2 |
Authors (3): Hendrik Surmann, Jorge de Heuvel, Maren Bennewitz
Human drivers exhibit individual preferences regarding driving style. Adapting autonomous vehicles to these preferences is essential for user trust and satisfaction. However, existing end-to-end driving approaches often rely on predefined driving styles or require continuous user feedback for adaptation, limiting their ability to support dynamic, context-dependent preferences. We propose a novel approach using multi-objective reinforcement learning (MORL) with preference-driven optimization for end-to-end autonomous driving that enables runtime adaptation to driving style preferences. Preferences are encoded as continuous weight vectors to modulate behavior along interpretable style objectives$\unicode{x2013}$including efficiency, comfort, speed, and aggressiveness$\unicode{x2013}$without requiring policy retraining. Our single-policy agent integrates vision-based perception in complex mixed-traffic scenarios and is evaluated in diverse urban environments using the CARLA simulator. Experimental results demonstrate that the agent dynamically adapts its driving behavior according to changing preferences while maintaining performance in terms of collision avoidance and route completion.
nan
Article 979
Title@2025-07-18 (5): ParallelTime: Dynamically Weighting the Balance of Short- and Long-Term Temporal Dependencies
Title: ParallelTime: Dynamically Weighting the Balance of Short- and Long-Term Temporal Dependencies | ParallelTime: Dynamische Gewichtung der Balance von kurz- und langfristigen zeitlichen Abhängigkeiten | 平行时间:动态加权短期和长期时间依赖的平衡 2507.13998v1 |
Authors (2): Itay Katav, Aryeh Kontorovich
Modern multivariate time series forecasting primarily relies on two architectures: the Transformer with attention mechanism and Mamba. In natural language processing, an approach has been used that combines local window attention for capturing short-term dependencies and Mamba for capturing long-term dependencies, with their outputs averaged to assign equal weight to both. We find that for time-series forecasting tasks, assigning equal weight to long-term and short-term dependencies is not optimal. To mitigate this, we propose a dynamic weighting mechanism, ParallelTime Weighter, which calculates interdependent weights for long-term and short-term dependencies for each token based on the input and the model’s knowledge. Furthermore, we introduce the ParallelTime architecture, which incorporates the ParallelTime Weighter mechanism to deliver state-of-the-art performance across diverse benchmarks. Our architecture demonstrates robustness, achieves lower FLOPs, requires fewer parameters, scales effectively to longer prediction horizons, and significantly outperforms existing methods. These advances highlight a promising path for future developments of parallel Attention-Mamba in time series forecasting. The implementation is readily available at: \href{https://github.com/itay1551/ParallelTime}{ParallelTime GitHub
nan
Article 980
Title@2025-07-18 (5): Machine learning applications in archaeological practices: a review
Title: Machine learning applications in archaeological practices: a review | Anwendungen des maschinellen Lernens in archäologischen Praktiken: eine Rezension | 考古学实践中的机械学习应用:审查 2501.03840v3 |
Authors (6): Mathias Bellat, Jordy D. Orellana Figueroa, Jonathan S. Reeves, Ruhollah Taghizadeh-Mehrjardi, Claudio Tennie, Thomas Scholten
Artificial intelligence and machine learning applications in archaeology have increased significantly in recent years, and these now span all subfields, geographical regions, and time periods. The prevalence and success of these applications have remained largely unexamined, as recent reviews on the use of machine learning in archaeology have only focused only on specific subfields of archaeology. Our review examined an exhaustive corpus of 135 articles published between 1997 and 2022. We observed a significant increase in the number of publications from 2019 onwards. Automatic structure detection and artefact classification were the most represented tasks in the articles reviewed, followed by taphonomy, and archaeological predictive modelling. From the review, clustering and unsupervised methods were underrepresented compared to supervised models. Artificial neural networks and ensemble learning account for two thirds of the total number of models used. However, if machine learning models are gaining in popularity they remain subject to misunderstanding. We observed, in some cases, poorly defined requirements and caveats of the machine learning methods used. Furthermore, the goals and the needs of machine learning applications for archaeological purposes are in some cases unclear or poorly expressed. To address this, we proposed a workflow guide for archaeologists to develop coherent and consistent methodologies adapted to their research questions, project scale and data. As in many other areas, machine learning is rapidly becoming an important tool in archaeological research and practice, useful for the analyses of large and multivariate data, although not without limitations. This review highlights the importance of well-defined and well-reported structured methodologies and collaborative practices to maximise the potential of applications of machine learning methods in archaeology.
nan
Article 981
Title@2025-07-18 (5): $ε$-rank and the Staircase Phenomenon: New Insights into Neural Network Training Dynamics
Title: $ε$-rank and the Staircase Phenomenon: New Insights into Neural Network Training Dynamics | $ε$-rank und das Staircase-Phänomen: Neue Einblicke in die neurale Netzwerk-Trainingsdynamik | 美元-先令和阶梯现象:对神经网络培训动态的新透视 2412.05144v3 |
Authors (3): Jiang Yang, Yuxiang Zhao, Quanhui Zhu
Understanding the training dynamics of deep neural networks (DNNs), particularly how they evolve low-dimensional features from high-dimensional data, remains a central challenge in deep learning theory. In this work, we introduce the concept of $\epsilon$-rank, a novel metric quantifying the effective feature of neuron functions in the terminal hidden layer. Through extensive experiments across diverse tasks, we observe a universal staircase phenomenon: during training process implemented by the standard stochastic gradient descent methods, the decline of the loss function is accompanied by an increase in the $\epsilon$-rank and exhibits a staircase pattern. Theoretically, we rigorously prove a negative correlation between the loss lower bound and $\epsilon$-rank, demonstrating that a high $\epsilon$-rank is essential for significant loss reduction. Moreover, numerical evidences show that within the same deep neural network, the $\epsilon$-rank of the subsequent hidden layer is higher than that of the previous hidden layer. Based on these observations, to eliminate the staircase phenomenon, we propose a novel pre-training strategy on the initial hidden layer that elevates the $\epsilon$-rank of the terminal hidden layer. Numerical experiments validate its effectiveness in reducing training time and improving accuracy across various tasks. Therefore, the newly introduced concept of $\epsilon$-rank is a computable quantity that serves as an intrinsic effective metric characteristic for deep neural networks, providing a novel perspective for understanding the training dynamics of neural networks and offering a theoretical foundation for designing efficient training strategies in practical applications.
nan
Article 982
Title@2025-07-18 (5): Structural Connectome Harmonization Using Deep Learning: The Strength of Graph Neural Networks
Title: Structural Connectome Harmonization Using Deep Learning: The Strength of Graph Neural Networks | Structural Connectome Harmonization Using Deep Learning: Die Stärke von Graph Neuronalen Netzwerken | 利用深层学习实现结构连接统一:图表神经网络的实力 2507.13992v1 |
Authors (8): Jagruti Patel, Thomas A. W. Bolton, Mikkel Schöttner, Anjali Tarun, Sebastien Tourbier, Yasser Alemàn-Gòmez, Jonas Richiardi, Patric Hagmann
Small sample sizes in neuroimaging in general, and in structural connectome (SC) studies in particular limit the development of reliable biomarkers for neurological and psychiatric disorders - such as Alzheimer’s disease and schizophrenia - by reducing statistical power, reliability, and generalizability. Large-scale multi-site studies have exist, but they have acquisition-related biases due to scanner heterogeneity, compromising imaging consistency and downstream analyses. While existing SC harmonization methods - such as linear regression (LR), ComBat, and deep learning techniques - mitigate these biases, they often rely on detailed metadata, traveling subjects (TS), or overlook the graph-topology of SCs. To address these limitations, we propose a site-conditioned deep harmonization framework that harmonizes SCs across diverse acquisition sites without requiring metadata or TS that we test in a simulated scenario based on the Human Connectome Dataset. Within this framework, we benchmark three deep architectures - a fully connected autoencoder (AE), a convolutional AE, and a graph convolutional AE - against a top-performing LR baseline. While non-graph models excel in edge-weight prediction and edge existence detection, the graph AE demonstrates superior preservation of topological structure and subject-level individuality, as reflected by graph metrics and fingerprinting accuracy, respectively. Although the LR baseline achieves the highest numerical performance by explicitly modeling acquisition parameters, it lacks applicability to real-world multi-site use cases as detailed acquisition metadata is often unavailable. Our results highlight the critical role of model architecture in SC harmonization performance and demonstrate that graph-based approaches are particularly well-suited for structure-aware, domain-generalizable SC harmonization in large-scale multi-site SC studies.
nan
Article 983
Title@2025-07-18 (5): Agentic Neural Networks: Self-Evolving Multi-Agent Systems via Textual Backpropagation
Title: Agentic Neural Networks: Self-Evolving Multi-Agent Systems via Textual Backpropagation | Agentische Neuronale Netzwerke: Selbstständige Multi-Agenten-Systeme über textuelle Backpropagation | 动态神经网络:通过文字反向分析实现自我演进的多行为者系统 2506.09046v2 |
Authors (5): Xiaowen Ma, Chenyang Lin, Yao Zhang, Volker Tresp, Yunpu Ma
Leveraging multiple Large Language Models(LLMs) has proven effective for addressing complex, high-dimensional tasks, but current approaches often rely on static, manually engineered multi-agent configurations. To overcome these constraints, we present the Agentic Neural Network(ANN), a framework that conceptualizes multi-agent collaboration as a layered neural network architecture. In this design, each agent operates as a node, and each layer forms a cooperative “team” focused on a specific subtask. Agentic Neural Network follows a two-phase optimization strategy: (1) Forward Phase-Drawing inspiration from neural network forward passes, tasks are dynamically decomposed into subtasks, and cooperative agent teams with suitable aggregation methods are constructed layer by layer. (2) Backward Phase-Mirroring backpropagation, we refine both global and local collaboration through iterative feedback, allowing agents to self-evolve their roles, prompts, and coordination. This neuro-symbolic approach enables ANN to create new or specialized agent teams post-training, delivering notable gains in accuracy and adaptability. Across four benchmark datasets, ANN surpasses leading multi-agent baselines under the same configurations, showing consistent performance improvements. Our findings indicate that ANN provides a scalable, data-driven framework for multi-agent systems, combining the collaborative capabilities of LLMs with the efficiency and flexibility of neural network principles. We plan to open-source the entire framework.
nan
Article 984
Title@2025-07-18 (5): Interpretable Imitation Learning via Generative Adversarial STL Inference and Control
Title: Interpretable Imitation Learning via Generative Adversarial STL Inference and Control | Interpretable Imitation Lernen über generative Adversariale STL-Inferenz und -Kontrolle | 通过产生反逆反生成的STL 推断与控制进行可解释的模拟学习 2402.10310v2 |
Authors (6): Wenliang Liu, Danyang Li, Erfan Aasi, Daniela Rus, Roberto Tron, Calin Belta
Imitation learning methods have demonstrated considerable success in teaching autonomous systems complex tasks through expert demonstrations. However, a limitation of these methods is their lack of interpretability, particularly in understanding the specific task the learning agent aims to accomplish. In this paper, we propose a novel imitation learning method that combines Signal Temporal Logic (STL) inference and control synthesis, enabling the explicit representation of the task as an STL formula. This approach not only provides a clear understanding of the task but also supports the integration of human knowledge and allows for adaptation to out-of-distribution scenarios by manually adjusting the STL formulas and fine-tuning the policy. We employ a Generative Adversarial Network (GAN)-inspired approach to train both the inference and policy networks, effectively narrowing the gap between expert and learned policies. The efficiency of our algorithm is demonstrated through simulations, showcasing its practical applicability and adaptability.
nan
Article 985
Title@2025-07-18 (5): Ev2R: Evaluating Evidence Retrieval in Automated Fact-Checking
Title: Ev2R: Evaluating Evidence Retrieval in Automated Fact-Checking | Ev2R: Evidence Retrieval im automatisierten Fact-Checking bewerten | Ev2R:评价自动实况调查中的证据检索 2411.05375v2 |
Authors (3): Mubashara Akhtar, Michael Schlichtkrull, Andreas Vlachos
Current automated fact-checking (AFC) approaches typically evaluate evidence either implicitly via the predicted verdicts or through exact matches with predefined closed knowledge sources, such as Wikipedia. However, these methods are limited due to their reliance on evaluation metrics originally designed for other purposes and constraints from closed knowledge sources. In this work, we introduce \textbf{\textcolor{skyblue}{Ev\textsuperscript{2}}\textcolor{orangebrown}{R}} which combines the strengths of reference-based evaluation and verdict-level proxy scoring. Ev\textsuperscript{2}R jointly assesses how well the evidence aligns with the gold references and how reliably it supports the verdict, addressing the shortcomings of prior methods. We evaluate Ev\textsuperscript{2}R against three types of evidence evaluation approaches: reference-based, proxy-reference, and reference-less baselines. Assessments against human ratings and adversarial tests demonstrate that Ev\textsuperscript{2}R consistently outperforms existing scoring approaches in accuracy and robustness. It achieves stronger correlation with human judgments and greater robustness to adversarial perturbations, establishing it as a reliable metric for evidence evaluation in AFC.\footnote{Code is available at \href{https://github.com/mubasharaak/fc-evidence-evaluation}{https://github.com/mubasharaak/fc-evidence-evaluation}.}
nan
Article 986
Title@2025-07-18 (5): Signs of the Past, Patterns of the Present: On the Automatic Classification of Old Babylonian Cuneiform Signs
Title: Signs of the Past, Patterns of the Present: On the Automatic Classification of Old Babylonian Cuneiform Signs | Zeichen der Vergangenheit, Muster der Gegenwart: Auf der automatischen Klassifikation der alten babylonischen Kuneiform Zeichen | 过去的迹象,现在的模式:关于旧巴比伦古代古代古代符号的自动分类 2507.13959v1 |
Authors (4): Eli Verwimp, Gustav Ryberg Smidt, Hendrik Hameeuw, Katrien De Graef
The work in this paper describes the training and evaluation of machine learning (ML) techniques for the classification of cuneiform signs. There is a lot of variability in cuneiform signs, depending on where they come from, for what and by whom they were written, but also how they were digitized. This variability makes it unlikely that an ML model trained on one dataset will perform successfully on another dataset. This contribution studies how such differences impact that performance. Based on our results and insights, we aim to influence future data acquisition standards and provide a solid foundation for future cuneiform sign classification tasks. The ML model has been trained and tested on handwritten Old Babylonian (c. 2000-1600 B.C.E.) documentary texts inscribed on clay tablets originating from three Mesopotamian cities (Nippur, D=ur-Abie\v{s}uh and Sippar). The presented and analysed model is ResNet50, which achieves a top-1 score of 87.1% and a top-5 score of 96.5% for signs with at least 20 instances. As these automatic classification results are the first on Old Babylonian texts, there are currently no comparable results.
nan
Article 987
Title@2025-07-18 (5): DUALRec: A Hybrid Sequential and Language Model Framework for Context-Aware Movie Recommendation
Title: DUALRec: A Hybrid Sequential and Language Model Framework for Context-Aware Movie Recommendation | DUALRec: Ein hybrides Sequenz- und Sprachmodell-Framework für die Kontext-Bewusste Film-Empfehlung | AUALRec:背景软件电影建议混合顺序和语言示范框架 2507.13957v1 |
Authors (2): Yitong Li, Raoul Grasman
The modern recommender systems are facing an increasing challenge of modelling and predicting the dynamic and context-rich user preferences. Traditional collaborative filtering and content-based methods often struggle to capture the temporal patternings and evolving user intentions. While Large Language Models (LLMs) have gained gradual attention in recent years, by their strong semantic understanding and reasoning abilities, they are not inherently designed to model chronologically evolving user preference and intentions. On the other hand, for sequential models like LSTM (Long-Short-Term-Memory) which is good at capturing the temporal dynamics of user behaviour and evolving user preference over time, but still lacks a rich semantic understanding for comprehensive recommendation generation. In this study, we propose DUALRec (Dynamic User-Aware Language-based Recommender), a novel recommender that leverages the complementary strength of both models, which combines the temporal modelling abilities of LSTM networks with semantic reasoning power of the fine-tuned Large Language Models. The LSTM component will capture users evolving preference through their viewing history, while the fine-tuned LLM variants will leverage these temporal user insights to generate next movies that users might enjoy. Experimental results on MovieLens-1M dataset shows that the DUALRec model outperforms a wide range of baseline models, with comprehensive evaluation matrices of Hit Rate (HR@k), Normalized Discounted Cumulative Gain (NDCG@k), and genre similarity metrics. This research proposes a novel architecture that bridges the gap between temporal sequence modeling and semantic reasoning, and offers a promising direction for developing more intelligent and context-aware recommenders.
nan
Article 988
Title@2025-07-18 (5): Robust Anomaly Detection with Graph Neural Networks using Controllability
Title: Robust Anomaly Detection with Graph Neural Networks using Controllability | Robuste Anomalieerkennung mit Graphen-Neuralen Netzen mit Kontrollierbarkeit | 使用可控性对图形神经网络进行强力异常探测 2507.13954v1 |
Authors (4): Yifan Wei, Anwar Said, Waseem Abbas, Xenofon Koutsoukos
Anomaly detection in complex domains poses significant challenges due to the need for extensive labeled data and the inherently imbalanced nature of anomalous versus benign samples. Graph-based machine learning models have emerged as a promising solution that combines attribute and relational data to uncover intricate patterns. However, the scarcity of anomalous data exacerbates the challenge, which requires innovative strategies to enhance model learning with limited information. In this paper, we hypothesize that the incorporation of the influence of the nodes, quantified through average controllability, can significantly improve the performance of anomaly detection. We propose two novel approaches to integrate average controllability into graph-based frameworks: (1) using average controllability as an edge weight and (2) encoding it as a one-hot edge attribute vector. Through rigorous evaluation on real-world and synthetic networks with six state-of-the-art baselines, our proposed methods demonstrate improved performance in identifying anomalies, highlighting the critical role of controllability measures in enhancing the performance of graph machine learning models. This work underscores the potential of integrating average controllability as additional metrics to address the challenges of anomaly detection in sparse and imbalanced datasets.
nan
Article 989
Title@2025-07-18 (5): MoDyGAN: Combining Molecular Dynamics With GANs to Investigate Protein Conformational Space
Title: MoDyGAN: Combining Molecular Dynamics With GANs to Investigate Protein Conformational Space | MoDyGAN: Kombination molekularer Dynamik mit GANs zur Untersuchung des Proteinkonformationsraums | MODYGAN:将分子动态与GANs相结合,以调查蛋白质变形空间 2507.13950v1 |
Authors (2): Jingbo Liang, Bruna Jacobson
Extensively exploring protein conformational landscapes remains a major challenge in computational biology due to the high computational cost involved in dynamic physics-based simulations. In this work, we propose a novel pipeline, MoDyGAN, that leverages molecular dynamics (MD) simulations and generative adversarial networks (GANs) to explore protein conformational spaces. MoDyGAN contains a generator that maps Gaussian distributions into MD-derived protein trajectories, and a refinement module that combines ensemble learning with a dual-discriminator to further improve the plausibility of generated conformations. Central to our approach is an innovative representation technique that reversibly transforms 3D protein structures into 2D matrices, enabling the use of advanced image-based GAN architectures. We use three rigid proteins to demonstrate that MoDyGAN can generate plausible new conformations. We also use deca-alanine as a case study to show that interpolations within the latent space closely align with trajectories obtained from steered molecular dynamics (SMD) simulations. Our results suggest that representing proteins as image-like data unlocks new possibilities for applying advanced deep learning techniques to biomolecular simulation, leading to an efficient sampling of conformational states. Additionally, the proposed framework holds strong potential for extension to other complex 3D structures.
nan
Article 990
Title@2025-07-18 (5): Generalist Forecasting with Frozen Video Models via Latent Diffusion
Title: Generalist Forecasting with Frozen Video Models via Latent Diffusion | Generalist Prognose mit gefrorenen Videomodellen über Latent Diffusion | 利用冷冻视频模型通过冷冻传播进行一般预测 2507.13942v1 |
Authors (9): Jacob C Walker, Pedro Vélez, Luisa Polania Cabrera, Guangyao Zhou, Rishabh Kabra, Carl Doersch, Maks Ovsjanikov, João Carreira, Shiry Ginosar
Forecasting what will happen next is a critical skill for general-purpose systems that plan or act in the world at different levels of abstraction. In this paper, we identify a strong correlation between a vision model’s perceptual ability and its generalist forecasting performance over short time horizons. This trend holds across a diverse set of pretrained models-including those trained generatively-and across multiple levels of abstraction, from raw pixels to depth, point tracks, and object motion. The result is made possible by a novel generalist forecasting framework that operates on any frozen vision backbone: we train latent diffusion models to forecast future features in the frozen representation space, which are then decoded via lightweight, task-specific readouts. To enable consistent evaluation across tasks, we introduce distributional metrics that compare distributional properties directly in the space of downstream tasks and apply this framework to nine models and four tasks. Our results highlight the value of bridging representation learning and generative modeling for temporally grounded video understanding.
nan
Article 991
Title@2025-07-18 (5): Machine-Learning Analysis of Radiative Decays to Dark Matter at the LHC
Title: Machine-Learning Analysis of Radiative Decays to Dark Matter at the LHC | Machine-Learning-Analyse von Strahlungsdefekten zur Dunklen Materie am LHC | LHC实验室辐射衰减到黑暗物质的机学分析 2410.13799v3 |
Authors (7): Ernesto Arganda, Marcela Carena, Martín de los Rios, Andres D. Perez, Duncan Rocha, Rosa M. Sandá Seoane, Carlos E. M. Wagner
The search for weakly interacting matter particles (WIMPs) is one of the main objectives of the High Luminosity Large Hadron Collider (HL-LHC). In this work we use Machine-Learning (ML) techniques to explore WIMP radiative decays into a Dark Matter (DM) candidate in a supersymmetric framework. The minimal supersymmetric WIMP sector includes the lightest neutralino that can provide the observed DM relic density through its co-annihilation with the second lightest neutralino and lightest chargino. Moreover, the direct DM detection cross section rates fulfill current experimental bounds and provide discovery targets for the same region of model parameters in which the radiative decay of the second lightest neutralino into a photon and the lightest neutralino is enhanced. This strongly motivates the search for radiatively decaying neutralinos which, however, suffers from strong backgrounds. We investigate the LHC reach in the search for these radiatively decaying particles by means of cut-based and ML methods and estimate its discovery potential in this well-motivated, new physics scenario. We demonstrate that using ML techniques would enable access to most of the parameter space unexplored by other searches.
nan
Article 992
Title@2025-07-18 (5): Two-Stage Pretraining for Molecular Property Prediction in the Wild
Title: Two-Stage Pretraining for Molecular Property Prediction in the Wild | Zweistufige Vorschulung für molekulare Property Prediction in the Wild | 野生生物分子财产预测两阶段培训前 2411.03537v2 |
Authors (6): Kevin Tirta Wijaya, Minghao Guo, Michael Sun, Hans-Peter Seidel, Wojciech Matusik, Vahid Babaei
Molecular deep learning models have achieved remarkable success in property prediction, but they often require large amounts of labeled data. The challenge is that, in real-world applications, labels are extremely scarce, as obtaining them through laboratory experimentation is both expensive and time-consuming. In this work, we introduce MoleVers, a versatile pretrained molecular model designed for various types of molecular property prediction in the wild, i.e., where experimentally-validated labels are scarce. MoleVers employs a two-stage pretraining strategy. In the first stage, it learns molecular representations from unlabeled data through masked atom prediction and extreme denoising, a novel task enabled by our newly introduced branching encoder architecture and dynamic noise scale sampling. In the second stage, the model refines these representations through predictions of auxiliary properties derived from computational methods, such as the density functional theory or large language models. Evaluation on 22 small, experimentally-validated datasets demonstrates that MoleVers achieves state-of-the-art performance, highlighting the effectiveness of its two-stage framework in producing generalizable molecular representations for diverse downstream properties.
nan
Article 993
Title@2025-07-18 (5): Reframing attention as a reinforcement learning problem for causal discovery
Title: Reframing attention as a reinforcement learning problem for causal discovery | Widerspenstige Aufmerksamkeit als Verstärkungs-Lernproblem für kausale Entdeckung | 将注意力重新定位为因果发现的一个强化学习问题 2507.13920v1 |
Authors (4): Turan Orujlu, Christian Gumbsch, Martin V. Butz, Charley M Wu
Formal frameworks of causality have operated largely parallel to modern trends in deep reinforcement learning (RL). However, there has been a revival of interest in formally grounding the representations learned by neural networks in causal concepts. Yet, most attempts at neural models of causality assume static causal graphs and ignore the dynamic nature of causal interactions. In this work, we introduce Causal Process framework as a novel theory for representing dynamic hypotheses about causal structure. Furthermore, we present Causal Process Model as an implementation of this framework. This allows us to reformulate the attention mechanism popularized by Transformer networks within an RL setting with the goal to infer interpretable causal processes from visual observations. Here, causal inference corresponds to constructing a causal graph hypothesis which itself becomes an RL task nested within the original RL problem. To create an instance of such hypothesis, we employ RL agents. These agents establish links between units similar to the original Transformer attention mechanism. We demonstrate the effectiveness of our approach in an RL environment where we outperform current alternatives in causal representation learning and agent performance, and uniquely recover graphs of dynamic causal processes.
nan
Article 994
Title@2025-07-18 (5): Generalization in Reinforcement Learning for Radio Access Networks
Title: Generalization in Reinforcement Learning for Radio Access Networks | Generalisierung im Ausbau-Lernen für Funkzugangsnetze | 无线电接入网络强化学习一般化 2507.06602v2 |
Authors (4): Burak Demirel, Yu Wang, Cristian Tatino, Pablo Soldati
Modern RAN operate in highly dynamic and heterogeneous environments, where hand-tuned, rule-based RRM algorithms often underperform. While RL can surpass such heuristics in constrained settings, the diversity of deployments and unpredictable radio conditions introduce major generalization challenges. Data-driven policies frequently overfit to training conditions, degrading performance in unseen scenarios. To address this, we propose a generalization-centered RL framework for RAN control that: (i) robustly reconstructs dynamically varying states from partial and noisy observations, while encoding static and semi-static information, such as radio nodes, cell attributes, and their topology, through graph representations; (ii) applies domain randomization to broaden the training distribution; and (iii) distributes data generation across multiple actors while centralizing training in a cloud-compatible architecture aligned with O-RAN principles. Although generalization increases computational and data-management complexity, our distributed design mitigates this by scaling data collection and training across diverse network conditions. Applied to downlink link adaptation in five 5G benchmarks, our policy improves average throughput and spectral efficiency by ~10% over an OLLA baseline (10% BLER target) in full-buffer MIMO/mMIMO and by >20% under high mobility. It matches specialized RL in full-buffer traffic and achieves up to 4- and 2-fold gains in eMBB and mixed-traffic benchmarks, respectively. In nine-cell deployments, GAT models offer 30% higher throughput over MLP baselines. These results, combined with our scalable architecture, offer a path toward AI-native 6G RAN using a single, generalizable RL agent.
nan
Article 995
Title@2025-07-18 (5): Self-supervised learning on gene expression data
Title: Self-supervised learning on gene expression data | Selbstüberwachtes Lernen über Genexpressionsdaten | 自我监督的基因表达数据学习 2507.13912v1 |
Authors (4): Kevin Dradjat, Massinissa Hamidi, Pierre Bartet, Blaise Hanczar
Predicting phenotypes from gene expression data is a crucial task in biomedical research, enabling insights into disease mechanisms, drug responses, and personalized medicine. Traditional machine learning and deep learning rely on supervised learning, which requires large quantities of labeled data that are costly and time-consuming to obtain in the case of gene expression data. Self-supervised learning has recently emerged as a promising approach to overcome these limitations by extracting information directly from the structure of unlabeled data. In this study, we investigate the application of state-of-the-art self-supervised learning methods to bulk gene expression data for phenotype prediction. We selected three self-supervised methods, based on different approaches, to assess their ability to exploit the inherent structure of the data and to generate qualitative representations which can be used for downstream predictive tasks. By using several publicly available gene expression datasets, we demonstrate how the selected methods can effectively capture complex information and improve phenotype prediction accuracy. The results obtained show that self-supervised learning methods can outperform traditional supervised models besides offering significant advantage by reducing the dependency on annotated data. We provide a comprehensive analysis of the performance of each method by highlighting their strengths and limitations. We also provide recommendations for using these methods depending on the case under study. Finally, we outline future research directions to enhance the application of self-supervised learning in the field of gene expression data analysis. This study is the first work that deals with bulk RNA-Seq data and self-supervised learning.
nan
Article 996
Title@2025-07-18 (5): LOCUS: LOcalization with Channel Uncertainty and Sporadic Energy
Title: LOCUS: LOcalization with Channel Uncertainty and Sporadic Energy | LOCUS: LOcalization mit Kanalunsicherheit und sporadischer Energie | LOCUS: 与频道不确定和零散能源的分级 2302.09409v3 |
Authors (5): Subrata Biswas, Mohammad Nur Hossain Khan, Violet Colwell, Jack Adiletta, Bashima Islam
Accurate sound source localization (SSL), such as direction-of-arrival (DoA) estimation, relies on consistent multichannel data. However, batteryless systems often suffer from missing data due to the stochastic nature of energy harvesting, degrading localization performance. We propose LOCUS, a deep learning framework that recovers corrupted features in such settings. LOCUS integrates three modules: (1) Information-Weighted Focus (InFo) to identify corrupted regions, (2) Latent Feature Synthesizer (LaFS) to reconstruct missing features, and (3) Guided Replacement (GRep) to restore data without altering valid inputs. LOCUS significantly improves DoA accuracy under missing-channel conditions, achieving up to 36.91% error reduction on DCASE and LargeSet, and 25.87-59.46% gains in real-world deployments. We release a 50-hour multichannel dataset to support future research on localization under energy constraints. Our code and data are available at: https://bashlab.github.io/locus_project/
nan
Article 997
Title@2025-07-18 (5): Deep Learning based Key Information Extraction from Business Documents: Systematic Literature Review
Title: Deep Learning based Key Information Extraction from Business Documents: Systematic Literature Review | Deep Learning based Key Information Extraction from Business Documents: Systematic Literature Review | 从商业文件中提取的基于深学习的关键信息:系统文献审查 2408.06345v2 |
Authors (2): Alexander Michael Rombach, Peter Fettke
Extracting key information from documents represents a large portion of business workloads and therefore offers a high potential for efficiency improvements and process automation. With recent advances in Deep Learning, a plethora of Deep Learning based approaches for Key Information Extraction have been proposed under the umbrella term Document Understanding that enable the processing of complex business documents. The goal of this systematic literature review is an in-depth analysis of existing approaches in this domain and the identification of opportunities for further research. To this end, 130 approaches published between 2017 and 2024 are analyzed in this study.
nan
Article 998
Title@2025-07-18 (5): On the optimal approximation of Sobolev and Besov functions using deep ReLU neural networks
Title: On the optimal approximation of Sobolev and Besov functions using deep ReLU neural networks | Zur optimalen Annäherung von Sobolev- und Besov-Funktionen mittels tiefer ReLU-Neuralnetze | 利用深RELU神经网络在Sobolev 和Besov 功能的最佳近似上使用深RELU神经网络 2409.00901v3 |
Authors (1): Yunfei Yang
This paper studies the problem of how efficiently functions in the Sobolev spaces $\mathcal{W}^{s,q}([0,1]^d)$ and Besov spaces $\mathcal{B}^s_{q,r}([0,1]^d)$ can be approximated by deep ReLU neural networks with width $W$ and depth $L$, when the error is measured in the $L^p([0,1]^d)$ norm. This problem has been studied by several recent works, which obtained the approximation rate $\mathcal{O}((WL)^{-2s/d})$ up to logarithmic factors when $p=q=\infty$, and the rate $\mathcal{O}(L^{-2s/d})$ for networks with fixed width when the Sobolev embedding condition $1/q -1/p<s/d$ holds. We generalize these results by showing that the rate $\mathcal{O}((WL)^{-2s/d})$ indeed holds under the Sobolev embedding condition. It is known that this rate is optimal up to logarithmic factors. The key tool in our proof is a novel encoding of sparse vectors by using deep ReLU neural networks with varied width and depth, which may be of independent interest.
nan
Article 999
Title@2025-07-18 (5): Improved DDIM Sampling with Moment Matching Gaussian Mixtures
Title: Improved DDIM Sampling with Moment Matching Gaussian Mixtures | Verbesserte DDIM-Probenahme mit momentgenauen Gauß-Mischungen | 改进DDIM抽样,与高山混合体相匹配的时速相匹配 2311.04938v3 |
Authors (1): Prasad Gabbur
We propose using a Gaussian Mixture Model (GMM) as reverse transition operator (kernel) within the Denoising Diffusion Implicit Models (DDIM) framework, which is one of the most widely used approaches for accelerated sampling from pre-trained Denoising Diffusion Probabilistic Models (DDPM). Specifically we match the first and second order central moments of the DDPM forward marginals by constraining the parameters of the GMM. We see that moment matching is sufficient to obtain samples with equal or better quality than the original DDIM with Gaussian kernels. We provide experimental results with unconditional models trained on CelebAHQ and FFHQ and class-conditional models trained on ImageNet datasets respectively. Our results suggest that using the GMM kernel leads to significant improvements in the quality of the generated samples when the number of sampling steps is small, as measured by FID and IS metrics. For example on ImageNet 256x256, using 10 sampling steps, we achieve a FID of 6.94 and IS of 207.85 with a GMM kernel compared to 10.15 and 196.73 respectively with a Gaussian kernel.
nan
Article 1000
Title@2025-07-18 (5): Explainable AI in Genomics: Transcription Factor Binding Site Prediction with Mixture of Experts
Title: Explainable AI in Genomics: Transcription Factor Binding Site Prediction with Mixture of Experts | Erklärbare KI in der Genomik: Transkriptionsfaktor Bindung Site Prediction mit Mischung von Experten | 在基因组学中可解释的AI:与专家混合的转移要素约束性现场预测 2507.09754v2 |
Authors (5): Aakash Tripathi, Ian E. Nielsen, Muhammad Umer, Ravi P. Ramachandran, Ghulam Rasool
Transcription Factor Binding Site (TFBS) prediction is crucial for understanding gene regulation and various biological processes. This study introduces a novel Mixture of Experts (MoE) approach for TFBS prediction, integrating multiple pre-trained Convolutional Neural Network (CNN) models, each specializing in different TFBS patterns. We evaluate the performance of our MoE model against individual expert models on both in-distribution and out-of-distribution (OOD) datasets, using six randomly selected transcription factors (TFs) for OOD testing. Our results demonstrate that the MoE model achieves competitive or superior performance across diverse TF binding sites, particularly excelling in OOD scenarios. The Analysis of Variance (ANOVA) statistical test confirms the significance of these performance differences. Additionally, we introduce ShiftSmooth, a novel attribution mapping technique that provides more robust model interpretability by considering small shifts in input sequences. Through comprehensive explainability analysis, we show that ShiftSmooth offers superior attribution for motif discovery and localization compared to traditional Vanilla Gradient methods. Our work presents an efficient, generalizable, and interpretable solution for TFBS prediction, potentially enabling new discoveries in genome biology and advancing our understanding of transcriptional regulation.
nan
Article 1001
Title@2025-07-18 (5): A Survey of Dimension Estimation Methods
Title: A Survey of Dimension Estimation Methods | Ein Überblick über die Dimensionsschätzungsmethoden | 尺寸估计方法调查 2507.13887v1 |
Authors (5): James A. D. Binnie, Paweł Dłotko, John Harvey, Jakub Malinowski, Ka Man Yim
It is a standard assumption that datasets in high dimension have an internal structure which means that they in fact lie on, or near, subsets of a lower dimension. In many instances it is important to understand the real dimension of the data, hence the complexity of the dataset at hand. A great variety of dimension estimators have been developed to find the intrinsic dimension of the data but there is little guidance on how to reliably use these estimators. This survey reviews a wide range of dimension estimation methods, categorising them by the geometric information they exploit: tangential estimators which detect a local affine structure; parametric estimators which rely on dimension-dependent probability distributions; and estimators which use topological or metric invariants. The paper evaluates the performance of these methods, as well as investigating varying responses to curvature and noise. Key issues addressed include robustness to hyperparameter selection, sample size requirements, accuracy in high dimensions, precision, and performance on non-linear geometries. In identifying the best hyperparameters for benchmark datasets, overfitting is frequent, indicating that many estimators may not generalise well beyond the datasets on which they have been tested.
nan
Article 1002
Title@2025-07-18 (5): Safety Certification in the Latent space using Control Barrier Functions and World Models
Title: Safety Certification in the Latent space using Control Barrier Functions and World Models | Sicherheitszertifizierung im Latent-Raum mit Control Barrier-Funktionen und Weltmodellen | 利用控制障碍功能和世界模型对低端空间使用控制障碍功能和世界模型进行安全认证 2507.13871v1 |
Authors (2): Mehul Anand, Shishir Kolathaya
Synthesising safe controllers from visual data typically requires extensive supervised labelling of safety-critical data, which is often impractical in real-world settings. Recent advances in world models enable reliable prediction in latent spaces, opening new avenues for scalable and data-efficient safe control. In this work, we introduce a semi-supervised framework that leverages control barrier certificates (CBCs) learned in the latent space of a world model to synthesise safe visuomotor policies. Our approach jointly learns a neural barrier function and a safe controller using limited labelled data, while exploiting the predictive power of modern vision transformers for latent dynamics modelling.
nan
Article 1003
Title@2025-07-18 (5): Towards scientific discovery with dictionary learning: Extracting biological concepts from microscopy foundation models
Title: Towards scientific discovery with dictionary learning: Extracting biological concepts from microscopy foundation models | Auf dem Weg zu wissenschaftlicher Entdeckung mit Wörterbuch-Lernen: Gewinnung biologischer Konzepte aus Mikroskopie-Stiftungsmodellen | 以字典学习实现科学发现:从显微镜基础模型中提取生物概念 2412.16247v3 |
Authors (7): Konstantin Donhauser, Kristina Ulicna, Gemma Elyse Moran, Aditya Ravuri, Kian Kenyon-Dean, Cian Eastwood, Jason Hartford
Sparse dictionary learning (DL) has emerged as a powerful approach to extract semantically meaningful concepts from the internals of large language models (LLMs) trained mainly in the text domain. In this work, we explore whether DL can extract meaningful concepts from less human-interpretable scientific data, such as vision foundation models trained on cell microscopy images, where limited prior knowledge exists about which high-level concepts should arise. We propose a novel combination of a sparse DL algorithm, Iterative Codebook Feature Learning (ICFL), with a PCA whitening pre-processing step derived from control data. Using this combined approach, we successfully retrieve biologically meaningful concepts, such as cell types and genetic perturbations. Moreover, we demonstrate how our method reveals subtle morphological changes arising from human-interpretable interventions, offering a promising new direction for scientific discovery via mechanistic interpretability in bioimaging.
nan
Article 1004
Title@2025-07-18 (5): Recalibrating binary probabilistic classifiers
Title: Recalibrating binary probabilistic classifiers | Rekalibrierung von binären probabilistischen Klassifikatoren | 重新计算二进制概率分解器 2505.19068v2 |
Authors (1): Dirk Tasche
Recalibration of binary probabilistic classifiers to a target prior probability is an important task in areas like credit risk management. We analyse methods for recalibration from a distribution shift perspective. Distribution shift assumptions linked to the area under the curve (AUC) of a probabilistic classifier are found to be useful for the design of meaningful recalibration methods. Two new methods called parametric covariate shift with posterior drift (CSPD) and ROC-based quasi moment matching (QMM) are proposed and tested together with some other methods in an example setting. The outcomes of the test suggest that the QMM methods discussed in the paper can provide appropriately conservative results in evaluations with concave functionals like for instance risk weights functions for credit risk.
nan
Article 1005
Title@2025-07-18 (5): Towards Regulated Deep Learning
Title: Towards Regulated Deep Learning | Auf dem Weg zu reguliertem Deep Learning | 走向监管的深学习 1912.13122v8 |
Authors (1): Andrés García-Camino
Regulation of Multi-Agent Systems (MAS) and Declarative Electronic Institutions (DEIs) was a multidisciplinary research topic of the past decade involving (Physical and Software) Agents and Law since the beginning, but recently evolved towards News-claimed Robot Lawyer since 2016. One of these first proposals of restricting the behaviour of Software Agents was Electronic Institutions. However, with the recent reformulation of Artificial Neural Networks (ANNs) as Deep Learning (DL), Security, Privacy,Ethical and Legal issues regarding the use of DL has raised concerns in the Artificial Intelligence (AI) Community. Now that the Regulation of MAS is almost correctly addressed, we propose the Regulation of Artificial Neural Networks as Agent-based Training of a special type of regulated Artificial Neural Network that we call Institutional Neural Network (INN).The main purpose of this paper is to bring attention to Artificial Teaching (AT) and to give a tentative answer showing a proof-of-concept implementation of Regulated Deep Learning (RDL). This paper introduces the former concept and provide $I^*$, a language previously used to model declaratively and extend Electronic Institutions, as a means to regulate the execution of Artificial Neural Networks and their interactions with Artificial Teachers (ATs)
nan
Article 1006
Title@2025-07-18 (5): VA-MoE: Variables-Adaptive Mixture of Experts for Incremental Weather Forecasting
Title: VA-MoE: Variables-Adaptive Mixture of Experts for Incremental Weather Forecasting | VA-MoE: Variables-Adaptive Mischung von Experten für inkrementale Wettervorhersage | VA-MoE:增量天气预报专家可变适应混合 2412.02503v2 |
Authors (7): Hao Chen, Han Tao, Guo Song, Jie Zhang, Yunlong Yu, Yonghan Dong, Lei Bai
This paper presents Variables Adaptive Mixture of Experts (VAMoE), a novel framework for incremental weather forecasting that dynamically adapts to evolving spatiotemporal patterns in real time data. Traditional weather prediction models often struggle with exorbitant computational expenditure and the need to continuously update forecasts as new observations arrive. VAMoE addresses these challenges by leveraging a hybrid architecture of experts, where each expert specializes in capturing distinct subpatterns of atmospheric variables (temperature, humidity, wind speed). Moreover, the proposed method employs a variable adaptive gating mechanism to dynamically select and combine relevant experts based on the input context, enabling efficient knowledge distillation and parameter sharing. This design significantly reduces computational overhead while maintaining high forecast accuracy. Experiments on real world ERA5 dataset demonstrate that VAMoE performs comparable against SoTA models in both short term (1 days) and long term (5 days) forecasting tasks, with only about 25% of trainable parameters and 50% of the initial training data.
nan
Article 1007
Title@2025-07-18 (5): Linearized Diffusion Map
Title: Linearized Diffusion Map | Linearisierte Diffusionskarte | 线状扩散地图 2507.14257v1 |
Authors (1): Julio Candanedo
We introduce the Linearized Diffusion Map (LDM), a novel linear dimensionality reduction method constructed via a linear approximation of the diffusion-map kernel. LDM integrates the geometric intuition of diffusion-based nonlinear methods with the computational simplicity, efficiency, and interpretability inherent in linear embeddings such as PCA and classical MDS. Through comprehensive experiments on synthetic datasets (Swiss roll and hyperspheres) and real-world benchmarks (MNIST and COIL-20), we illustrate that LDM captures distinct geometric features of datasets compared to PCA, offering complementary advantages. Specifically, LDM embeddings outperform PCA in datasets exhibiting explicit manifold structures, particularly in high-dimensional regimes, whereas PCA remains preferable in scenarios dominated by variance or noise. Furthermore, the complete positivity of LDM’s kernel matrix allows direct applicability of Non-negative Matrix Factorization (NMF), suggesting opportunities for interpretable latent-structure discovery. Our analysis positions LDM as a valuable new linear dimensionality reduction technique with promising theoretical and practical extensions.
nan
Article 1008
Title@2025-07-18 (5): Load Forecasting for Households and Energy Communities: Are Deep Learning Models Worth the Effort?
Title: Load Forecasting for Households and Energy Communities: Are Deep Learning Models Worth the Effort? | Lastprognosen für Haushalte und Energiegemeinschaften: Sind Deep-Learning-Modelle die Mühe wert? | 家庭和能源界的负载预测:深层学习模式值得努力吗? 2501.05000v5 |
Authors (7): Lukas Moosbrugger, Valentin Seiler, Philipp Wohlgenannt, Sebastian Hegenbart, Sashko Ristov, Elias Eder, Peter Kepplinger
Energy communities (ECs) play a key role in enabling local demand shifting and enhancing self-sufficiency, as energy systems transition toward decentralized structures with high shares of renewable generation. To optimally operate them, accurate short-term load forecasting is essential, particularly for implementing demand-side management strategies. With the recent rise of deep learning methods, data-driven forecasting has gained significant attention, however, it remains insufficiently explored in many practical contexts. Therefore, this study evaluates the effectiveness of state-of-the-art deep learning models-including LSTM, xLSTM, and Transformer architectures-compared to traditional benchmarks such as K-Nearest Neighbors (KNN) and persistence forecasting, across varying community size, historical data availability, and model complexity. Additionally, we assess the benefits of transfer learning using publicly available synthetic load profiles. On average, transfer learning improves the normalized mean absolute error by 1.97 percentage points when only two months of training data are available. Interestingly, for less than six months of training data, simple persistence models outperform deep learning architectures in forecast accuracy. The practical value of improved forecasting is demonstrated using a mixed-integer linear programming optimization for ECs with a shared battery energy storage system. For an energy community with 50 households, the most accurate deep learning model achieves an average reduction in financial energy costs of 8.06%. Notably, a simple KNN approach achieves average savings of 8.01%, making it a competitive and robust alternative. All implementations are publicly available to facilitate reproducibility. These findings offer actionable insights for ECs, and they highlight when the additional complexity of deep learning is warranted by performance gains.
nan
Article 1009
Title@2025-07-18 (5): Conformal Data Contamination Tests for Trading or Sharing of Data
Title: Conformal Data Contamination Tests for Trading or Sharing of Data | Konforme Datenkontaminationstests für den Handel oder die Weitergabe von Daten | 交换或分享数据的非正式数据污染测试 2507.13835v1 |
Authors (4): Martin V. Vejling, Shashi Raj Pandey, Christophe A. N. Biscio, Petar Popovski
The amount of quality data in many machine learning tasks is limited to what is available locally to data owners. The set of quality data can be expanded through trading or sharing with external data agents. However, data buyers need quality guarantees before purchasing, as external data may be contaminated or irrelevant to their specific learning task. Previous works primarily rely on distributional assumptions about data from different agents, relegating quality checks to post-hoc steps involving costly data valuation procedures. We propose a distribution-free, contamination-aware data-sharing framework that identifies external data agents whose data is most valuable for model personalization. To achieve this, we introduce novel two-sample testing procedures, grounded in rigorous theoretical foundations for conformal outlier detection, to determine whether an agent’s data exceeds a contamination threshold. The proposed tests, termed conformal data contamination tests, remain valid under arbitrary contamination levels while enabling false discovery rate control via the Benjamini-Hochberg procedure. Empirical evaluations across diverse collaborative learning scenarios demonstrate the robustness and effectiveness of our approach. Overall, the conformal data contamination test distinguishes itself as a generic procedure for aggregating data with statistically rigorous quality guarantees.
nan
Article 1010
Title@2025-07-18 (5): Scalable Submodular Policy Optimization via Pruned Submodularity Graph
Title: Scalable Submodular Policy Optimization via Pruned Submodularity Graph | Skalierbare submodulare Optimierung der Politik über Pruned Submodularity Graph | 通过审慎次模块图实现可缩放子模块政策优化 2507.13834v1 |
Authors (3): Aditi Anand, Suman Banerjee, Dildar Ali
In Reinforcement Learning (abbreviated as RL), an agent interacts with the environment via a set of possible actions, and a reward is generated from some unknown distribution. The task here is to find an optimal set of actions such that the reward after a certain time step gets maximized. In a traditional setup, the reward function in an RL Problem is considered additive. However, in reality, there exist many problems, including path planning, coverage control, etc., the reward function follows the diminishing return, which can be modeled as a submodular function. In this paper, we study a variant of the RL Problem where the reward function is submodular, and our objective is to find an optimal policy such that this reward function gets maximized. We have proposed a pruned submodularity graph-based approach that provides a provably approximate solution in a feasible computation time. The proposed approach has been analyzed to understand its time and space requirements as well as a performance guarantee. We have experimented with a benchmark agent-environment setup, which has been used for similar previous studies, and the results are reported. From the results, we observe that the policy obtained by our proposed approach leads to more reward than the baseline methods.
nan
Article 1011
Title@2025-07-18 (5): Question-Answer Extraction from Scientific Articles Using Knowledge Graphs and Large Language Models
Title: Question-Answer Extraction from Scientific Articles Using Knowledge Graphs and Large Language Models | Frage-Antwort-Extraktion aus wissenschaftlichen Artikeln mit Wissensgraphen und großen Sprachmodellen | 利用知识图和大语言模型从科学文章中提取问题答案 2507.13827v1 |
Authors (6): Hosein Azarbonyad, Zi Long Zhu, Georgios Cheirmpos, Zubair Afzal, Vikrant Yadav, Georgios Tsatsaronis
When deciding to read an article or incorporate it into their research, scholars often seek to quickly identify and understand its main ideas. In this paper, we aim to extract these key concepts and contributions from scientific articles in the form of Question and Answer (QA) pairs. We propose two distinct approaches for generating QAs. The first approach involves selecting salient paragraphs, using a Large Language Model (LLM) to generate questions, ranking these questions by the likelihood of obtaining meaningful answers, and subsequently generating answers. This method relies exclusively on the content of the articles. However, assessing an article’s novelty typically requires comparison with the existing literature. Therefore, our second approach leverages a Knowledge Graph (KG) for QA generation. We construct a KG by fine-tuning an Entity Relationship (ER) extraction model on scientific articles and using it to build the graph. We then employ a salient triplet extraction method to select the most pertinent ERs per article, utilizing metrics such as the centrality of entities based on a triplet TF-IDF-like measure. This measure assesses the saliency of a triplet based on its importance within the article compared to its prevalence in the literature. For evaluation, we generate QAs using both approaches and have them assessed by Subject Matter Experts (SMEs) through a set of predefined metrics to evaluate the quality of both questions and answers. Our evaluations demonstrate that the KG-based approach effectively captures the main ideas discussed in the articles. Furthermore, our findings indicate that fine-tuning the ER extraction model on our scientific corpus is crucial for extracting high-quality triplets from such documents.
nan
Article 1012
Title@2025-07-18 (5): Bridging Local and Global Knowledge via Transformer in Board Games
Title: Bridging Local and Global Knowledge via Transformer in Board Games | Überbrückung von lokalem und globalem Wissen über Transformer in Brettspielen | 通过棋盘运动会变换器连接地方和全球知识 2410.05347v2 |
Authors (4): Yan-Ru Ju, Tai-Lin Wu, Chung-Chin Shih, Ti-Rong Wu
Although AlphaZero has achieved superhuman performance in board games, recent studies reveal its limitations in handling scenarios requiring a comprehensive understanding of the entire board, such as recognizing long-sequence patterns in Go. To address this challenge, we propose ResTNet, a network that interleaves residual and Transformer blocks to bridge local and global knowledge. ResTNet improves playing strength across multiple board games, increasing win rate from 54.6% to 60.8% in 9x9 Go, 53.6% to 60.9% in 19x19 Go, and 50.4% to 58.0% in 19x19 Hex. In addition, ResTNet effectively processes global information and tackles two long-sequence patterns in 19x19 Go, including circular pattern and ladder pattern. It reduces the mean square error for circular pattern recognition from 2.58 to 1.07 and lowers the attack probability against an adversary program from 70.44% to 23.91%. ResTNet also improves ladder pattern recognition accuracy from 59.15% to 80.01%. By visualizing attention maps, we demonstrate that ResTNet captures critical game concepts in both Go and Hex, offering insights into AlphaZero’s decision-making process. Overall, ResTNet shows a promising approach to integrating local and global knowledge, paving the way for more effective AlphaZero-based algorithms in board games. Our code is available at https://rlg.iis.sinica.edu.tw/papers/restnet.
nan
Article 1013
Title@2025-07-18 (5): Demographic-aware fine-grained classification of pediatric wrist fractures
Title: Demographic-aware fine-grained classification of pediatric wrist fractures | Demografiebewusste feinkörnige Klassifizierung von pädiatrischen Handgelenkfrakturen | 人口意识小儿科手腕骨折细细细分分类 2507.12964v2 |
Authors (4): Ammar Ahmed, Ali Shariq Imran, Zenun Kastrati, Sher Muhammad Daudpota
Wrist pathologies are frequently observed, particularly among children who constitute the majority of fracture cases. However, diagnosing these conditions is time-consuming and requires specialized expertise. Computer vision presents a promising avenue, contingent upon the availability of extensive datasets, a notable challenge in medical imaging. Therefore, reliance solely on one modality, such as images, proves inadequate, especially in an era of diverse and plentiful data types. In this study, we employ a multifaceted approach to address the challenge of recognizing wrist pathologies using an extremely limited dataset. Initially, we approach the problem as a fine-grained recognition task, aiming to identify subtle X-ray pathologies that conventional CNNs overlook. Secondly, we enhance network performance by fusing patient metadata with X-ray images. Thirdly, rather than pre-training on a coarse-grained dataset like ImageNet, we utilize weights trained on a fine-grained dataset. While metadata integration has been used in other medical domains, this is a novel application for wrist pathologies. Our results show that a fine-grained strategy and metadata integration improve diagnostic accuracy by 2% with a limited dataset and by over 10% with a larger fracture-focused dataset.
nan
Article 1014
Title@2025-07-18 (5): XpertAI: uncovering regression model strategies for sub-manifolds
Title: XpertAI: uncovering regression model strategies for sub-manifolds | XpertAI: Aufdecken von Regressionsmodellstrategien für Submanifolds | XpertAI:发现次奴隶皮回归示范战略 2403.07486v4 |
Authors (3): Simon Letzgus, Klaus-Robert Müller, Grégoire Montavon
In recent years, Explainable AI (XAI) methods have facilitated profound validation and knowledge extraction from ML models. While extensively studied for classification, few XAI solutions have addressed the challenges specific to regression models. In regression, explanations need to be precisely formulated to address specific user queries (e.g.\ distinguishing between Why is the output above 0?' and
Why is the output above 50?’). They should furthermore reflect the model’s behavior on the relevant data sub-manifold. In this paper, we introduce XpertAI, a framework that disentangles the prediction strategy into multiple range-specific sub-strategies and allows the formulation of precise queries about the model (the `explanandum’) as a linear combination of those sub-strategies. XpertAI is formulated generally to work alongside popular XAI attribution techniques, based on occlusion, gradient integration, or reverse propagation. Qualitative and quantitative results, demonstrate the benefits of our approach.
nan
Article 1015
Title@2025-07-18 (5): DP2Unlearning: An Efficient and Guaranteed Unlearning Framework for LLMs
Title: DP2Unlearning: An Efficient and Guaranteed Unlearning Framework for LLMs | DP2Unlearning: Ein effizientes und garantiertes Unlearning Framework für LLMs | DP2重新学习:LLMM 高效和有保证的不学习框架 2504.13774v2 |
Authors (4): Tamim Al Mahmud, Najeeb Jebreel, Josep Domingo-Ferrer, David Sanchez
Large language models (LLMs) have recently revolutionized language processing tasks but have also brought ethical and legal issues. LLMs have a tendency to memorize potentially private or copyrighted information present in the training data, which might then be delivered to end users at inference time. When this happens, a naive solution is to retrain the model from scratch after excluding the undesired data. Although this guarantees that the target data have been forgotten, it is also prohibitively expensive for LLMs. Approximate unlearning offers a more efficient alternative, as it consists of ex post modifications of the trained model itself to prevent undesirable results, but it lacks forgetting guarantees because it relies solely on empirical evidence. In this work, we present DP2Unlearning, a novel LLM unlearning framework that offers formal forgetting guarantees at a significantly lower cost than retraining from scratch on the data to be retained. DP2Unlearning involves training LLMs on textual data protected using {\epsilon}-differential privacy (DP), which later enables efficient unlearning with the guarantees against disclosure associated with the chosen {\epsilon}. Our experiments demonstrate that DP2Unlearning achieves similar model performance post-unlearning, compared to an LLM retraining from scratch on retained data – the gold standard exact unlearning – but at approximately half the unlearning cost. In addition, with a reasonable computational cost, it outperforms approximate unlearning methods at both preserving the utility of the model post-unlearning and effectively forgetting the targeted information.
nan
Article 1016
Title@2025-07-18 (5): Equilibrium Propagation for Learning in Lagrangian Dynamical Systems
Title: Equilibrium Propagation for Learning in Lagrangian Dynamical Systems | Equilibrium Propagation für das Lernen in lagrangischen dynamischen Systemen | Lagrangian动态系统学习平衡促进平衡 2505.07363v3 |
Authors (1): Serge Massar
We propose a method for training dynamical systems governed by Lagrangian mechanics using Equilibrium Propagation. Our approach extends Equilibrium Propagation - initially developed for energy-based models - to dynamical trajectories by leveraging the principle of action extremization. Training is achieved by gently nudging trajectories toward desired targets and measuring how the variables conjugate to the parameters to be trained respond. This method is particularly suited to systems with periodic boundary conditions or fixed initial and final states, enabling efficient parameter updates without requiring explicit backpropagation through time. In the case of periodic boundary conditions, this approach yields the semiclassical limit of Quantum Equilibrium Propagation. Applications to systems with dissipation are also discussed.
nan
Article 1017
Title@2025-07-18 (5): DiffGradCAM: A Universal Class Activation Map Resistant to Adversarial Training
Title: DiffGradCAM: A Universal Class Activation Map Resistant to Adversarial Training | DiffGradCAM: Eine universelle Aktivierungskarte der Klasse, die dem adversarialen Training standhält | DiffGradCAM: 通用级启动地图抗反向培训 2506.08514v2 |
Authors (3): Jacob Piland, Chris Sweet, Adam Czajka
Class Activation Mapping (CAM) and its gradient-based variants (e.g., GradCAM) have become standard tools for explaining Convolutional Neural Network (CNN) predictions. However, these approaches typically focus on individual logits, while for neural networks using softmax, the class membership probability estimates depend \textit{only} on the \textit{differences} between logits, not on their absolute values. This disconnect leaves standard CAMs vulnerable to adversarial manipulation, such as passive fooling, where a model is trained to produce misleading CAMs without affecting decision performance. We introduce \textbf{Salience-Hoax Activation Maps (SHAMs)}, an \emph{entropy-aware form of passive fooling} that serves as a benchmark for CAM robustness under adversarial conditions. To address the passive fooling vulnerability, we then propose \textbf{DiffGradCAM}, a novel, lightweight, and contrastive approach to class activation mapping that is both non-suceptible to passive fooling, but also matches the output of standard CAM methods such as GradCAM in the non-adversarial case. Together, SHAM and DiffGradCAM establish a new framework for probing and improving the robustness of saliency-based explanations. We validate both contributions across multi-class tasks with few and many classes.
nan
Article 1018
Title@2025-07-18 (5): On-the-Fly Fine-Tuning of Foundational Neural Network Potentials: A Bayesian Neural Network Approach
Title: On-the-Fly Fine-Tuning of Foundational Neural Network Potentials: A Bayesian Neural Network Approach | On-the-Fly Fine-Tuning von Grundlagen-Neural-Netzwerk-Potenziale: Ein bayesischer Neural-Netzwerk-Ansatz | 基础神经网络潜力的实时微调调整:贝耶斯神经网络方法 2507.13805v1 |
Authors (3): Tim Rensmeyer, Denis Kramer, Oliver Niggemann
Due to the computational complexity of evaluating interatomic forces from first principles, the creation of interatomic machine learning force fields has become a highly active field of research. However, the generation of training datasets of sufficient size and sample diversity itself comes with a computational burden that can make this approach impractical for modeling rare events or systems with a large configuration space. Fine-tuning foundation models that have been pre-trained on large-scale material or molecular databases offers a promising opportunity to reduce the amount of training data necessary to reach a desired level of accuracy. However, even if this approach requires less training data overall, creating a suitable training dataset can still be a very challenging problem, especially for systems with rare events and for end-users who don’t have an extensive background in machine learning. In on-the-fly learning, the creation of a training dataset can be largely automated by using model uncertainty during the simulation to decide if the model is accurate enough or if a structure should be recalculated with classical methods and used to update the model. A key challenge for applying this form of active learning to the fine-tuning of foundation models is how to assess the uncertainty of those models during the fine-tuning process, even though most foundation models lack any form of uncertainty quantification. In this paper, we overcome this challenge by introducing a fine-tuning approach based on Bayesian neural network methods and a subsequent on-the-fly workflow that automatically fine-tunes the model while maintaining a pre-specified accuracy and can detect rare events such as transition states and sample them at an increased rate relative to their occurrence.
nan
Article 1019
Title@2025-07-18 (5): Exploiting Label Skewness for Spiking Neural Networks in Federated Learning
Title: Exploiting Label Skewness for Spiking Neural Networks in Federated Learning | Ausnutzung von Label Skewness für spikende neurale Netzwerke im Federated Learning | 利用Label Sskwonence 用于联邦学习联盟的Spiking神经网络 2412.17305v3 |
Authors (5): Di Yu, Xin Du, Linshan Jiang, Huijing Zhang, Shuiguang Deng
The energy efficiency of deep spiking neural networks (SNNs) aligns with the constraints of resource-limited edge devices, positioning SNNs as a promising foundation for intelligent applications leveraging the extensive data collected by these devices. To address data privacy concerns when deploying SNNs on edge devices, federated learning (FL) facilitates collaborative model training by leveraging data distributed across edge devices without transmitting local data to a central server. However, existing FL approaches struggle with label-skewed data across devices, which leads to drift in local SNN models and degrades the performance of the global SNN model. In this paper, we propose a novel framework called FedLEC, which incorporates intra-client label weight calibration to balance the learning intensity across local labels and inter-client knowledge distillation to mitigate local SNN model bias caused by label absence. Extensive experiments with three different structured SNNs across five datasets (i.e., three non-neuromorphic and two neuromorphic datasets) demonstrate the efficiency of FedLEC. Compared to eight state-of-the-art FL algorithms, FedLEC achieves an average accuracy improvement of approximately 11.59% for the global SNN model under various label skew distribution settings.
nan
Article 1020
Title@2025-07-18 (5): Feature Engineering is Not Dead: Reviving Classical Machine Learning with Entropy, HOG, and LBP Feature Fusion for Image Classification
Title: Feature Engineering is Not Dead: Reviving Classical Machine Learning with Entropy, HOG, and LBP Feature Fusion for Image Classification | Feature Engineering is Not Dead: Wiederbelebung des klassischen maschinellen Lernens mit Entropie, HOG und LBP-Feature Fusion für die Bildklassifizierung | 特色工程没有死:恢复古典机器学习与英音、HOG和LBP图像分类的特征融合 2507.13772v1 |
Authors (6): Abhijit Sen, Giridas Maiti, Bikram K. Parida, Bhanu P. Mishra, Mahima Arya, Denys I. Bondar
Feature engineering continues to play a critical role in image classification, particularly when interpretability and computational efficiency are prioritized over deep learning models with millions of parameters. In this study, we revisit classical machine learning based image classification through a novel approach centered on Permutation Entropy (PE), a robust and computationally lightweight measure traditionally used in time series analysis but rarely applied to image data. We extend PE to two-dimensional images and propose a multiscale, multi-orientation entropy-based feature extraction approach that characterizes spatial order and complexity along rows, columns, diagonals, anti-diagonals, and local patches of the image. To enhance the discriminatory power of the entropy features, we integrate two classic image descriptors: the Histogram of Oriented Gradients (HOG) to capture shape and edge structure, and Local Binary Patterns (LBP) to encode micro-texture of an image. The resulting hand-crafted feature set, comprising of 780 dimensions, is used to train Support Vector Machine (SVM) classifiers optimized through grid search. The proposed approach is evaluated on multiple benchmark datasets, including Fashion-MNIST, KMNIST, EMNIST, and CIFAR-10, where it delivers competitive classification performance without relying on deep architectures. Our results demonstrate that the fusion of PE with HOG and LBP provides a compact, interpretable, and effective alternative to computationally expensive and limited interpretable deep learning models. This shows a potential of entropy-based descriptors in image classification and contributes a lightweight and generalizable solution to interpretable machine learning in image classification and computer vision.
nan
Article 1021
Title@2025-07-18 (5): Geometry-Informed Neural Networks
Title: Geometry-Informed Neural Networks | Geometrie-informierte Neuronale Netzwerke | 几何内建神经网络 2402.14009v4 |
Authors (6): Arturs Berzins, Andreas Radler, Eric Volkmann, Sebastian Sanokowski, Sepp Hochreiter, Johannes Brandstetter
Geometry is a ubiquitous tool in computer graphics, design, and engineering. However, the lack of large shape datasets limits the application of state-of-the-art supervised learning methods and motivates the exploration of alternative learning strategies. To this end, we introduce geometry-informed neural networks (GINNs) – a framework for training shape-generative neural fields without data by leveraging user-specified design requirements in the form of objectives and constraints. By adding diversity as an explicit constraint, GINNs avoid mode-collapse and can generate multiple diverse solutions, often required in geometry tasks. Experimentally, we apply GINNs to several problems spanning physics, geometry, and engineering design, showing control over geometrical and topological properties, such as surface smoothness or the number of holes. These results demonstrate the potential of training shape-generative models without data, paving the way for new generative design approaches without large datasets.
nan
Article 1022
Title@2025-07-18 (5): Insights into a radiology-specialised multimodal large language model with sparse autoencoders
Title: Insights into a radiology-specialised multimodal large language model with sparse autoencoders | Einblicke in ein radiologisch spezialisiertes multimodales Großsprachmodell mit spärlichen Autoencodern | 深入观察放射学专门化多式联运大型语言模型,无甚多的自动编码器 2507.12950v2 |
Authors (7): Kenza Bouzid, Shruthi Bannur, Felix Meissen, Daniel Coelho de Castro, Anton Schwaighofer, Javier Alvarez-Valle, Stephanie L. Hyland
Interpretability can improve the safety, transparency and trust of AI models, which is especially important in healthcare applications where decisions often carry significant consequences. Mechanistic interpretability, particularly through the use of sparse autoencoders (SAEs), offers a promising approach for uncovering human-interpretable features within large transformer-based models. In this study, we apply Matryoshka-SAE to the radiology-specialised multimodal large language model, MAIRA-2, to interpret its internal representations. Using large-scale automated interpretability of the SAE features, we identify a range of clinically relevant concepts - including medical devices (e.g., line and tube placements, pacemaker presence), pathologies such as pleural effusion and cardiomegaly, longitudinal changes and textual features. We further examine the influence of these features on model behaviour through steering, demonstrating directional control over generations with mixed success. Our results reveal practical and methodological challenges, yet they offer initial insights into the internal concepts learned by MAIRA-2 - marking a step toward deeper mechanistic understanding and interpretability of a radiology-adapted multimodal large language model, and paving the way for improved model transparency. We release the trained SAEs and interpretations: https://huggingface.co/microsoft/maira-2-sae.
nan
Article 1023
Title@2025-07-18 (5): Dual-Center Graph Clustering with Neighbor Distribution
Title: Dual-Center Graph Clustering with Neighbor Distribution | Dual-Center Graph Clustering mit Nachbarschaftsverteilung | 与邻居分布相邻的双中心图集 2507.13765v1 |
Authors (5): Enhao Cheng, Shoujia Zhang, Jianhua Yin, Li Jin, Liqiang Nie
Graph clustering is crucial for unraveling intricate data structures, yet it presents significant challenges due to its unsupervised nature. Recently, goal-directed clustering techniques have yielded impressive results, with contrastive learning methods leveraging pseudo-label garnering considerable attention. Nonetheless, pseudo-label as a supervision signal is unreliable and existing goal-directed approaches utilize only features to construct a single-target distribution for single-center optimization, which lead to incomplete and less dependable guidance. In our work, we propose a novel Dual-Center Graph Clustering (DCGC) approach based on neighbor distribution properties, which includes representation learning with neighbor distribution and dual-center optimization. Specifically, we utilize neighbor distribution as a supervision signal to mine hard negative samples in contrastive learning, which is reliable and enhances the effectiveness of representation learning. Furthermore, neighbor distribution center is introduced alongside feature center to jointly construct a dual-target distribution for dual-center optimization. Extensive experiments and analysis demonstrate superior performance and effectiveness of our proposed method.
nan
Article 1024
Title@2025-07-18 (5): Learning to Reject Low-Quality Explanations via User Feedback
Title: Learning to Reject Low-Quality Explanations via User Feedback | Lernen, Low-Quality-Erklärungen per User Feedback abzulehnen | 通过用户反馈学习拒绝低质量解释 2507.12900v2 |
Authors (4): Luca Stradiotti, Dario Pesenti, Stefano Teso, Jesse Davis
Machine Learning predictors are increasingly being employed in high-stakes applications such as credit scoring. Explanations help users unpack the reasons behind their predictions, but are not always “high quality’’. That is, end-users may have difficulty interpreting or believing them, which can complicate trust assessment and downstream decision-making. We argue that classifiers should have the option to refuse handling inputs whose predictions cannot be explained properly and introduce a framework for learning to reject low-quality explanations (LtX) in which predictors are equipped with a rejector that evaluates the quality of explanations. In this problem setting, the key challenges are how to properly define and assess explanation quality and how to design a suitable rejector. Focusing on popular attribution techniques, we introduce ULER (User-centric Low-quality Explanation Rejector), which learns a simple rejector from human ratings and per-feature relevance judgments to mirror human judgments of explanation quality. Our experiments show that ULER outperforms both state-of-the-art and explanation-aware learning to reject strategies at LtX on eight classification and regression benchmarks and on a new human-annotated dataset, which we will publicly release to support future research.
nan
Article 1025
Title@2025-07-18 (5): SIC: Similarity-Based Interpretable Image Classification with Neural Networks
Title: SIC: Similarity-Based Interpretable Image Classification with Neural Networks | SIC: Ähnlichkeitsbasierte Interpretierbare Bildklassifikation mit neuralen Netzwerken | SIC: 神经网络的基于相似性的解释性图像分类 2501.17328v3 |
Authors (4): Tom Nuno Wolf, Emre Kavak, Fabian Bongratz, Christian Wachinger
The deployment of deep learning models in critical domains necessitates a balance between high accuracy and interpretability. We introduce SIC, an inherently interpretable neural network that provides local and global explanations of its decision-making process. Leveraging the concept of case-based reasoning, SIC extracts class-representative support vectors from training images, ensuring they capture relevant features while suppressing irrelevant ones. Classification decisions are made by calculating and aggregating similarity scores between these support vectors and the input’s latent feature vector. We employ B-Cos transformations, which align model weights with inputs, to yield coherent pixel-level explanations in addition to global explanations of case-based reasoning. We evaluate SIC on three tasks: fine-grained classification on Stanford Dogs and FunnyBirds, multi-label classification on Pascal VOC, and pathology detection on the RSNA dataset. Results indicate that SIC not only achieves competitive accuracy compared to state-of-the-art black-box and inherently interpretable models but also offers insightful explanations verified through practical evaluation on the FunnyBirds benchmark. Our theoretical analysis proves that these explanations fulfill established axioms for explanations. Our findings underscore SIC’s potential for applications where understanding model decisions is as critical as the decisions themselves.
nan
Article 1026
Title@2025-07-18 (5): Convolution-weighting method for the physics-informed neural network: A Primal-Dual Optimization Perspective
Title: Convolution-weighting method for the physics-informed neural network: A Primal-Dual Optimization Perspective | Convolution-Gewichtungsmethode für das physikinformierte neuronale Netzwerk: Eine primär-duale Optimierungsperspektive | 物理学-知情神经网络的革命加权法:原始-多极优化视角 2506.19805v2 |
Authors (2): Chenhao Si, Ming Yan
Physics-informed neural networks (PINNs) are extensively employed to solve partial differential equations (PDEs) by ensuring that the outputs and gradients of deep learning models adhere to the governing equations. However, constrained by computational limitations, PINNs are typically optimized using a finite set of points, which poses significant challenges in guaranteeing their convergence and accuracy. In this study, we proposed a new weighting scheme that will adaptively change the weights to the loss functions from isolated points to their continuous neighborhood regions. The empirical results show that our weighting scheme can reduce the relative $L^2$ errors to a lower value.
nan
Article 1027
Title@2025-07-18 (5): A Simple Baseline for Stable and Plastic Neural Networks
Title: A Simple Baseline for Stable and Plastic Neural Networks | Eine einfache Basis für stabile und plastische Neuralnetze | 稳定神经网络和可塑神经网络的简单基线 2507.10637v2 |
Authors (3): Étienne Künzel, Achref Jaziri, Visvanathan Ramesh
Continual learning in computer vision requires that models adapt to a continuous stream of tasks without forgetting prior knowledge, yet existing approaches often tip the balance heavily toward either plasticity or stability. We introduce RDBP, a simple, low-overhead baseline that unites two complementary mechanisms: ReLUDown, a lightweight activation modification that preserves feature sensitivity while preventing neuron dormancy, and Decreasing Backpropagation, a biologically inspired gradient-scheduling scheme that progressively shields early layers from catastrophic updates. Evaluated on the Continual ImageNet benchmark, RDBP matches or exceeds the plasticity and stability of state-of-the-art methods while reducing computational cost. RDBP thus provides both a practical solution for real-world continual learning and a clear benchmark against which future continual learning strategies can be measured.
nan
Article 1028
Title@2025-07-18 (5): Robustness Evaluation of Offline Reinforcement Learning for Robot Control Against Action Perturbations
Title: Robustness Evaluation of Offline Reinforcement Learning for Robot Control Against Action Perturbations | Robustheitsbewertung von Offline-Verstärkungslernen für die Robotersteuerung gegen Aktionsstörungen | 对用于控制机器人控制行动干扰的离线强化学习的强力评价 2412.18781v2 |
Authors (4): Shingo Ayabe, Takuto Otomo, Hiroshi Kera, Kazuhiko Kawamoto
Offline reinforcement learning, which learns solely from datasets without environmental interaction, has gained attention. This approach, similar to traditional online deep reinforcement learning, is particularly promising for robot control applications. Nevertheless, its robustness against real-world challenges, such as joint actuator faults in robots, remains a critical concern. This study evaluates the robustness of existing offline reinforcement learning methods using legged robots from OpenAI Gym based on average episodic rewards. For robustness evaluation, we simulate failures by incorporating both random and adversarial perturbations, representing worst-case scenarios, into the joint torque signals. Our experiments show that existing offline reinforcement learning methods exhibit significant vulnerabilities to these action perturbations and are more vulnerable than online reinforcement learning methods, highlighting the need for more robust approaches in this field.
nan
Article 1029
Title@2025-07-18 (5): Search-Optimized Quantization in Biomedical Ontology Alignment
Title: Search-Optimized Quantization in Biomedical Ontology Alignment | Search-Optimierte Quantisierung in der biomedizinischen Ontologie Ausrichtung | 生物医学肿瘤协调方面的搜索优化定量化 2507.13742v1 |
Authors (2): Oussama Bouaggad, Natalia Grabar
In the fast-moving world of AI, as organizations and researchers develop more advanced models, they face challenges due to their sheer size and computational demands. Deploying such models on edge devices or in resource-constrained environments adds further challenges related to energy consumption, memory usage and latency. To address these challenges, emerging trends are shaping the future of efficient model optimization techniques. From this premise, by employing supervised state-of-the-art transformer-based models, this research introduces a systematic method for ontology alignment, grounded in cosine-based semantic similarity between a biomedical layman vocabulary and the Unified Medical Language System (UMLS) Metathesaurus. It leverages Microsoft Olive to search for target optimizations among different Execution Providers (EPs) using the ONNX Runtime backend, followed by an assembled process of dynamic quantization employing Intel Neural Compressor and IPEX (Intel Extension for PyTorch). Through our optimization process, we conduct extensive assessments on the two tasks from the DEFT 2020 Evaluation Campaign, achieving a new state-of-the-art in both. We retain performance metrics intact, while attaining an average inference speed-up of 20x and reducing memory usage by approximately 70%.
nan
Article 1030
Title@2025-07-18 (5): SamGoG: A Sampling-Based Graph-of-Graphs Framework for Imbalanced Graph Classification
Title: SamGoG: A Sampling-Based Graph-of-Graphs Framework for Imbalanced Graph Classification | SamGoG: Ein stichprobenbasierter Graph-of-Graphs-Rahmen für eine unausgewogene Graphenklassifikation | SamGG: 以抽样为基础的图示图示图图示图分类框架 2507.13741v1 |
Authors (3): Shangyou Wang, Zezhong Ding, Xike Xie
Graph Neural Networks (GNNs) have shown remarkable success in graph classification tasks by capturing both structural and feature-based representations. However, real-world graphs often exhibit two critical forms of imbalance: class imbalance and graph size imbalance. These imbalances can bias the learning process and degrade model performance. Existing methods typically address only one type of imbalance or incur high computational costs. In this work, we propose SamGoG, a sampling-based Graph-of-Graphs (GoG) learning framework that effectively mitigates both class and graph size imbalance. SamGoG constructs multiple GoGs through an efficient importance-based sampling mechanism and trains on them sequentially. This sampling mechanism incorporates the learnable pairwise similarity and adaptive GoG node degree to enhance edge homophily, thus improving downstream model quality. SamGoG can seamlessly integrate with various downstream GNNs, enabling their efficient adaptation for graph classification tasks. Extensive experiments on benchmark datasets demonstrate that SamGoG achieves state-of-the-art performance with up to a 15.66% accuracy improvement with 6.7$\times$ training acceleration.
nan
Article 1031
Title@2025-07-18 (5): Eye-tracked Virtual Reality: A Comprehensive Survey on Methods and Privacy Challenges
Title: Eye-tracked Virtual Reality: A Comprehensive Survey on Methods and Privacy Challenges | Virtual Reality: Eine umfassende Umfrage zu Methoden und Datenschutz-Herausforderungen | 双轨虚拟现实:关于方法和隐私挑战的全面调查 2305.14080v2 |
Authors (8): Efe Bozkir, Süleyman Özdel, Mengdi Wang, Brendan David-John, Hong Gao, Kevin Butler, Eakta Jain, Enkelejda Kasneci
The latest developments in computer hardware, sensor technologies, and artificial intelligence can make virtual reality (VR) and virtual spaces an important part of human everyday life. Eye tracking offers not only a hands-free way of interaction but also the possibility of a deeper understanding of human visual attention and cognitive processes in VR. Despite these possibilities, eye-tracking data also reveals users’ privacy-sensitive attributes when combined with the information about the presented stimulus. To address all these possibilities and potential privacy issues, in this survey, we first cover major works in eye tracking, VR, and privacy areas between 2012 and 2022. While eye tracking in the VR part covers the complete pipeline of eye-tracking methodology from pupil detection and gaze estimation to offline use of the data and analyses, as for privacy and security, we focus on eye-based authentication as well as computational methods to preserve the privacy of individuals and their eye-tracking data in VR. Later, considering all of these, we draw three main directions for the research community by focusing on privacy challenges. In summary, this survey provides an extensive literature review of the utmost possibilities with eye tracking in VR and the privacy implications of those possibilities.
nan
Article 1032
Title@2025-07-18 (5): Honesty in Causal Forests: When It Helps and When It Hurts
Title: Honesty in Causal Forests: When It Helps and When It Hurts | Ehrlichkeit im Kausalwald: Wenn es hilft und wenn es weh tut | Causal森林中的诚实:当它帮助时,当它伤害时 2506.13107v2 |
Authors (2): Yanfang Hou, Carlos Fernández-Loría
Causal forests estimate how treatment effects vary across individuals, guiding personalized interventions in areas like marketing, operations, and public policy. A standard modeling practice with this method is honest estimation: dividing the data so that the subgroups used to model treatment effect variation are formed separately from the data used to estimate those effects. This is intended to reduce overfitting and is the default in many software packages. But is it always the right choice? In this paper, we show that honest estimation can reduce the accuracy of individual-level treatment effect estimates, especially when there are substantial differences in how individuals respond to treatment, and the data is rich enough to uncover those differences. The core issue is a classic bias-variance trade-off: honesty lowers the risk of overfitting but increases the risk of underfitting, because it limits the data available to detect patterns. Across 7,500 benchmark datasets, we find that the cost of using honesty by default can be as high as requiring 75% more data to match the performance of models trained without it. We argue that honesty is best understood as a form of regularization, and like any regularization choice, its use should be guided by out-of-sample performance, not adopted reflexively.
nan
Article 1033
Title@2025-07-18 (5): An End-to-End DNN Inference Framework for the SpiNNaker2 Neuromorphic MPSoC
Title: An End-to-End DNN Inference Framework for the SpiNNaker2 Neuromorphic MPSoC | Ein End-to-End DNN-Inferenz-Framework für den SpiNNaker2 Neuromorphic MMPSoC | SpinNNAker2神经地态 MPSC 的端对端 DNN 推推框架 2507.13736v1 |
Authors (6): Matthias Jobst, Tim Langer, Chen Liu, Mehmet Alici, Hector A. Gonzalez, Christian Mayr
This work presents a multi-layer DNN scheduling framework as an extension of OctopuScheduler, providing an end-to-end flow from PyTorch models to inference on a single SpiNNaker2 chip. Together with a front-end comprised of quantization and lowering steps, the proposed framework enables the edge-based execution of large and complex DNNs up to transformer scale using the neuromorphic platform SpiNNaker2.
nan
Article 1034
Title@2025-07-18 (5): Prompt-Tuning Bandits: Enabling Few-Shot Generalization for Efficient Multi-Task Offline RL
Title: Prompt-Tuning Bandits: Enabling Few-Shot Generalization for Efficient Multi-Task Offline RL | Prompt-Tuning Bandits: Ermöglichung der wenigen scharfen Verallgemeinerung für effiziente Multi-Task Offline RL | 即时派遣强盗:为高效的多任务离线转线 2502.06358v3 |
Authors (4): Finn Rietz, Oleg Smirnov, Sara Karimi, Lele Cao
Prompting has emerged as the dominant paradigm for adapting large, pre-trained transformer-based models to downstream tasks. The Prompting Decision Transformer (PDT) enables large-scale, multi-task offline Reinforcement Learning (RL) pre-training by leveraging stochastic trajectory prompts to identify the target task. However, these prompts are sampled uniformly from expert demonstrations, overlooking a critical limitation: not all prompts are equally informative for differentiating between tasks. This limits generalization and adaptation, especially in low-data or open-world settings where sample efficiency is crucial. To address this issue, we propose a lightweight, inference-time, bandit-based prompt-tuning framework. The bandit explores and optimizes trajectory prompt selection to enhance task performance, while avoiding costly fine-tuning of the transformer backbone. Our experiments indicate not only clear performance gains due to bandit-based prompt-tuning, but also better sample complexity, scalability, and prompt space exploration compared to prompt-tuning baselines. These results highlights the importance of adaptive prompt selection mechanisms for efficient generalization in offline multi-task RL.
nan
Article 1035
Title@2025-07-18 (5): The Judge Variable: Challenging Judge-Agnostic Legal Judgment Prediction
Title: The Judge Variable: Challenging Judge-Agnostic Legal Judgment Prediction | Die Richtervariable: Herausfordernde Richter-agnostische rechtliche Urteilsvorhersage | 法官变量:挑战法官-不可接受法律判决预测 2507.13732v1 |
Authors (1): Guillaume Zambrano
This study examines the role of human judges in legal decision-making by using machine learning to predict child physical custody outcomes in French appellate courts. Building on the legal realism-formalism debate, we test whether individual judges’ decision-making patterns significantly influence case outcomes, challenging the assumption that judges are neutral variables that apply the law uniformly. To ensure compliance with French privacy laws, we implement a strict pseudonymization process. Our analysis uses 18,937 living arrangements rulings extracted from 10,306 cases. We compare models trained on individual judges’ past rulings (specialist models) with a judge-agnostic model trained on aggregated data (generalist models). The prediction pipeline is a hybrid approach combining large language models (LLMs) for structured feature extraction and ML models for outcome prediction (RF, XGB and SVC). Our results show that specialist models consistently achieve higher predictive accuracy than the general model, with top-performing models reaching F1 scores as high as 92.85%, compared to the generalist model’s 82.63% trained on 20x to 100x more samples. Specialist models capture stable individual patterns that are not transferable to other judges. In-Domain and Cross-Domain validity tests provide empirical support for legal realism, demonstrating that judicial identity plays a measurable role in legal outcomes. All data and code used will be made available.
nan
Article 1036
Title@2025-07-18 (5): Adversarial Training Improves Generalization Under Distribution Shifts in Bioacoustics
Title: Adversarial Training Improves Generalization Under Distribution Shifts in Bioacoustics | Adversariale Ausbildung verbessert Generalisierung unter Verteilungsverschiebungen in der Bioakustik | 反向培训改进了生物精算学分布变化下的普及化 2507.13727v1 |
Authors (4): René Heinrich, Lukas Rauch, Bernhard Sick, Christoph Scholz
Adversarial training is a promising strategy for enhancing model robustness against adversarial attacks. However, its impact on generalization under substantial data distribution shifts in audio classification remains largely unexplored. To address this gap, this work investigates how different adversarial training strategies improve generalization performance and adversarial robustness in audio classification. The study focuses on two model architectures: a conventional convolutional neural network (ConvNeXt) and an inherently interpretable prototype-based model (AudioProtoPNet). The approach is evaluated using a challenging bird sound classification benchmark. This benchmark is characterized by pronounced distribution shifts between training and test data due to varying environmental conditions and recording methods, a common real-world challenge. The investigation explores two adversarial training strategies: one based on output-space attacks that maximize the classification loss function, and another based on embedding-space attacks designed to maximize embedding dissimilarity. These attack types are also used for robustness evaluation. Additionally, for AudioProtoPNet, the study assesses the stability of its learned prototypes under targeted embedding-space attacks. Results show that adversarial training, particularly using output-space attacks, improves clean test data performance by an average of 10.5% relative and simultaneously strengthens the adversarial robustness of the models. These findings, although derived from the bird sound domain, suggest that adversarial training holds potential to enhance robustness against both strong distribution shifts and adversarial attacks in challenging audio classification settings.
nan
Article 1037
Title@2025-07-18 (5): FourCastNet 3: A geometric approach to probabilistic machine-learning weather forecasting at scale
Title: FourCastNet 3: A geometric approach to probabilistic machine-learning weather forecasting at scale | FourCastNet 3: Ein geometrischer Ansatz zur probabilistischen maschinellen Wettervorhersage im Maßstab | 4CastNet 3: 大规模机学习气象预测概率的几何方法 2507.12144v2 |
Authors (10): Boris Bonev, Thorsten Kurth, Ankur Mahesh, Mauro Bisson, Jean Kossaifi, Karthik Kashinath, Anima Anandkumar, William D. Collins, Michael S. Pritchard, Alexander Keller
FourCastNet 3 advances global weather modeling by implementing a scalable, geometric machine learning (ML) approach to probabilistic ensemble forecasting. The approach is designed to respect spherical geometry and to accurately model the spatially correlated probabilistic nature of the problem, resulting in stable spectra and realistic dynamics across multiple scales. FourCastNet 3 delivers forecasting accuracy that surpasses leading conventional ensemble models and rivals the best diffusion-based methods, while producing forecasts 8 to 60 times faster than these approaches. In contrast to other ML approaches, FourCastNet 3 demonstrates excellent probabilistic calibration and retains realistic spectra, even at extended lead times of up to 60 days. All of these advances are realized using a purely convolutional neural network architecture tailored for spherical geometry. Scalable and efficient large-scale training on 1024 GPUs and more is enabled by a novel training paradigm for combined model- and data-parallelism, inspired by domain decomposition methods in classical numerical models. Additionally, FourCastNet 3 enables rapid inference on a single GPU, producing a 60-day global forecast at 0.25{\deg}, 6-hourly resolution in under 4 minutes. Its computational efficiency, medium-range probabilistic skill, spectral fidelity, and rollout stability at subseasonal timescales make it a strong candidate for improving meteorological forecasting and early warning systems through large ensemble predictions.
nan
Article 1038
Title@2025-07-18 (5): Tackling fake images in cybersecurity – Interpretation of a StyleGAN and lifting its black-box
Title: Tackling fake images in cybersecurity – Interpretation of a StyleGAN and lifting its black-box | In Cybersecurity gefälschte Bilder zu packen – Interpretation eines StyleGAN und Aufhebung seiner Blackbox | 在网络安全中处理假图像 – – StyleGAN 的解读和取消黑盒 2507.13722v1 |
Authors (2): Julia Laubmann, Johannes Reschke
In today’s digital age, concerns about the dangers of AI-generated images are increasingly common. One powerful tool in this domain is StyleGAN (style-based generative adversarial networks), a generative adversarial network capable of producing highly realistic synthetic faces. To gain a deeper understanding of how such a model operates, this work focuses on analyzing the inner workings of StyleGAN’s generator component. Key architectural elements and techniques, such as the Equalized Learning Rate, are explored in detail to shed light on the model’s behavior. A StyleGAN model is trained using the PyTorch framework, enabling direct inspection of its learned weights. Through pruning, it is revealed that a significant number of these weights can be removed without drastically affecting the output, leading to reduced computational requirements. Moreover, the role of the latent vector – which heavily influences the appearance of the generated faces – is closely examined. Global alterations to this vector primarily affect aspects like color tones, while targeted changes to individual dimensions allow for precise manipulation of specific facial features. This ability to finetune visual traits is not only of academic interest but also highlights a serious ethical concern: the potential misuse of such technology. Malicious actors could exploit this capability to fabricate convincing fake identities, posing significant risks in the context of digital deception and cybercrime.
nan
Article 1039
Title@2025-07-18 (5): Graph-Structured Data Analysis of Component Failure in Autonomous Cargo Ships Based on Feature Fusion
Title: Graph-Structured Data Analysis of Component Failure in Autonomous Cargo Ships Based on Feature Fusion | Graphstrukturierte Datenanalyse des Bauteilausfalls in autonomen Frachtschiffen basierend auf Feature Fusion | 根据地貌分化对自主货运船舶部件故障进行图结构化数据分析 2507.13721v1 |
Authors (5): Zizhao Zhang, Tianxiang Zhao, Yu Sun, Liping Sun, Jichuan Kang
To address the challenges posed by cascading reactions caused by component failures in autonomous cargo ships (ACS) and the uncertainties in emergency decision-making, this paper proposes a novel hybrid feature fusion framework for constructing a graph-structured dataset of failure modes. By employing an improved cuckoo search algorithm (HN-CSA), the literature retrieval efficiency is significantly enhanced, achieving improvements of 7.1% and 3.4% compared to the NSGA-II and CSA search algorithms, respectively. A hierarchical feature fusion framework is constructed, using Word2Vec encoding to encode subsystem/component features, BERT-KPCA to process failure modes/reasons, and Sentence-BERT to quantify the semantic association between failure impact and emergency decision-making. The dataset covers 12 systems, 1,262 failure modes, and 6,150 propagation paths. Validation results show that the GATE-GNN model achieves a classification accuracy of 0.735, comparable to existing benchmarks. Additionally, a silhouette coefficient of 0.641 indicates that the features are highly distinguishable. In the label prediction results, the Shore-based Meteorological Service System achieved an F1 score of 0.93, demonstrating high prediction accuracy. This paper not only provides a solid foundation for failure analysis in autonomous cargo ships but also offers reliable support for fault diagnosis, risk assessment, and intelligent decision-making systems. The link to the dataset is https://github.com/wojiufukele/Graph-Structured-about-CSA.
nan
Article 1040
Title@2025-07-18 (5): Bi-GRU Based Deception Detection using EEG Signals
Title: Bi-GRU Based Deception Detection using EEG Signals | Bi-GRU-basierte Erkennung mit EEG-Signalen | Bi-GRU 使用 EEG 信号检测的基于Bi-GRU的欺骗性检测 2507.13718v1 |
Authors (6): Danilo Avola, Muhammad Yasir Bilal, Emad Emam, Cristina Lakasz, Daniele Pannone, Amedeo Ranaldi
Deception detection is a significant challenge in fields such as security, psychology, and forensics. This study presents a deep learning approach for classifying deceptive and truthful behavior using ElectroEncephaloGram (EEG) signals from the Bag-of-Lies dataset, a multimodal corpus designed for naturalistic, casual deception scenarios. A Bidirectional Gated Recurrent Unit (Bi-GRU) neural network was trained to perform binary classification of EEG samples. The model achieved a test accuracy of 97\%, along with high precision, recall, and F1-scores across both classes. These results demonstrate the effectiveness of using bidirectional temporal modeling for EEG-based deception detection and suggest potential for real-time applications and future exploration of advanced neural architectures.
nan
Article 1041
Title@2025-07-18 (5): Benchmarking of EEG Analysis Techniques for Parkinson’s Disease Diagnosis: A Comparison between Traditional ML Methods and Foundation DL Methods
Title: Benchmarking of EEG Analysis Techniques for Parkinson’s Disease Diagnosis: A Comparison between Traditional ML Methods and Foundation DL Methods | Benchmarking von EEG-Analysetechniken für Parkinson-Krankheitsdiagnose: Ein Vergleich zwischen traditionellen ML-Methoden und Stiftungs-DL-Methoden | Parkinson疾病诊断的EEG分析技术基准基准基准:传统ML方法与DL基础方法的比较 2507.13716v1 |
Authors (8): Danilo Avola, Andrea Bernardini, Giancarlo Crocetti, Andrea Ladogana, Mario Lezoche, Maurizio Mancini, Daniele Pannone, Amedeo Ranaldi
Parkinson’s Disease PD is a progressive neurodegenerative disorder that affects motor and cognitive functions with early diagnosis being critical for effective clinical intervention Electroencephalography EEG offers a noninvasive and costeffective means of detecting PDrelated neural alterations yet the development of reliable automated diagnostic models remains a challenge In this study we conduct a systematic benchmark of traditional machine learning ML and deep learning DL models for classifying PD using a publicly available oddball task dataset Our aim is to lay the groundwork for developing an effective learning system and to determine which approach produces the best results We implement a unified sevenstep preprocessing pipeline and apply consistent subjectwise crossvalidation and evaluation criteria to ensure comparability across models Our results demonstrate that while baseline deep learning architectures particularly CNNLSTM models achieve the best performance compared to other deep learning architectures underlining the importance of capturing longrange temporal dependencies several traditional classifiers such as XGBoost also offer strong predictive accuracy and calibrated decision boundaries By rigorously comparing these baselines our work provides a solid reference framework for future studies aiming to develop and evaluate more complex or specialized architectures Establishing a reliable set of baseline results is essential to contextualize improvements introduced by novel methods ensuring scientific rigor and reproducibility in the evolving field of EEGbased neurodiagnostics
nan
Article 1042
Title@2025-07-18 (5): LLaPipe: LLM-Guided Reinforcement Learning for Automated Data Preparation Pipeline Construction
Title: LLaPipe: LLM-Guided Reinforcement Learning for Automated Data Preparation Pipeline Construction | LLaPipe: LLM-geführtes Verstärkungslernen für die automatisierte Datenvorbereitung Pipeline-Konstruktion | LLaPipe:LLM-指导强化学习,用于自动数据准备管道建设 2507.13712v1 |
Authors (5): Jing Chang, Chang Liu, Jinbin Huang, Rui Mao, Jianbin Qin
Automated data preparation is crucial for democratizing machine learning, yet existing reinforcement learning (RL) based approaches suffer from inefficient exploration in the vast space of possible preprocessing pipelines. We present LLaPipe, a novel framework that addresses this exploration bottleneck by integrating Large Language Models (LLMs) as intelligent policy advisors. Unlike traditional methods that rely solely on statistical features and blind trial-and-error, LLaPipe leverages the semantic understanding capabilities of LLMs to provide contextually relevant exploration guidance. Our framework introduces three key innovations: (1) an LLM Policy Advisor that analyzes dataset semantics and pipeline history to suggest promising preprocessing operations, (2) an Experience Distillation mechanism that mines successful patterns from past pipelines and transfers this knowledge to guide future exploration, and (3) an Adaptive Advisor Triggering strategy (Advisor\textsuperscript{+}) that dynamically determines when LLM intervention is most beneficial, balancing exploration effectiveness with computational cost. Through extensive experiments on 18 diverse datasets spanning multiple domains, we demonstrate that LLaPipe achieves up to 22.4\% improvement in pipeline quality and 2.3$\times$ faster convergence compared to state-of-the-art RL-based methods, while maintaining computational efficiency through selective LLM usage (averaging only 19.0\% of total exploration steps).
nan
Article 1043
Title@2025-07-18 (5): MuteSwap: Visual-informed Silent Video Identity Conversion
Title: MuteSwap: Visual-informed Silent Video Identity Conversion | MuteSwap: Visuell informierte Silent Video Identity Conversion | MuteSwap: 视觉知情的静音视频身份转换 2507.00498v2 |
Authors (3): Yifan Liu, Yu Fang, Zhouhan Lin
Conventional voice conversion modifies voice characteristics from a source speaker to a target speaker, relying on audio input from both sides. However, this process becomes infeasible when clean audio is unavailable, such as in silent videos or noisy environments. In this work, we focus on the task of Silent Face-based Voice Conversion (SFVC), which does voice conversion entirely from visual inputs. i.e., given images of a target speaker and a silent video of a source speaker containing lip motion, SFVC generates speech aligning the identity of the target speaker while preserving the speech content in the source silent video. As this task requires generating intelligible speech and converting identity using only visual cues, it is particularly challenging. To address this, we introduce MuteSwap, a novel framework that employs contrastive learning to align cross-modality identities and minimize mutual information to separate shared visual features. Experimental results show that MuteSwap achieves impressive performance in both speech synthesis and identity conversion, especially under noisy conditions where methods dependent on audio input fail to produce intelligible results, demonstrating both the effectiveness of our training approach and the feasibility of SFVC.
nan
Article 1044
Title@2025-07-18 (5): CogniQ-H: A Soft Hierarchical Reinforcement Learning Paradigm for Automated Data Preparation
Title: CogniQ-H: A Soft Hierarchical Reinforcement Learning Paradigm for Automated Data Preparation | CogniQ-H: Ein weiches Hierarchisches Verstärkungs-Lernparadigma für die automatisierte Datenvorbereitung | CogniQ-H: 用于自动编制数据的软级级强化学习模型 2507.13710v1 |
Authors (5): Jing Chang, Chang Liu, Jinbin Huang, Rui Mao, Jianbin Qin
Data preparation is a foundational yet notoriously challenging component of the machine learning lifecycle, characterized by a vast combinatorial search space of potential operator sequences. While reinforcement learning (RL) offers a promising direction, existing approaches are inefficient as they fail to capture the structured, hierarchical nature of the problem. We argue that Hierarchical Reinforcement Learning (HRL), a paradigm that has been successful in other domains, provides a conceptually ideal yet previously unexplored framework for this task. However, a naive HRL implementation with a `hard hierarchy’ is prone to suboptimal, irreversible decisions. To address this, we introduce CogniQ-H, the first framework to implement a soft hierarchical paradigm for robust, end-to-end automated data preparation. CogniQ-H formulates action selection as a Bayesian inference problem. A high-level strategic prior, generated by a Large Language Model (LLM), guides exploration probabilistically. This prior is synergistically combined with a fine-grained operator quality score from a supervised Learning-to-Rank (LTR) model and a long-term value estimate from the agent’s own Q-function. This hybrid architecture allows CogniQ-H to balance strategic guidance with adaptive, evidence-based decision-making. Through extensive experiments on 18 diverse datasets spanning multiple domains, we demonstrate that CogniQ-H achieves up to 13.9\% improvement in pipeline quality and 2.8$\times$ faster convergence compared to state-of-the-art RL-based methods.
nan
Article 1045
Title@2025-07-18 (5): To Code or not to Code? Adaptive Tool Integration for Math Language Models via Expectation-Maximization
Title: To Code or not to Code? Adaptive Tool Integration for Math Language Models via Expectation-Maximization | Um zu kodieren oder nicht zu kodieren? Adaptive Toolintegration für Math Language Models über Erwartungs-Maximierung | 代码或非代码?通过期望-最大化将数学语言模型整合的适应性工具集成 2502.00691v4 |
Authors (7): Haozhe Wang, Long Li, Chao Qu, Fengming Zhu, Weidi Xu, Wei Chu, Fangzhen Lin
Recent advances in mathematical problem-solving with language models (LMs) integrate chain-of-thought (CoT) reasoning and code execution to harness their complementary strengths. However, existing hybrid frameworks exhibit a critical limitation: they depend on externally dictated instructions or rigid code-integration templates, lacking metacognitive awareness – the capacity to dynamically evaluate intrinsic capabilities and autonomously determine when and how to integrate tools. This rigidity motivates our study of autonomous code integration, enabling models to adapt tool-usage strategies as their reasoning abilities evolve during training. While reinforcement learning (RL) shows promise for boosting LLM reasoning at scale (e.g., DeepSeek-R1), we demonstrate its inefficiency in learning autonomous code integration due to inadequate exploration of the vast combinatorial space of CoT-code interleaving patterns. To address this challenge, we propose a novel Expectation-Maximization (EM) framework that synergizes structured exploration (E-step) with off-policy RL optimization (M-step), creating a self-reinforcing cycle between metacognitive tool-use decisions and evolving capabilities. Experiments reveal our method achieves superior results through improved exploration. Notably, our 7B model improves over 11% on MATH500 and 9.4% on AIME without o1-like CoT.
nan
Article 1046
Title@2025-07-18 (5): Mitigating Goal Misgeneralization via Minimax Regret
Title: Mitigating Goal Misgeneralization via Minimax Regret | Zielverallgemeinerung durch Minimax-Regret abmildern | 通过Minimmax Regret 推广 2507.03068v2 |
Authors (7): Karim Abdel Sadek, Matthew Farrugia-Roberts, Usman Anwar, Hannah Erlebach, Christian Schroeder de Witt, David Krueger, Michael Dennis
Safe generalization in reinforcement learning requires not only that a learned policy acts capably in new situations, but also that it uses its capabilities towards the pursuit of the designer’s intended goal. The latter requirement may fail when a proxy goal incentivizes similar behavior to the intended goal within the training environment, but not in novel deployment environments. This creates the risk that policies will behave as if in pursuit of the proxy goal, rather than the intended goal, in deployment – a phenomenon known as goal misgeneralization. In this paper, we formalize this problem setting in order to theoretically study the possibility of goal misgeneralization under different training objectives. We show that goal misgeneralization is possible under approximate optimization of the maximum expected value (MEV) objective, but not the minimax expected regret (MMER) objective. We then empirically show that the standard MEV-based training method of domain randomization exhibits goal misgeneralization in procedurally-generated grid-world environments, whereas current regret-based unsupervised environment design (UED) methods are more robust to goal misgeneralization (though they don’t find MMER policies in all cases). Our findings suggest that minimax expected regret is a promising approach to mitigating goal misgeneralization.
nan
Article 1047
Title@2025-07-18 (5): Improving DAPO from a Mixed-Policy Perspective
Title: Improving DAPO from a Mixed-Policy Perspective | Verbesserung der DAPO aus gemischter Politik | 从混合政策角度改进残疾和残疾人组织 2507.12931v2 |
Authors (1): Hongze Tan
This paper introduces two novel modifications to the Dynamic sAmpling Policy Optimization (DAPO) algorithm [1], approached from a mixed-policy perspective. Standard policy gradient methods can suffer from instability and sample inefficiency, particularly in sparse reward settings. To address this, we first propose a method that incorporates a pre-trained, stable guiding policy ($\piphi$) to provide off-policy experience, thereby regularizing the training of the target policy ($\pion$). This approach improves training stability and convergence speed by adaptively adjusting the learning step size. Secondly, we extend this idea to re-utilize zero-reward samples, which are often discarded by dynamic sampling strategies like DAPO’s. By treating these samples as a distinct batch guided by the expert policy, we further enhance sample efficiency. We provide a theoretical analysis for both methods, demonstrating that their objective functions converge to the optimal solution within the established theoretical framework of reinforcement learning. The proposed mixed-policy framework effectively balances exploration and exploitation, promising more stable and efficient policy optimization.
nan
Article 1048
Title@2025-07-18 (5): MoTM: Towards a Foundation Model for Time Series Imputation based on Continuous Modeling
Title: MoTM: Towards a Foundation Model for Time Series Imputation based on Continuous Modeling | MoTM: Auf dem Weg zu einem Basismodell für Zeitreihen Imputation basierend auf kontinuierlicher Modellierung | MoTM:建立基于连续建模的时间序列计算基础模型 2507.13207v2 |
Authors (3): Etienne Le Naour, Tahar Nabil, Ghislain Agoua
Recent years have witnessed a growing interest for time series foundation models, with a strong emphasis on the forecasting task. Yet, the crucial task of out-of-domain imputation of missing values remains largely underexplored. We propose a first step to fill this gap by leveraging implicit neural representations (INRs). INRs model time series as continuous functions and naturally handle various missing data scenarios and sampling rates. While they have shown strong performance within specific distributions, they struggle under distribution shifts. To address this, we introduce MoTM (Mixture of Timeflow Models), a step toward a foundation model for time series imputation. Building on the idea that a new time series is a mixture of previously seen patterns, MoTM combines a basis of INRs, each trained independently on a distinct family of time series, with a ridge regressor that adapts to the observed context at inference. We demonstrate robust in-domain and out-of-domain generalization across diverse imputation scenarios (e.g., block and pointwise missingness, variable sampling rates), paving the way for adaptable foundation imputation models.
nan
Article 1049
Title@2025-07-18 (5): Policy Verification in Stochastic Dynamical Systems Using Logarithmic Neural Certificates
Title: Policy Verification in Stochastic Dynamical Systems Using Logarithmic Neural Certificates | Politikprüfung in stochastischen dynamischen Systemen mit logarithmischen Neuralzertifikaten | 使用对数神经神经证书进行斯托卡动态系统的政策核查 2406.00826v4 |
Authors (4): Thom Badings, Wietze Koops, Sebastian Junges, Nils Jansen
We consider the verification of neural network policies for discrete-time stochastic systems with respect to reach-avoid specifications. We use a learner-verifier procedure that learns a certificate for the specification, represented as a neural network. Verifying that this neural network certificate is a so-called reach-avoid supermartingale (RASM) proves the satisfaction of a reach-avoid specification. Existing approaches for such a verification task rely on computed Lipschitz constants of neural networks. These approaches struggle with large Lipschitz constants, especially for reach-avoid specifications with high threshold probabilities. We present two key contributions to obtain smaller Lipschitz constants than existing approaches. First, we introduce logarithmic RASMs (logRASMs), which take exponentially smaller values than RASMs and hence have lower theoretical Lipschitz constants. Second, we present a fast method to compute tighter upper bounds on Lipschitz constants based on weighted norms. Our empirical evaluation shows we can consistently verify the satisfaction of reach-avoid specifications with probabilities as high as 99.9999%.
nan
Article 1050
Title@2025-07-18 (5): Learning Deformable Body Interactions With Adaptive Spatial Tokenization
Title: Learning Deformable Body Interactions With Adaptive Spatial Tokenization | Verformbare Körperinteraktionen mit adaptiver räumlicher Tokenisierung lernen | 学习与适应性空间拳击的变形身体互动 2507.13707v1 |
Authors (6): Hao Wang, Yu Liu, Daniel Biggs, Haoru Wang, Jiandong Yu, Ping Huang
Simulating interactions between deformable bodies is vital in fields like material science, mechanical design, and robotics. While learning-based methods with Graph Neural Networks (GNNs) are effective at solving complex physical systems, they encounter scalability issues when modeling deformable body interactions. To model interactions between objects, pairwise global edges have to be created dynamically, which is computationally intensive and impractical for large-scale meshes. To overcome these challenges, drawing on insights from geometric representations, we propose an Adaptive Spatial Tokenization (AST) method for efficient representation of physical states. By dividing the simulation space into a grid of cells and mapping unstructured meshes onto this structured grid, our approach naturally groups adjacent mesh nodes. We then apply a cross-attention module to map the sparse cells into a compact, fixed-length embedding, serving as tokens for the entire physical state. Self-attention modules are employed to predict the next state over these tokens in latent space. This framework leverages the efficiency of tokenization and the expressive power of attention mechanisms to achieve accurate and scalable simulation results. Extensive experiments demonstrate that our method significantly outperforms state-of-the-art approaches in modeling deformable body interactions. Notably, it remains effective on large-scale simulations with meshes exceeding 100,000 nodes, where existing methods are hindered by computational limitations. Additionally, we contribute a novel large-scale dataset encompassing a wide range of deformable body interactions to support future research in this area.
nan
Article 1051
Title@2025-07-18 (5): EgoVLA: Learning Vision-Language-Action Models from Egocentric Human Videos
Title: EgoVLA: Learning Vision-Language-Action Models from Egocentric Human Videos | EgoVLA: Vision-Language-Action-Modelle von egozentrischen menschlichen Videos lernen | EgoVLA:从以以地球为中心的人类视频中学习愿景-语言-行动模式 2507.12440v3 |
Authors (15): Ruihan Yang, Qinxi Yu, Yecheng Wu, Rui Yan, Borui Li, An-Chieh Cheng, Xueyan Zou, Yunhao Fang, Xuxin Cheng, Ri-Zhao Qiu, Hongxu Yin, Sifei Liu, Song Han, Yao Lu, Xiaolong Wang
Real robot data collection for imitation learning has led to significant advancements in robotic manipulation. However, the requirement for robot hardware in the process fundamentally constrains the scale of the data. In this paper, we explore training Vision-Language-Action (VLA) models using egocentric human videos. The benefit of using human videos is not only for their scale but more importantly for the richness of scenes and tasks. With a VLA trained on human video that predicts human wrist and hand actions, we can perform Inverse Kinematics and retargeting to convert the human actions to robot actions. We fine-tune the model using a few robot manipulation demonstrations to obtain the robot policy, namely EgoVLA. We propose a simulation benchmark called Ego Humanoid Manipulation Benchmark, where we design diverse bimanual manipulation tasks with demonstrations. We fine-tune and evaluate EgoVLA with Ego Humanoid Manipulation Benchmark and show significant improvements over baselines and ablate the importance of human data. Videos can be found on our website: https://rchalyang.github.io/EgoVLA
nan
Article 1052
Title@2025-07-18 (5): Binarizing Physics-Inspired GNNs for Combinatorial Optimization
Title: Binarizing Physics-Inspired GNNs for Combinatorial Optimization | Verbindliche Physik-inspirierte GNNs für die kombinatorische Optimierung | 联合优化的由物理启发的GNNs 2507.13703v1 |
Authors (4): Martin Krutský, Gustav Šír, Vyacheslav Kungurtsev, Georgios Korpas
Physics-inspired graph neural networks (PI-GNNs) have been utilized as an efficient unsupervised framework for relaxing combinatorial optimization problems encoded through a specific graph structure and loss, reflecting dependencies between the problem’s variables. While the framework has yielded promising results in various combinatorial problems, we show that the performance of PI-GNNs systematically plummets with an increasing density of the combinatorial problem graphs. Our analysis reveals an interesting phase transition in the PI-GNNs’ training dynamics, associated with degenerate solutions for the denser problems, highlighting a discrepancy between the relaxed, real-valued model outputs and the binary-valued problem solutions. To address the discrepancy, we propose principled alternatives to the naive strategy used in PI-GNNs by building on insights from fuzzy logic and binarized neural networks. Our experiments demonstrate that the portfolio of proposed methods significantly improves the performance of PI-GNNs in increasingly dense settings.
nan
Article 1053
Title@2025-07-18 (5): Can we ease the Injectivity Bottleneck on Lorentzian Manifolds for Graph Neural Networks?
Title: Can we ease the Injectivity Bottleneck on Lorentzian Manifolds for Graph Neural Networks? | Können wir den Injektivitätsengpass auf Lorentzian Manifolds für Graphen-Neural-Netzwerke erleichtern? | 我们能否为图形神经网络 减轻Lorentzian Manifolds的 射入波特内克? 2504.00142v5 |
Authors (2): Srinitish Srinivasan, Omkumar CU
While hyperbolic GNNs show promise for hierarchical data, they often have limited discriminative power compared to Euclidean counterparts or the WL test, due to non-injective aggregation. To address this expressivity gap, we propose the Lorentzian Graph Isomorphic Network (LGIN), a novel HGNN designed for enhanced discrimination within the Lorentzian model. LGIN introduces a new update rule that preserves the Lorentzian metric while effectively capturing richer structural information. This marks a significant step towards more expressive GNNs on Riemannian manifolds. Extensive evaluations across nine benchmark datasets demonstrate LGIN’s superior performance, consistently outperforming or matching state-of-the-art hyperbolic and Euclidean baselines, showcasing its ability to capture complex graph structures. LGIN is the first to adapt principles of powerful, highly discriminative GNN architectures to a Riemannian manifold. The code for our paper can be found at https://github.com/Deceptrax123/LGIN
nan
Article 1054
Title@2025-07-18 (5): Tight Bounds for Answering Adaptively Chosen Concentrated Queries
Title: Tight Bounds for Answering Adaptively Chosen Concentrated Queries | Enge Grenzen für die Antwort auf adaptiv ausgewählte konzentrierte Abfragen | 用于回答适应性选择的集中查询的紧闭环环环 2507.13700v1 |
Authors (3): Emma Rapoport, Edith Cohen, Uri Stemmer
Most work on adaptive data analysis assumes that samples in the dataset are independent. When correlations are allowed, even the non-adaptive setting can become intractable, unless some structural constraints are imposed. To address this, Bassily and Freund [2016] introduced the elegant framework of concentrated queries, which requires the analyst to restrict itself to queries that are concentrated around their expected value. While this assumption makes the problem trivial in the non-adaptive setting, in the adaptive setting it remains quite challenging. In fact, all known algorithms in this framework support significantly fewer queries than in the independent case: At most $O(n)$ queries for a sample of size $n$, compared to $O(n^2)$ in the independent setting. In this work, we prove that this utility gap is inherent under the current formulation of the concentrated queries framework, assuming some natural conditions on the algorithm. Additionally, we present a simplified version of the best-known algorithms that match our impossibility result.
nan
Article 1055
Title@2025-07-18 (5): FedDifRC: Unlocking the Potential of Text-to-Image Diffusion Models in Heterogeneous Federated Learning
Title: FedDifRC: Unlocking the Potential of Text-to-Image Diffusion Models in Heterogeneous Federated Learning | FedDifRC: Entsperren des Potenzials von Text-zu-Bild-Diffusionsmodellen im Heterogenen Federated Learning | FedDifRC:在异质联邦学习中释放文本到图像传播模型的潜力 2507.06482v2 |
Authors (6): Huan Wang, Haoran Li, Huaming Chen, Jun Yan, Jiahua Shi, Jun Shen
Federated learning aims at training models collaboratively across participants while protecting privacy. However, one major challenge for this paradigm is the data heterogeneity issue, where biased data preferences across multiple clients, harming the model’s convergence and performance. In this paper, we first introduce powerful diffusion models into the federated learning paradigm and show that diffusion representations are effective steers during federated training. To explore the possibility of using diffusion representations in handling data heterogeneity, we propose a novel diffusion-inspired Federated paradigm with Diffusion Representation Collaboration, termed FedDifRC, leveraging meaningful guidance of diffusion models to mitigate data heterogeneity. The key idea is to construct text-driven diffusion contrasting and noise-driven diffusion regularization, aiming to provide abundant class-related semantic information and consistent convergence signals. On the one hand, we exploit the conditional feedback from the diffusion model for different text prompts to build a text-driven contrastive learning strategy. On the other hand, we introduce a noise-driven consistency regularization to align local instances with diffusion denoising representations, constraining the optimization region in the feature space. In addition, FedDifRC can be extended to a self-supervised scheme without relying on any labeled data. We also provide a theoretical analysis for FedDifRC to ensure convergence under non-convex objectives. The experiments on different scenarios validate the effectiveness of FedDifRC and the efficiency of crucial components.
nan
Article 1056
Title@2025-07-18 (5): An AI-powered Technology Stack for Solving Many-Electron Field Theory
Title: An AI-powered Technology Stack for Solving Many-Electron Field Theory | Ein KI-powered Technologie Stack für die Lösung von Viel-Elektronen-Feld-Theorie | 用于解决多电场理论的AI-动力技术堆叠 2403.18840v2 |
Authors (8): Pengcheng Hou, Tao Wang, Daniel Cerkoney, Xiansheng Cai, Zhiyi Li, Youjin Deng, Lei Wang, Kun Chen
Quantum field theory (QFT) for interacting many-electron systems is fundamental to condensed matter physics, yet achieving accurate solutions confronts computational challenges in managing the combinatorial complexity of Feynman diagrams, implementing systematic renormalization, and evaluating high-dimensional integrals. We present a unifying framework that integrates QFT computational workflows with an AI-powered technology stack. A cornerstone of this framework is representing Feynman diagrams as computational graphs, which structures the inherent mathematical complexity and facilitates the application of optimized algorithms developed for machine learning and high-performance computing. Consequently, automatic differentiation, native to these graph representations, delivers efficient, fully automated, high-order field-theoretic renormalization procedures. This graph-centric approach also enables sophisticated numerical integration; our neural-network-enhanced Monte Carlo method, accelerated via massively parallel GPU implementation, efficiently evaluates challenging high-dimensional diagrammatic integrals. Applying this framework to the uniform electron gas, we determine the quasiparticle effective mass to a precision significantly surpassing current state-of-the-art simulations. Our work demonstrates the transformative potential of integrating AI-driven computational advances with QFT, opening systematic pathways for solving complex quantum many-body problems across disciplines.
nan
Article 1057
Title@2025-07-18 (5): Kolmogorov-Arnold Networks-based GRU and LSTM for Loan Default Early Prediction
Title: Kolmogorov-Arnold Networks-based GRU and LSTM for Loan Default Early Prediction | Kolmogorov-Arnold Networks-basierte GRU und LSTM für Kredit-Standard-Frühvorhersage | Kolmogorov-Arnold网络基于GRU和LSTM的贷款默认早期预测 2507.13685v1 |
Authors (7): Yue Yang, Zihan Su, Ying Zhang, Chang Chuan Goh, Yuxiang Lin, Anthony Graham Bellotti, Boon Giin Lee
This study addresses a critical challenge in time series anomaly detection: enhancing the predictive capability of loan default models more than three months in advance to enable early identification of default events, helping financial institutions implement preventive measures before risk events materialize. Existing methods have significant drawbacks, such as their lack of accuracy in early predictions and their dependence on training and testing within the same year and specific time frames. These issues limit their practical use, particularly with out-of-time data. To address these, the study introduces two innovative architectures, GRU-KAN and LSTM-KAN, which merge Kolmogorov-Arnold Networks (KAN) with Gated Recurrent Units (GRU) and Long Short-Term Memory (LSTM) networks. The proposed models were evaluated against the baseline models (LSTM, GRU, LSTM-Attention, and LSTM-Transformer) in terms of accuracy, precision, recall, F1 and AUC in different lengths of feature window, sample sizes, and early prediction intervals. The results demonstrate that the proposed model achieves a prediction accuracy of over 92% three months in advance and over 88% eight months in advance, significantly outperforming existing baselines.
nan
Article 1058
Title@2025-07-18 (5): FireQ: Fast INT4-FP8 Kernel and RoPE-aware Quantization for LLM Inference Acceleration
Title: FireQ: Fast INT4-FP8 Kernel and RoPE-aware Quantization for LLM Inference Acceleration | FireQ: Schnelle INT4-FP8-Kernel- und RoPE-gestützte Quantisierung für LLM-Inferenzbeschleunigung | 消防:快速INT4-FFP8 内核和ROPE-感知的LLM 推推加速量 2505.20839v3 |
Authors (8): Daehyeon Baek, Jieun Choi, Jimyoung Son, Kyungmin Bin, Seungbeom Choi, Kihyo Moon, Minsung Jang, Hyojung Lee
As large language models become increasingly prevalent, memory bandwidth constraints significantly limit inference throughput, motivating post-training quantization (PTQ). In this paper, we propose FireQ, a co-designed PTQ framework and an INT4-FP8 matrix multiplication kernel that accelerates LLM inference across all linear layers. Specifically, FireQ quantizes linear layer weights and key-values to INT4, and activations and queries to FP8, significantly enhancing throughput. Additionally, we introduce a three-stage pipelining for the prefill phase, which modifies the FlashAttention-3 kernel, effectively reducing time-to-first-token in the prefill phase. To minimize accuracy loss from quantization, we develop novel outlier smoothing techniques tailored separately for linear and attention layers. In linear layers, we explicitly use per-tensor scaling to prevent underflow caused by the FP8 quantization scaling factor of INT4 quantization, and channel-wise scaling to compensate for coarse granularity of INT4. In attention layers, we address quantization challenges posed by rotary positional embeddings (RoPE) by combining pre-RoPE and post-RoPE scaling strategies. FireQ significantly outperforms state-of-the-art methods, achieving 1.68x faster inference in feed-forward network layers on Llama2-7B and 1.26x faster prefill phase performance on Llama3-8B compared to QServe, with negligible accuracy loss.
nan
Article 1059
Title@2025-07-18 (5): MUSO: Achieving Exact Machine Unlearning in Over-Parameterized Regimes
Title: MUSO: Achieving Exact Machine Unlearning in Over-Parameterized Regimes | MUSO: Exaktes Lernen der Maschine in überparameterisierten Regimes | MUSO:在过度测量制度中实现精确的机械脱学 2410.08557v2 |
Authors (5): Ruikai Yang, Mingzhen He, Zhengbao He, Youmei Qiu, Xiaolin Huang
Machine unlearning (MU) is to make a well-trained model behave as if it had never been trained on specific data. In today’s over-parameterized models, dominated by neural networks, a common approach is to manually relabel data and fine-tune the well-trained model. It can approximate the MU model in the output space, but the question remains whether it can achieve exact MU, i.e., in the parameter space. We answer this question by employing random feature techniques to construct an analytical framework. Under the premise of model optimization via stochastic gradient descent, we theoretically demonstrated that over-parameterized linear models can achieve exact MU through relabeling specific data. We also extend this work to real-world nonlinear networks and propose an alternating optimization algorithm that unifies the tasks of unlearning and relabeling. The algorithm’s effectiveness, confirmed through numerical experiments, highlights its superior performance in unlearning across various scenarios compared to current state-of-the-art methods, particularly excelling over similar relabeling-based MU approaches.
nan
Article 1060
Title@2025-07-18 (5): HeCoFuse: Cross-Modal Complementary V2X Cooperative Perception with Heterogeneous Sensors
Title: HeCoFuse: Cross-Modal Complementary V2X Cooperative Perception with Heterogeneous Sensors | HeCoFuse: Cross-Modal Complementary V2X kooperative Wahrnehmung mit Heterogenen Sensoren | HEFuse:跨模式补充V2X合作感知与异源感应器 2507.13677v1 |
Authors (5): Chuheng Wei, Ziye Qin, Walter Zimmer, Guoyuan Wu, Matthew J. Barth
Real-world Vehicle-to-Everything (V2X) cooperative perception systems often operate under heterogeneous sensor configurations due to cost constraints and deployment variability across vehicles and infrastructure. This heterogeneity poses significant challenges for feature fusion and perception reliability. To address these issues, we propose HeCoFuse, a unified framework designed for cooperative perception across mixed sensor setups where nodes may carry Cameras (C), LiDARs (L), or both. By introducing a hierarchical fusion mechanism that adaptively weights features through a combination of channel-wise and spatial attention, HeCoFuse can tackle critical challenges such as cross-modality feature misalignment and imbalanced representation quality. In addition, an adaptive spatial resolution adjustment module is employed to balance computational cost and fusion effectiveness. To enhance robustness across different configurations, we further implement a cooperative learning strategy that dynamically adjusts fusion type based on available modalities. Experiments on the real-world TUMTraf-V2X dataset demonstrate that HeCoFuse achieves 43.22% 3D mAP under the full sensor configuration (LC+LC), outperforming the CoopDet3D baseline by 1.17%, and reaches an even higher 43.38% 3D mAP in the L+LC scenario, while maintaining 3D mAP in the range of 21.74% to 43.38% across nine heterogeneous sensor configurations. These results, validated by our first-place finish in the CVPR 2025 DriveX challenge, establish HeCoFuse as the current state-of-the-art on TUM-Traf V2X dataset while demonstrating robust performance across diverse sensor deployments.
nan
Article 1061
Title@2025-07-18 (5): Complex non-backtracking matrix for directed graphs
Title: Complex non-backtracking matrix for directed graphs | Komplexe Nicht-Rückverfolgungsmatrix für gerichtete Graphen | 定向图表的复杂非后跟踪矩阵表 2507.12503v2 |
Authors (2): Keishi Sando, Hideitsu Hino
Graph representation matrices are essential tools in graph data analysis. Recently, Hermitian adjacency matrices have been proposed to investigate directed graph structures. Previous studies have demonstrated that these matrices can extract valuable information for clustering. In this paper, we propose the complex non-backtracking matrix that integrates the properties of the Hermitian adjacency matrix and the non-backtracking matrix. The proposed matrix has similar properties with the non-backtracking matrix of undirected graphs. We reveal relationships between the complex non-backtracking matrix and the Hermitian adjacency matrix. Also, we provide intriguing insights that this matrix representation holds cluster information, particularly for sparse directed graphs.
nan
Article 1062
Title@2025-07-18 (5): Breaking the Illusion of Security via Interpretation: Interpretable Vision Transformer Systems under Attack
Title: Breaking the Illusion of Security via Interpretation: Interpretable Vision Transformer Systems under Attack | Breaking the Illusion of Security via Interpretation: Interpretable Vision Transformer Systeme unter Angriff | 通过解释打破对安全的幻觉:被攻击的可解释的愿景变形系统 2507.14248v1 |
Authors (5): Eldor Abdukhamidov, Mohammed Abuhamad, Simon S. Woo, Hyoungshick Kim, Tamer Abuhmed
Vision transformer (ViT) models, when coupled with interpretation models, are regarded as secure and challenging to deceive, making them well-suited for security-critical domains such as medical applications, autonomous vehicles, drones, and robotics. However, successful attacks on these systems can lead to severe consequences. Recent research on threats targeting ViT models primarily focuses on generating the smallest adversarial perturbations that can deceive the models with high confidence, without considering their impact on model interpretations. Nevertheless, the use of interpretation models can effectively assist in detecting adversarial examples. This study investigates the vulnerability of transformer models to adversarial attacks, even when combined with interpretation models. We propose an attack called “AdViT” that generates adversarial examples capable of misleading both a given transformer model and its coupled interpretation model. Through extensive experiments on various transformer models and two transformer-based interpreters, we demonstrate that AdViT achieves a 100% attack success rate in both white-box and black-box scenarios. In white-box scenarios, it reaches up to 98% misclassification confidence, while in black-box scenarios, it reaches up to 76% misclassification confidence. Remarkably, AdViT consistently generates accurate interpretations in both scenarios, making the adversarial examples more difficult to detect.
nan
Article 1063
Title@2025-07-18 (5): When Person Re-Identification Meets Event Camera: A Benchmark Dataset and An Attribute-guided Re-Identification Framework
Title: When Person Re-Identification Meets Event Camera: A Benchmark Dataset and An Attribute-guided Re-Identification Framework | Wenn Person Re-Identification auf Ereigniskamera trifft: Ein Benchmark-Datensatz und ein Attribut-geführtes Re-Identification Framework | 当人员重新确认与事件相遇时:基准数据集和属性指导的重新确定框架 2507.13659v1 |
Authors (8): Xiao Wang, Qian Zhu, Shujuan Wu, Bo Jiang, Shiliang Zhang, Yaowei Wang, Yonghong Tian, Bin Luo
Recent researchers have proposed using event cameras for person re-identification (ReID) due to their promising performance and better balance in terms of privacy protection, event camera-based person ReID has attracted significant attention. Currently, mainstream event-based person ReID algorithms primarily focus on fusing visible light and event stream, as well as preserving privacy. Although significant progress has been made, these methods are typically trained and evaluated on small-scale or simulated event camera datasets, making it difficult to assess their real identification performance and generalization ability. To address the issue of data scarcity, this paper introduces a large-scale RGB-event based person ReID dataset, called EvReID. The dataset contains 118,988 image pairs and covers 1200 pedestrian identities, with data collected across multiple seasons, scenes, and lighting conditions. We also evaluate 15 state-of-the-art person ReID algorithms, laying a solid foundation for future research in terms of both data and benchmarking. Based on our newly constructed dataset, this paper further proposes a pedestrian attribute-guided contrastive learning framework to enhance feature learning for person re-identification, termed TriPro-ReID. This framework not only effectively explores the visual features from both RGB frames and event streams, but also fully utilizes pedestrian attributes as mid-level semantic features. Extensive experiments on the EvReID dataset and MARS datasets fully validated the effectiveness of our proposed RGB-Event person ReID framework. The benchmark dataset and source code will be released on https://github.com/Event-AHU/Neuromorphic_ReID
nan
Article 1064
Title@2025-07-18 (5): Towards Foundation Models for Experimental Readout Systems Combining Discrete and Continuous Data
Title: Towards Foundation Models for Experimental Readout Systems Combining Discrete and Continuous Data | Auf dem Weg zu Grundlagenmodellen für experimentelle Auslesesysteme zur Kombination von diskreten und kontinuierlichen Daten | 建立分立和连续数据合并的实验读出系统基础模型 2505.08736v2 |
Authors (2): James Giroux, Cristiano Fanelli
We present a (proto) Foundation Model for Nuclear Physics, capable of operating on low-level detector inputs from Imaging Cherenkov Detectors at the future Electron Ion Collider. Building upon established next-token prediction approaches, we aim to address potential challenges such as resolution loss from existing tokenization schemes and limited support for conditional generation. We propose four key innovations: (i) separate vocabularies for discrete and continuous variates, combined via Causal Multi-Head Cross-Attention (CMHCA), (ii) continuous kinematic conditioning through prepended context embeddings, (iii) scalable and simple, high-resolution continuous variate tokenization without joint vocabulary inflation, and (iv) class conditional generation through a Mixture of Experts. Our model enables fast, high-fidelity generation of pixel and time sequences for Cherenkov photons, validated through closure tests in the High Performance DIRC. We also show our model generalizes to reconstruction tasks such as pion/kaon identification, and noise filtering, in which we show its ability to leverage fine-tuning under specific objectives.
nan
Article 1065
Title@2025-07-18 (5): A Comprehensive Review of Transformer-based language models for Protein Sequence Analysis and Design
Title: A Comprehensive Review of Transformer-based language models for Protein Sequence Analysis and Design | Eine umfassende Überprüfung von Transformer-basierten Sprachmodellen für Proteinsequenzanalyse und -design | 全面审查以变换器为基础的蛋白序列分析和设计语言模型 2507.13646v1 |
Authors (5): Nimisha Ghosh, Daniele Santoni, Debaleena Nawn, Eleonora Ottaviani, Giovanni Felici
The impact of Transformer-based language models has been unprecedented in Natural Language Processing (NLP). The success of such models has also led to their adoption in other fields including bioinformatics. Taking this into account, this paper discusses recent advances in Transformer-based models for protein sequence analysis and design. In this review, we have discussed and analysed a significant number of works pertaining to such applications. These applications encompass gene ontology, functional and structural protein identification, generation of de novo proteins and binding of proteins. We attempt to shed light on the strength and weaknesses of the discussed works to provide a comprehensive insight to readers. Finally, we highlight shortcomings in existing research and explore potential avenues for future developments. We believe that this review will help researchers working in this field to have an overall idea of the state of the art in this field, and to orient their future studies.
nan
Article 1066
Title@2025-07-18 (5): The Illusion of Thinking: Understanding the Strengths and Limitations of Reasoning Models via the Lens of Problem Complexity
Title: The Illusion of Thinking: Understanding the Strengths and Limitations of Reasoning Models via the Lens of Problem Complexity | Die Illusion des Denkens: Die Stärken und Grenzen von Vernunftmodellen über das Lens of Problem Complexity verstehen | 思考的幻觉:通过问题复杂焦点了解理性模型的长处和局限性 2506.06941v2 |
Authors (6): Parshin Shojaee, Iman Mirzadeh, Keivan Alizadeh, Maxwell Horton, Samy Bengio, Mehrdad Farajtabar
Recent generations of language models have introduced Large Reasoning Models (LRMs) that generate detailed thinking processes before providing answers. While these models demonstrate improved performance on reasoning benchmarks, their fundamental capabilities, scaling properties, and limitations remain insufficiently understood. Current evaluations primarily focus on established math and coding benchmarks, emphasizing final answer accuracy. However, this evaluation paradigm often suffers from contamination and does not provide insights into the reasoning traces. In this work, we systematically investigate these gaps with the help of controllable puzzle environments that allow precise manipulation of complexity while maintaining consistent logical structures. This setup enables the analysis of not only final answers but also the internal reasoning traces, offering insights into how LRMs think. Through extensive experiments, we show that LRMs face a complete accuracy collapse beyond certain complexities. Moreover, they exhibit a counterintuitive scaling limit: their reasoning effort increases with problem complexity up to a point, then declines despite having remaining token budget. By comparing LRMs with their standard LLM counterparts under same inference compute, we identify three performance regimes: (1) low-complexity tasks where standard models outperform LRMs, (2) medium-complexity tasks where LRMs demonstrates advantage, and (3) high-complexity tasks where both models face complete collapse. We found that LRMs have limitations in exact computation: they fail to use explicit algorithms and reason inconsistently across scales. We also investigate the reasoning traces in more depth, studying the patterns of explored solutions and analyzing the models’ computational behavior, shedding light on their strengths, limitations, and raising questions about their reasoning capabilities.
nan
Article 1067
Title@2025-07-18 (5): KEPLA: A Knowledge-Enhanced Deep Learning Framework for Accurate Protein-Ligand Binding Affinity Prediction
Title: KEPLA: A Knowledge-Enhanced Deep Learning Framework for Accurate Protein-Ligand Binding Affinity Prediction | KEPLA: Ein Wissen-erweitertes Deep Learning Framework für präzise Protein-Ligand Bindung Affinity Prediction | KEPLA:一个知识强化的更深层学习框架,用于准确预测蛋白-银-捆绑性近亲关系 2506.13196v3 |
Authors (7): Han Liu, Keyan Ding, Peilin Chen, Yinwei Wei, Liqiang Nie, Dapeng Wu, Shiqi Wang
Accurate prediction of protein-ligand binding affinity is critical for drug discovery. While recent deep learning approaches have demonstrated promising results, they often rely solely on structural features of proteins and ligands, overlooking their valuable biochemical knowledge associated with binding affinity. To address this limitation, we propose KEPLA, a novel deep learning framework that explicitly integrates prior knowledge from Gene Ontology and ligand properties to enhance prediction performance. KEPLA takes protein sequences and ligand molecular graphs as input and optimizes two complementary objectives: (1) aligning global representations with knowledge graph relations to capture domain-specific biochemical insights, and (2) leveraging cross attention between local representations to construct fine-grained joint embeddings for prediction. Experiments on two benchmark datasets across both in-domain and cross-domain scenarios demonstrate that KEPLA consistently outperforms state-of-the-art baselines. Furthermore, interpretability analyses based on knowledge graph relations and cross attention maps provide valuable insights into the underlying predictive mechanisms.
nan
Article 1068
Title@2025-07-18 (5): Differential Privacy in Kernelized Contextual Bandits via Random Projections
Title: Differential Privacy in Kernelized Contextual Bandits via Random Projections | Differentielle Privatsphäre in Kernelisierten Kontext Bandits über Random Projektionen | 通过随机预测在核心环境强盗中的不同隐私 2507.13639v1 |
Authors (3): Nikola Pavlovic, Sudeep Salgia, Qing Zhao
We consider the problem of contextual kernel bandits with stochastic contexts, where the underlying reward function belongs to a known Reproducing Kernel Hilbert Space. We study this problem under an additional constraint of Differential Privacy, where the agent needs to ensure that the sequence of query points is differentially private with respect to both the sequence of contexts and rewards. We propose a novel algorithm that achieves the state-of-the-art cumulative regret of $\widetilde{\mathcal{O}}(\sqrt{\gamma_TT}+\frac{\gamma_T}{\varepsilon_{\mathrm{DP}}})$ and $\widetilde{\mathcal{O}}(\sqrt{\gamma_TT}+\frac{\gamma_T\sqrt{T}}{\varepsilon_{\mathrm{DP}}})$ over a time horizon of $T$ in the joint and local models of differential privacy, respectively, where $\gamma_T$ is the effective dimension of the kernel and $\varepsilon_{\mathrm{DP}} > 0$ is the privacy parameter. The key ingredient of the proposed algorithm is a novel private kernel-ridge regression estimator which is based on a combination of private covariance estimation and private random projections. It offers a significantly reduced sensitivity compared to its classical counterpart while maintaining a high prediction accuracy, allowing our algorithm to achieve the state-of-the-art performance guarantees.
nan
Article 1069
Title@2025-07-18 (5): State Space Models Naturally Produce Traveling Waves, Time Cells, and Scale to Abstract Cognitive Functions
Title: State Space Models Naturally Produce Traveling Waves, Time Cells, and Scale to Abstract Cognitive Functions | State Space Models erzeugen natürlich reisende Wellen, Zeitzellen und skalieren zu abstrakten kognitiven Funktionen | 自然产生旅行波、时格和按抽象认知功能的尺度衡量的自然生成空间模型 2507.13638v1 |
Authors (6): Sen Lu, Xiaoyu Zhang, Mingtao Hu, Eric Yeu-Jer Lee, Soohyeon Kim, Wei D. Lu
A grand challenge in modern neuroscience is to bridge the gap between the detailed mapping of microscale neural circuits and a mechanistic understanding of cognitive functions. While extensive knowledge exists about neuronal connectivity and biophysics, a significant gap remains in how these elements combine to produce flexible, learned behaviors. Here, we propose that a framework based on State-Space Models (SSMs), an emerging class of deep learning architectures, can bridge this gap. We argue that the differential equations governing elements in an SSM are conceptually consistent with the biophysical dynamics of neurons, while the combined dynamics in the model lead to emergent behaviors observed in experimental neuroscience. We test this framework by training an S5 model–a specific SSM variant employing a diagonal state transition matrix–on temporal discrimination tasks with reinforcement learning (RL). We demonstrate that the model spontaneously develops neural representations that strikingly mimic biological ‘time cells’. We reveal that these cells emerge from a simple generative principle: learned rotational dynamics of hidden state vectors in the complex plane. This single mechanism unifies the emergence of time cells, ramping activity, and oscillations/traveling waves observed in numerous experiments. Furthermore, we show that this rotational dynamics generalizes beyond interval discriminative tasks to abstract event-counting tasks that were considered foundational for performing complex cognitive tasks. Our findings position SSMs as a compelling framework that connects single-neuron dynamics to cognitive phenomena, offering a unifying and computationally tractable theoretical ground for temporal learning in the brain.
nan
Article 1070
Title@2025-07-18 (5): Large Language Models in Cybersecurity: Applications, Vulnerabilities, and Defense Techniques
Title: Large Language Models in Cybersecurity: Applications, Vulnerabilities, and Defense Techniques | Große Sprachmodelle in Cybersecurity: Anwendungen, Schwachstellen und Verteidigungstechniken | 网络安全大语言模式:应用、脆弱性和国防技术 2507.13629v1 |
Authors (3): Niveen O. Jaffal, Mohammed Alkhanafseh, David Mohaisen
Large Language Models (LLMs) are transforming cybersecurity by enabling intelligent, adaptive, and automated approaches to threat detection, vulnerability assessment, and incident response. With their advanced language understanding and contextual reasoning, LLMs surpass traditional methods in tackling challenges across domains such as IoT, blockchain, and hardware security. This survey provides a comprehensive overview of LLM applications in cybersecurity, focusing on two core areas: (1) the integration of LLMs into key cybersecurity domains, and (2) the vulnerabilities of LLMs themselves, along with mitigation strategies. By synthesizing recent advancements and identifying key limitations, this work offers practical insights and strategic recommendations for leveraging LLMs to build secure, scalable, and future-ready cyber defense systems.
nan
Article 1071
Title@2025-07-18 (5): FedSkipTwin: Digital-Twin-Guided Client Skipping for Communication-Efficient Federated Learning
Title: FedSkipTwin: Digital-Twin-Guided Client Skipping for Communication-Efficient Federated Learning | FedSkipTwin: Digital-Twin-geführter Client Skipping für kommunikatives und effizientes Federated Learning | FedSkipTwin: 数字双向指导客户跳过客户端, 用于沟通高效的联邦学习 2507.13624v1 |
Authors (4): Daniel Commey, Kamel Abbad, Garth V. Crosby, Lyes Khoukhi
Communication overhead remains a primary bottleneck in federated learning (FL), particularly for applications involving mobile and IoT devices with constrained bandwidth. This work introduces FedSkipTwin, a novel client-skipping algorithm driven by lightweight, server-side digital twins. Each twin, implemented as a simple LSTM, observes a client’s historical sequence of gradient norms to forecast both the magnitude and the epistemic uncertainty of its next update. The server leverages these predictions, requesting communication only when either value exceeds a predefined threshold; otherwise, it instructs the client to skip the round, thereby saving bandwidth. Experiments are conducted on the UCI-HAR and MNIST datasets with 10 clients under a non-IID data distribution. The results demonstrate that FedSkipTwin reduces total communication by 12-15.5% across 20 rounds while simultaneously improving final model accuracy by up to 0.5 percentage points compared to the standard FedAvg algorithm. These findings establish that prediction-guided skipping is a practical and effective strategy for resource-aware FL in bandwidth-constrained edge environments.
nan
Article 1072
Title@2025-07-18 (5): Deep Q-Learning with Gradient Target Tracking
Title: Deep Q-Learning with Gradient Target Tracking | Deep Q-Learning mit gradientem Target Tracking | 与渐进目标跟踪进行深度学习 2503.16700v3 |
Authors (3): Bum Geun Park, Taeho Lee, Donghwan Lee
This paper introduces Q-learning with gradient target tracking, a novel reinforcement learning framework that provides a learned continuous target update mechanism as an alternative to the conventional hard update paradigm. In the standard deep Q-network (DQN), the target network is a copy of the online network’s weights, held fixed for a number of iterations before being periodically replaced via a hard update. While this stabilizes training by providing consistent targets, it introduces a new challenge: the hard update period must be carefully tuned to achieve optimal performance. To address this issue, we propose two gradient-based target update methods: DQN with asymmetric gradient target tracking (AGT2-DQN) and DQN with symmetric gradient target tracking (SGT2-DQN). These methods replace the conventional hard target updates with continuous and structured updates using gradient descent, which effectively eliminates the need for manual tuning. We provide a theoretical analysis proving the convergence of these methods in tabular settings. Additionally, empirical evaluations demonstrate their advantages over standard DQN baselines, which suggest that gradient-based target updates can serve as an effective alternative to conventional target update mechanisms in Q-learning.
nan
Article 1073
Title@2025-07-18 (5): ZKP-FedEval: Verifiable and Privacy-Preserving Federated Evaluation using Zero-Knowledge Proofs
Title: ZKP-FedEval: Verifiable and Privacy-Preserving Federated Evaluation using Zero-Knowledge Proofs | ZKP-FedEval: Überprüfbare und datenschutzschonende Federated Evaluation mit Null-Wissensnachweisen | ZKP-FedEval:使用零知识证明进行可核查和隐私保护的联邦评价 2507.11649v2 |
Authors (4): Daniel Commey, Benjamin Appiah, Griffith S. Klogo, Garth V. Crosby
Federated Learning (FL) enables collaborative model training on decentralized data without exposing raw data. However, the evaluation phase in FL may leak sensitive information through shared performance metrics. In this paper, we propose a novel protocol that incorporates Zero-Knowledge Proofs (ZKPs) to enable privacy-preserving and verifiable evaluation for FL. Instead of revealing raw loss values, clients generate a succinct proof asserting that their local loss is below a predefined threshold. Our approach is implemented without reliance on external APIs, using self-contained modules for federated learning simulation, ZKP circuit design, and experimental evaluation on both the MNIST and Human Activity Recognition (HAR) datasets. We focus on a threshold-based proof for a simple Convolutional Neural Network (CNN) model (for MNIST) and a multi-layer perceptron (MLP) model (for HAR), and evaluate the approach in terms of computational overhead, communication cost, and verifiability.
nan
Article 1074
Title@2025-07-18 (5): Merge Kernel for Bayesian Optimization on Permutation Space
Title: Merge Kernel for Bayesian Optimization on Permutation Space | Zusammenführen Kernel für Bayesian Optimierung auf Permutationsraum | Bayesian Permodation 空间优化合并核心圈 2507.13263v2 |
Authors (2): Zikai Xie, Linjiang Chen
Bayesian Optimization (BO) algorithm is a standard tool for black-box optimization problems. The current state-of-the-art BO approach for permutation spaces relies on the Mallows kernel-an $\Omega(n^2)$ representation that explicitly enumerates every pairwise comparison. Inspired by the close relationship between the Mallows kernel and pairwise comparison, we propose a novel framework for generating kernel functions on permutation space based on sorting algorithms. Within this framework, the Mallows kernel can be viewed as a special instance derived from bubble sort. Further, we introduce the \textbf{Merge Kernel} constructed from merge sort, which replaces the quadratic complexity with $\Theta(n\log n)$ to achieve the lowest possible complexity. The resulting feature vector is significantly shorter, can be computed in linearithmic time, yet still efficiently captures meaningful permutation distances. To boost robustness and right-invariance without sacrificing compactness, we further incorporate three lightweight, task-agnostic descriptors: (1) a shift histogram, which aggregates absolute element displacements and supplies a global misplacement signal; (2) a split-pair line, which encodes selected long-range comparisons by aligning elements across the two halves of the whole permutation; and (3) sliding-window motifs, which summarize local order patterns that influence near-neighbor objectives. Our empirical evaluation demonstrates that the proposed kernel consistently outperforms the state-of-the-art Mallows kernel across various permutation optimization benchmarks. Results confirm that the Merge Kernel provides a more compact yet more effective solution for Bayesian optimization in permutation space.
nan
Article 1075
Title@2025-07-18 (5): Off-Policy Evaluation and Learning for Matching Markets
Title: Off-Policy Evaluation and Learning for Matching Markets | Off-Policy-Evaluierung und -Lernen für Matching-Märkte | 非政策评价和学习以匹配市场 2507.13608v1 |
Authors (3): Yudai Hayashi, Shuhei Goda, Yuta Saito
Matching users based on mutual preferences is a fundamental aspect of services driven by reciprocal recommendations, such as job search and dating applications. Although A/B tests remain the gold standard for evaluating new policies in recommender systems for matching markets, it is costly and impractical for frequent policy updates. Off-Policy Evaluation (OPE) thus plays a crucial role by enabling the evaluation of recommendation policies using only offline logged data naturally collected on the platform. However, unlike conventional recommendation settings, the large scale and bidirectional nature of user interactions in matching platforms introduce variance issues and exacerbate reward sparsity, making standard OPE methods unreliable. To address these challenges and facilitate effective offline evaluation, we propose novel OPE estimators, \textit{DiPS} and \textit{DPR}, specifically designed for matching markets. Our methods combine elements of the Direct Method (DM), Inverse Propensity Score (IPS), and Doubly Robust (DR) estimators while incorporating intermediate labels, such as initial engagement signals, to achieve better bias-variance control in matching markets. Theoretically, we derive the bias and variance of the proposed estimators and demonstrate their advantages over conventional methods. Furthermore, we show that these estimators can be seamlessly extended to offline policy learning methods for improving recommendation policies for making more matches. We empirically evaluate our methods through experiments on both synthetic data and A/B testing logs from a real job-matching platform. The empirical results highlight the superiority of our approach over existing methods in off-policy evaluation and learning tasks for a variety of configurations.
nan
Article 1076
Title@2025-07-18 (5): Improving Low-Cost Teleoperation: Augmenting GELLO with Force
Title: Improving Low-Cost Teleoperation: Augmenting GELLO with Force | Verbesserung der Low-Cost-Teleoperation: GELLO mit Kraft erweitern | 改进低费技术合作:加强GELLLO 2507.13602v1 |
Authors (5): Shivakanth Sujit, Luca Nunziante, Dan Ogawa Lillrank, Rousslan Fernand Julien Dossa, Kai Arulkumaran
In this work we extend the low-cost GELLO teleoperation system, initially designed for joint position control, with additional force information. Our first extension is to implement force feedback, allowing users to feel resistance when interacting with the environment. Our second extension is to add force information into the data collection process and training of imitation learning models. We validate our additions by implementing these on a GELLO system with a Franka Panda arm as the follower robot, performing a user study, and comparing the performance of policies trained with and without force information on a range of simulated and real dexterous manipulation tasks. Qualitatively, users with robotics experience preferred our controller, and the addition of force inputs improved task success on the majority of tasks.
nan
Article 1077
Title@2025-07-18 (5): Position: Untrained Machine Learning for Anomaly Detection by using 3D Point Cloud Data
Title: Position: Untrained Machine Learning for Anomaly Detection by using 3D Point Cloud Data | Position: Untrainiertes maschinelles Lernen zur Erkennung von Anomalien durch Verwendung von 3D-Punkt-Cloud-Daten | 位置: 使用 3D 点云数据进行异常检测的未经训练的机器学习 2502.03876v2 |
Authors (2): Juan Du, Dongheng Chen
Anomaly detection based on 3D point cloud data is an important research problem and receives more and more attention recently. Untrained anomaly detection based on only one sample is an emerging research problem motivated by real manufacturing industries such as personalized manufacturing where only one sample can be collected without any additional labels and historical datasets. Identifying anomalies accurately based on one 3D point cloud sample is a critical challenge in both industrial applications and the field of machine learning. This paper aims to provide a formal definition of the untrained anomaly detection problem based on 3D point cloud data, discuss the differences between untrained anomaly detection and current unsupervised anomaly detection problems. Unlike trained unsupervised learning, untrained unsupervised learning does not rely on any data, including unlabeled data. Instead, they leverage prior knowledge about the surfaces and anomalies. We propose three complementary methodological frameworks: the Latent Variable Inference Framework that employs probabilistic modeling to distinguish anomalies; the Decomposition Framework that separates point clouds into reference, anomaly, and noise components through sparse learning; and the Local Geometry Framework that leverages neighborhood information for anomaly identification. Experimental results demonstrate that untrained methods achieve competitive detection performance while offering significant computational advantages, demonstrating up to a 15-fold increase in execution speed. The proposed methods provide viable solutions for scenarios with extreme data scarcity, addressing critical challenges in personalized manufacturing and healthcare applications where collecting multiple samples or historical data is infeasible.
nan
Article 1078
Title@2025-07-18 (5): Accelerating RF Power Amplifier Design via Intelligent Sampling and ML-Based Parameter Tuning
Title: Accelerating RF Power Amplifier Design via Intelligent Sampling and ML-Based Parameter Tuning | Beschleunigung des RF-Leistungsverstärkers über intelligente Probenahme und ML-basierte Parameter-Tuning | 通过智能取样和以 ML 为基础的参数图集加速 RF 功率放大器设计 2507.11928v2 |
Authors (2): Abhishek Sriram, Neal Tuffy
This paper presents a machine learning-accelerated optimization framework for RF power amplifier design that reduces simulation requirements by 65% while maintaining $\pm0.4$ dBm accuracy for the majority of the modes. The proposed method combines MaxMin Latin Hypercube Sampling with CatBoost gradient boosting to intelligently explore multidimensional parameter spaces. Instead of exhaustively simulating all parameter combinations to achieve target P2dB compression specifications, our approach strategically selects approximately 35% of critical simulation points. The framework processes ADS netlists, executes harmonic balance simulations on the reduced dataset, and trains a CatBoost model to predict P2dB performance across the entire design space. Validation across 15 PA operating modes yields an average $R^2$ of 0.901, with the system ranking parameter combinations by their likelihood of meeting target specifications. The integrated solution delivers 58.24% to 77.78% reduction in simulation time through automated GUI-based workflows, enabling rapid design iterations without compromising accuracy standards required for production RF circuits.
nan
Article 1079
Title@2025-07-18 (5): GIFT: Gradient-aware Immunization of diffusion models against malicious Fine-Tuning with safe concepts retention
Title: GIFT: Gradient-aware Immunization of diffusion models against malicious Fine-Tuning with safe concepts retention | GIFT: Gradient-aware Immunisierung von Diffusionsmodellen gegen bösartiges Fein-Tuning mit sicherer Konzeptbindung | GIFT: 逐步对防止恶意微调的传播模式进行逐步免疫免疫,并保留安全概念 2507.13598v1 |
Authors (6): Amro Abdalla, Ismail Shaheen, Dan DeGenaro, Rupayan Mallick, Bogdan Raita, Sarah Adel Bargal
We present GIFT: a {G}radient-aware {I}mmunization technique to defend diffusion models against malicious {F}ine-{T}uning while preserving their ability to generate safe content. Existing safety mechanisms like safety checkers are easily bypassed, and concept erasure methods fail under adversarial fine-tuning. GIFT addresses this by framing immunization as a bi-level optimization problem: the upper-level objective degrades the model’s ability to represent harmful concepts using representation noising and maximization, while the lower-level objective preserves performance on safe data. GIFT achieves robust resistance to malicious fine-tuning while maintaining safe generative quality. Experimental results show that our method significantly impairs the model’s ability to re-learn harmful concepts while maintaining performance on safe content, offering a promising direction for creating inherently safer generative models resistant to adversarial fine-tuning attacks.
nan
Article 1080
Title@2025-07-18 (5): An Empirical Risk Minimization Approach for Offline Inverse RL and Dynamic Discrete Choice Model
Title: An Empirical Risk Minimization Approach for Offline Inverse RL and Dynamic Discrete Choice Model | Ein empirischer Risikominimierungsansatz für Offline-Inverse-RL- und Dynamische Diskrete-Choice-Modell | 离线反转转流和动态分辨选择模式的经验风险最小化办法 2502.14131v4 |
Authors (3): Enoch H. Kang, Hema Yoganarasimhan, Lalit Jain
We study the problem of estimating Dynamic Discrete Choice (DDC) models, also known as offline Maximum Entropy-Regularized Inverse Reinforcement Learning (offline MaxEnt-IRL) in machine learning. The objective is to recover reward or $Q^*$ functions that govern agent behavior from offline behavior data. In this paper, we propose a globally convergent gradient-based method for solving these problems without the restrictive assumption of linearly parameterized rewards. The novelty of our approach lies in introducing the Empirical Risk Minimization (ERM) based IRL/DDC framework, which circumvents the need for explicit state transition probability estimation in the Bellman equation. Furthermore, our method is compatible with non-parametric estimation techniques such as neural networks. Therefore, the proposed method has the potential to be scaled to high-dimensional, infinite state spaces. A key theoretical insight underlying our approach is that the Bellman residual satisfies the Polyak-Lojasiewicz (PL) condition – a property that, while weaker than strong convexity, is sufficient to ensure fast global convergence guarantees. Through a series of synthetic experiments, we demonstrate that our approach consistently outperforms benchmark methods and state-of-the-art alternatives.
nan
Article 1081
Title@2025-07-18 (5): BLAST: A Stealthy Backdoor Leverage Attack against Cooperative Multi-Agent Deep Reinforcement Learning based Systems
Title: BLAST: A Stealthy Backdoor Leverage Attack against Cooperative Multi-Agent Deep Reinforcement Learning based Systems | BLAST: Eine stealthy Hintertür Hebelangriff gegen kooperative Multi-Agent Deep-Verstärkung-Learning-basierte Systeme | BLAST:对基于合作的多机构加强深层强化学习系统的隐秘后门利用攻击 2501.01593v2 |
Authors (6): Jing Fang, Saihao Yan, Xueyu Yin, Yinbo Yu, Chunwei Tian, Jiajia Liu
Recent studies have shown that cooperative multi-agent deep reinforcement learning (c-MADRL) is under the threat of backdoor attacks. Once a backdoor trigger is observed, it will perform malicious actions leading to failures or malicious goals. However, existing backdoor attacks suffer from several issues, e.g., instant trigger patterns lack stealthiness, the backdoor is trained or activated by an additional network, or all agents are backdoored. To this end, in this paper, we propose a novel backdoor leverage attack against c-MADRL, BLAST, which attacks the entire multi-agent team by embedding the backdoor only in a single agent. Firstly, we introduce adversary spatiotemporal behavior patterns as the backdoor trigger rather than manual-injected fixed visual patterns or instant status and control the period to perform malicious actions. This method can guarantee the stealthiness and practicality of BLAST. Secondly, we hack the original reward function of the backdoor agent via unilateral guidance to inject BLAST, so as to achieve the \textit{leverage attack effect} that can pry open the entire multi-agent system via a single backdoor agent. We evaluate our BLAST against 3 classic c-MADRL algorithms (VDN, QMIX, and MAPPO) in 2 popular c-MADRL environments (SMAC and Pursuit), and 2 existing defense mechanisms. The experimental results demonstrate that BLAST can achieve a high attack success rate while maintaining a low clean performance variance rate.
nan
Article 1082
Title@2025-07-18 (5): AI-Accelerated Flow Simulation: A Robust Auto-Regressive Framework for Long-Term CFD Forecasting
Title: AI-Accelerated Flow Simulation: A Robust Auto-Regressive Framework for Long-Term CFD Forecasting | KI-beschleunigte Flusssimulation: Robustes Auto-Regressives Framework für langfristige CFD-Prognose | AI-加速流动模拟:长期CFD预测的强有力的自动递减框架 2412.05657v3 |
Authors (3): Sunwoong Yang, Ricardo Vinuesa, Namwoo Kang
This study addresses the critical challenge of error accumulation in spatio-temporal auto-regressive (AR) predictions within scientific machine learning models by exploring temporal integration schemes and adaptive multi-step rollout strategies. We introduce the first implementation of the two-step Adams-Bashforth method specifically tailored for data-driven AR prediction, leveraging historical derivative information to enhance numerical stability without additional computational overhead. To validate our approach, we systematically evaluate time integration schemes across canonical 2D PDEs before extending to complex Navier-Stokes cylinder vortex shedding dynamics. Additionally, we develop three novel adaptive weighting strategies that dynamically adjust the importance of different future time steps during multi-step rollout training. Our analysis reveals that as physical complexity increases, such sophisticated rollout techniques become essential, with the Adams-Bashforth scheme demonstrating consistent robustness across investigated systems and our best adaptive approach delivering an 89% improvement over conventional fixed-weight methods while maintaining similar computational costs. For the complex Navier-Stokes vortex shedding problem, despite using an extremely lightweight graph neural network with just 1,177 trainable parameters and training on only 50 snapshots, our framework accurately predicts 350 future time steps reducing mean squared error from 0.125 (single-step direct prediction) to 0.002 (Adams-Bashforth with proposed multi-step rollout). Our integrated methodology demonstrates an 83% improvement over standard noise injection techniques and maintains robustness under severe spatial constraints; specifically, when trained on only a partial spatial domain, it still achieves 58% and 27% improvements over direct prediction and forward Euler methods, respectively.
nan
Article 1083
Title@2025-07-18 (5): FuSeFL: Fully Secure and Scalable Cross-Silo Federated Learning
Title: FuSeFL: Fully Secure and Scalable Cross-Silo Federated Learning | FuSeFL: Vollsicheres und skalierbares Cross-Silo-Federated Learning | FFSFL: 完全安全和可缩放的跨西罗联邦学习 2507.13591v1 |
Authors (2): Sahar Ghoflsaz Ghinani, Elaheh Sadredini
Federated Learning (FL) enables collaborative model training without centralizing client data, making it attractive for privacy-sensitive domains. While existing approaches employ cryptographic techniques such as homomorphic encryption, differential privacy, or secure multiparty computation to mitigate inference attacks-including model inversion, membership inference, and gradient leakage-they often suffer from high computational, communication, or memory overheads. Moreover, many methods overlook the confidentiality of the global model itself, which may be proprietary and sensitive. These challenges limit the practicality of secure FL, especially in cross-silo deployments involving large datasets and strict compliance requirements. We present FuSeFL, a fully secure and scalable FL scheme designed for cross-silo settings. FuSeFL decentralizes training across client pairs using lightweight secure multiparty computation (MPC), while confining the server’s role to secure aggregation. This design eliminates server bottlenecks, avoids data offloading, and preserves full confidentiality of data, model, and updates throughout training. FuSeFL defends against inference threats, achieves up to 95% lower communication latency and 50% lower server memory usage, and improves accuracy over prior secure FL solutions, demonstrating strong security and efficiency at scale.
nan
Article 1084
Title@2025-07-18 (5): A million-scale dataset and generalizable foundation model for nanomaterial-protein interactions
Title: A million-scale dataset and generalizable foundation model for nanomaterial-protein interactions | Ein millionengroßes Datensatz- und verallgemeinerbares Fundamentmodell für Nanomaterial-Protein-Interaktionen | 关于纳米材料-蛋白质相互作用的百万尺度数据集和通用基础模型 2507.14245v1 |
Authors (6): Hengjie Yu, Kenneth A. Dawson, Haiyun Yang, Shuya Liu, Yan Yan, Yaochu Jin
Unlocking the potential of nanomaterials in medicine and environmental science hinges on understanding their interactions with proteins, a complex decision space where AI is poised to make a transformative impact. However, progress has been hindered by limited datasets and the restricted generalizability of existing models. Here, we propose NanoPro-3M, the largest nanomaterial-protein interaction dataset to date, comprising over 3.2 million samples and 37,000 unique proteins. Leveraging this, we present NanoProFormer, a foundational model that predicts nanomaterial-protein affinities through multimodal representation learning, demonstrating strong generalization, handling missing features, and unseen nanomaterials or proteins. We show that multimodal modeling significantly outperforms single-modality approaches and identifies key determinants of corona formation. Furthermore, we demonstrate its applicability to a range of downstream tasks through zero-shot inference and fine-tuning. Together, this work establishes a solid foundation for high-performance and generalized prediction of nanomaterial-protein interaction endpoints, reducing experimental reliance and accelerating various in vitro applications.
nan
Article 1085
Title@2025-07-17 (4): Learning Pluralistic User Preferences through Reinforcement Learning Fine-tuned Summaries
Title: Learning Pluralistic User Preferences through Reinforcement Learning Fine-tuned Summaries | Pluralistische Benutzerpräferenzen durch Verstärkung lernen Feinabstimmungen lernen | 通过强化学习,精调简编,通过强化学习,提供多元学习用户首选 2507.13579v1 |
Authors (5): Hyunji Nam, Yanming Wan, Mickel Liu, Jianxun Lian, Natasha Jaques
As everyday use cases of large language model (LLM) AI assistants have expanded, it is becoming increasingly important to personalize responses to align to different users’ preferences and goals. While reinforcement learning from human feedback (RLHF) is effective at improving LLMs to be generally more helpful and fluent, it does not account for variability across users, as it models the entire user population with a single reward model. We present a novel framework, Preference Learning Using Summarization (PLUS), that learns text-based summaries of each user’s preferences, characteristics, and past conversations. These summaries condition the reward model, enabling it to make personalized predictions about the types of responses valued by each user. We train the user-summarization model with reinforcement learning, and update the reward model simultaneously, creating an online co-adaptation loop. We show that in contrast with prior personalized RLHF techniques or with in-context learning of user information, summaries produced by PLUS capture meaningful aspects of a user’s preferences. Across different pluralistic user datasets, we show that our method is robust to new users and diverse conversation topics. Additionally, we demonstrate that the textual summaries generated about users can be transferred for zero-shot personalization of stronger, proprietary models like GPT-4. The resulting user summaries are not only concise and portable, they are easy for users to interpret and modify, allowing for more transparency and user control in LLM alignment.
nan
Article 1086
Title@2025-07-17 (4): Apple Intelligence Foundation Language Models: Tech Report 2025
Title: Apple Intelligence Foundation Language Models: Tech Report 2025 | Apple Intelligence Foundation Sprachmodelle: Tech Report 2025 | 苹果情报基金会语言模式:2025年技术报告 2507.13575v1 |
Authors (395): Hanzhi Zhou, Erik Hornberger, Pengsheng Guo, Xiyou Zhou, Saiwen Wang, Xin Wang, Yifei He, Xuankai Chang, Rene Rauch, Louis D’hauwe, John Peebles, Alec Doane, Kohen Chia, Jenna Thibodeau, Zi-Yi Dou, Yuanyang Zhang, Ruoming Pang, Reed Li, Zhifeng Chen, Jeremy Warner, Zhaoyang Xu, Sophy Lee, David Mizrahi, Ramsey Tantawi, Chris Chaney, Kelsey Peterson, Jun Qin, Alex Dombrowski, Mira Chiang, Aiswarya Raghavan, Gerard Casamayor, Qibin Chen, Aonan Zhang, Nathalie Tran, Jianyu Wang, Hang Su, Thomas Voice, Alessandro Pappalardo, Brycen Wershing, Prasanth Yadla, Rui Li, Priyal Chhatrapati, Ismael Fernandez, Yusuf Goren, Xin Zheng, Forrest Huang, Tao Lei, Eray Yildiz, Alper Kokmen, Gokul Santhanam, Areeba Kamal, Kaan Elgin, Dian Ang Yap, Jeremy Liu, Peter Gray, Howard Xing, Kieran Liu, Matteo Ronchi, Moritz Schwarzer-Becker, Yun Zhu, Mandana Saebi, Jeremy Snow, David Griffiths, Guillaume Tartavel, Erin Feldman, Simon Lehnerer, Fernando Bermúdez-Medina, Hans Han, Joe Zhou, Xiaoyi Ren, Sujeeth Reddy, Zirui Wang, Tom Gunter, Albert Antony, Yuanzhi Li, John Dennison, Tony Sun, Yena Han, Yi Qin, Sam Davarnia, Jeffrey Bigham, Wayne Shan, Hannah Gillis Coleman, Guillaume Klein, Peng Liu, Muyang Yu, Jack Cackler, Yuan Gao, Crystal Xiao, Binazir Karimzadeh, Zhengdong Zhang, Felix Bai, Albin Madappally Jose, Feng Nan, Nazir Kamaldin, Dong Yin, Hans Hao, Yanchao Sun, Yi Hua, Charles Maalouf, Alex Guillen Garcia, Guoli Yin, Lezhi Li, Mohana Prasad Sathya Moorthy, Hongbin Gao, Jay Tang, Joanna Arreaza-Taylor, Faye Lao, Carina Peng, Josh Shaffer, Dan Masi, Sushma Rao, Tommi Vehvilainen, Senyu Tong, Dongcai Shen, Yang Zhao, Chris Bartels, Peter Fu, Qingqing Cao, Christopher Neubauer, Ethan Li, Mingfei Gao, Rebecca Callahan, Richard Wei, Patrick Dong, Alex Braunstein, Sachin Ravi, Adolfo Lopez Mendez, Kaiwei Huang, Kun Duan, Haoshuo Huang, Rui Qian, Stefano Ligas, Jordan Huffaker, Dongxu Li, Bailin Wang, Nanzhu Wang, Anuva Agarwal, Tait Madsen, Josh Newnham, Abhishek Sharma, Zhile Ren, Deepak Gopinath, Erik Daxberger, Saptarshi Guha, Oron Levy, Jing Lu, Nan Dun, Marc Kirchner, Yinfei Yang, Manjot Bilkhu, Dave Nelson, Anthony Spalvieri-Kruse, Juan Lao Tebar, Yang Xu, Phani Mutyala, Gabriel Jacoby-Cooper, Yingbo Wang, Karla Vega, Vishaal Mahtani, Darren Botten, Eric Wang, Hanli Li, Matthias Paulik, Haoran Yan, Navid Shiee, Yihao Qian, Bugu Wu, Qi Zhu, Ob Adaranijo, Bhuwan Dhingra, Zhe Gan, Nicholas Seidl, Grace Duanmu, Rong Situ, Yiping Ma, Yin Xia, David Riazati, Vasileios Saveris, Anh Nguyen, Michael, Lee, Patrick Sonnenberg, Chinguun Erdenebileg, Yanghao Li, Vivian Ma, James Chou, Isha Garg, Mark Lee, Keen You, Yuhong Li, Ransen Niu, Nandhitha Raghuram, Pulkit Agrawal, Henry Mason, Sumeet Singh, Keyu He, Hong-You Chen, Lucas Guibert, Shiyu Li, Varsha Paidi, Narendran Raghavan, Mingze Xu, Yuli Yang, Sergiu Sima, Irina Belousova, Sprite Chu, Afshin Dehghan, Philipp Dufter, David Haldimann, Zhen Yang, Margit Bowler, Chang Liu, Ying-Chang Cheng, Vivek Rathod, Syd Evans, Wilson Tsao, Dustin Withers, Haitian Sun, Biyao Wang, Peter Grasch, Walker Cheng, Yihao Feng, Vivek Kumar, Frank Chu, Victoria MönchJuan Haladjian, Doug Kang, Jiarui Lu, Ciro Sannino, Max Lam, Floris Weers, Bowen Pan, Kenneth Jung, Dhaval Doshi, Fangping Shi, Olli Saarikivi, Alp Aygar, Josh Elman, Cheng Leong, Eshan Verma, Matthew Lei, Jeff Nichols, Jiulong Shan, Donald Zhang, Lawrence Zhou, Stephen Murphy, Xianzhi Du, Chang Lan, Ankur Jain, Elmira Amirloo, Marcin Eichner, Naomy Sabo, Anupama Mann Anupama, David Qiu, Zhao Meng, Michael FitzMaurice, Peng Zhang, Simon Yeung, Chen Chen, Marco Zuliani, Andrew Hansen, Yang Lu, Brent Ramerth, Ziyi Zhong, Parsa Mazaheri, Matthew Hopkins, Mengyu Li, Simon Wang, David Chen, Farzin Rasteh, Chong Wang, Josh Gardner, Asaf Liberman, Haoxuan You, Andrew Walkingshaw, Xingyu Zhou, Jinhao Lei, Yan Meng, Quentin Keunebroek, Sam Wiseman, Anders Boesen Lindbo Larsen, Yi Zhang, Zaid Ahmed, Haiming Gang, Aaron Franklin, Kelvin Zou, Guillaume Seguin, Jonathan Janke, Rachel Burger, Co Giang, Cheng Shen, Jen Liu, Sanskruti Shah, Xiang Kong, Yiran Fei, TJ Collins, Chen Zhang, Zhiyun Lu, Michael Booker, Qin Ba, Yasutaka Tanaka, Andres Romero Mier Y Teran, Federico Scozzafava, Regan Poston, Jane Li, Eduardo Jimenez, Bas Straathof, Karanjeet Singh, Lindsay Hislop, Rajat Arora, Deepa Seshadri, Boyue Li, Colorado Reed, Zhen Li, TJ Lu, Yi Wang, Kaelen Haag, Nicholas Lusskin, Raunak Sinha, Rahul Nair, Eldon Schoop, Mary Beth Kery, Mehrdad Farajtbar, Brenda Yang, George Horrell, Shiwen Zhao, Dhruti Shah, Cha Chen, Bowen Zhang, Chang Gao, Devi Krishna, Jennifer Mallalieu, Javier Movellan, Di Feng, Emily Zhang, Sam Xu, Junting Pan, Dominik Moritz, Suma Jayaram, Kevin Smith, Dongseong Hwang, Daniel Parilla, Jiaming Hu, You-Cyuan Jhang, Emad Soroush, Fred Hohman, Nan Du, Emma Wang, Sam Dodge, Pragnya Sridhar, Joris Pelemans, Wei Fang, Nina Wenzel, Joseph Yitan Cheng, Hadas Kotek, Chung-Cheng Chiu, Meng Cao, Haijing Fu, Ruixuan Hou, Ke Ye, Diane Zhu, Nikhil Bhendawade, Joseph Astrauskas, Jian Liu, Sai Aitharaju, Wentao Wu, Artsiom Peshko, Hyunjik Kim, Nilesh Shahdadpuri, Andy De Wang, Qi Shan, Piotr Maj, Raul Rea Menacho, Justin Lazarow, Eric Liang Yang, Arsalan Farooq, Donghan Yu, David Güera, Minsik Cho, Kavya Nerella, Yongqiang Wang, Tao Jia, John Park, Jeff Lai, Haotian Zhang, Futang Peng, Daniele Molinari, Aparna Rajamani, Tyler Johnson, Lauren Gardiner, Chao Jia, Violet Yao, Wojciech Kryscinski, Xiujun Li, Shang-Chen Wu
We introduce two multilingual, multimodal foundation language models that power Apple Intelligence features across Apple devices and services: i a 3B-parameter on-device model optimized for Apple silicon through architectural innovations such as KV-cache sharing and 2-bit quantization-aware training; and ii a scalable server model built on a novel Parallel-Track Mixture-of-Experts PT-MoE transformer that combines track parallelism, mixture-of-experts sparse computation, and interleaved global-local attention to deliver high quality with competitive cost on Apple’s Private Cloud Compute platform. Both models are trained on large-scale multilingual and multimodal datasets sourced via responsible web crawling, licensed corpora, and high-quality synthetic data, then further refined with supervised fine-tuning and reinforcement learning on a new asynchronous platform. The resulting models support several additional languages while understanding images and executing tool calls. In public benchmarks and human evaluations, both the server model and the on-device model match or surpass comparably sized open baselines. A new Swift-centric Foundation Models framework exposes guided generation, constrained tool calling, and LoRA adapter fine-tuning, allowing developers to integrate these capabilities with a few lines of code. The latest advancements in Apple Intelligence models are grounded in our Responsible AI approach with safeguards like content filtering and locale-specific evaluation, as well as our commitment to protecting our users’ privacy with innovations like Private Cloud Compute.
nan
Article 1087
Title@2025-07-17 (4): Generative Deep Learning Framework for Inverse Design of Fuels
Title: Generative Deep Learning Framework for Inverse Design of Fuels | Generatives Deep-Learning-Framework für das Inverse Design von Kraftstoffen | 燃料反向设计生成深深学习框架 2504.12075v2 |
Authors (6): Kiran K. Yalamanchi, Pinaki Pal, Balaji Mohan, Abdullah S. AlRamadan, Jihad A. Badra, Yuanjiang Pei
In the present work, a generative deep learning framework combining a Co-optimized Variational Autoencoder (Co-VAE) architecture with quantitative structure-property relationship (QSPR) techniques is developed to enable accelerated inverse design of fuels. The Co-VAE integrates a property prediction component coupled with the VAE latent space, enhancing molecular reconstruction and accurate estimation of Research Octane Number (RON) (chosen as the fuel property of interest). A subset of the GDB-13 database, enriched with a curated RON database, is used for model training. Hyperparameter tuning is further utilized to optimize the balance among reconstruction fidelity, chemical validity, and RON prediction. An independent regression model is then used to refine RON prediction, while a differential evolution algorithm is employed to efficiently navigate the VAE latent space and identify promising fuel molecule candidates with high RON. This methodology addresses the limitations of traditional fuel screening approaches by capturing complex structure-property relationships within a comprehensive latent representation. The generative model can be adapted to different target properties, enabling systematic exploration of large chemical spaces relevant to fuel design applications. Furthermore, the demonstrated framework can be readily extended by incorporating additional synthesizability criteria to improve applicability and reliability for de novo design of new fuels.
nan
Article 1088
Title@2025-07-17 (4): Understanding Reasoning in Thinking Language Models via Steering Vectors
Title: Understanding Reasoning in Thinking Language Models via Steering Vectors | Verständnis von Vernunft im Denken von Sprachmodellen über Lenkungs-Vektoren | 通过指导矢量来理解思考语言模式的理由 2506.18167v3 |
Authors (5): Constantin Venhoff, Iván Arcuschin, Philip Torr, Arthur Conmy, Neel Nanda
Recent advances in large language models (LLMs) have led to the development of thinking language models that generate extensive internal reasoning chains before producing responses. While these models achieve improved performance, controlling their reasoning processes remains challenging. This work presents a steering approach for thinking LLMs by analyzing and manipulating specific reasoning behaviors in DeepSeek-R1-Distill models. Through a systematic experiment on 500 tasks across 10 diverse categories, we identify several reasoning behaviors exhibited by thinking models, including expressing uncertainty, generating examples for hypothesis validation, and backtracking in reasoning chains. We demonstrate that these behaviors are mediated by linear directions in the model’s activation space and can be controlled using steering vectors. By extracting and applying these vectors, we provide a method to modulate specific aspects of the model’s reasoning process, such as its tendency to backtrack or express uncertainty. Our approach offers practical tools for steering reasoning processes in thinking models in a controlled and interpretable manner. We validate our steering method using three DeepSeek-R1-Distill models, demonstrating consistent control across different model architectures.
nan
Article 1089
Title@2025-07-17 (4): An Approach for Auto Generation of Labeling Functions for Software Engineering Chatbots
Title: An Approach for Auto Generation of Labeling Functions for Software Engineering Chatbots | Ein Ansatz zur automatischen Generierung von Beschriftungsfunktionen für Software Engineering Chatbots | 软件工程聊天器自动生成标签功能的方法 2410.07094v2 |
Authors (4): Ebube Alor, Ahmad Abdellatif, SayedHassan Khatoonabadi, Emad Shihab
Software engineering (SE) chatbots are increasingly gaining attention for their role in enhancing development processes. At the core of chatbots are Natural Language Understanding platforms (NLUs), which enable them to comprehend user queries but require labeled data for training. However, acquiring such labeled data for SE chatbots is challenging due to the scarcity of high-quality datasets, as training requires specialized vocabulary and phrases not found in typical language datasets. Consequently, developers often resort to manually annotating user queries – a time-consuming and resource-intensive process. Previous approaches require human intervention to generate rules, called labeling functions (LFs), that categorize queries based on specific patterns. To address this issue, we propose an approach to automatically generate LFs by extracting patterns from labeled user queries. We evaluate our approach on four SE datasets and measure performance improvement from training NLUs on queries labeled by the generated LFs. The generated LFs effectively label data with AUC scores up to 85.3% and NLU performance improvements up to 27.2%. Furthermore, our results show that the number of LFs affects labeling performance. We believe that our approach can save time and resources in labeling users’ queries, allowing practitioners to focus on core chatbot functionalities rather than manually labeling queries.
nan
Article 1090
Title@2025-07-17 (4): Change of Thought: Adaptive Test-Time Computation
Title: Change of Thought: Adaptive Test-Time Computation | Gedankenwechsel: Adaptive Test-Time Computation | 改变思想:适应性试验时间计算 2507.13569v1 |
Authors (4): Mrinal Mathur, Mike Doan, Barak Pearlmutter, Sergey Plis
Transformers evaluated in a single, fixed-depth pass are provably limited in expressive power to the constant-depth circuit class TC0. Running a Transformer autoregressively removes that ceiling – first in next-token prediction and, more recently, in chain-of-thought reasoning. Both regimes rely on feedback loops that decode internal states into tokens only to re-encode them in subsequent steps. While this “thinking aloud” mirrors human reasoning, biological brains iterate without externalising intermediate states as language. To boost the expressive power of encoder Transformers without resorting to token-level autoregression, we introduce the SELF-Transformer: an encoder layer that iteratively refines its own attention weights to a fixed point. Instead of producing – in one pass – the alignment matrix that remixes the input sequence, the SELF-Transformer iteratively updates that matrix internally, scaling test-time computation with input difficulty. This adaptivity yields up to 20\% accuracy gains on encoder-style benchmarks without increasing parameter count, demonstrating that input-adaptive alignment at test time offers substantial benefits for only a modest extra compute budget. Self-Transformers thus recover much of the expressive power of iterative reasoning while preserving the simplicity of pure encoder architectures.
nan
Article 1091
Title@2025-07-17 (4): Why Isn’t Relational Learning Taking Over the World?
Title: Why Isn’t Relational Learning Taking Over the World? | Warum übernimmt das relationale Lernen nicht die Welt? | 为什么关系学习不超越世界? 2507.13558v1 |
Authors (1): David Poole
AI seems to be taking over the world with systems that model pixels, words, and phonemes. The world is arguably made up, not of pixels, words, and phonemes but of entities (objects, things, including events) with properties and relations among them. Surely we should model these, not the perception or description of them. You might suspect that concentrating on modeling words and pixels is because all of the (valuable) data in the world is in terms of text and images. If you look into almost any company you will find their most valuable data is in spreadsheets, databases and other relational formats. These are not the form that are studied in introductory machine learning, but are full of product numbers, student numbers, transaction numbers and other identifiers that can’t be interpreted naively as numbers. The field that studies this sort of data has various names including relational learning, statistical relational AI, and many others. This paper explains why relational learning is not taking over the world – except in a few cases with restricted relations – and what needs to be done to bring it to it’s rightful prominence.
nan
Article 1092
Title@2025-07-17 (4): Time Series Forecastability Measures
Title: Time Series Forecastability Measures | Zeitreihen Vorausschätzungsmaßnahmen | 时间序列 时间序列 时间序列 时间序列 时间序列 时间序列 时间序列 时间序列 时间序列 时间序列 时间序列 时间序列 时间序列 2507.13556v1 |
Authors (3): Rui Wang, Steven Klee, Alexis Roos
This paper proposes using two metrics to quantify the forecastability of time series prior to model development: the spectral predictability score and the largest Lyapunov exponent. Unlike traditional model evaluation metrics, these measures assess the inherent forecastability characteristics of the data before any forecast attempts. The spectral predictability score evaluates the strength and regularity of frequency components in the time series, whereas the Lyapunov exponents quantify the chaos and stability of the system generating the data. We evaluated the effectiveness of these metrics on both synthetic and real-world time series from the M5 forecast competition dataset. Our results demonstrate that these two metrics can correctly reflect the inherent forecastability of a time series and have a strong correlation with the actual forecast performance of various models. By understanding the inherent forecastability of time series before model training, practitioners can focus their planning efforts on products and supply chain levels that are more forecastable, while setting appropriate expectations or seeking alternative strategies for products with limited forecastability.
nan
Article 1093
Title@2025-07-17 (4): Loss-Complexity Landscape and Model Structure Functions
Title: Loss-Complexity Landscape and Model Structure Functions | Verlust-Komplexität Landschaft und Modellstruktur Funktionen | 地形和模型结构功能 2507.13543v1 |
Authors (1): Alexander Kolpakov
We develop a framework for dualizing the Kolmogorov structure function $h_x(\alpha)$, which then allows using computable complexity proxies. We establish a mathematical analogy between information-theoretic constructs and statistical mechanics, introducing a suitable partition function and free energy functional. We explicitly prove the Legendre-Fenchel duality between the structure function and free energy, showing detailed balance of the Metropolis kernel, and interpret acceptance probabilities as information-theoretic scattering amplitudes. A susceptibility-like variance of model complexity is shown to peak precisely at loss-complexity trade-offs interpreted as phase transitions. Practical experiments with linear and tree-based regression models verify these theoretical predictions, explicitly demonstrating the interplay between the model complexity, generalization, and overfitting threshold.
nan
Article 1094
Title@2025-07-17 (4): Instruct-MusicGen: Unlocking Text-to-Music Editing for Music Language Models via Instruction Tuning
Title: Instruct-MusicGen: Unlocking Text-to-Music Editing for Music Language Models via Instruction Tuning | Instruct-MusicGen: Entsperren von Text-zu-Musik-Editing für Musik Sprachmodelle über Instruction Tuning | 指令-音乐Gen:通过指令调制解锁文字到音乐编辑音乐语言模型 2405.18386v3 |
Authors (10): Yixiao Zhang, Yukara Ikemiya, Woosung Choi, Naoki Murata, Marco A. Martínez-Ramírez, Liwei Lin, Gus Xia, Wei-Hsiang Liao, Yuki Mitsufuji, Simon Dixon
Recent advances in text-to-music editing, which employ text queries to modify music (e.g.\ by changing its style or adjusting instrumental components), present unique challenges and opportunities for AI-assisted music creation. Previous approaches in this domain have been constrained by the necessity to train specific editing models from scratch, which is both resource-intensive and inefficient; other research uses large language models to predict edited music, resulting in imprecise audio reconstruction. To Combine the strengths and address these limitations, we introduce Instruct-MusicGen, a novel approach that finetunes a pretrained MusicGen model to efficiently follow editing instructions such as adding, removing, or separating stems. Our approach involves a modification of the original MusicGen architecture by incorporating a text fusion module and an audio fusion module, which allow the model to process instruction texts and audio inputs concurrently and yield the desired edited music. Remarkably, Instruct-MusicGen only introduces 8% new parameters to the original MusicGen model and only trains for 5K steps, yet it achieves superior performance across all tasks compared to existing baselines, and demonstrates performance comparable to the models trained for specific tasks. This advancement not only enhances the efficiency of text-to-music editing but also broadens the applicability of music language models in dynamic music production environments.
nan
Article 1095
Title@2025-07-17 (4): Acoustic Index: A Novel AI-Driven Parameter for Cardiac Disease Risk Stratification Using Echocardiography
Title: Acoustic Index: A Novel AI-Driven Parameter for Cardiac Disease Risk Stratification Using Echocardiography | Akustischer Index: Ein neuartiger KI-getriebener Parameter für die Risikoteilung von Herzerkrankungen mittels Echokardiographie | 声学指数:使用心心电图进行心电图分析的心病风险分解的新AI-Driven参数 2507.13542v1 |
Authors (3): Beka Begiashvili, Carlos J. Fernandez-Candel, Matías Pérez Paredes
Traditional echocardiographic parameters such as ejection fraction (EF) and global longitudinal strain (GLS) have limitations in the early detection of cardiac dysfunction. EF often remains normal despite underlying pathology, and GLS is influenced by load conditions and vendor variability. There is a growing need for reproducible, interpretable, and operator-independent parameters that capture subtle and global cardiac functional alterations. We introduce the Acoustic Index, a novel AI-derived echocardiographic parameter designed to quantify cardiac dysfunction from standard ultrasound views. The model combines Extended Dynamic Mode Decomposition (EDMD) based on Koopman operator theory with a hybrid neural network that incorporates clinical metadata. Spatiotemporal dynamics are extracted from echocardiographic sequences to identify coherent motion patterns. These are weighted via attention mechanisms and fused with clinical data using manifold learning, resulting in a continuous score from 0 (low risk) to 1 (high risk). In a prospective cohort of 736 patients, encompassing various cardiac pathologies and normal controls, the Acoustic Index achieved an area under the curve (AUC) of 0.89 in an independent test set. Cross-validation across five folds confirmed the robustness of the model, showing that both sensitivity and specificity exceeded 0.8 when evaluated on independent data. Threshold-based analysis demonstrated stable trade-offs between sensitivity and specificity, with optimal discrimination near this threshold. The Acoustic Index represents a physics-informed, interpretable AI biomarker for cardiac function. It shows promise as a scalable, vendor-independent tool for early detection, triage, and longitudinal monitoring. Future directions include external validation, longitudinal studies, and adaptation to disease-specific classifiers.
nan
Article 1096
Title@2025-07-17 (4): Provable Low-Frequency Bias of In-Context Learning of Representations
Title: Provable Low-Frequency Bias of In-Context Learning of Representations | Wahrscheinliche frequenzarme Bias des In-Context-Lernens von Repräsentationen | 可实现的低公平率代表制的理论内学习 2507.13540v1 |
Authors (3): Yongyi Yang, Hidenori Tanaka, Wei Hu
In-context learning (ICL) enables large language models (LLMs) to acquire new behaviors from the input sequence alone without any parameter updates. Recent studies have shown that ICL can surpass the original meaning learned in pretraining stage through internalizing the structure the data-generating process (DGP) of the prompt into the hidden representations. However, the mechanisms by which LLMs achieve this ability is left open. In this paper, we present the first rigorous explanation of such phenomena by introducing a unified framework of double convergence, where hidden representations converge both over context and across layers. This double convergence process leads to an implicit bias towards smooth (low-frequency) representations, which we prove analytically and verify empirically. Our theory explains several open empirical observations, including why learned representations exhibit globally structured but locally distorted geometry, and why their total energy decays without vanishing. Moreover, our theory predicts that ICL has an intrinsic robustness towards high-frequency noise, which we empirically confirm. These results provide new insights into the underlying mechanisms of ICL, and a theoretical foundation to study it that hopefully extends to more general data distributions and settings.
nan
Article 1097
Title@2025-07-17 (4): How Not to Detect Prompt Injections with an LLM
Title: How Not to Detect Prompt Injections with an LLM | Wie man Injektionen mit einem LLM nicht erkennen kann | 如何不用LLM检测快速注射 2507.05630v2 |
Authors (4): Sarthak Choudhary, Divyam Anshumaan, Nils Palumbo, Somesh Jha
LLM-integrated applications and agents are vulnerable to prompt injection attacks, in which adversaries embed malicious instructions within seemingly benign user inputs to manipulate the LLM’s intended behavior. Recent defenses based on $\textit{known-answer detection}$ (KAD) have achieved near-perfect performance by using an LLM to classify inputs as clean or contaminated. In this work, we formally characterize the KAD framework and uncover a structural vulnerability in its design that invalidates its core security premise. We design a methodical adaptive attack, $\textit{DataFlip}$, to exploit this fundamental weakness. It consistently evades KAD defenses with detection rates as low as $1.5\%$ while reliably inducing malicious behavior with success rates of up to $88\%$, without needing white-box access to the LLM or any optimization procedures.
nan
Article 1098
Title@2025-07-17 (4): Sugar-Beet Stress Detection using Satellite Image Time Series
Title: Sugar-Beet Stress Detection using Satellite Image Time Series | Sugar-Beet-Stress-Erkennung mit Satellitenbild-Zeitreihe | 利用卫星图像图像时间序列检测糖甜甜豆应激反应 2507.13514v1 |
Authors (5): Bhumika Laxman Sadbhave, Philipp Vaeth, Denise Dejon, Gunther Schorcht, Magda Gregorová
Satellite Image Time Series (SITS) data has proven effective for agricultural tasks due to its rich spectral and temporal nature. In this study, we tackle the task of stress detection in sugar-beet fields using a fully unsupervised approach. We propose a 3D convolutional autoencoder model to extract meaningful features from Sentinel-2 image sequences, combined with acquisition-date-specific temporal encodings to better capture the growth dynamics of sugar-beets. The learned representations are used in a downstream clustering task to separate stressed from healthy fields. The resulting stress detection system can be directly applied to data from different years, offering a practical and accessible tool for stress detection in sugar-beets.
nan
Article 1099
Title@2025-07-17 (4): Inverse Synthetic Aperture Fourier Ptychography
Title: Inverse Synthetic Aperture Fourier Ptychography | Inverse Synthetische Blende Fourier Ptychographie | 反向合成孔径孔径 2507.03733v2 |
Authors (3): Matthew A. Chan, Casey J. Pellizzari, Christopher A. Metzler
Fourier ptychography (FP) is a powerful light-based synthetic aperture imaging technique that allows one to reconstruct a high-resolution, wide field-of-view image by computationally integrating a diverse collection of low-resolution, far-field measurements. Typically, FP measurement diversity is introduced by changing the angle of the illumination or the position of the camera; either approach results in sampling different portions of the target’s spatial frequency content, but both approaches introduce substantial costs and complexity to the acquisition process. In this work, we introduce Inverse Synthetic Aperture Fourier Ptychography, a novel approach to FP that foregoes changing the illumination angle or camera position and instead generates measurement diversity through target motion. Critically, we also introduce a novel learning-based method for estimating k-space coordinates from dual plane intensity measurements, thereby enabling synthetic aperture imaging without knowing the rotation of the target. We experimentally validate our method in simulation and on a tabletop optical system.
nan
Article 1100
Title@2025-07-17 (4): PHASE: Passive Human Activity Simulation Evaluation
Title: PHASE: Passive Human Activity Simulation Evaluation | PHASE: Passive Simulation der menschlichen Aktivität | PHASE:被动的人类活动模拟评价 2507.13505v1 |
Authors (4): Steven Lamp, Jason D. Hiser, Anh Nguyen-Tuong, Jack W. Davidson
Cybersecurity simulation environments, such as cyber ranges, honeypots, and sandboxes, require realistic human behavior to be effective, yet no quantitative method exists to assess the behavioral fidelity of synthetic user personas. This paper presents PHASE (Passive Human Activity Simulation Evaluation), a machine learning framework that analyzes Zeek connection logs and distinguishes human from non-human activity with over 90\% accuracy. PHASE operates entirely passively, relying on standard network monitoring without any user-side instrumentation or visible signs of surveillance. All network activity used for machine learning is collected via a Zeek network appliance to avoid introducing unnecessary network traffic or artifacts that could disrupt the fidelity of the simulation environment. The paper also proposes a novel labeling approach that utilizes local DNS records to classify network traffic, thereby enabling machine learning analysis. Furthermore, we apply SHAP (SHapley Additive exPlanations) analysis to uncover temporal and behavioral signatures indicative of genuine human users. In a case study, we evaluate a synthetic user persona and identify distinct non-human patterns that undermine behavioral realism. Based on these insights, we develop a revised behavioral configuration that significantly improves the human-likeness of synthetic activity yielding a more realistic and effective synthetic user persona.
nan
Article 1101
Title@2025-07-17 (4): Gradient Descent Finds Over-Parameterized Neural Networks with Sharp Generalization for Nonparametric Regression
Title: Gradient Descent Finds Over-Parameterized Neural Networks with Sharp Generalization for Nonparametric Regression | Gradient Descent findet überparameterisierte neurale Netzwerke mit scharfer Generalisierung für nichtparametrische Regression | 梯度梯度下发现超计神经网络,具有非参数回归的锐化概括化 2411.02904v4 |
Authors (2): Yingzhen Yang, Ping Li
We study nonparametric regression by an over-parameterized two-layer neural network trained by gradient descent (GD) in this paper. We show that, if the neural network is trained by GD with early stopping, then the trained network renders a sharp rate of the nonparametric regression risk of $\mathcal{O}(\epsilon_n^2)$, which is the same rate as that for the classical kernel regression trained by GD with early stopping, where $\epsilon_n$ is the critical population rate of the Neural Tangent Kernel (NTK) associated with the network and $n$ is the size of the training data. It is remarked that our result does not require distributional assumptions about the covariate as long as the covariate is bounded, in a strong contrast with many existing results which rely on specific distributions of the covariates such as the spherical uniform data distribution or distributions satisfying certain restrictive conditions. The rate $\mathcal{O}(\epsilon_n^2)$ is known to be minimax optimal for specific cases, such as the case that the NTK has a polynomial eigenvalue decay rate which happens under certain distributional assumptions on the covariates. Our result formally fills the gap between training a classical kernel regression model and training an over-parameterized but finite-width neural network by GD for nonparametric regression without distributional assumptions on the bounded covariate. We also provide confirmative answers to certain open questions or address particular concerns in the literature of training over-parameterized neural networks by GD with early stopping for nonparametric regression, including the characterization of the stopping time, the lower bound for the network width, and the constant learning rate used in GD.
nan
Article 1102
Title@2025-07-17 (4): SpecMaskFoley: Steering Pretrained Spectral Masked Generative Transformer Toward Synchronized Video-to-audio Synthesis via ControlNet
Title: SpecMaskFoley: Steering Pretrained Spectral Masked Generative Transformer Toward Synchronized Video-to-audio Synthesis via ControlNet | SpecMaskFoley: Steuerung vortrainierter Spektralmasken Generativer Transformer zum synchronisierten Video-zu-Audio-Synthese über ControlNet | SpecMaskFoley:通过控制网实现同步录相合成 2505.16195v2 |
Authors (6): Zhi Zhong, Akira Takahashi, Shuyang Cui, Keisuke Toyama, Shusuke Takahashi, Yuki Mitsufuji
Foley synthesis aims to synthesize high-quality audio that is both semantically and temporally aligned with video frames. Given its broad application in creative industries, the task has gained increasing attention in the research community. To avoid the non-trivial task of training audio generative models from scratch, adapting pretrained audio generative models for video-synchronized foley synthesis presents an attractive direction. ControlNet, a method for adding fine-grained controls to pretrained generative models, has been applied to foley synthesis, but its use has been limited to handcrafted human-readable temporal conditions. In contrast, from-scratch models achieved success by leveraging high-dimensional deep features extracted using pretrained video encoders. We have observed a performance gap between ControlNet-based and from-scratch foley models. To narrow this gap, we propose SpecMaskFoley, a method that steers the pretrained SpecMaskGIT model toward video-synchronized foley synthesis via ControlNet. To unlock the potential of a single ControlNet branch, we resolve the discrepancy between the temporal video features and the time-frequency nature of the pretrained SpecMaskGIT via a frequency-aware temporal feature aligner, eliminating the need for complicated conditioning mechanisms widely used in prior arts. Evaluations on a common foley synthesis benchmark demonstrate that SpecMaskFoley could even outperform strong from-scratch baselines, substantially advancing the development of ControlNet-based foley synthesis models. Demo page: https://zzaudio.github.io/SpecMaskFoley_Demo/
nan
Article 1103
Title@2025-07-17 (4): Model-free Reinforcement Learning for Model-based Control: Towards Safe, Interpretable and Sample-efficient Agents
Title: Model-free Reinforcement Learning for Model-based Control: Towards Safe, Interpretable and Sample-efficient Agents | Modellfreies Verstärkungslernen für modellbasierte Steuerung: Auf dem Weg zu sicheren, interpretierbaren und mustereffizienten Agenten | 示范式控制示范性强化学习:建立安全、可解释和高效采样的代用品 2507.13491v1 |
Authors (2): Thomas Banker, Ali Mesbah
Training sophisticated agents for optimal decision-making under uncertainty has been key to the rapid development of modern autonomous systems across fields. Notably, model-free reinforcement learning (RL) has enabled decision-making agents to improve their performance directly through system interactions, with minimal prior knowledge about the system. Yet, model-free RL has generally relied on agents equipped with deep neural network function approximators, appealing to the networks’ expressivity to capture the agent’s policy and value function for complex systems. However, neural networks amplify the issues of sample inefficiency, unsafe learning, and limited interpretability in model-free RL. To this end, this work introduces model-based agents as a compelling alternative for control policy approximation, leveraging adaptable models of system dynamics, cost, and constraints for safe policy learning. These models can encode prior system knowledge to inform, constrain, and aid in explaining the agent’s decisions, while deficiencies due to model mismatch can be remedied with model-free RL. We outline the benefits and challenges of learning model-based agents – exemplified by model predictive control – and detail the primary learning approaches: Bayesian optimization, policy search RL, and offline strategies, along with their respective strengths. While model-free RL has long been established, its interplay with model-based agents remains largely unexplored, motivating our perspective on their combined potentials for sample-efficient learning of safe and interpretable decision-making agents.
nan
Article 1104
Title@2025-07-17 (4): ParaPO: Aligning Language Models to Reduce Verbatim Reproduction of Pre-training Data
Title: ParaPO: Aligning Language Models to Reduce Verbatim Reproduction of Pre-training Data | ParaPO: Sprachmodelle so ausrichten, dass verbatime Reproduktion von Vortrainingsdaten reduziert wird | ParaPO:调整语文模式,减少培训前数据的逐字记录 2504.14452v2 |
Authors (8): Tong Chen, Faeze Brahman, Jiacheng Liu, Niloofar Mireshghallah, Weijia Shi, Pang Wei Koh, Luke Zettlemoyer, Hannaneh Hajishirzi
Language models (LMs) can memorize and reproduce segments from their pretraining data verbatim even in non-adversarial settings, raising concerns about copyright, plagiarism, privacy, and creativity. We introduce Paraphrase Preference Optimization (ParaPO), a post-training method that fine-tunes LMs to reduce unintentional regurgitation while preserving their overall utility. ParaPO trains LMs to prefer paraphrased versions of memorized segments over the original verbatim content from the pretraining data. To maintain the ability to recall famous quotations when appropriate, we develop a variant of ParaPO that uses system prompts to control regurgitation behavior. In our evaluation on Llama3.1-8B, ParaPO consistently reduces regurgitation across all tested datasets (e.g., reducing the regurgitation metric from 17.3 to 12.9 in creative writing), whereas unlearning methods used in prior work to mitigate regurgitation are less effective outside their targeted unlearned domain (from 17.3 to 16.9). When applied to the instruction-tuned Tulu3-8B model, ParaPO with system prompting successfully preserves famous quotation recall while reducing unintentional regurgitation (from 8.7 to 6.3 in creative writing) when prompted not to regurgitate. In contrast, without ParaPO tuning, prompting the model not to regurgitate produces only a marginal reduction (8.7 to 8.4).
nan
Article 1105
Title@2025-07-17 (4): Neural Architecture Search with Mixed Bio-inspired Learning Rules
Title: Neural Architecture Search with Mixed Bio-inspired Learning Rules | Neurale Architektur Suche mit gemischten bio-inspirierten Lernregeln | 具有混合生物启发混合学习规则的神经结构搜索 2507.13485v1 |
Authors (2): Imane Hamzaoui, Riyadh Baghdadi
Bio-inspired neural networks are attractive for their adversarial robustness, energy frugality, and closer alignment with cortical physiology, yet they often lag behind back-propagation (BP) based models in accuracy and ability to scale. We show that allowing the use of different bio-inspired learning rules in different layers, discovered automatically by a tailored neural-architecture-search (NAS) procedure, bridges this gap. Starting from standard NAS baselines, we enlarge the search space to include bio-inspired learning rules and use NAS to find the best architecture and learning rule to use in each layer. We show that neural networks that use different bio-inspired learning rules for different layers have better accuracy than those that use a single rule across all the layers. The resulting NN that uses a mix of bio-inspired learning rules sets new records for bio-inspired models: 95.16% on CIFAR-10, 76.48% on CIFAR-100, 43.42% on ImageNet16-120, and 60.51% top-1 on ImageNet. In some regimes, they even surpass comparable BP-based networks while retaining their robustness advantages. Our results suggest that layer-wise diversity in learning rules allows better scalability and accuracy, and motivates further research on mixing multiple bio-inspired learning rules in the same network.
nan
Article 1106
Title@2025-07-17 (4): Improving Out-of-distribution Human Activity Recognition via IMU-Video Cross-modal Representation Learning
Title: Improving Out-of-distribution Human Activity Recognition via IMU-Video Cross-modal Representation Learning | Verbesserung der außerbetrieblichen Anerkennung menschlicher Tätigkeiten durch IMU-Video Cross-modal Representative Learning | 通过IMU-Video跨模式代表性学习,改善对人的活动在分配外的认可 2507.13482v1 |
Authors (6): Seyyed Saeid Cheshmi, Buyao Lyu, Thomas Lisko, Rajesh Rajamani, Robert A. McGovern, Yogatheesan Varatharajah
Human Activity Recognition (HAR) based on wearable inertial sensors plays a critical role in remote health monitoring. In patients with movement disorders, the ability to detect abnormal patient movements in their home environments can enable continuous optimization of treatments and help alert caretakers as needed. Machine learning approaches have been proposed for HAR tasks using Inertial Measurement Unit (IMU) data; however, most rely on application-specific labels and lack generalizability to data collected in different environments or populations. To address this limitation, we propose a new cross-modal self-supervised pretraining approach to learn representations from large-sale unlabeled IMU-video data and demonstrate improved generalizability in HAR tasks on out of distribution (OOD) IMU datasets, including a dataset collected from patients with Parkinson’s disease. Specifically, our results indicate that the proposed cross-modal pretraining approach outperforms the current state-of-the-art IMU-video pretraining approach and IMU-only pretraining under zero-shot and few-shot evaluations. Broadly, our study provides evidence that in highly dynamic data modalities, such as IMU signals, cross-modal pretraining may be a useful tool to learn generalizable data representations. Our software is available at https://github.com/scheshmi/IMU-Video-OOD-HAR.
nan
Article 1107
Title@2025-07-17 (4): Multiresolution local smoothness detection in non-uniformly sampled multivariate signals
Title: Multiresolution local smoothness detection in non-uniformly sampled multivariate signals | Multiauflösende lokale Glättedetektion in nicht einheitlich abgetasteten multivariaten Signalen | 在非统一抽样的多变量信号中多分辨率多分辨率局部平稳探测 2507.13480v1 |
Authors (3): Sara Avesani, Gianluca Giacchi, Michael Multerer
Inspired by edge detection based on the decay behavior of wavelet coefficients, we introduce a (near) linear-time algorithm for detecting the local regularity in non-uniformly sampled multivariate signals. Our approach quantifies regularity within the framework of microlocal spaces introduced by Jaffard. The central tool in our analysis is the fast samplet transform, a distributional wavelet transform tailored to scattered data. We establish a connection between the decay of samplet coefficients and the pointwise regularity of multivariate signals. As a by product, we derive decay estimates for functions belonging to classical H"older spaces and Sobolev-Slobodeckij spaces. While traditional wavelets are effective for regularity detection in low-dimensional structured data, samplets demonstrate robust performance even for higher dimensional and scattered data. To illustrate our theoretical findings, we present extensive numerical studies detecting local regularity of one-, two- and three-dimensional signals, ranging from non-uniformly sampled time series over image segmentation to edge detection in point clouds.
nan
Article 1108
Title@2025-07-17 (4): psifx – Psychological and Social Interactions Feature Extraction Package
Title: psifx – Psychological and Social Interactions Feature Extraction Package | psifx – Psychologische und soziale Interaktionen Feature Extraction Package | psifx – – 心理和社会互动 2407.10266v4 |
Authors (3): Guillaume Rochette, Mathieu Rochat, Matthew J. Vowels
psifx is a plug-and-play multi-modal feature extraction toolkit, aiming to facilitate and democratize the use of state-of-the-art machine learning techniques for human sciences research. It is motivated by a need (a) to automate and standardize data annotation processes that typically require expensive, lengthy, and inconsistent human labour; (b) to develop and distribute open-source community-driven psychology research software; and (c) to enable large-scale access and ease of use for non-expert users. The framework contains an array of tools for tasks such as speaker diarization, closed-caption transcription and translation from audio; body, hand, and facial pose estimation and gaze tracking with multi-person tracking from video; and interactive textual feature extraction supported by large language models. The package has been designed with a modular and task-oriented approach, enabling the community to add or update new tools easily. This combination creates new opportunities for in-depth study of real-time behavioral phenomena in psychological and social science research.
nan
Article 1109
Title@2025-07-17 (4): Base3: a simple interpolation-based ensemble method for robust dynamic link prediction
Title: Base3: a simple interpolation-based ensemble method for robust dynamic link prediction | Base3: eine einfache, interpolationsbasierte Ensemble-Methode für robuste dynamische Link-Vorhersage | 基数3:一种简单的基于内插的共合方法,用于稳健动态链接预测 2506.12764v2 |
Authors (1): Kondrup Emma
Dynamic link prediction remains a central challenge in temporal graph learning, particularly in designing models that are both effective and practical for real-world deployment. Existing approaches often rely on complex neural architectures, which are computationally intensive and difficult to interpret. In this work, we build on the strong recurrence-based foundation of the EdgeBank baseline, by supplementing it with inductive capabilities. We do so by leveraging the predictive power of non-learnable signals from two complementary perspectives: historical edge recurrence, as captured by EdgeBank, and global node popularity, as introduced in the PopTrack model. We propose t-CoMem, a lightweight memory module that tracks temporal co-occurrence patterns and neighborhood activity. Building on this, we introduce Base3, an interpolation-based model that fuses EdgeBank, PopTrack, and t-CoMem into a unified scoring framework. This combination effectively bridges local and global temporal dynamics – repetition, popularity, and context – without relying on training. Evaluated on the Temporal Graph Benchmark, Base3 achieves performance competitive with state-of-the-art deep models, even outperforming them on some datasets. Importantly, it considerably improves on existing baselines’ performance under more realistic and challenging negative sampling strategies – offering a simple yet robust alternative for temporal graph learning.
nan
Article 1110
Title@2025-07-17 (4): Graph Neural Network Surrogates for Contacting Deformable Bodies with Necessary and Sufficient Contact Detection
Title: Graph Neural Network Surrogates for Contacting Deformable Bodies with Necessary and Sufficient Contact Detection | Graph Neural Network Surrogates für Kontakt mit deformierbaren Körpern mit notwendiger und ausreichender Kontakterkennung | 与必要和足够接触检测器接触变形机体的神经网络代号 2507.13459v1 |
Authors (6): Vijay K. Dubey, Collin E. Haese, Osman Gültekin, David Dalton, Manuel K. Rausch, Jan N. Fuhg
Surrogate models for the rapid inference of nonlinear boundary value problems in mechanics are helpful in a broad range of engineering applications. However, effective surrogate modeling of applications involving the contact of deformable bodies, especially in the context of varying geometries, is still an open issue. In particular, existing methods are confined to rigid body contact or, at best, contact between rigid and soft objects with well-defined contact planes. Furthermore, they employ contact or collision detection filters that serve as a rapid test but use only the necessary and not sufficient conditions for detection. In this work, we present a graph neural network architecture that utilizes continuous collision detection and, for the first time, incorporates sufficient conditions designed for contact between soft deformable bodies. We test its performance on two benchmarks, including a problem in soft tissue mechanics of predicting the closed state of a bioprosthetic aortic valve. We find a regularizing effect on adding additional contact terms to the loss function, leading to better generalization of the network. These benefits hold for simple contact at similar planes and element normal angles, and complex contact at differing planes and element normal angles. We also demonstrate that the framework can handle varying reference geometries. However, such benefits come with high computational costs during training, resulting in a trade-off that may not always be favorable. We quantify the training cost and the resulting inference speedups on various hardware architectures. Importantly, our graph neural network implementation results in up to a thousand-fold speedup for our benchmark problems at inference.
nan
Article 1111
Title@2025-07-17 (4): Domain-randomized deep learning for neuroimage analysis
Title: Domain-randomized deep learning for neuroimage analysis | Domain-randomisiertes Deep Learning für Neuroimage-Analysen | 用于神经影像分析的内地随机深层学习 2507.13458v1 |
Authors (1): Malte Hoffmann
Deep learning has revolutionized neuroimage analysis by delivering unprecedented speed and accuracy. However, the narrow scope of many training datasets constrains model robustness and generalizability. This challenge is particularly acute in magnetic resonance imaging (MRI), where image appearance varies widely across pulse sequences and scanner hardware. A recent domain-randomization strategy addresses the generalization problem by training deep neural networks on synthetic images with randomized intensities and anatomical content. By generating diverse data from anatomical segmentation maps, the approach enables models to accurately process image types unseen during training, without retraining or fine-tuning. It has demonstrated effectiveness across modalities including MRI, computed tomography, positron emission tomography, and optical coherence tomography, as well as beyond neuroimaging in ultrasound, electron and fluorescence microscopy, and X-ray microtomography. This tutorial paper reviews the principles, implementation, and potential of the synthesis-driven training paradigm. It highlights key benefits, such as improved generalization and resistance to overfitting, while discussing trade-offs such as increased computational demands. Finally, the article explores practical considerations for adopting the technique, aiming to accelerate the development of generalizable tools that make deep learning more accessible to domain experts without extensive computational resources or machine learning knowledge.
nan
Article 1112
Title@2025-07-17 (4): Hierarchical Rectified Flow Matching with Mini-Batch Couplings
Title: Hierarchical Rectified Flow Matching with Mini-Batch Couplings | Hierarchischer rektifizierter Fluss passend zu Mini-Batch-Kupplungen | 与小批量相匹配的梯级校正流程 2507.13350v1 |
Authors (4): Yichi Zhang, Yici Yan, Alex Schwing, Zhizhen Zhao
Flow matching has emerged as a compelling generative modeling approach that is widely used across domains. To generate data via a flow matching model, an ordinary differential equation (ODE) is numerically solved via forward integration of the modeled velocity field. To better capture the multi-modality that is inherent in typical velocity fields, hierarchical flow matching was recently introduced. It uses a hierarchy of ODEs that are numerically integrated when generating data. This hierarchy of ODEs captures the multi-modal velocity distribution just like vanilla flow matching is capable of modeling a multi-modal data distribution. While this hierarchy enables to model multi-modal velocity distributions, the complexity of the modeled distribution remains identical across levels of the hierarchy. In this paper, we study how to gradually adjust the complexity of the distributions across different levels of the hierarchy via mini-batch couplings. We show the benefits of mini-batch couplings in hierarchical rectified flow matching via compelling results on synthetic and imaging data. Code is available at https://riccizz.github.io/HRF_coupling.
nan
Article 1113
Title@2025-07-17 (4): VisionThink: Smart and Efficient Vision Language Model via Reinforcement Learning
Title: VisionThink: Smart and Efficient Vision Language Model via Reinforcement Learning | VisionThink: Intelligentes und effizientes Vision-Sprachmodell durch Verstärkungslernen | 远景设想:通过强化学习建立聪明、高效的愿景语言模式 2507.13348v1 |
Authors (6): Senqiao Yang, Junyi Li, Xin Lai, Bei Yu, Hengshuang Zhao, Jiaya Jia
Recent advancements in vision-language models (VLMs) have improved performance by increasing the number of visual tokens, which are often significantly longer than text tokens. However, we observe that most real-world scenarios do not require such an extensive number of visual tokens. While the performance drops significantly in a small subset of OCR-related tasks, models still perform accurately in most other general VQA tasks with only 1/4 resolution. Therefore, we propose to dynamically process distinct samples with different resolutions, and present a new paradigm for visual token compression, namely, VisionThink. It starts with a downsampled image and smartly decides whether it is sufficient for problem solving. Otherwise, the model could output a special token to request the higher-resolution image. Compared to existing Efficient VLM methods that compress tokens using fixed pruning ratios or thresholds, VisionThink autonomously decides whether to compress tokens case by case. As a result, it demonstrates strong fine-grained visual understanding capability on OCR-related tasks, and meanwhile saves substantial visual tokens on simpler tasks. We adopt reinforcement learning and propose the LLM-as-Judge strategy to successfully apply RL to general VQA tasks. Moreover, we carefully design a reward function and penalty mechanism to achieve a stable and reasonable image resize call ratio. Extensive experiments demonstrate the superiority, efficiency, and effectiveness of our method. Our code is available at https://github.com/dvlab-research/VisionThink.
nan
Article 1114
Title@2025-07-17 (4): Latent Policy Steering with Embodiment-Agnostic Pretrained World Models
Title: Latent Policy Steering with Embodiment-Agnostic Pretrained World Models | Latent Policy Steering mit prätrainierten Weltmodellen der Embodiment-Agnostik | 与Embodiment-Agnnocistic未受训练世界模型的原始政策指导 2507.13340v1 |
Authors (3): Yiqi Wang, Mrinal Verghese, Jeff Schneider
Learning visuomotor policies via imitation has proven effective across a wide range of robotic domains. However, the performance of these policies is heavily dependent on the number of training demonstrations, which requires expensive data collection in the real world. In this work, we aim to reduce data collection efforts when learning visuomotor robot policies by leveraging existing or cost-effective data from a wide range of embodiments, such as public robot datasets and the datasets of humans playing with objects (human data from play). Our approach leverages two key insights. First, we use optic flow as an embodiment-agnostic action representation to train a World Model (WM) across multi-embodiment datasets, and finetune it on a small amount of robot data from the target embodiment. Second, we develop a method, Latent Policy Steering (LPS), to improve the output of a behavior-cloned policy by searching in the latent space of the WM for better action sequences. In real world experiments, we observe significant improvements in the performance of policies trained with a small amount of data (over 50% relative improvement with 30 demonstrations and over 20% relative improvement with 50 demonstrations) by combining the policy with a WM pretrained on two thousand episodes sampled from the existing Open X-embodiment dataset across different robots or a cost-effective human dataset from play.
nan
Article 1115
Title@2025-07-17 (4): Training Transformers with Enforced Lipschitz Constants
Title: Training Transformers with Enforced Lipschitz Constants | Trainingstransformatoren mit verstärkter Lipschitz-Konstanten | 培训具有强制立利普施茨常数的变革者 2507.13338v1 |
Authors (6): Laker Newhouse, R. Preston Hess, Franz Cesista, Andrii Zahorodnii, Jeremy Bernstein, Phillip Isola
Neural networks are often highly sensitive to input and weight perturbations. This sensitivity has been linked to pathologies such as vulnerability to adversarial examples, divergent training, and overfitting. To combat these problems, past research has looked at building neural networks entirely from Lipschitz components. However, these techniques have not matured to the point where researchers have trained a modern architecture such as a transformer with a Lipschitz certificate enforced beyond initialization. To explore this gap, we begin by developing and benchmarking novel, computationally-efficient tools for maintaining norm-constrained weight matrices. Applying these tools, we are able to train transformer models with Lipschitz bounds enforced throughout training. We find that optimizer dynamics matter: switching from AdamW to Muon improves standard methods – weight decay and spectral normalization – allowing models to reach equal performance with a lower Lipschitz bound. Inspired by Muon’s update having a fixed spectral norm, we co-design a weight constraint method that improves the Lipschitz vs. performance tradeoff on MLPs and 2M parameter transformers. Our 2-Lipschitz transformer on Shakespeare text reaches validation accuracy 60%. Scaling to 145M parameters, our 10-Lipschitz transformer reaches 21% accuracy on internet text. However, to match the NanoGPT baseline validation accuracy of 39.4%, our Lipschitz upper bound increases to 10^264. Nonetheless, our Lipschitz transformers train without stability measures such as layer norm, QK norm, and logit tanh softcapping.
nan
Article 1116
Title@2025-07-17 (4): GeoReg: Weight-Constrained Few-Shot Regression for Socio-Economic Estimation using LLM
Title: GeoReg: Weight-Constrained Few-Shot Regression for Socio-Economic Estimation using LLM | GeoReg: Gewicht-beschränkt Wenig-heiße Regression für sozioökonomische Abschätzung mit LLM | Georg: 使用LLM法理学模型,为社会经济估算而进行微慢回归,但受重力约束的微弱回缩 2507.13323v1 |
Authors (9): Kyeongjin Ahn, Sungwon Han, Seungeon Lee, Donghyun Ahn, Hyoshin Kim, Jungwon Kim, Jihee Kim, Sangyoon Park, Meeyoung Cha
Socio-economic indicators like regional GDP, population, and education levels, are crucial to shaping policy decisions and fostering sustainable development. This research introduces GeoReg a regression model that integrates diverse data sources, including satellite imagery and web-based geospatial information, to estimate these indicators even for data-scarce regions such as developing countries. Our approach leverages the prior knowledge of large language model (LLM) to address the scarcity of labeled data, with the LLM functioning as a data engineer by extracting informative features to enable effective estimation in few-shot settings. Specifically, our model obtains contextual relationships between data features and the target indicator, categorizing their correlations as positive, negative, mixed, or irrelevant. These features are then fed into the linear estimator with tailored weight constraints for each category. To capture nonlinear patterns, the model also identifies meaningful feature interactions and integrates them, along with nonlinear transformations. Experiments across three countries at different stages of development demonstrate that our model outperforms baselines in estimating socio-economic indicators, even for low-income countries with limited data availability.
nan
Article 1117
Title@2025-07-17 (4): Federated Learning: A Survey on Privacy-Preserving Collaborative Intelligence
Title: Federated Learning: A Survey on Privacy-Preserving Collaborative Intelligence | Föderiertes Lernen: Eine Umfrage zum Datenschutz-Schutz Kollaborativer Intelligenz | 联邦学习:保护隐私合作情报调查 2504.17703v2 |
Authors (3): Nusrat Jahan, Ratun Rahman, Michel Wang
Federated Learning (FL) has emerged as a transformative paradigm in the field of distributed machine learning, enabling multiple clients such as mobile devices, edge nodes, or organizations to collaboratively train a shared global model without the need to centralize sensitive data. This decentralized approach addresses growing concerns around data privacy, security, and regulatory compliance, making it particularly attractive in domains such as healthcare, finance, and smart IoT systems. This survey provides a concise yet comprehensive overview of Federated Learning, beginning with its core architecture and communication protocol. We discuss the standard FL lifecycle, including local training, model aggregation, and global updates. A particular emphasis is placed on key technical challenges such as handling non-IID (non-independent and identically distributed) data, mitigating system and hardware heterogeneity, reducing communication overhead, and ensuring privacy through mechanisms like differential privacy and secure aggregation. Furthermore, we examine emerging trends in FL research, including personalized FL, cross-device versus cross-silo settings, and integration with other paradigms such as reinforcement learning and quantum computing. We also highlight real-world applications and summarize benchmark datasets and evaluation metrics commonly used in FL research. Finally, we outline open research problems and future directions to guide the development of scalable, efficient, and trustworthy FL systems.
nan
Article 1118
Title@2025-07-17 (4): Learning from Reward-Free Offline Data: A Case for Planning with Latent Dynamics Models
Title: Learning from Reward-Free Offline Data: A Case for Planning with Latent Dynamics Models | Von reward-free Offline-Daten lernen: Ein Fall für die Planung mit latenten Dynamics-Modellen | 从无回报脱离线数据中学习:利用隐时动态模型进行规划的一个案例 2502.14819v2 |
Authors (6): Vlad Sobal, Wancong Zhang, Kynghyun Cho, Randall Balestriero, Tim G. J. Rudner, Yann LeCun
A long-standing goal in AI is to build agents that can solve a variety of tasks across different environments, including previously unseen ones. Two dominant approaches tackle this challenge: (i) reinforcement learning (RL), which learns policies through trial and error, and (ii) optimal control, which plans actions using a learned or known dynamics model. However, their relative strengths and weaknesses remain underexplored in the setting where agents must learn from offline trajectories without reward annotations. In this work, we systematically analyze the performance of different RL and control-based methods under datasets of varying quality. On the RL side, we consider goal-conditioned and zero-shot approaches. On the control side, we train a latent dynamics model using the Joint Embedding Predictive Architecture (JEPA) and use it for planning. We study how dataset properties-such as data diversity, trajectory quality, and environment variability-affect the performance of these approaches. Our results show that model-free RL excels when abundant, high-quality data is available, while model-based planning excels in generalization to novel environment layouts, trajectory stitching, and data-efficiency. Notably, planning with a latent dynamics model emerges as a promising approach for zero-shot generalization from suboptimal data.
nan
Article 1119
Title@2025-07-17 (4): Boosting Team Modeling through Tempo-Relational Representation Learning
Title: Boosting Team Modeling through Tempo-Relational Representation Learning | Teammodellierung durch Tempo-Relationales Repräsentationslernen fördern | 通过Tempo-关系代表制学习促进团队模拟 2507.13305v1 |
Authors (3): Vincenzo Marco De Luca, Giovanna Varni, Andrea Passerini
Team modeling remains a fundamental challenge at the intersection of Artificial Intelligence and the Social Sciences. Social Science research emphasizes the need to jointly model dynamics and relations, while practical applications demand unified models capable of inferring multiple team constructs simultaneously, providing interpretable insights and actionable recommendations to enhance team performance. However, existing works do not meet these practical demands. To bridge this gap, we present TRENN, a novel tempo-relational architecture that integrates: (i) an automatic temporal graph extractor, (ii) a tempo-relational encoder, (iii) a decoder for team construct prediction, and (iv) two complementary explainability modules. TRENN jointly captures relational and temporal team dynamics, providing a solid foundation for MT-TRENN, which extends TReNN by replacing the decoder with a multi-task head, enabling the model to learn shared Social Embeddings and simultaneously predict multiple team constructs, including Emergent Leadership, Leadership Style, and Teamwork components. Experimental results demonstrate that our approach significantly outperforms approaches that rely exclusively on temporal or relational information. Additionally, experimental evaluation has shown that the explainability modules integrated in MT-TRENN yield interpretable insights and actionable suggestions to support team improvement. These capabilities make our approach particularly well-suited for Human-Centered AI applications, such as intelligent decision-support systems in high-stakes collaborative environments.
nan
Article 1120
Title@2025-07-17 (4): Retraining-Free Merging of Sparse MoE via Hierarchical Clustering
Title: Retraining-Free Merging of Sparse MoE via Hierarchical Clustering | Retraining-Free Merging von Sparse MoE über Hierarchical Clustering | 通过等级式集束式集成,无培训地重新合并粗微中小部 2410.08589v3 |
Authors (6): I-Chun Chen, Hsu-Shen Liu, Wei-Fang Sun, Chen-Hao Chao, Yen-Chang Hsu, Chun-Yi Lee
Sparse Mixture-of-Experts (SMoE) models represent a significant advancement in large language model (LLM) development through their efficient parameter utilization. These models achieve substantial performance improvements at reduced inference costs. However, the deployment of SMoE models faces constraints from extensive memory requirements of expert components in resource-limited environments. To address these limitations, this paper introduces Hierarchical Clustering for Sparsely activated Mixture of Experts (HC-SMoE), a task-agnostic expert merging framework for parameter reduction without retraining. HC-SMoE introduces a novel hierarchical clustering approach based on expert outputs to ensure merging robustness independent of routing decisions. The proposed output-based clustering method enables effective capture of functional relationships between experts for large-scale architectures. We provide theoretical analysis and comprehensive evaluations across multiple zero-shot language tasks to demonstrate HC-SMoE’s effectiveness in state-of-the-art models including Qwen and Mixtral. The experimental results validate HC-SMoE’s superior performance and practical applicability for real-world deployments.
nan
Article 1121
Title@2025-07-17 (4): Advancing Seasonal Prediction of Tropical Cyclone Activity with a Hybrid AI-Physics Climate Model
Title: Advancing Seasonal Prediction of Tropical Cyclone Activity with a Hybrid AI-Physics Climate Model | Förderung der saisonalen Vorhersage Tropischer Zyklonaktivität mit einem Hybrid-KI-Physik-Klimamodell | 采用AI-物理混合气候模型推进热带气旋活动季节性预测 2505.01455v2 |
Authors (4): Gan Zhang, Megha Rao, Janni Yuval, Ming Zhao
Machine learning (ML) models are successful with weather forecasting and have shown progress in climate simulations, yet leveraging them for useful climate predictions needs exploration. Here we show this feasibility using Neural General Circulation Model (NeuralGCM), a hybrid ML-physics atmospheric model developed by Google, for seasonal predictions of large-scale atmospheric variability and Northern Hemisphere tropical cyclone (TC) activity. Inspired by physical model studies, we simplify boundary conditions, assuming sea surface temperature (SST) and sea ice follow their climatological cycle but persist anomalies present at the initialization time. With such forcings, NeuralGCM can generate 100 simulation days in ~8 minutes with a single Graphics Processing Unit (GPU), while simulating realistic atmospheric circulation and TC climatology patterns. This configuration yields useful seasonal predictions (July to November) for the tropical atmosphere and various TC activity metrics. Notably, the predicted and observed TC frequency in the North Atlantic and East Pacific basins are significantly correlated during 1990 to 2023 (r=~0.7), suggesting prediction skill comparable to existing physical GCMs. Despite challenges associated with model resolution and simplified boundary forcings, the model-predicted interannual variations demonstrate significant correlations with the observation, including the sub-basin TC tracks (p<0.1) and basin-wide accumulated cyclone energy (p<0.01) of the North Atlantic and North Pacific basins. These findings highlight the promise of leveraging ML models with physical insights to model TC risks and deliver seamless weather-climate predictions.
nan
Article 1122
Title@2025-07-17 (4): SIDDA: SInkhorn Dynamic Domain Adaptation for Image Classification with Equivariant Neural Networks
Title: SIDDA: SInkhorn Dynamic Domain Adaptation for Image Classification with Equivariant Neural Networks | SIDDA: SInkhorn Dynamische Domain-Anpassung für die Bildklassifizierung mit Gleichwertigen Neuronalen Netzwerken | SIDDA: 利用等质神经网络进行图像分类的SInkhorn动态域域适应 2501.14048v2 |
Authors (5): Sneh Pandya, Purvik Patel, Brian D. Nord, Mike Walmsley, Aleksandra Ćiprijanović
Modern neural networks (NNs) often do not generalize well in the presence of a “covariate shift”; that is, in situations where the training and test data distributions differ, but the conditional distribution of classification labels remains unchanged. In such cases, NN generalization can be reduced to a problem of learning more domain-invariant features. Domain adaptation (DA) methods include a range of techniques aimed at achieving this; however, these methods have struggled with the need for extensive hyperparameter tuning, which then incurs significant computational costs. In this work, we introduce SIDDA, an out-of-the-box DA training algorithm built upon the Sinkhorn divergence, that can achieve effective domain alignment with minimal hyperparameter tuning and computational overhead. We demonstrate the efficacy of our method on multiple simulated and real datasets of varying complexity, including simple shapes, handwritten digits, and real astronomical observations. SIDDA is compatible with a variety of NN architectures, and it works particularly well in improving classification accuracy and model calibration when paired with equivariant neural networks (ENNs). We find that SIDDA enhances the generalization capabilities of NNs, achieving up to a $\approx40\%$ improvement in classification accuracy on unlabeled target data. We also study the efficacy of DA on ENNs with respect to the varying group orders of the dihedral group $D_N$, and find that the model performance improves as the degree of equivariance increases. Finally, we find that SIDDA enhances model calibration on both source and target data–achieving over an order of magnitude improvement in the ECE and Brier score. SIDDA’s versatility, combined with its automated approach to domain alignment, has the potential to advance multi-dataset studies by enabling the development of highly generalizable models.
nan
Article 1123
Title@2025-07-17 (4): Air Traffic Controller Task Demand via Graph Neural Networks: An Interpretable Approach to Airspace Complexity
Title: Air Traffic Controller Task Demand via Graph Neural Networks: An Interpretable Approach to Airspace Complexity | Flugverkehrskontroller Auftragsnachfrage über Graph Neural Networks: Ein interpretierbarer Ansatz für die Komplexität des Luftraums | 通过图形神经网络的空中交通管制主计长任务需求:对空气空间复杂度的一种解释性办法 2507.13423v1 |
Authors (5): Edward Henderson, Dewi Gould, Richard Everson, George De Ath, Nick Pepper
Real-time assessment of near-term Air Traffic Controller (ATCO) task demand is a critical challenge in an increasingly crowded airspace, as existing complexity metrics often fail to capture nuanced operational drivers beyond simple aircraft counts. This work introduces an interpretable Graph Neural Network (GNN) framework to address this gap. Our attention-based model predicts the number of upcoming clearances, the instructions issued to aircraft by ATCOs, from interactions within static traffic scenarios. Crucially, we derive an interpretable, per-aircraft task demand score by systematically ablating aircraft and measuring the impact on the model’s predictions. Our framework significantly outperforms an ATCO-inspired heuristic and is a more reliable estimator of scenario complexity than established baselines. The resulting tool can attribute task demand to specific aircraft, offering a new way to analyse and understand the drivers of complexity for applications in controller training and airspace redesign.
nan
Article 1124
Title@2025-07-17 (4): crowd-hpo: Realistic Hyperparameter Optimization and Benchmarking for Learning from Crowds with Noisy Labels
Title: crowd-hpo: Realistic Hyperparameter Optimization and Benchmarking for Learning from Crowds with Noisy Labels | crowd-hpo: Realistische Hyperparameter-Optimierung und Benchmarking zum Lernen von Crowds mit Noisy-Labels | 现实主义超超参数最佳化和基准化,用噪音标签从人群中学习 2504.09085v2 |
Authors (4): Marek Herde, Lukas Lührs, Denis Huseljic, Bernhard Sick
Crowdworking is a cost-efficient solution for acquiring class labels. Since these labels are subject to noise, various approaches to learning from crowds have been proposed. Typically, these approaches are evaluated with default hyperparameter configurations, resulting in unfair and suboptimal performance, or with hyperparameter configurations tuned via a validation set with ground truth class labels, representing an often unrealistic scenario. Moreover, both setups can produce different approach rankings, complicating study comparisons. Therefore, we introduce crowd-hpo as a framework for evaluating approaches to learning from crowds in combination with criteria to select well-performing hyperparameter configurations with access only to noisy crowd-labeled validation data. Extensive experiments with neural networks demonstrate that these criteria select hyperparameter configurations, which improve the learning from crowd approaches’ generalization performances, measured on separate test sets with ground truth labels. Hence, incorporating such criteria into experimental studies is essential for enabling fairer and more realistic benchmarking.
nan
Article 1125
Title@2025-07-17 (4): Optimal Empirical Risk Minimization under Temporal Distribution Shifts
Title: Optimal Empirical Risk Minimization under Temporal Distribution Shifts | Optimale Empirische Risikominimierung unter zeitlichen Verteilungsverschiebungen | 时间分布变化下最佳实证风险最小化 2507.13287v1 |
Authors (4): Yujin Jeong, Ramesh Johari, Dominik Rothenhäusler, Emily Fox
Temporal distribution shifts pose a key challenge for machine learning models trained and deployed in dynamically evolving environments. This paper introduces RIDER (RIsk minimization under Dynamically Evolving Regimes) which derives optimally-weighted empirical risk minimization procedures under temporal distribution shifts. Our approach is theoretically grounded in the random distribution shift model, where random shifts arise as a superposition of numerous unpredictable changes in the data-generating process. We show that common weighting schemes, such as pooling all data, exponentially weighting data, and using only the most recent data, emerge naturally as special cases in our framework. We demonstrate that RIDER consistently improves out-of-sample predictive performance when applied as a fine-tuning step on the Yearbook dataset, across a range of benchmark methods in Wild-Time. Moreover, we show that RIDER outperforms standard weighting strategies in two other real-world tasks: predicting stock market volatility and forecasting ride durations in NYC taxi data.
nan
Article 1126
Title@2025-07-17 (4): Stochastic Weakly Convex Optimization Under Heavy-Tailed Noises
Title: Stochastic Weakly Convex Optimization Under Heavy-Tailed Noises | Stochastisch schwache Konvex-Optimierung unter schwerfälligen Geräuschen | 在重故障噪音下优化 2507.13283v1 |
Authors (3): Tianxi Zhu, Yi Xu, Xiangyang Ji
An increasing number of studies have focused on stochastic first-order methods (SFOMs) under heavy-tailed gradient noises, which have been observed in the training of practical deep learning models. In this paper, we focus on two types of gradient noises: one is sub-Weibull noise, and the other is noise under the assumption that it has a bounded $p$-th central moment ($p$-BCM) with $p\in (1, 2]$. The latter is more challenging due to the occurrence of infinite variance when $p\in (1, 2)$. Under these two gradient noise assumptions, the in-expectation and high-probability convergence of SFOMs have been extensively studied in the contexts of convex optimization and standard smooth optimization. However, for weakly convex objectives-a class that includes all Lipschitz-continuous convex objectives and smooth objectives-our understanding of the in-expectation and high-probability convergence of SFOMs under these two types of noises remains incomplete. We investigate the high-probability convergence of the vanilla stochastic subgradient descent (SsGD) method under sub-Weibull noises, as well as the high-probability and in-expectation convergence of clipped SsGD under the $p$-BCM noises. Both analyses are conducted in the context of weakly convex optimization. For weakly convex objectives that may be non-convex and non-smooth, our results demonstrate that the theoretical dependence of vanilla SsGD on the failure probability and number of iterations under sub-Weibull noises does not degrade compared to the case of smooth objectives. Under $p$-BCM noises, our findings indicate that the non-smoothness and non-convexity of weakly convex objectives do not impact the theoretical dependence of clipped SGD on the failure probability relative to the smooth case; however, the sample complexity we derived is worse than a well-known lower bound for smooth optimization.
nan
Article 1127
Title@2025-07-17 (4): Generative Diffusion Models for Resource Allocation in Wireless Networks
Title: Generative Diffusion Models for Resource Allocation in Wireless Networks | Generative Diffusionsmodelle zur Ressourcenallokation in drahtlosen Netzwerken | 无线网络资源分配生成传播模型 2504.20277v2 |
Authors (4): Yigit Berkay Uslu, Samar Hadou, Shirin Saeedi Bidokhti, Alejandro Ribeiro
This paper proposes a supervised training algorithm for learning stochastic resource allocation policies with generative diffusion models (GDMs). We formulate the allocation problem as the maximization of an ergodic utility function subject to ergodic Quality of Service (QoS) constraints. Given samples from a stochastic expert policy that yields a near-optimal solution to the constrained optimization problem, we train a GDM policy to imitate the expert and generate new samples from the optimal distribution. We achieve near-optimal performance through the sequential execution of the generated samples. To enable generalization to a family of network configurations, we parameterize the backward diffusion process with a graph neural network (GNN) architecture. We present numerical results in a case study of power control.
nan
Article 1128
Title@2025-07-17 (4): Evaluating Reinforcement Learning Algorithms for Navigation in Simulated Robotic Quadrupeds: A Comparative Study Inspired by Guide Dog Behaviour
Title: Evaluating Reinforcement Learning Algorithms for Navigation in Simulated Robotic Quadrupeds: A Comparative Study Inspired by Guide Dog Behaviour | Bewertung von Stärkungslernen Algorithmen für die Navigation in simulierten Roboter-Quadrupen: Eine vergleichende Studie, inspiriert von Guide Dog Behaviour | 评价模拟机器人四重干扰模拟机器人四重干扰导航中强化学习的教学比值:受导狗行为启发的比较研究 2507.13277v1 |
Authors (1): Emma M. A. Harrison
Robots are increasingly integrated across industries, particularly in healthcare. However, many valuable applications for quadrupedal robots remain overlooked. This research explores the effectiveness of three reinforcement learning algorithms in training a simulated quadruped robot for autonomous navigation and obstacle avoidance. The goal is to develop a robotic guide dog simulation capable of path following and obstacle avoidance, with long-term potential for real-world assistance to guide dogs and visually impaired individuals. It also seeks to expand research into medical ‘pets’, including robotic guide and alert dogs. A comparative analysis of thirteen related research papers shaped key evaluation criteria, including collision detection, pathfinding algorithms, sensor usage, robot type, and simulation platforms. The study focuses on sensor inputs, collision frequency, reward signals, and learning progression to determine which algorithm best supports robotic navigation in complex environments. Custom-made environments were used to ensure fair evaluation of all three algorithms under controlled conditions, allowing consistent data collection. Results show that Proximal Policy Optimization (PPO) outperformed Deep Q-Network (DQN) and Q-learning across all metrics, particularly in average and median steps to goal per episode. By analysing these results, this study contributes to robotic navigation, AI and medical robotics, offering insights into the feasibility of AI-driven quadruped mobility and its role in assistive robotics.
nan
Article 1129
Title@2025-07-17 (4): Do you know what q-means?
Title: Do you know what q-means? | Weißt du, was q-bedeutet? | 你知道什么是q - means吗? 2308.09701v3 |
Authors (4): Arjan Cornelissen, Joao F. Doriguello, Alessandro Luongo, Ewin Tang
Clustering is one of the most important tools for analysis of large datasets, and perhaps the most popular clustering algorithm is Lloyd’s algorithm for $k$-means. This algorithm takes $n$ vectors $V=[v_1,\dots,v_n]\in\mathbb{R}^{d\times n}$ and outputs $k$ centroids $c_1,\dots,c_k\in\mathbb{R}^d$; these partition the vectors into clusters based on which centroid is closest to a particular vector. We present a classical $\varepsilon$-$k$-means algorithm that performs an approximate version of one iteration of Lloyd’s algorithm with time complexity $\tilde{O}\big(\frac{|V|_F^2}{n}\frac{k^{2}d}{\varepsilon^2}(k + \log{n})\big)$, exponentially improving the dependence on the data size $n$ and matching that of the “$q$-means” quantum algorithm originally proposed by Kerenidis, Landman, Luongo, and Prakash (NeurIPS’19). Moreover, we propose an improved $q$-means quantum algorithm with time complexity $\tilde{O}\big(\frac{|V|_F}{\sqrt{n}}\frac{k^{3/2}d}{\varepsilon}(\sqrt{k}+\sqrt{d})(\sqrt{k} + \log{n})\big)$ that quadratically improves the runtime of our classical $\varepsilon$-$k$-means algorithm in several parameters. Our quantum algorithm does not rely on quantum linear algebra primitives of prior work, but instead only uses QRAM to prepare simple states based on the current iteration’s clusters and multivariate quantum amplitude estimation. Finally, we provide classical and quantum query lower bounds, showing that our algorithms are optimal in most parameters.
nan
Article 1130
Title@2025-07-17 (4): Automating Steering for Safe Multimodal Large Language Models
Title: Automating Steering for Safe Multimodal Large Language Models | Automatisierungslenkung für sichere multimodale große Sprachmodelle | 安全多式联运大语言模式自动化指导 2507.13255v1 |
Authors (7): Lyucheng Wu, Mengru Wang, Ziwen Xu, Tri Cao, Nay Oo, Bryan Hooi, Shumin Deng
Recent progress in Multimodal Large Language Models (MLLMs) has unlocked powerful cross-modal reasoning abilities, but also raised new safety concerns, particularly when faced with adversarial multimodal inputs. To improve the safety of MLLMs during inference, we introduce a modular and adaptive inference-time intervention technology, AutoSteer, without requiring any fine-tuning of the underlying model. AutoSteer incorporates three core components: (1) a novel Safety Awareness Score (SAS) that automatically identifies the most safety-relevant distinctions among the model’s internal layers; (2) an adaptive safety prober trained to estimate the likelihood of toxic outputs from intermediate representations; and (3) a lightweight Refusal Head that selectively intervenes to modulate generation when safety risks are detected. Experiments on LLaVA-OV and Chameleon across diverse safety-critical benchmarks demonstrate that AutoSteer significantly reduces the Attack Success Rate (ASR) for textual, visual, and cross-modal threats, while maintaining general abilities. These findings position AutoSteer as a practical, interpretable, and effective framework for safer deployment of multimodal AI systems.
nan
Article 1131
Title@2025-07-17 (4): A Roadmap for Climate-Relevant Robotics Research
Title: A Roadmap for Climate-Relevant Robotics Research | Ein Fahrplan für die klimarelevante Robotikforschung | 气候相关机器人研究路线图 2507.11623v2 |
Authors (28): Alan Papalia, Charles Dawson, Laurentiu L. Anton, Norhan Magdy Bayomi, Bianca Champenois, Jung-Hoon Cho, Levi Cai, Joseph DelPreto, Kristen Edwards, Bilha-Catherine Githinji, Cameron Hickert, Vindula Jayawardana, Matthew Kramer, Shreyaa Raghavan, David Russell, Shide Salimi, Jingnan Shi, Soumya Sudhakar, Yanwei Wang, Shouyi Wang, Luca Carlone, Vijay Kumar, Daniela Rus, John E. Fernandez, Cathy Wu, George Kantor, Derek Young, Hanumant Singh
Climate change is one of the defining challenges of the 21st century, and many in the robotics community are looking for ways to contribute. This paper presents a roadmap for climate-relevant robotics research, identifying high-impact opportunities for collaboration between roboticists and experts across climate domains such as energy, the built environment, transportation, industry, land use, and Earth sciences. These applications include problems such as energy systems optimization, construction, precision agriculture, building envelope retrofits, autonomous trucking, and large-scale environmental monitoring. Critically, we include opportunities to apply not only physical robots but also the broader robotics toolkit - including planning, perception, control, and estimation algorithms - to climate-relevant problems. A central goal of this roadmap is to inspire new research directions and collaboration by highlighting specific, actionable problems at the intersection of robotics and climate. This work represents a collaboration between robotics researchers and domain experts in various climate disciplines, and it serves as an invitation to the robotics community to bring their expertise to bear on urgent climate priorities.
nan
Article 1132
Title@2025-07-17 (4): Leveraging Asynchronous Cross-border Market Data for Improved Day-Ahead Electricity Price Forecasting in European Markets
Title: Leveraging Asynchronous Cross-border Market Data for Improved Day-Ahead Electricity Price Forecasting in European Markets | Nutzung asynchroner grenzübergreifender Marktdaten für eine verbesserte Tagesprognose der Strompreise in den europäischen Märkten | 利用非同步跨界市场数据改进欧洲市场日间电力价格预测 2507.13250v1 |
Authors (4): Maria Margarida Mascarenhas, Jilles De Blauwe, Mikael Amelin, Hussain Kazmi
Accurate short-term electricity price forecasting is crucial for strategically scheduling demand and generation bids in day-ahead markets. While data-driven techniques have shown considerable prowess in achieving high forecast accuracy in recent years, they rely heavily on the quality of input covariates. In this paper, we investigate whether asynchronously published prices as a result of differing gate closure times (GCTs) in some bidding zones can improve forecasting accuracy in other markets with later GCTs. Using a state-of-the-art ensemble of models, we show significant improvements of 22% and 9% in forecast accuracy in the Belgian (BE) and Swedish bidding zones (SE3) respectively, when including price data from interconnected markets with earlier GCT (Germany-Luxembourg, Austria, and Switzerland). This improvement holds for both general as well as extreme market conditions. Our analysis also yields further important insights: frequent model recalibration is necessary for maximum accuracy but comes at substantial additional computational costs, and using data from more markets does not always lead to better performance - a fact we delve deeper into with interpretability analysis of the forecast models. Overall, these findings provide valuable guidance for market participants and decision-makers aiming to optimize bidding strategies within increasingly interconnected and volatile European energy markets.
nan
Article 1133
Title@2025-07-17 (4): Approximation Rates for Shallow ReLU$^k$ Neural Networks on Sobolev Spaces via the Radon Transform
Title: Approximation Rates for Shallow ReLU$^k$ Neural Networks on Sobolev Spaces via the Radon Transform | Annäherungssätze für Shallow ReLU$^k$ Neurale Netze auf Sobolev-Räumen über die Radon-Transformation | Sobolev空间的浅光RELU$QK$美元神经网络通过拉子变换的近似率 2408.10996v2 |
Authors (3): Tong Mao, Jonathan W. Siegel, Jinchao Xu
Let $\Omega\subset \mathbb{R}^d$ be a bounded domain. We consider the problem of how efficiently shallow neural networks with the ReLU$^k$ activation function can approximate functions from Sobolev spaces $W^s(L_p(\Omega))$ with error measured in the $L_q(\Omega)$-norm. Utilizing the Radon transform and recent results from discrepancy theory, we provide a simple proof of nearly optimal approximation rates in a variety of cases, including when $q\leq p$, $p\geq 2$, and $s \leq k + (d+1)/2$. The rates we derive are optimal up to logarithmic factors, and significantly generalize existing results. An interesting consequence is that the adaptivity of shallow ReLU$^k$ neural networks enables them to obtain optimal approximation rates for smoothness up to order $s = k + (d+1)/2$, even though they represent piecewise polynomials of fixed degree $k$.
nan
Article 1134
Title@2025-07-17 (4): The carbon cost of materials discovery: Can machine learning really accelerate the discovery of new photovoltaics?
Title: The carbon cost of materials discovery: Can machine learning really accelerate the discovery of new photovoltaics? | Die CO2-Kosten der Materialentdeckung: Kann maschinelles Lernen die Entdeckung neuer Photovoltaik wirklich beschleunigen? | 材料发现的碳成本:机器学习能否真正加速新光伏发电的发现? 2507.13246v1 |
Authors (2): Matthew Walker, Keith T. Butler
Computational screening has become a powerful complement to experimental efforts in the discovery of high-performance photovoltaic (PV) materials. Most workflows rely on density functional theory (DFT) to estimate electronic and optical properties relevant to solar energy conversion. Although more efficient than laboratory-based methods, DFT calculations still entail substantial computational and environmental costs. Machine learning (ML) models have recently gained attention as surrogates for DFT, offering drastic reductions in resource use with competitive predictive performance. In this study, we reproduce a canonical DFT-based workflow to estimate the maximum efficiency limit and progressively replace its components with ML surrogates. By quantifying the CO$_2$ emissions associated with each computational strategy, we evaluate the trade-offs between predictive efficacy and environmental cost. Our results reveal multiple hybrid ML/DFT strategies that optimize different points along the accuracy–emissions front. We find that direct prediction of scalar quantities, such as maximum efficiency, is significantly more tractable than using predicted absorption spectra as an intermediate step. Interestingly, ML models trained on DFT data can outperform DFT workflows using alternative exchange–correlation functionals in screening applications, highlighting the consistency and utility of data-driven approaches. We also assess strategies to improve ML-driven screening through expanded datasets and improved model architectures tailored to PV-relevant features. This work provides a quantitative framework for building low-emission, high-throughput discovery pipelines.
nan
Article 1135
Title@2025-07-17 (4): VectorFit : Adaptive Singular & Bias Vector Fine-Tuning of Pre-trained Foundation Models
Title: VectorFit : Adaptive Singular & Bias Vector Fine-Tuning of Pre-trained Foundation Models | VectorFit : Adaptive Singular & Bias Vector Fine-Tuning von vortrainierten Foundation-Modellen | 矢量Fit:培训前基金会模型的适应性单单项和比亚斯矢量微调 2503.19530v2 |
Authors (3): Suhas G Hegde, Shilpy Kaur, Aruna Tiwari
Popular PEFT methods reduce trainable parameter count for fine-tuning by parameterizing new low-rank or sparse trainable weights in parallel to the frozen pre-trained weights $W$. However, these weights are trained from scratch, and there exists a performance gap between these methods and full fine-tuning, especially in low-budget settings. We introduce VectorFit, a new way of parameterization that efficiently utilizes the existing knowledge embedded in $W$ by adaptively training their singular vectors and biases. We show that utilizing the structural and transformational properties of $W$ in this way can lead to high-rank incremental weight matrices $\Delta W$, comparable to that of full fine-tuning. VectorFit delivers superior results with \textbf{9$\boldsymbol\times$} fewer trainable parameters than the leading PEFT methods. Through comprehensive experiments across 19 datasets covering a wide range of language and vision tasks such as natural language understanding and generation, question answering, image classification, and image generation, we demonstrate that VectorFit surpasses baselines in terms of performance as a function of parameter-efficiency.
nan
Article 1136
Title@2025-07-17 (4): Multiple-Frequencies Population-Based Training
Title: Multiple-Frequencies Population-Based Training | Mehrfachhäufigkeiten bevölkerungsbasierte Ausbildung | 以人口为基础的培训 2506.03225v2 |
Authors (6): Waël Doulazmi, Auguste Lehuger, Marin Toromanoff, Valentin Charraut, Thibault Buhet, Fabien Moutarde
Reinforcement Learning’s high sensitivity to hyperparameters is a source of instability and inefficiency, creating significant challenges for practitioners. Hyperparameter Optimization (HPO) algorithms have been developed to address this issue, among them Population-Based Training (PBT) stands out for its ability to generate hyperparameters schedules instead of fixed configurations. PBT trains a population of agents, each with its own hyperparameters, frequently ranking them and replacing the worst performers with mutations of the best agents. These intermediate selection steps can cause PBT to focus on short-term improvements, leading it to get stuck in local optima and eventually fall behind vanilla Random Search over longer timescales. This paper studies how this greediness issue is connected to the choice of evolution frequency, the rate at which the selection is done. We propose Multiple-Frequencies Population-Based Training (MF-PBT), a novel HPO algorithm that addresses greediness by employing sub-populations, each evolving at distinct frequencies. MF-PBT introduces a migration process to transfer information between sub-populations, with an asymmetric design to balance short and long-term optimization. Extensive experiments on the Brax suite demonstrate that MF-PBT improves sample efficiency and long-term performance, even without actually tuning hyperparameters.
nan
Article 1137
Title@2025-07-17 (4): Computational-Statistical Tradeoffs from NP-hardness
Title: Computational-Statistical Tradeoffs from NP-hardness | Computational-Statistical Tradeoffs von NP-Härte | 对NP-硬度的计算-统计取舍 2507.13222v1 |
Authors (4): Guy Blanc, Caleb Koch, Carmen Strassle, Li-Yang Tan
A central question in computer science and statistics is whether efficient algorithms can achieve the information-theoretic limits of statistical problems. Many computational-statistical tradeoffs have been shown under average-case assumptions, but since statistical problems are average-case in nature, it has been a challenge to base them on standard worst-case assumptions. In PAC learning where such tradeoffs were first studied, the question is whether computational efficiency can come at the cost of using more samples than information-theoretically necessary. We base such tradeoffs on $\mathsf{NP}$-hardness and obtain: $\circ$ Sharp computational-statistical tradeoffs assuming $\mathsf{NP}$ requires exponential time: For every polynomial $p(n)$, there is an $n$-variate class $C$ with VC dimension $1$ such that the sample complexity of time-efficiently learning $C$ is $\Theta(p(n))$. $\circ$ A characterization of $\mathsf{RP}$ vs. $\mathsf{NP}$ in terms of learning: $\mathsf{RP} = \mathsf{NP}$ iff every $\mathsf{NP}$-enumerable class is learnable with $O(\mathrm{VCdim}(C))$ samples in polynomial time. The forward implication has been known since (Pitt and Valiant, 1988); we prove the reverse implication. Notably, all our lower bounds hold against improper learners. These are the first $\mathsf{NP}$-hardness results for improperly learning a subclass of polynomial-size circuits, circumventing formal barriers of Applebaum, Barak, and Xiao (2008).
nan
Article 1138
Title@2025-07-17 (4): V-Max: A Reinforcement Learning Framework for Autonomous Driving
Title: V-Max: A Reinforcement Learning Framework for Autonomous Driving | V-Max: Ein Rahmen für verstärktes Lernen für autonomes Fahren | V-Max:加强自主驾驶学习框架 2503.08388v3 |
Authors (4): Valentin Charraut, Waël Doulazmi, Thomas Tournaire, Thibault Buhet
Learning-based decision-making has the potential to enable generalizable Autonomous Driving (AD) policies, reducing the engineering overhead of rule-based approaches. Imitation Learning (IL) remains the dominant paradigm, benefiting from large-scale human demonstration datasets, but it suffers from inherent limitations such as distribution shift and imitation gaps. Reinforcement Learning (RL) presents a promising alternative, yet its adoption in AD remains limited due to the lack of standardized and efficient research frameworks. To this end, we introduce V-Max, an open research framework providing all the necessary tools to make RL practical for AD. V-Max is built on Waymax, a hardware-accelerated AD simulator designed for large-scale experimentation. We extend it using ScenarioNet’s approach, enabling the fast simulation of diverse AD datasets.
nan
Article 1139
Title@2025-07-17 (4): Compositional Discrete Latent Code for High Fidelity, Productive Diffusion Models
Title: Compositional Discrete Latent Code for High Fidelity, Productive Diffusion Models | Komponativ diskreter latenter Code für High Fidelity, Produktive Diffusionsmodelle | 高菲力、生产性扩散模型、生产性扩散模型 2507.12318v2 |
Authors (3): Samuel Lavoie, Michael Noukhovitch, Aaron Courville
We argue that diffusion models’ success in modeling complex distributions is, for the most part, coming from their input conditioning. This paper investigates the representation used to condition diffusion models from the perspective that ideal representations should improve sample fidelity, be easy to generate, and be compositional to allow out-of-training samples generation. We introduce Discrete Latent Code (DLC), an image representation derived from Simplicial Embeddings trained with a self-supervised learning objective. DLCs are sequences of discrete tokens, as opposed to the standard continuous image embeddings. They are easy to generate and their compositionality enables sampling of novel images beyond the training distribution. Diffusion models trained with DLCs have improved generation fidelity, establishing a new state-of-the-art for unconditional image generation on ImageNet. Additionally, we show that composing DLCs allows the image generator to produce out-of-distribution samples that coherently combine the semantics of images in diverse ways. Finally, we showcase how DLCs can enable text-to-image generation by leveraging large-scale pretrained language models. We efficiently finetune a text diffusion language model to generate DLCs that produce novel samples outside of the image generator training distribution.
nan
Article 1140
Title@2025-07-17 (4): Branching Stein Variational Gradient Descent for sampling multimodal distributions
Title: Branching Stein Variational Gradient Descent for sampling multimodal distributions | Verzweigung Stein Variational Gradient Descent für die Probenahme multimodaler Verteilungen | 用于抽样多式联运分销的 2506.13916v2 |
Authors (3): Isaías Bañales, Arturo Jaramillo, Joshué Helí Ricalde-Guerrero
We propose a novel particle-based variational inference method designed to work with multimodal distributions. Our approach, referred to as Branched Stein Variational Gradient Descent (BSVGD), extends the classical Stein Variational Gradient Descent (SVGD) algorithm by incorporating a random branching mechanism that encourages the exploration of the state space. In this work, a theoretical guarantee for the convergence in distribution is presented, as well as numerical experiments to validate the suitability of our algorithm. Performance comparisons between the BSVGD and the SVGD are presented using the Wasserstein distance between samples and the corresponding computational times.
nan
Article 1141
Title@2025-07-17 (4): Relation-Aware Slicing in Cross-Domain Alignment
Title: Relation-Aware Slicing in Cross-Domain Alignment | Verhältnis-Bewusstsein-Slicing in Cross-Domain-Alignment | 跨域对齐中的关系软件切切 2507.13194v1 |
Authors (4): Dhruv Sarkar, Aprameyo Chakrabartty, Anish Chakrabarty, Swagatam Das
The Sliced Gromov-Wasserstein (SGW) distance, aiming to relieve the computational cost of solving a non-convex quadratic program that is the Gromov-Wasserstein distance, utilizes projecting directions sampled uniformly from unit hyperspheres. This slicing mechanism incurs unnecessary computational costs due to uninformative directions, which also affects the representative power of the distance. However, finding a more appropriate distribution over the projecting directions (slicing distribution) is often an optimization problem in itself that comes with its own computational cost. In addition, with more intricate distributions, the sampling itself may be expensive. As a remedy, we propose an optimization-free slicing distribution that provides fast sampling for the Monte Carlo approximation. We do so by introducing the Relation-Aware Projecting Direction (RAPD), effectively capturing the pairwise association of each of two pairs of random vectors, each following their ambient law. This enables us to derive the Relation-Aware Slicing Distribution (RASD), a location-scale law corresponding to sampled RAPDs. Finally, we introduce the RASGW distance and its variants, e.g., IWRASGW (Importance Weighted RASGW), which overcome the shortcomings experienced by SGW. We theoretically analyze its properties and substantiate its empirical prowess using extensive experiments on various alignment tasks.
nan
Article 1142
Title@2025-07-17 (4): GradNetOT: Learning Optimal Transport Maps with GradNets
Title: GradNetOT: Learning Optimal Transport Maps with GradNets | GradNetOT: Optimale Transportkarten mit GradNets lernen | GradNetOT: 与 GradNets一起学习最佳交通地图 2507.13191v1 |
Authors (3): Shreyas Chaudhari, Srinivasa Pranav, José M. F. Moura
Monotone gradient functions play a central role in solving the Monge formulation of the optimal transport problem, which arises in modern applications ranging from fluid dynamics to robot swarm control. When the transport cost is the squared Euclidean distance, Brenier’s theorem guarantees that the unique optimal map is the gradient of a convex function, namely a monotone gradient map, and it satisfies a Monge-Amp`ere equation. In [arXiv:2301.10862] [arXiv:2404.07361], we proposed Monotone Gradient Networks (mGradNets), neural networks that directly parameterize the space of monotone gradient maps. In this work, we leverage mGradNets to directly learn the optimal transport mapping by minimizing a training loss function defined using the Monge-Amp`ere equation. We empirically show that the structural bias of mGradNets facilitates the learning of optimal transport maps and employ our method for a robot swarm control problem.
nan
Article 1143
Title@2025-07-17 (4): Bounding the Worst-class Error: A Boosting Approach
Title: Bounding the Worst-class Error: A Boosting Approach | Den Fehler der schlechtesten Klasse zu überwinden: Ein Boosting-Ansatz | 绕过最坏的错误 : 推动方法 2310.14890v3 |
Authors (4): Yuya Saito, Shinnosuke Matsuo, Seiichi Uchida, Daiki Suehiro
This paper tackles the problem of the worst-class error rate, instead of the standard error rate averaged over all classes. For example, a three-class classification task with class-wise error rates of 10%, 10%, and 40% has a worst-class error rate of 40%, whereas the average is 20% under the class-balanced condition. The worst-class error is important in many applications. For example, in a medical image classification task, it would not be acceptable for the malignant tumor class to have a 40% error rate, while the benign and healthy classes have a 10% error rates. To avoid overfitting in worst-class error minimization using Deep Neural Networks (DNNs), we design a problem formulation for bounding the worst-class error instead of achieving zero worst-class error. Moreover, to correctly bound the worst-class error, we propose a boosting approach which ensembles DNNs. We give training and generalization worst-class-error bound. Experimental results show that the algorithm lowers worst-class test error rates while avoiding overfitting to the training set. This code is available at https://github.com/saito-yuya/Bounding-the-Worst-class-error-A-Boosting-Approach.
nan
Article 1144
Title@2025-07-17 (4): Spectral Bellman Method: Unifying Representation and Exploration in RL
Title: Spectral Bellman Method: Unifying Representation and Exploration in RL | Spektral Bellman-Methode: Vereinheitliche Darstellung und Exploration in RL | 光谱钟门方法:统一代表与探索 2507.13181v1 |
Authors (4): Ofir Nabati, Bo Dai, Shie Mannor, Guy Tennenholtz
The effect of representation has been demonstrated in reinforcement learning, from both theoretical and empirical successes. However, the existing representation learning mainly induced from model learning aspects, misaligning with our RL tasks. This work introduces Spectral Bellman Representation, a novel framework derived from the Inherent Bellman Error (IBE) condition, which aligns with the fundamental structure of Bellman updates across a space of possible value functions, therefore, directly towards value-based RL. Our key insight is the discovery of a fundamental spectral relationship: under the zero-IBE condition, the transformation of a distribution of value functions by the Bellman operator is intrinsically linked to the feature covariance structure. This spectral connection yields a new, theoretically-grounded objective for learning state-action features that inherently capture this Bellman-aligned covariance. Our method requires a simple modification to existing algorithms. We demonstrate that our learned representations enable structured exploration, by aligning feature covariance with Bellman dynamics, and improve overall performance, particularly in challenging hard-exploration and long-horizon credit assignment tasks. Our framework naturally extends to powerful multi-step Bellman operators, further broadening its impact. Spectral Bellman Representation offers a principled and effective path toward learning more powerful and structurally sound representations for value-based reinforcement learning.
nan
Article 1145
Title@2025-07-17 (4): SHIELD: A Secure and Highly Enhanced Integrated Learning for Robust Deepfake Detection against Adversarial Attacks
Title: SHIELD: A Secure and Highly Enhanced Integrated Learning for Robust Deepfake Detection against Adversarial Attacks | SHIELD: Ein sicheres und hochverstärktes integriertes Lernen für robuste Deepfake-Erkennung gegen feindliche Angriffe | SHIELD: 可靠和高度强化的综合学习,以强有力地发现深假,防止反向攻击 2507.13170v1 |
Authors (4): Kutub Uddin, Awais Khan, Muhammad Umar Farooq, Khalid Malik
Audio plays a crucial role in applications like speaker verification, voice-enabled smart devices, and audio conferencing. However, audio manipulations, such as deepfakes, pose significant risks by enabling the spread of misinformation. Our empirical analysis reveals that existing methods for detecting deepfake audio are often vulnerable to anti-forensic (AF) attacks, particularly those attacked using generative adversarial networks. In this article, we propose a novel collaborative learning method called SHIELD to defend against generative AF attacks. To expose AF signatures, we integrate an auxiliary generative model, called the defense (DF) generative model, which facilitates collaborative learning by combining input and output. Furthermore, we design a triplet model to capture correlations for real and AF attacked audios with real-generated and attacked-generated audios using auxiliary generative models. The proposed SHIELD strengthens the defense against generative AF attacks and achieves robust performance across various generative models. The proposed AF significantly reduces the average detection accuracy from 95.49% to 59.77% for ASVspoof2019, from 99.44% to 38.45% for In-the-Wild, and from 98.41% to 51.18% for HalfTruth for three different generative models. The proposed SHIELD mechanism is robust against AF attacks and achieves an average accuracy of 98.13%, 98.58%, and 99.57% in match, and 98.78%, 98.62%, and 98.85% in mismatch settings for the ASVspoof2019, In-the-Wild, and HalfTruth datasets, respectively.
nan
Article 1146
Title@2025-07-17 (4): Orbis: Overcoming Challenges of Long-Horizon Prediction in Driving World Models
Title: Orbis: Overcoming Challenges of Long-Horizon Prediction in Driving World Models | Orbis: Herausforderungen der Langzeit-Vorhersage bei treibenden Weltmodellen überwinden | Orbis:克服在推动世界模式方面长期预测的挑战 2507.13162v1 |
Authors (5): Arian Mousakhan, Sudhanshu Mittal, Silvio Galesso, Karim Farid, Thomas Brox
Existing world models for autonomous driving struggle with long-horizon generation and generalization to challenging scenarios. In this work, we develop a model using simple design choices, and without additional supervision or sensors, such as maps, depth, or multiple cameras. We show that our model yields state-of-the-art performance, despite having only 469M parameters and being trained on 280h of video data. It particularly stands out in difficult scenarios like turning maneuvers and urban traffic. We test whether discrete token models possibly have advantages over continuous models based on flow matching. To this end, we set up a hybrid tokenizer that is compatible with both approaches and allows for a side-by-side comparison. Our study concludes in favor of the continuous autoregressive model, which is less brittle on individual design choices and more powerful than the model built on discrete tokens. Code, models and qualitative results are publicly available at https://lmb-freiburg.github.io/orbis.github.io/.
nan
Article 1147
Title@2025-07-17 (4): Inverse Reinforcement Learning Meets Large Language Model Post-Training: Basics, Advances, and Opportunities
Title: Inverse Reinforcement Learning Meets Large Language Model Post-Training: Basics, Advances, and Opportunities | Inverse Stärkung Lernen trifft auf großes Sprachmodell Post-Training: Grundlagen, Fortschritte und Chancen | 培训后培训:基础、进步和机会 2507.13158v1 |
Authors (2): Hao Sun, Mihaela van der Schaar
In the era of Large Language Models (LLMs), alignment has emerged as a fundamental yet challenging problem in the pursuit of more reliable, controllable, and capable machine intelligence. The recent success of reasoning models and conversational AI systems has underscored the critical role of reinforcement learning (RL) in enhancing these systems, driving increased research interest at the intersection of RL and LLM alignment. This paper provides a comprehensive review of recent advances in LLM alignment through the lens of inverse reinforcement learning (IRL), emphasizing the distinctions between RL techniques employed in LLM alignment and those in conventional RL tasks. In particular, we highlight the necessity of constructing neural reward models from human data and discuss the formal and practical implications of this paradigm shift. We begin by introducing fundamental concepts in RL to provide a foundation for readers unfamiliar with the field. We then examine recent advances in this research agenda, discussing key challenges and opportunities in conducting IRL for LLM alignment. Beyond methodological considerations, we explore practical aspects, including datasets, benchmarks, evaluation metrics, infrastructure, and computationally efficient training and inference techniques. Finally, we draw insights from the literature on sparse-reward RL to identify open questions and potential research directions. By synthesizing findings from diverse studies, we aim to provide a structured and critical overview of the field, highlight unresolved challenges, and outline promising future directions for improving LLM alignment through RL and IRL techniques.
nan
Article 1148
Title@2025-07-17 (4): AI-ming backwards: Vanishing archaeological landscapes in Mesopotamia and automatic detection of sites on CORONA imagery
Title: AI-ming backwards: Vanishing archaeological landscapes in Mesopotamia and automatic detection of sites on CORONA imagery | KI-Ming rückwärts: Auslöschende archäologische Landschaften in Mesopotamien und automatische Erkennung von Stätten auf CORONA-Bildern | AI-Ming倒向:美索不达米亚消失的考古景观和自动探测CORONA图像上的遗址 2507.13420v1 |
Authors (4): Alessandro Pistola, Valentina Orru’, Nicolo’ Marchetti, Marco Roccetti
By upgrading an existing deep learning model with the knowledge provided by one of the oldest sets of grayscale satellite imagery, known as CORONA, we improved the AI model attitude towards the automatic identification of archaeological sites in an environment which has been completely transformed in the last five decades, including the complete destruction of many of those same sites. The initial Bing based convolutional network model was retrained using CORONA satellite imagery for the district of Abu Ghraib, west of Baghdad, central Mesopotamian floodplain. The results were twofold and surprising. First, the detection precision obtained on the area of interest increased sensibly: in particular, the Intersection over Union (IoU) values, at the image segmentation level, surpassed 85 percent, while the general accuracy in detecting archeological sites reached 90 percent. Second, our retrained model allowed the identification of four new sites of archaeological interest (confirmed through field verification), previously not identified by archaeologists with traditional techniques. This has confirmed the efficacy of using AI techniques and the CORONA imagery from the 1960 to discover archaeological sites currently no longer visible, a concrete breakthrough with significant consequences for the study of landscapes with vanishing archaeological evidence induced by anthropization
nan
Article 1149
Title@2025-07-17 (4): NonverbalTTS: A Public English Corpus of Text-Aligned Nonverbal Vocalizations with Emotion Annotations for Text-to-Speech
Title: NonverbalTTS: A Public English Corpus of Text-Aligned Nonverbal Vocalizations with Emotion Annotations for Text-to-Speech | NonverbalTTS: A Public English Corpus of Text-Aligned Nonverbal Vocalizations with Emotion Annotations for Text-to-Speech | 非口头翻译:一个以文本为主的非口头演唱的英文公共单位,带有文字对语音情感说明 2507.13155v1 |
Authors (3): Maksim Borisov, Egor Spirin, Daria Diatlova
Current expressive speech synthesis models are constrained by the limited availability of open-source datasets containing diverse nonverbal vocalizations (NVs). In this work, we introduce NonverbalTTS (NVTTS), a 17-hour open-access dataset annotated with 10 types of NVs (e.g., laughter, coughs) and 8 emotional categories. The dataset is derived from popular sources, VoxCeleb and Expresso, using automated detection followed by human validation. We propose a comprehensive pipeline that integrates automatic speech recognition (ASR), NV tagging, emotion classification, and a fusion algorithm to merge transcriptions from multiple annotators. Fine-tuning open-source text-to-speech (TTS) models on the NVTTS dataset achieves parity with closed-source systems such as CosyVoice2, as measured by both human evaluation and automatic metrics, including speaker similarity and NV fidelity. By releasing NVTTS and its accompanying annotation guidelines, we address a key bottleneck in expressive TTS research. The dataset is available at https://huggingface.co/datasets/deepvk/NonverbalTTS.
nan
Article 1150
Title@2025-07-17 (4): NGTM: Substructure-based Neural Graph Topic Model for Interpretable Graph Generation
Title: NGTM: Substructure-based Neural Graph Topic Model for Interpretable Graph Generation | NGTM: Substrukturbasiertes Neural Graph Topic Model für die interpretierbare Graphengenerierung | NGTM: 以次级结构为基础的可解释图形生成神经图专题模型 2507.13133v1 |
Authors (3): Yuanxin Zhuang, Dazhong Shen, Ying Sun
Graph generation plays a pivotal role across numerous domains, including molecular design and knowledge graph construction. Although existing methods achieve considerable success in generating realistic graphs, their interpretability remains limited, often obscuring the rationale behind structural decisions. To address this challenge, we propose the Neural Graph Topic Model (NGTM), a novel generative framework inspired by topic modeling in natural language processing. NGTM represents graphs as mixtures of latent topics, each defining a distribution over semantically meaningful substructures, which facilitates explicit interpretability at both local and global scales. The generation process transparently integrates these topic distributions with a global structural variable, enabling clear semantic tracing of each generated graph. Experiments demonstrate that NGTM achieves competitive generation quality while uniquely enabling fine-grained control and interpretability, allowing users to tune structural features or induce biological properties through topic-level adjustments.
nan
Article 1151
Title@2025-07-17 (4): PINT: Physics-Informed Neural Time Series Models with Applications to Long-term Inference on WeatherBench 2m-Temperature Data
Title: PINT: Physics-Informed Neural Time Series Models with Applications to Long-term Inference on WeatherBench 2m-Temperature Data | PINT: Physik-informierte Neuralzeit-Serienmodelle mit Anwendungen zur langfristigen Schlussfolgerung auf WeatherBench 2m-Temperaturdaten | PINT: 应用气象区2m-温度数据长期推断的物理化神经时间序列模型 2502.04018v2 |
Authors (3): Keonvin Park, Jisu Kim, Jaemin Seo
This paper introduces PINT (Physics-Informed Neural Time Series Models), a framework that integrates physical constraints into neural time series models to improve their ability to capture complex dynamics. We apply PINT to the ERA5 WeatherBench dataset, focusing on long-term forecasting of 2m-temperature data. PINT incorporates the Simple Harmonic Oscillator Equation as a physics-informed prior, embedding its periodic dynamics into RNN, LSTM, and GRU architectures. This equation’s analytical solutions (sine and cosine functions) facilitate rigorous evaluation of the benefits of incorporating physics-informed constraints. By benchmarking against a linear regression baseline derived from its exact solutions, we quantify the impact of embedding physical principles in data-driven models. Unlike traditional time series models that rely on future observations, PINT is designed for practical forecasting. Using only the first 90 days of observed data, it iteratively predicts the next two years, addressing challenges posed by limited real-time updates. Experiments on the WeatherBench dataset demonstrate PINT’s ability to generalize, capture periodic trends, and align with physical principles. This study highlights the potential of physics-informed neural models in bridging machine learning and interpretable climate applications. Our models and datasets are publicly available on GitHub: https://github.com/KV-Park.
nan
Article 1152
Title@2025-07-17 (4): Search for Z/2 eigenfunctions on the sphere using machine learning
Title: Search for Z/2 eigenfunctions on the sphere using machine learning | Suche nach Z/2 Eigenfunktionen auf der Kugel mittels maschinellem Lernen | 使用机器学习在球体上搜索 Z/2 电子元件 2507.13122v1 |
Authors (2): Andriy Haydys, Willem Adriaan Salm
We use machine learning to search for examples of Z/2 eigenfunctions on the 2-sphere. For this we created a multivalued version of a feedforward deep neural network, and we implemented it using the JAX library. We found Z/2 eigenfunctions for three cases: In the first two cases we fixed the branch points at the vertices of a tetrahedron and at a cube respectively. In a third case, we allowed the AI to move the branch points around and, in the end, it positioned the branch points at the vertices of a squashed tetrahedron.
nan
Article 1153
Title@2025-07-17 (4): RS-TinyNet: Stage-wise Feature Fusion Network for Detecting Tiny Objects in Remote Sensing Images
Title: RS-TinyNet: Stage-wise Feature Fusion Network for Detecting Tiny Objects in Remote Sensing Images | RS-TinyNet: Stage-wise Feature Fusion Network zur Erkennung winziger Objekte in Bildern der Fernerkundung | RS-TinyNet:在遥感图像中探测小物体的分阶段地貌融合网络 2507.13120v1 |
Authors (3): Xiaozheng Jiang, Wei Zhang, Xuerui Mao
Detecting tiny objects in remote sensing (RS) imagery has been a long-standing challenge due to their extremely limited spatial information, weak feature representations, and dense distributions across complex backgrounds. Despite numerous efforts devoted, mainstream detectors still underperform in such scenarios. To bridge this gap, we introduce RS-TinyNet, a multi-stage feature fusion and enhancement model explicitly tailored for RS tiny object detection in various RS scenarios. RS-TinyNet comes with two novel designs: tiny object saliency modeling and feature integrity reconstruction. Guided by these principles, we design three step-wise feature enhancement modules. Among them, the multi-dimensional collaborative attention (MDCA) module employs multi-dimensional attention to enhance the saliency of tiny objects. Additionally, the auxiliary reversible branch (ARB) and a progressive fusion detection head (PFDH) module are introduced to preserve information flow and fuse multi-level features to bridge semantic gaps and retain structural detail. Comprehensive experiments on public RS dataset AI-TOD show that our RS-TinyNet surpasses existing state-of-the-art (SOTA) detectors by 4.0% AP and 6.5% AP75. Evaluations on DIOR benchmark dataset further validate its superior detection performance in diverse RS scenarios. These results demonstrate that the proposed multi-stage feature fusion strategy offers an effective and practical solution for tiny object detection in complex RS environments.
nan
Article 1154
Title@2025-07-17 (4): Generative AI Models for Learning Flow Maps of Stochastic Dynamical Systems in Bounded Domains
Title: Generative AI Models for Learning Flow Maps of Stochastic Dynamical Systems in Bounded Domains | Generative KI-Modelle zum Lernen von Flusskarten stochastischer dynamischer Systeme in gebundenen Bereichen | 生成 “ AI “ 模块,用于生成 “ 封闭域 “ 内存储动态系统动态系统的学习流程图 “ 模型 2507.15990v1 |
Authors (5): Minglei Yang, Yanfang Liu, Diego del-Castillo-Negrete, Yanzhao Cao, Guannan Zhang
Simulating stochastic differential equations (SDEs) in bounded domains, presents significant computational challenges due to particle exit phenomena, which requires accurate modeling of interior stochastic dynamics and boundary interactions. Despite the success of machine learning-based methods in learning SDEs, existing learning methods are not applicable to SDEs in bounded domains because they cannot accurately capture the particle exit dynamics. We present a unified hybrid data-driven approach that combines a conditional diffusion model with an exit prediction neural network to capture both interior stochastic dynamics and boundary exit phenomena. Our ML model consists of two major components: a neural network that learns exit probabilities using binary cross-entropy loss with rigorous convergence guarantees, and a training-free diffusion model that generates state transitions for non-exiting particles using closed-form score functions. The two components are integrated through a probabilistic sampling algorithm that determines particle exit at each time step and generates appropriate state transitions. The performance of the proposed approach is demonstrated via three test cases: a one-dimensional simplified problem for theoretical verification, a two-dimensional advection-diffusion problem in a bounded domain, and a three-dimensional problem of interest to magnetically confined fusion plasmas.
nan
Article 1155
Title@2025-07-17 (4): Task-Circuit Quantization: Leveraging Knowledge Localization and Interpretability for Compression
Title: Task-Circuit Quantization: Leveraging Knowledge Localization and Interpretability for Compression | Task-Circuit Quantization: Nutzung von Wissen Lokalisierung und Dolmetschbarkeit für Komprimierung | 任务-环境环境定量:利用知识本地化和压缩解释 2504.07389v2 |
Authors (4): Hanqi Xiao, Yi-Lin Sung, Elias Stengel-Eskin, Mohit Bansal
Post-training quantization (PTQ) reduces a model’s memory footprint by mapping full precision weights into low bit weights without costly retraining, but can degrade its downstream performance especially in low 2- to 3-bit settings. We develop a new mixed-precision PTQ approach, Task-Circuit Quantization (TaCQ), that draws parallels to automated circuit discovery, directly conditioning the quantization process on specific weight circuits – which we define as sets of weights associated with downstream task performance. These weights are kept as 16-bit weights, while others are quantized, maintaining performance while only adding a marginal memory cost. Specifically, TaCQ contrasts unquantized model weights with a uniformly-quantized model to estimate the expected change in weights due to quantization and uses gradient information to predict the resulting impact on task performance, allowing us to preserve task-specific weights. We compare TaCQ-based quantization to existing mixed-precision quantization methods when conditioning both on general-purpose and task-specific data. Across QA, math reasoning, and text-to-SQL tasks for both Llama-3 and Qwen2.5, we find that TaCQ outperforms baselines using the same calibration data and a lower weight budget, achieving major improvements in the 2 and 3-bit regime. With only 3.1 bits we are able to recover 96% of Llama-3-8B-Instruct’s unquantized 16-bit MMLU performance, obtaining a 5.25% absolute improvement over SPQR. We also observe consistently large gains over existing methods in the 2-bit regime, with an average gain of 14.74% over the strongest baseline, SliM-LLM. Moreover, we observe a 7.20% gain without conditioning on specific tasks, showing TaCQ’s ability to identify important weights is not limited to task-conditioned settings.
nan
Article 1156
Title@2025-07-17 (4): Deep Learning-Based Fetal Lung Segmentation from Diffusion-weighted MRI Images and Lung Maturity Evaluation for Fetal Growth Restriction
Title: Deep Learning-Based Fetal Lung Segmentation from Diffusion-weighted MRI Images and Lung Maturity Evaluation for Fetal Growth Restriction | Deep Learning-based Fetal Lung Segmentation aus diffusionsgewichteten MRT-Bildern und Lungenreife-Evaluierung für fetale Wachstumsbeschränkung | 从传播加权磁RI图像和对胎儿生长限制的肺期评估中分离出的深学习-基于学习的胎儿肺部切片 2507.13106v1 |
Authors (10): Zhennan Xiao, Katharine Brudkiewicz, Zhen Yuan, Rosalind Aughwane, Magdalena Sokolska, Joanna Chappell, Trevor Gaunt, Anna L. David, Andrew P. King, Andrew Melbourne
Fetal lung maturity is a critical indicator for predicting neonatal outcomes and the need for post-natal intervention, especially for pregnancies affected by fetal growth restriction. Intra-voxel incoherent motion analysis has shown promising results for non-invasive assessment of fetal lung development, but its reliance on manual segmentation is time-consuming, thus limiting its clinical applicability. In this work, we present an automated lung maturity evaluation pipeline for diffusion-weighted magnetic resonance images that consists of a deep learning-based fetal lung segmentation model and a model-fitting lung maturity assessment. A 3D nnU-Net model was trained on manually segmented images selected from the baseline frames of 4D diffusion-weighted MRI scans. The segmentation model demonstrated robust performance, yielding a mean Dice coefficient of 82.14%. Next, voxel-wise model fitting was performed based on both the nnU-Net-predicted and manual lung segmentations to quantify IVIM parameters reflecting tissue microstructure and perfusion. The results suggested no differences between the two. Our work shows that a fully automated pipeline is possible for supporting fetal lung maturity assessment and clinical decision-making.
nan
Article 1157
Title@2025-07-17 (4): SemCSE: Semantic Contrastive Sentence Embeddings Using LLM-Generated Summaries For Scientific Abstracts
Title: SemCSE: Semantic Contrastive Sentence Embeddings Using LLM-Generated Summaries For Scientific Abstracts | SemCSE: Semantische kontrastive Satzeinbettungen mit LLM-generierten Zusammenfassungen für wissenschaftliche Abstracts | SEMCSE: 使用LLM创制的科学摘要摘要 2507.13105v1 |
Authors (2): Marc Brinner, Sina Zarriess
We introduce SemCSE, an unsupervised method for learning semantic embeddings of scientific texts. Building on recent advances in contrastive learning for text embeddings, our approach leverages LLM-generated summaries of scientific abstracts to train a model that positions semantically related summaries closer together in the embedding space. This resulting objective ensures that the model captures the true semantic content of a text, in contrast to traditional citation-based approaches that do not necessarily reflect semantic similarity. To validate this, we propose a novel benchmark designed to assess a model’s ability to understand and encode the semantic content of scientific texts, demonstrating that our method enforces a stronger semantic separation within the embedding space. Additionally, we evaluate SemCSE on the comprehensive SciRepEval benchmark for scientific text embeddings, where it achieves state-of-the-art performance among models of its size, thus highlighting the benefits of a semantically focused training approach.
nan
Article 1158
Title@2025-07-17 (4): Unified Triplet-Level Hallucination Evaluation for Large Vision-Language Models
Title: Unified Triplet-Level Hallucination Evaluation for Large Vision-Language Models | Unified Triplet-Level Halluzination Evaluation für große Vision-Sprache Modelle | 大型视觉语言模型统一三维级幻觉评价 2410.23114v4 |
Authors (4): Junjie Wu, Tsz Ting Chung, Kai Chen, Dit-Yan Yeung
Despite the outstanding performance in vision-language reasoning, Large Vision-Language Models (LVLMs) might generate hallucinated contents that do not exist in the given image. Most existing LVLM hallucination benchmarks are constrained to evaluate the object-related hallucinations. However, the potential hallucination on the relations between two objects, i.e., relation hallucination, still lacks investigation. To remedy that, we design a unified framework to measure the object and relation hallucination in LVLMs simultaneously. The core idea of our framework is to evaluate hallucinations via (object, relation, object) triplets extracted from LVLMs’ responses, making it easily generalizable to different vision-language tasks. Based on our framework, we further introduce Tri-HE, a novel Triplet-level Hallucination Evaluation benchmark which can be used to study both object and relation hallucination at the same time. With comprehensive evaluations on Tri-HE, we observe that the relation hallucination issue is even more serious than object hallucination among existing LVLMs, highlighting a previously neglected problem towards reliable LVLMs. Moreover, based on our findings, we design a simple training-free approach that effectively mitigates hallucinations for LVLMs. Our dataset and code for the reproduction of our experiments are available publicly at https://github.com/wujunjie1998/Tri-HE.
nan
Article 1159
Title@2025-07-17 (4): Uni-Instruct: One-step Diffusion Model through Unified Diffusion Divergence Instruction
Title: Uni-Instruct: One-step Diffusion Model through Unified Diffusion Divergence Instruction | Uni-Instruct: Einstufiges Diffusionsmodell durch Unified Diffusion Divergence Instruction | Uni- Instruct: 通过统一扩散分散指令单步扩散模型 2505.20755v2 |
Authors (6): Yifei Wang, Weimin Bai, Colin Zhang, Debing Zhang, Weijian Luo, He Sun
In this paper, we unify more than 10 existing one-step diffusion distillation approaches, such as Diff-Instruct, DMD, SIM, SiD, $f$-distill, etc, inside a theory-driven framework which we name the \textbf{\emph{Uni-Instruct}}. Uni-Instruct is motivated by our proposed diffusion expansion theory of the $f$-divergence family. Then we introduce key theories that overcome the intractability issue of the original expanded $f$-divergence, resulting in an equivalent yet tractable loss that effectively trains one-step diffusion models by minimizing the expanded $f$-divergence family. The novel unification introduced by Uni-Instruct not only offers new theoretical contributions that help understand existing approaches from a high-level perspective but also leads to state-of-the-art one-step diffusion generation performances. On the CIFAR10 generation benchmark, Uni-Instruct achieves record-breaking Frechet Inception Distance (FID) values of \textbf{\emph{1.46}} for unconditional generation and \textbf{\emph{1.38}} for conditional generation. On the ImageNet-$64\times 64$ generation benchmark, Uni-Instruct achieves a new SoTA one-step generation FID of \textbf{\emph{1.02}}, which outperforms its 79-step teacher diffusion with a significant improvement margin of 1.33 (1.02 vs 2.35). We also apply Uni-Instruct on broader tasks like text-to-3D generation. For text-to-3D generation, Uni-Instruct gives decent results, which slightly outperforms previous methods, such as SDS and VSD, in terms of both generation quality and diversity. Both the solid theoretical and empirical contributions of Uni-Instruct will potentially help future studies on one-step diffusion distillation and knowledge transferring of diffusion models.
nan
Article 1160
Title@2025-07-17 (4): Unsupervised Ground Metric Learning
Title: Unsupervised Ground Metric Learning | Unüberwachtes metrisches Lernen am Boden | 不受监督的地面计量学习 2507.13094v1 |
Authors (4): Janis Auffenberg, Jonas Bresch, Oleh Melnyk, Gabriele Steidl
Data classification without access to labeled samples remains a challenging problem. It usually depends on an appropriately chosen distance between features, a topic addressed in metric learning. Recently, Huizing, Cantini and Peyr'e proposed to simultaneously learn optimal transport (OT) cost matrices between samples and features of the dataset. This leads to the task of finding positive eigenvectors of a certain nonlinear function that maps cost matrices to OT distances. Having this basic idea in mind, we consider both the algorithmic and the modeling part of unsupervised metric learning. First, we examine appropriate algorithms and their convergence. In particular, we propose to use the stochastic random function iteration algorithm and prove that it converges linearly for our setting, although our operators are not paracontractive as it was required for convergence so far. Second, we ask the natural question if the OT distance can be replaced by other distances. We show how Mahalanobis-like distances fit into our considerations. Further, we examine an approach via graph Laplacians. In contrast to the previous settings, we have just to deal with linear functions in the wanted matrices here, so that simple algorithms from linear algebra can be applied.
nan
Article 1161
Title@2025-07-17 (4): Truthful Elicitation of Imprecise Forecasts
Title: Truthful Elicitation of Imprecise Forecasts | Wahre Botschaft von ungenauen Prognosen | 以真真真真真真真真真切的易感简易预报 2503.16395v4 |
Authors (3): Anurag Singh, Siu Lun Chau, Krikamol Muandet
The quality of probabilistic forecasts is crucial for decision-making under uncertainty. While proper scoring rules incentivize truthful reporting of precise forecasts, they fall short when forecasters face epistemic uncertainty about their beliefs, limiting their use in safety-critical domains where decision-makers (DMs) prioritize proper uncertainty management. To address this, we propose a framework for scoring imprecise forecasts – forecasts given as a set of beliefs. Despite existing impossibility results for deterministic scoring rules, we enable truthful elicitation by drawing connection to social choice theory and introducing a two-way communication framework where DMs first share their aggregation rules (e.g., averaging or min-max) used in downstream decisions for resolving forecast ambiguity. This, in turn, helps forecasters resolve indecision during elicitation. We further show that truthful elicitation of imprecise forecasts is achievable using proper scoring rules randomized over the aggregation procedure. Our approach allows DM to elicit and integrate the forecaster’s epistemic uncertainty into their decision-making process, thus improving credibility.
nan
Article 1162
Title@2025-07-17 (4): Uncertainty-Aware Cross-Modal Knowledge Distillation with Prototype Learning for Multimodal Brain-Computer Interfaces
Title: Uncertainty-Aware Cross-Modal Knowledge Distillation with Prototype Learning for Multimodal Brain-Computer Interfaces | Ungewissheitsbewusste Cross-Modal Knowledge Destillation mit Prototypenlernen für multimodale Gehirn-Computer-Schnittstellen | 与多式脑-计算机界面的原型学习相结合的不确定-软件软件的跨模式知识蒸馏 2507.13092v1 |
Authors (3): Hyo-Jeong Jang, Hye-Bin Shin, Seong-Whan Lee
Electroencephalography (EEG) is a fundamental modality for cognitive state monitoring in brain-computer interfaces (BCIs). However, it is highly susceptible to intrinsic signal errors and human-induced labeling errors, which lead to label noise and ultimately degrade model performance. To enhance EEG learning, multimodal knowledge distillation (KD) has been explored to transfer knowledge from visual models with rich representations to EEG-based models. Nevertheless, KD faces two key challenges: modality gap and soft label misalignment. The former arises from the heterogeneous nature of EEG and visual feature spaces, while the latter stems from label inconsistencies that create discrepancies between ground truth labels and distillation targets. This paper addresses semantic uncertainty caused by ambiguous features and weakly defined labels. We propose a novel cross-modal knowledge distillation framework that mitigates both modality and label inconsistencies. It aligns feature semantics through a prototype-based similarity module and introduces a task-specific distillation head to resolve label-induced inconsistency in supervision. Experimental results demonstrate that our approach improves EEG-based emotion regression and classification performance, outperforming both unimodal and multimodal baselines on a public multimodal dataset. These findings highlight the potential of our framework for BCI applications.
nan
Article 1163
Title@2025-07-17 (4): Super Resolution for Renewable Energy Resource Data With Wind From Reanalysis Data and Application to Ukraine
Title: Super Resolution for Renewable Energy Resource Data With Wind From Reanalysis Data and Application to Ukraine | Super Auflösung für erneuerbare Energien Ressourcendaten mit Wind Von der Reanalyse Daten und Anwendung in die Ukraine | 乌克兰可再生能源资源数据利用风向再分析数据和应用于乌克兰的超级分辨率 2407.19086v2 |
Authors (7): Brandon N. Benton, Grant Buster, Pavlo Pinchuk, Andrew Glaws, Ryan N. King, Galen Maclaurin, Ilya Chernyakhovskiy
With a potentially increasing share of the electricity grid relying on wind to provide generating capacity and energy, there is an expanding global need for historically accurate, spatiotemporally continuous, high-resolution wind data. Conventional downscaling methods for generating these data based on numerical weather prediction have a high computational burden and require extensive tuning for historical accuracy. In this work, we present a novel deep learning-based spatiotemporal downscaling method using generative adversarial networks (GANs) for generating historically accurate high-resolution wind resource data from the European Centre for Medium-Range Weather Forecasting Reanalysis version 5 data (ERA5). In contrast to previous approaches, which used coarsened high-resolution data as low-resolution training data, we use true low-resolution simulation outputs. We show that by training a GAN model with ERA5 as the low-resolution input and Wind Integration National Dataset Toolkit (WTK) data as the high-resolution target, we achieved results comparable in historical accuracy and spatiotemporal variability to conventional dynamical downscaling. This GAN-based downscaling method additionally reduces computational costs over dynamical downscaling by two orders of magnitude. We applied this approach to downscale 30 km, hourly ERA5 data to 2 km, 5 min wind data for January 2000 through December 2023 at multiple hub heights over Ukraine, Moldova, and part of Romania. This 24-year data record is the first member of the super-resolution for renewable energy resource data with wind from the reanalysis data dataset (Sup3rWind).
nan
Article 1164
Title@2025-07-17 (4): Soft-ECM: An extension of Evidential C-Means for complex data
Title: Soft-ECM: An extension of Evidential C-Means for complex data | Soft-ECM: Erweiterung von Evidential C-Means für komplexe Daten | 软体-电子内容软体:复杂数据的证据性C-方法的扩展 2507.13417v1 |
Authors (3): Armel Soubeiga, Thomas Guyet, Violaine Antoine
Clustering based on belief functions has been gaining increasing attention in the machine learning community due to its ability to effectively represent uncertainty and/or imprecision. However, none of the existing algorithms can be applied to complex data, such as mixed data (numerical and categorical) or non-tabular data like time series. Indeed, these types of data are, in general, not represented in a Euclidean space and the aforementioned algorithms make use of the properties of such spaces, in particular for the construction of barycenters. In this paper, we reformulate the Evidential C-Means (ECM) problem for clustering complex data. We propose a new algorithm, Soft-ECM, which consistently positions the centroids of imprecise clusters requiring only a semi-metric. Our experiments show that Soft-ECM present results comparable to conventional fuzzy clustering approaches on numerical data, and we demonstrate its ability to handle mixed data and its benefits when combining fuzzy clustering with semi-metrics such as DTW for time series data.
nan
Article 1165
Title@2025-07-17 (4): MUPAX: Multidimensional Problem Agnostic eXplainable AI
Title: MUPAX: Multidimensional Problem Agnostic eXplainable AI | MUPAX: Multidimensionales Problem Agnostic eXplainable KI | MUPAX: 多元问题Agnistic EXlable AI 2507.13090v1 |
Authors (4): Vincenzo Dentamaro, Felice Franchini, Giuseppe Pirlo, Irina Voiculescu
Robust XAI techniques should ideally be simultaneously deterministic, model agnostic, and guaranteed to converge. We propose MULTIDIMENSIONAL PROBLEM AGNOSTIC EXPLAINABLE AI (MUPAX), a deterministic, model agnostic explainability technique, with guaranteed convergency. MUPAX measure theoretic formulation gives principled feature importance attribution through structured perturbation analysis that discovers inherent input patterns and eliminates spurious relationships. We evaluate MUPAX on an extensive range of data modalities and tasks: audio classification (1D), image classification (2D), volumetric medical image analysis (3D), and anatomical landmark detection, demonstrating dimension agnostic effectiveness. The rigorous convergence guarantees extend to any loss function and arbitrary dimensions, making MUPAX applicable to virtually any problem context for AI. By contrast with other XAI methods that typically decrease performance when masking, MUPAX not only preserves but actually enhances model accuracy by capturing only the most important patterns of the original data. Extensive benchmarking against the state of the XAI art demonstrates MUPAX ability to generate precise, consistent and understandable explanations, a crucial step towards explainable and trustworthy AI systems. The source code will be released upon publication.
nan
Article 1166
Title@2025-07-17 (4): DASViT: Differentiable Architecture Search for Vision Transformer
Title: DASViT: Differentiable Architecture Search for Vision Transformer | DASViT: Unterschiedliche Architektur Suche nach Vision Transformer | DASVVT:不同建筑搜索视野变异器 2507.13079v1 |
Authors (3): Pengjin Wu, Ferrante Neri, Zhenhua Feng
Designing effective neural networks is a cornerstone of deep learning, and Neural Architecture Search (NAS) has emerged as a powerful tool for automating this process. Among the existing NAS approaches, Differentiable Architecture Search (DARTS) has gained prominence for its efficiency and ease of use, inspiring numerous advancements. Since the rise of Vision Transformers (ViT), researchers have applied NAS to explore ViT architectures, often focusing on macro-level search spaces and relying on discrete methods like evolutionary algorithms. While these methods ensure reliability, they face challenges in discovering innovative architectural designs, demand extensive computational resources, and are time-intensive. To address these limitations, we introduce Differentiable Architecture Search for Vision Transformer (DASViT), which bridges the gap in differentiable search for ViTs and uncovers novel designs. Experiments show that DASViT delivers architectures that break traditional Transformer encoder designs, outperform ViT-B/16 on multiple datasets, and achieve superior efficiency with fewer parameters and FLOPs.
nan
Article 1167
Title@2025-07-17 (4): On the Effectiveness of the z-Transform Method in Quadratic Optimization
Title: On the Effectiveness of the z-Transform Method in Quadratic Optimization | Über die Wirksamkeit der z-Transform Methode in der quadratischen Optimierung | 关于四压压优化中z变形方法有效性问题 2507.03404v2 |
Authors (1): Francis Bach
The z-transform of a sequence is a classical tool used within signal processing, control theory, computer science, and electrical engineering. It allows for studying sequences from their generating functions, with many operations that can be equivalently defined on the original sequence and its $z$-transform. In particular, the z-transform method focuses on asymptotic behaviors and allows the use of Taylor expansions. We present a sequence of results of increasing significance and difficulty for linear models and optimization algorithms, demonstrating the effectiveness and versatility of the z-transform method in deriving new asymptotic results. Starting from the simplest gradient descent iterations in an infinite-dimensional Hilbert space, we show how the spectral dimension characterizes the convergence behavior. We then extend the analysis to Nesterov acceleration, averaging techniques, and stochastic gradient descent.
nan
Article 1168
Title@2025-07-17 (4): Single- to multi-fidelity history-dependent learning with uncertainty quantification and disentanglement: application to data-driven constitutive modeling
Title: Single- to multi-fidelity history-dependent learning with uncertainty quantification and disentanglement: application to data-driven constitutive modeling | Single- to Multi-Fidelity history-dependent Learning mit Unsicherheitsquantifizierung und Disentanglementierung: Anwendung auf datengesteuerte konstitutive Modellierung | 具有不确定性的量化和分解:适用于数据驱动的构成型建模 2507.13416v1 |
Authors (3): Jiaxiang Yi, Bernardo P. Ferreira, Miguel A. Bessa
Data-driven learning is generalized to consider history-dependent multi-fidelity data, while quantifying epistemic uncertainty and disentangling it from data noise (aleatoric uncertainty). This generalization is hierarchical and adapts to different learning scenarios: from training the simplest single-fidelity deterministic neural networks up to the proposed multi-fidelity variance estimation Bayesian recurrent neural networks. The versatility and generality of the proposed methodology are demonstrated by applying it to different data-driven constitutive modeling scenarios that include multiple fidelities with and without aleatoric uncertainty (noise). The method accurately predicts the response and quantifies model error while also discovering the noise distribution (when present). This opens opportunities for future real-world applications in diverse scientific and engineering domains; especially, the most challenging cases involving design and analysis under uncertainty.
nan
Article 1169
Title@2025-07-17 (4): MedPix 2.0: A Comprehensive Multimodal Biomedical Data set for Advanced AI Applications with Retrieval Augmented Generation and Knowledge Graphs
Title: MedPix 2.0: A Comprehensive Multimodal Biomedical Data set for Advanced AI Applications with Retrieval Augmented Generation and Knowledge Graphs | MedPix 2.0: Umfassender multimodaler biomedizinischer Datensatz für fortgeschrittene KI-Anwendungen mit retrieval Augmented Generation und Wissensgraphen | MedPix 2.0:一套综合多式生物医学数据集,用于高级AI应用,并附有回收增加的生成和知识图 2407.02994v5 |
Authors (5): Irene Siragusa, Salvatore Contino, Massimo La Ciura, Rosario Alicata, Roberto Pirrone
The increasing interest in developing Artificial Intelligence applications in the medical domain, suffers from the lack of high-quality data set, mainly due to privacy-related issues. In addition, the recent increase in Vision Language Models (VLM) leads to the need for multimodal medical data sets, where clinical reports and findings are attached to the corresponding medical scans. This paper illustrates the entire workflow for building the MedPix 2.0 data set. Starting with the well-known multimodal data set MedPix\textsuperscript{\textregistered}, mainly used by physicians, nurses, and healthcare students for Continuing Medical Education purposes, a semi-automatic pipeline was developed to extract visual and textual data followed by a manual curing procedure in which noisy samples were removed, thus creating a MongoDB database. Along with the data set, we developed a Graphical User Interface aimed at navigating efficiently the MongoDB instance and obtaining the raw data that can be easily used for training and/or fine-tuning VLMs. To enforce this point, in this work, we first recall DR-Minerva, a Retrieve Augmented Generation-based VLM model trained upon MedPix 2.0. DR-Minerva predicts the body part and the modality used to scan its input image. We also propose the extension of DR-Minerva with a Knowledge Graph that uses Llama 3.1 Instruct 8B, and leverages MedPix 2.0. The resulting architecture can be queried in a end-to-end manner, as a medical decision support system. MedPix 2.0 is available on GitHub.
nan
Article 1170
Title@2025-07-17 (4): On statistical learning of graphs
Title: On statistical learning of graphs | Statistisches Erlernen von Schaubildern | 关于统计学图表 2507.13054v1 |
Authors (4): Vittorio Cipriani, Valentino Delle Rose, Luca San Mauro, Giovanni Solda
We study PAC and online learnability of hypothesis classes formed by copies of a countably infinite graph G, where each copy is induced by permuting G’s vertices. This corresponds to learning a graph’s labeling, knowing its structure and label set. We consider classes where permutations move only finitely many vertices. Our main result shows that PAC learnability of all such finite-support copies implies online learnability of the full isomorphism type of G, and is equivalent to the condition of automorphic triviality. We also characterize graphs where copies induced by swapping two vertices are not learnable, using a relaxation of the extension property of the infinite random graph. Finally, we show that, for all G and k>2, learnability for k-vertex permutations is equivalent to that for 2-vertex permutations, yielding a four-class partition of infinite graphs, whose complexity we also determine using tools coming from both descriptive set theory and computability theory.
nan
Article 1171
Title@2025-07-17 (4): Mining Voter Behaviour and Confidence: A Rule-Based Analysis of the 2022 U.S. Elections
Title: Mining Voter Behaviour and Confidence: A Rule-Based Analysis of the 2022 U.S. Elections | Mining Voter Behaviour and Confidence: Eine regelbasierte Analyse der Wahlen 2022 in den USA | 采矿选民行为和信任:对2022年美国选举的基于规则的分析 2507.14236v1 |
Authors (3): Md Al Jubair, Mohammad Shamsul Arefin, Ahmed Wasif Reza
This study explores the relationship between voter trust and their experiences during elections by applying a rule-based data mining technique to the 2022 Survey of the Performance of American Elections (SPAE). Using the Apriori algorithm and setting parameters to capture meaningful associations (support >= 3%, confidence >= 60%, and lift > 1.5), the analysis revealed a strong connection between demographic attributes and voting-related challenges, such as registration hurdles, accessibility issues, and queue times. For instance, respondents who indicated that accessing polling stations was “very easy” and who reported moderate confidence were found to be over six times more likely (lift = 6.12) to trust their county’s election outcome and experience no registration issues. A further analysis, which adjusted the support threshold to 2%, specifically examined patterns among minority voters. It revealed that 98.16 percent of Black voters who reported easy access to polling locations also had smooth registration experiences. Additionally, those who had high confidence in the vote-counting process were almost two times as likely to identify as Democratic Party supporters. These findings point to the important role that enhancing voting access and offering targeted support can play in building trust in the electoral system, particularly among marginalized communities.
nan
Article 1172
Title@2025-07-17 (4): Gauge Flow Models
Title: Gauge Flow Models | Modelle für den Messfluss | Gage 流程模型 2507.13414v1 |
Authors (2): Alexander Strunk, Roland Assam
This paper introduces Gauge Flow Models, a novel class of Generative Flow Models. These models incorporate a learnable Gauge Field within the Flow Ordinary Differential Equation (ODE). A comprehensive mathematical framework for these models, detailing their construction and properties, is provided. Experiments using Flow Matching on Gaussian Mixture Models demonstrate that Gauge Flow Models yields significantly better performance than traditional Flow Models of comparable or even larger size. Additionally, unpublished research indicates a potential for enhanced performance across a broader range of generative tasks.
nan
Article 1173
Title@2025-07-17 (4): Uncertainty quantification for White Matter Hyperintensity segmentation detects silent failures and improves automated Fazekas quantification
Title: Uncertainty quantification for White Matter Hyperintensity segmentation detects silent failures and improves automated Fazekas quantification | Unsicherheits-Quantifizierung für White Matter Hyperintensitätssegmentierung erkennt leise Ausfälle und verbessert die automatisierte Fazekas-Quantifizierung | 白色物质超密度分离的不确定性量化,可检测静态故障,改进自动Fazekas量化 2411.17571v2 |
Authors (11): Ben Philps, Maria del C. Valdes Hernandez, Chen Qin, Una Clancy, Eleni Sakka, Susana Munoz Maniega, Mark E. Bastin, Angela C. C. Jochems, Joanna M. Wardlaw, Miguel O. Bernabeu, Alzheimers Disease Neuroimaging Initiative
White Matter Hyperintensities (WMH) are key neuroradiological markers of small vessel disease present in brain MRI. Assessment of WMH is important in research and clinics. However, WMH are challenging to segment due to their high variability in shape, location, size, poorly defined borders, and similar intensity profile to other pathologies (e.g stroke lesions) and artefacts (e.g head motion). In this work, we assess the utility and semantic properties of the most effective techniques for uncertainty quantification (UQ) in segmentation for the WMH segmentation task across multiple test-time data distributions. We find UQ techniques reduce ‘silent failure’ by identifying in UQ maps small WMH clusters in the deep white matter that are unsegmented by the model. A combination of Stochastic Segmentation Networks with Deep Ensembles also yields the highest Dice and lowest Absolute Volume Difference % (AVD) score and can highlight areas where there is ambiguity between WMH and stroke lesions. We further demonstrate the downstream utility of UQ, proposing a novel method for classification of the clinical Fazekas score using spatial features extracted from voxelwise WMH probability and UQ maps. We show that incorporating WMH uncertainty information improves Fazekas classification performance and calibration. Our model with (UQ and spatial WMH features)/(spatial WMH features)/(WMH volume only) achieves a balanced accuracy score of 0.74/0.67/0.62, and root brier score of 0.65/0.72/0.74 in the Deep WMH and balanced accuracy of 0.74/0.73/0.71 and root brier score of 0.64/0.66/0.68 in the Periventricular region. We further demonstrate that stochastic UQ techniques with high sample diversity can improve the detection of poor quality segmentations.
nan
Article 1174
Title@2025-07-17 (4): The Power of Architecture: Deep Dive into Transformer Architectures for Long-Term Time Series Forecasting
Title: The Power of Architecture: Deep Dive into Transformer Architectures for Long-Term Time Series Forecasting | Die Kraft der Architektur: Tiefgehen in Transformer-Architekturen für langfristige Zeitreihen | 建筑力量:为长期时间序列预测而向变形结构深度下潜 2507.13043v1 |
Authors (8): Lefei Shen, Mouxiang Chen, Han Fu, Xiaoxue Ren, Xiaoyun Joy Wang, Jianling Sun, Zhuo Li, Chenghao Liu
Transformer-based models have recently become dominant in Long-term Time Series Forecasting (LTSF), yet the variations in their architecture, such as encoder-only, encoder-decoder, and decoder-only designs, raise a crucial question: What Transformer architecture works best for LTSF tasks? However, existing models are often tightly coupled with various time-series-specific designs, making it difficult to isolate the impact of the architecture itself. To address this, we propose a novel taxonomy that disentangles these designs, enabling clearer and more unified comparisons of Transformer architectures. Our taxonomy considers key aspects such as attention mechanisms, forecasting aggregations, forecasting paradigms, and normalization layers. Through extensive experiments, we uncover several key insights: bi-directional attention with joint-attention is most effective; more complete forecasting aggregation improves performance; and the direct-mapping paradigm outperforms autoregressive approaches. Furthermore, our combined model, utilizing optimal architectural choices, consistently outperforms several existing models, reinforcing the validity of our conclusions. We hope these findings offer valuable guidance for future research on Transformer architectural designs in LTSF. Our code is available at https://github.com/HALF111/TSF_architecture.
nan
Article 1175
Title@2025-07-17 (4): Confidence-Filtered Relevance (CFR): An Interpretable and Uncertainty-Aware Machine Learning Framework for Naturalness Assessment in Satellite Imagery
Title: Confidence-Filtered Relevance (CFR): An Interpretable and Uncertainty-Aware Machine Learning Framework for Naturalness Assessment in Satellite Imagery | Confidence-Filtered Relevance (CFR): Ein interpretierbares und unsicheres Machine Learning Framework für die Bewertung von Natürlichkeit in Satellitenbildern | 信任改变的相关性:卫星图像中自然评估的 解释性和不确定性和不确定性-智能学习框架 2507.13034v1 |
Authors (2): Ahmed Emam, Ribana Roscher
Protected natural areas play a vital role in ecological balance and ecosystem services. Monitoring these regions at scale using satellite imagery and machine learning is promising, but current methods often lack interpretability and uncertainty-awareness, and do not address how uncertainty affects naturalness assessment. In contrast, we propose Confidence-Filtered Relevance (CFR), a data-centric framework that combines LRP Attention Rollout with Deep Deterministic Uncertainty (DDU) estimation to analyze how model uncertainty influences the interpretability of relevance heatmaps. CFR partitions the dataset into subsets based on uncertainty thresholds, enabling systematic analysis of how uncertainty shapes the explanations of naturalness in satellite imagery. Applied to the AnthroProtect dataset, CFR assigned higher relevance to shrublands, forests, and wetlands, aligning with other research on naturalness assessment. Moreover, our analysis shows that as uncertainty increases, the interpretability of these relevance heatmaps declines and their entropy grows, indicating less selective and more ambiguous attributions. CFR provides a data-centric approach to assess the relevance of patterns to naturalness in satellite imagery based on their associated certainty.
nan
Article 1176
Title@2025-07-17 (4): (Exhaustive) Symbolic Regression and model selection by minimum description length
Title: (Exhaustive) Symbolic Regression and model selection by minimum description length | (Erschöpfend) Symbolische Regression und Modellauswahl nach minimaler Beschreibungslänge | 按最低描述长度分列的符号回归和模型选择 2507.13033v1 |
Authors (1): Harry Desmond
Symbolic regression is the machine learning method for learning functions from data. After a brief overview of the symbolic regression landscape, I will describe the two main challenges that traditional algorithms face: they have an unknown (and likely significant) probability of failing to find any given good function, and they suffer from ambiguity and poorly-justified assumptions in their function-selection procedure. To address these I propose an exhaustive search and model selection by the minimum description length principle, which allows accuracy and complexity to be directly traded off by measuring each in units of information. I showcase the resulting publicly available Exhaustive Symbolic Regression algorithm on three open problems in astrophysics: the expansion history of the universe, the effective behaviour of gravity in galaxies and the potential of the inflaton field. In each case the algorithm identifies many functions superior to the literature standards. This general purpose methodology should find widespread utility in science and beyond.
nan
Article 1177
Title@2025-07-17 (4): When Pattern-by-Pattern Works: Theoretical and Empirical Insights for Logistic Models with Missing Values
Title: When Pattern-by-Pattern Works: Theoretical and Empirical Insights for Logistic Models with Missing Values | Wenn Pattern-by-Pattern arbeitet: Theoretische und Empirische Einblicke für Logistische Modelle mit fehlenden Werten | 代代办法:缺少价值的后勤模式理论和经验透视 2507.13024v1 |
Authors (3): Christophe Muller, Erwan Scornet, Julie Josse
Predicting a response with partially missing inputs remains a challenging task even in parametric models, since parameter estimation in itself is not sufficient to predict on partially observed inputs. Several works study prediction in linear models. In this paper, we focus on logistic models, which present their own difficulties. From a theoretical perspective, we prove that a Pattern-by-Pattern strategy (PbP), which learns one logistic model per missingness pattern, accurately approximates Bayes probabilities in various missing data scenarios (MCAR, MAR and MNAR). Empirically, we thoroughly compare various methods (constant and iterative imputations, complete case analysis, PbP, and an EM algorithm) across classification, probability estimation, calibration, and parameter inference. Our analysis provides a comprehensive view on the logistic regression with missing values. It reveals that mean imputation can be used as baseline for low sample sizes, and improved performance is obtained via nonlinear multiple iterative imputation techniques with the labels (MICE.RF.Y). For large sample sizes, PbP is the best method for Gaussian mixtures, and we recommend MICE.RF.Y in presence of nonlinear features.
nan
Article 1178
Title@2025-07-17 (4): Fault detection and diagnosis for the engine electrical system of a space launcher based on a temporal convolutional autoencoder and calibrated classifiers
Title: Fault detection and diagnosis for the engine electrical system of a space launcher based on a temporal convolutional autoencoder and calibrated classifiers | Fehlererkennung und Diagnose für das elektrische Motorsystem eines Raumwerfers basierend auf einem zeitlich konvolutionären Autoencoder und kalibrierten Klassifikatoren | 以时富集自动编码器和校准分类器为基础的空间发射装置发动机电气系统的故障检测和诊断 2507.13022v1 |
Authors (4): Luis Basora, Louison Bocquet-Nouaille, Elinirina Robinson, Serge Le Gonidec
In the context of the health monitoring for the next generation of reusable space launchers, we outline a first step toward developing an onboard fault detection and diagnostic capability for the electrical system that controls the engine valves. Unlike existing approaches in the literature, our solution is designed to meet a broader range of key requirements. This includes estimating confidence levels for predictions, detecting out-of-distribution (OOD) cases, and controlling false alarms. The proposed solution is based on a temporal convolutional autoencoder to automatically extract low-dimensional features from raw sensor data. Fault detection and diagnosis are respectively carried out using a binary and a multiclass classifier trained on the autoencoder latent and residual spaces. The classifiers are histogram-based gradient boosting models calibrated to output probabilities that can be interpreted as confidence levels. A relatively simple technique, based on inductive conformal anomaly detection, is used to identify OOD data. We leverage other simple yet effective techniques, such as cumulative sum control chart (CUSUM) to limit the false alarms, and threshold moving to address class imbalance in fault detection. The proposed framework is highly configurable and has been evaluated on simulated data, covering both nominal and anomalous operational scenarios. The results indicate that our solution is a promising first step, though testing with real data will be necessary to ensure that it achieves the required maturity level for operational use.
nan
Article 1179
Title@2025-07-17 (4): The late-stage training dynamics of (stochastic) subgradient descent on homogeneous neural networks
Title: The late-stage training dynamics of (stochastic) subgradient descent on homogeneous neural networks | Die Spätphasen-Trainingsdynamik des (stochastischen) subgradienten Abstiegs auf homogenen neuronalen Netzwerken | 在同质神经网络上的(随机)亚梯级下降的后阶段培训动态 2502.05668v3 |
Authors (2): Sholom Schechtman, Nicolas Schreuder
We analyze the implicit bias of constant step stochastic subgradient descent (SGD). We consider the setting of binary classification with homogeneous neural networks - a large class of deep neural networks with ReLU-type activation functions such as MLPs and CNNs without biases. We interpret the dynamics of normalized SGD iterates as an Euler-like discretization of a conservative field flow that is naturally associated to the normalized classification margin. Owing to this interpretation, we show that normalized SGD iterates converge to the set of critical points of the normalized margin at late-stage training (i.e., assuming that the data is correctly classified with positive normalized margin). Up to our knowledge, this is the first extension of the analysis of Lyu and Li (2020) on the discrete dynamics of gradient descent to the nonsmooth and stochastic setting. Our main result applies to binary classification with exponential or logistic losses. We additionally discuss extensions to more general settings.
nan
Article 1180
Title@2025-07-17 (4): SMART: Relation-Aware Learning of Geometric Representations for Knowledge Graphs
Title: SMART: Relation-Aware Learning of Geometric Representations for Knowledge Graphs | SMART: Beziehungsorientiertes Lernen geometrischer Darstellungen für Wissensgraphen | SMART:知识图表几何表示法关系-知识学习 2507.13001v1 |
Authors (6): Kossi Amouzouvi, Bowen Song, Andrea Coletta, Luigi Bellomarini, Jens Lehmann, Sahar Vahdati
Knowledge graph representation learning approaches provide a mapping between symbolic knowledge in the form of triples in a knowledge graph (KG) and their feature vectors. Knowledge graph embedding (KGE) models often represent relations in a KG as geometric transformations. Most state-of-the-art (SOTA) KGE models are derived from elementary geometric transformations (EGTs), such as translation, scaling, rotation, and reflection, or their combinations. These geometric transformations enable the models to effectively preserve specific structural and relational patterns of the KG. However, the current use of EGTs by KGEs remains insufficient without considering relation-specific transformations. Although recent models attempted to address this problem by ensembling SOTA baseline models in different ways, only a single or composite version of geometric transformations are used by such baselines to represent all the relations. In this paper, we propose a framework that evaluates how well each relation fits with different geometric transformations. Based on this ranking, the model can: (1) assign the best-matching transformation to each relation, or (2) use majority voting to choose one transformation type to apply across all relations. That is, the model learns a single relation-specific EGT in low dimensional vector space through an attention mechanism. Furthermore, we use the correlation between relations and EGTs, which are learned in a low dimension, for relation embeddings in a high dimensional vector space. The effectiveness of our models is demonstrated through comprehensive evaluations on three benchmark KGs as well as a real-world financial KG, witnessing a performance comparable to leading models
nan
Article 1181
Title@2025-07-17 (4): Differential-informed Sample Selection Accelerates Multimodal Contrastive Learning
Title: Differential-informed Sample Selection Accelerates Multimodal Contrastive Learning | Differential-informierte Probenauswahl beschleunigt multimodales kontrastives Lernen | 不同知情的抽样甄选加速多模式差异学习 2507.12998v1 |
Authors (8): Zihua Zhao, Feng Hong, Mengxi Chen, Pengyi Chen, Benyuan Liu, Jiangchao Yao, Ya Zhang, Yanfeng Wang
The remarkable success of contrastive-learning-based multimodal models has been greatly driven by training on ever-larger datasets with expensive compute consumption. Sample selection as an alternative efficient paradigm plays an important direction to accelerate the training process. However, recent advances on sample selection either mostly rely on an oracle model to offline select a high-quality coreset, which is limited in the cold-start scenarios, or focus on online selection based on real-time model predictions, which has not sufficiently or efficiently considered the noisy correspondence. To address this dilemma, we propose a novel Differential-Informed Sample Selection (DISSect) method, which accurately and efficiently discriminates the noisy correspondence for training acceleration. Specifically, we rethink the impact of noisy correspondence on contrastive learning and propose that the differential between the predicted correlation of the current model and that of a historical model is more informative to characterize sample quality. Based on this, we construct a robust differential-based sample selection and analyze its theoretical insights. Extensive experiments on three benchmark datasets and various downstream tasks demonstrate the consistent superiority of DISSect over current state-of-the-art methods. Source code is available at: https://github.com/MediaBrain-SJTU/DISSect.
nan
Article 1182
Title@2025-07-17 (4): (Almost) Free Modality Stitching of Foundation Models
Title: (Almost) Free Modality Stitching of Foundation Models | (Fast) Freie Modalitätsstiche von Stiftungsmodellen | (几乎) 基金会模型的免费方式 2507.10015v3 |
Authors (4): Jaisidh Singh, Diganta Misra, Boris Knyazev, Antonio Orvieto
Foundation multi-modal models are often designed by stitching of multiple existing pretrained uni-modal models: for example, an image classifier with an text model. This stitching process is performed by training a connector module that aims to align the representation spaces of these uni-modal models towards a multi-modal objective. However, given the complexity of training such connectors on large scale web-based datasets coupled with the ever-increasing number of available pretrained uni-modal models, the task of uni-modal models selection and subsequent connector module training becomes computationally demanding. To address this under-studied critical problem, we propose Hypernetwork Model Alignment (Hyma), a novel all-in-one solution for optimal uni-modal model selection and connector training by leveraging hypernetworks. Specifically, our framework utilizes the parameter prediction capability of a hypernetwork to obtain jointly trained connector modules for $N \times M$ combinations of uni-modal models. In our experiments, Hyma reduces the cost of searching for the best performing uni-modal model pair by $10\times$, while matching the ranking and trained connector performance obtained via grid search across a suite of diverse multi-modal benchmarks.
nan
Article 1183
Title@2025-07-17 (4): Teach Old SAEs New Domain Tricks with Boosting
Title: Teach Old SAEs New Domain Tricks with Boosting | Lehren Sie alte SAEs neue Domain Tricks mit Förderung | 教授旧的 SAEs 新域圈套 2507.12990v1 |
Authors (6): Nikita Koriagin, Yaroslav Aksenov, Daniil Laptev, Gleb Gerasimov, Nikita Balagansky, Daniil Gavrilov
Sparse Autoencoders have emerged as powerful tools for interpreting the internal representations of Large Language Models, yet they often fail to capture domain-specific features not prevalent in their training corpora. This paper introduces a residual learning approach that addresses this feature blindness without requiring complete retraining. We propose training a secondary SAE specifically to model the reconstruction error of a pretrained SAE on domain-specific texts, effectively capturing features missed by the primary model. By summing the outputs of both models during inference, we demonstrate significant improvements in both LLM cross-entropy and explained variance metrics across multiple specialized domains. Our experiments show that this method efficiently incorporates new domain knowledge into existing SAEs while maintaining their performance on general tasks. This approach enables researchers to selectively enhance SAE interpretability for specific domains of interest, opening new possibilities for targeted mechanistic interpretability of LLMs.
nan
Article 1184
Title@2025-07-17 (4): Variance-Based Pruning for Accelerating and Compressing Trained Networks
Title: Variance-Based Pruning for Accelerating and Compressing Trained Networks | Varianzbasiertes Pruning für beschleunigte und komprimierende Ausgebildete Netzwerke | 加快和压缩经过训练的网络 2507.12988v1 |
Authors (3): Uranik Berisha, Jens Mehnert, Alexandru Paul Condurache
Increasingly expensive training of ever larger models such as Vision Transfomers motivate reusing the vast library of already trained state-of-the-art networks. However, their latency, high computational costs and memory demands pose significant challenges for deployment, especially on resource-constrained hardware. While structured pruning methods can reduce these factors, they often require costly retraining, sometimes for up to hundreds of epochs, or even training from scratch to recover the lost accuracy resulting from the structural modifications. Maintaining the provided performance of trained models after structured pruning and thereby avoiding extensive retraining remains a challenge. To solve this, we introduce Variance-Based Pruning, a simple and structured one-shot pruning technique for efficiently compressing networks, with minimal finetuning. Our approach first gathers activation statistics, which are used to select neurons for pruning. Simultaneously the mean activations are integrated back into the model to preserve a high degree of performance. On ImageNet-1k recognition tasks, we demonstrate that directly after pruning DeiT-Base retains over 70% of its original performance and requires only 10 epochs of fine-tuning to regain 99% of the original accuracy while simultaneously reducing MACs by 35% and model size by 36%, thus speeding up the model by 1.44x.
nan
Article 1185
Title@2025-07-17 (4): FedGA: A Fair Federated Learning Framework Based on the Gini Coefficient
Title: FedGA: A Fair Federated Learning Framework Based on the Gini Coefficient | FedGA: Ein faires, auf dem Gini-Koeffizienten basierendes Föderated Learning Framework | FDGA:基于基尼系数的公平联邦学习框架 2507.12983v1 |
Authors (1): ShanBin Liu
Fairness has emerged as one of the key challenges in federated learning. In horizontal federated settings, data heterogeneity often leads to substantial performance disparities across clients, raising concerns about equitable model behavior. To address this issue, we propose FedGA, a fairness-aware federated learning algorithm. We first employ the Gini coefficient to measure the performance disparity among clients. Based on this, we establish a relationship between the Gini coefficient $G$ and the update scale of the global model ${U_s}$, and use this relationship to adaptively determine the timing of fairness intervention. Subsequently, we dynamically adjust the aggregation weights according to the system’s real-time fairness status, enabling the global model to better incorporate information from clients with relatively poor performance.We conduct extensive experiments on the Office-Caltech-10, CIFAR-10, and Synthetic datasets. The results show that FedGA effectively improves fairness metrics such as variance and the Gini coefficient, while maintaining strong overall performance, demonstrating the effectiveness of our approach.
nan
Article 1186
Title@2025-07-17 (4): A Distributed Generative AI Approach for Heterogeneous Multi-Domain Environments under Data Sharing constraints
Title: A Distributed Generative AI Approach for Heterogeneous Multi-Domain Environments under Data Sharing constraints | Ein verteilter generativer KI-Ansatz für heterogene Multi-Domain-Umgebungen unter Datenfreigabebeschränkungen | 在数据共享制约下,对异种多领域不同环境采取分散的AI方法 2507.12979v1 |
Authors (4): Youssef Tawfilis, Hossam Amer, Minar El-Aasser, Tallal Elshabrawy
Federated Learning has gained increasing attention for its ability to enable multiple nodes to collaboratively train machine learning models without sharing their raw data. At the same time, Generative AI – particularly Generative Adversarial Networks (GANs) – have achieved remarkable success across a wide range of domains, such as healthcare, security, and Image Generation. However, training generative models typically requires large datasets and significant computational resources, which are often unavailable in real-world settings. Acquiring such resources can be costly and inefficient, especially when many underutilized devices – such as IoT devices and edge devices – with varying capabilities remain idle. Moreover, obtaining large datasets is challenging due to privacy concerns and copyright restrictions, as most devices are unwilling to share their data. To address these challenges, we propose a novel approach for decentralized GAN training that enables the utilization of distributed data and underutilized, low-capability devices while not sharing data in its raw form. Our approach is designed to tackle key challenges in decentralized environments, combining KLD-weighted Clustered Federated Learning to address the issues of data heterogeneity and multi-domain datasets, with Heterogeneous U-Shaped split learning to tackle the challenge of device heterogeneity under strict data sharing constraints – ensuring that no labels or raw data, whether real or synthetic, are ever shared between nodes. Experimental results shows that our approach demonstrates consistent and significant improvements across key performance metrics, where it achieves 1.1x – 2.2x higher image generation scores, an average 10% boost in classification metrics (up to 50% in multi-domain non-IID settings), in much lower latency compared to several benchmarks. Find our code at https://github.com/youssefga28/HuSCF-GAN.
nan
Article 1187
Title@2025-07-17 (4): WaveletInception Networks for Drive-by Vibration-Based Infrastructure Health Monitoring
Title: WaveletInception Networks for Drive-by Vibration-Based Infrastructure Health Monitoring | WaveletInception-Netzwerke für Drive-by-Vibrationsbasierte Infrastruktur-Gesundheitsüberwachung | 驱动振动基础设施健康监测波动感知网络 2507.12969v1 |
Authors (3): Reza Riahi Samani, Alfredo Nunez, Bart De Schutter
This paper presents a novel deep learning-based framework for infrastructure health monitoring using drive-by vibration response signals. Recognizing the importance of spectral and temporal information, we introduce the WaveletInception-BiLSTM network. The WaveletInception feature extractor utilizes a Learnable Wavelet Packet Transform (LWPT) as the stem for extracting vibration signal features, incorporating spectral information in the early network layers. This is followed by 1D Inception networks that extract multi-scale, high-level features at deeper layers. The extracted vibration signal features are then integrated with operational conditions via a Long Short-term Memory (LSTM) layer. The resulting feature extraction network effectively analyzes drive-by vibration signals across various measurement speeds without preprocessing and uses LSTM to capture interrelated temporal dependencies among different modes of information and to create feature vectors for health condition estimation. The estimator head is designed with a sequential modeling architecture using bidirectional LSTM (BiLSTM) networks, capturing bi-directional temporal relationships from drive-by measurements. This architecture allows for a high-resolution, beam-level assessment of infrastructure health conditions. A case study focusing on railway track stiffness estimation with simulated drive-by vibration signals shows that the model significantly outperforms state-of-the-art methods in estimating railway ballast and railpad stiffness parameters. Results underscore the potential of this approach for accurate, localized, and fully automated drive-by infrastructure health monitoring.
nan
Article 1188
Title@2025-07-17 (4): Investigating Forecasting Models for Pandemic Infections Using Heterogeneous Data Sources: A 2-year Study with COVID-19
Title: Investigating Forecasting Models for Pandemic Infections Using Heterogeneous Data Sources: A 2-year Study with COVID-19 | Untersuchung von Prognosemodellen für Pandemieinfektionen unter Verwendung heterogener Datenquellen: Eine 2-jährige Studie mit COVID-19 | 利用异源数据源调查利用异源数据对传染病的预测模型:COVID-19的两年期研究 2507.12966v1 |
Authors (3): Zacharias Komodromos, Kleanthis Malialis, Panayiotis Kolios
Emerging in December 2019, the COVID-19 pandemic caused widespread health, economic, and social disruptions. Rapid global transmission overwhelmed healthcare systems, resulting in high infection rates, hospitalisations, and fatalities. To minimise the spread, governments implemented several non-pharmaceutical interventions like lockdowns and travel restrictions. While effective in controlling transmission, these measures also posed significant economic and societal challenges. Although the WHO declared COVID-19 no longer a global health emergency in May 2023, its impact persists, shaping public health strategies. The vast amount of data collected during the pandemic offers valuable insights into disease dynamics, transmission, and intervention effectiveness. Leveraging these insights can improve forecasting models, enhancing preparedness and response to future outbreaks while mitigating their social and economic impact. This paper presents a large-scale case study on COVID-19 forecasting in Cyprus, utilising a two-year dataset that integrates epidemiological data, vaccination records, policy measures, and weather conditions. We analyse infection trends, assess forecasting performance, and examine the influence of external factors on disease dynamics. The insights gained contribute to improved pandemic preparedness and response strategies.
nan
Article 1189
Title@2025-07-17 (4): A Spectral Interpretation of Redundancy in a Graph Reservoir
Title: A Spectral Interpretation of Redundancy in a Graph Reservoir | Eine spektrale Interpretation der Redundanz in einem Graph Reservoir | 图表储量中剩余性的旁观解释 2507.12963v1 |
Authors (2): Anna Bison, Alessandro Sperduti
Reservoir computing has been successfully applied to graphs as a preprocessing method to improve the training efficiency of Graph Neural Networks (GNNs). However, a common issue that arises when repeatedly applying layer operators on graphs is over-smoothing, which consists in the convergence of graph signals toward low-frequency components of the graph Laplacian. This work revisits the definition of the reservoir in the Multiresolution Reservoir Graph Neural Network (MRGNN), a spectral reservoir model, and proposes a variant based on a Fairing algorithm originally introduced in the field of surface design in computer graphics. This algorithm provides a pass-band spectral filter that allows smoothing without shrinkage, and it can be adapted to the graph setting through the Laplacian operator. Given its spectral formulation, this method naturally connects to GNN architectures for tasks where smoothing, when properly controlled, can be beneficial,such as graph classification. The core contribution of the paper lies in the theoretical analysis of the algorithm from a random walks perspective. In particular, it shows how tuning the spectral coefficients can be interpreted as modulating the contribution of redundant random walks. Exploratory experiments based on the MRGNN architecture illustrate the potential of this approach and suggest promising directions for future research.
nan
Article 1190
Title@2025-07-17 (4): Characterizing Dynamical Stability of Stochastic Gradient Descent in Overparameterized Learning
Title: Characterizing Dynamical Stability of Stochastic Gradient Descent in Overparameterized Learning | Dynamische Stabilität des stochastischen Gradienten Absinkens im überparameterisierten Lernen charakterisierend | 将过度量化的学习中存储层渐变源的动态稳定化特性化 2407.20209v3 |
Authors (2): Dennis Chemnitz, Maximilian Engel
For overparameterized optimization tasks, such as those found in modern machine learning, global minima are generally not unique. In order to understand generalization in these settings, it is vital to study to which minimum an optimization algorithm converges. The possibility of having minima that are unstable under the dynamics imposed by the optimization algorithm limits the potential minima that the algorithm can find. In this paper, we characterize the global minima that are dynamically stable/unstable for both deterministic and stochastic gradient descent (SGD). In particular, we introduce a characteristic Lyapunov exponent that depends on the local dynamics around a global minimum and rigorously prove that the sign of this Lyapunov exponent determines whether SGD can accumulate at the respective global minimum.
nan
Article 1191
Title@2025-07-17 (4): A Progressive Image Restoration Network for High-order Degradation Imaging in Remote Sensing
Title: A Progressive Image Restoration Network for High-order Degradation Imaging in Remote Sensing | Ein Progressives Bildwiederherstellungsnetzwerk für High-Order Degradation Imaging in Remote Sensing | 遥感中高顺序退化成像的逐步图像恢复网络 2412.07195v2 |
Authors (6): Yujie Feng, Yin Yang, Xiaohong Fan, Zhengpeng Zhang, Lijing Bu, Jianping Zhang
Recently, deep learning methods have gained remarkable achievements in the field of image restoration for remote sensing (RS). However, most existing RS image restoration methods focus mainly on conventional first-order degradation models, which may not effectively capture the imaging mechanisms of remote sensing images. Furthermore, many RS image restoration approaches that use deep learning are often criticized for their lacks of architecture transparency and model interpretability. To address these problems, we propose a novel progressive restoration network for high-order degradation imaging (HDI-PRNet), to progressively restore different image degradation. HDI-PRNet is developed based on the theoretical framework of degradation imaging, also Markov properties of the high-order degradation process and Maximum a posteriori (MAP) estimation, offering the benefit of mathematical interpretability within the unfolding network. The framework is composed of three main components: a module for image denoising that relies on proximal mapping prior learning, a module for image deblurring that integrates Neumann series expansion with dual-domain degradation learning, and a module for super-resolution. Extensive experiments demonstrate that our method achieves superior performance on both synthetic and real remote sensing images.
nan
Article 1192
Title@2025-07-17 (4): A Brain Tumor Segmentation Method Based on CLIP and 3D U-Net with Cross-Modal Semantic Guidance and Multi-Level Feature Fusion
Title: A Brain Tumor Segmentation Method Based on CLIP and 3D U-Net with Cross-Modal Semantic Guidance and Multi-Level Feature Fusion | Eine Gehirntumor-Segmentierungsmethode basierend auf CLIP und 3D U-Net mit Cross-Modal Semantic Guidance und Multi-Level-Feature Fusion | 以CLIP和3D U-Net为基础的脑肿瘤分解法,并配有跨模式语义指导和多功能融合 2507.09966v2 |
Authors (1): Mingda Zhang
Precise segmentation of brain tumors from magnetic resonance imaging (MRI) is essential for neuro-oncology diagnosis and treatment planning. Despite advances in deep learning methods, automatic segmentation remains challenging due to tumor morphological heterogeneity and complex three-dimensional spatial relationships. Current techniques primarily rely on visual features extracted from MRI sequences while underutilizing semantic knowledge embedded in medical reports. This research presents a multi-level fusion architecture that integrates pixel-level, feature-level, and semantic-level information, facilitating comprehensive processing from low-level data to high-level concepts. The semantic-level fusion pathway combines the semantic understanding capabilities of Contrastive Language-Image Pre-training (CLIP) models with the spatial feature extraction advantages of 3D U-Net through three mechanisms: 3D-2D semantic bridging, cross-modal semantic guidance, and semantic-based attention mechanisms. Experimental validation on the BraTS 2020 dataset demonstrates that the proposed model achieves an overall Dice coefficient of 0.8567, representing a 4.8% improvement compared to traditional 3D U-Net, with a 7.3% Dice coefficient increase in the clinically important enhancing tumor (ET) region.
nan
Article 1193
Title@2025-07-17 (4): cIDIR: Conditioned Implicit Neural Representation for Regularized Deformable Image Registration
Title: cIDIR: Conditioned Implicit Neural Representation for Regularized Deformable Image Registration | cIDIR: Bedingte implizite Neuraldarstellung für regularisierte, deformierbare Bildregistrierung | cIDIR: 定期变形图像注册的有条件的、隐含的神经代表 2507.12953v1 |
Authors (3): Sidaty El Hadramy, Oumeymah Cherkaoui, Philippe C. Cattin
Regularization is essential in deformable image registration (DIR) to ensure that the estimated Deformation Vector Field (DVF) remains smooth, physically plausible, and anatomically consistent. However, fine-tuning regularization parameters in learning-based DIR frameworks is computationally expensive, often requiring multiple training iterations. To address this, we propose cIDI, a novel DIR framework based on Implicit Neural Representations (INRs) that conditions the registration process on regularization hyperparameters. Unlike conventional methods that require retraining for each regularization hyperparameter setting, cIDIR is trained over a prior distribution of these hyperparameters, then optimized over the regularization hyperparameters by using the segmentations masks as an observation. Additionally, cIDIR models a continuous and differentiable DVF, enabling seamless integration of advanced regularization techniques via automatic differentiation. Evaluated on the DIR-LAB dataset, $\operatorname{cIDIR}$ achieves high accuracy and robustness across the dataset.
nan
Article 1194
Title@2025-07-17 (4): Signal Recovery Using a Spiked Mixture Model
Title: Signal Recovery Using a Spiked Mixture Model | Signalwiederherstellung mit einem Spiked Mixture Model | 使用斯派混合混合模型恢复信号 2501.01840v2 |
Authors (5): Paul-Louis Delacour, Sander Wahls, Jeffrey M. Spraggins, Lukasz Migas, Raf Van de Plas
We introduce the spiked mixture model (SMM) to address the problem of estimating a set of signals from many randomly scaled and noisy observations. Subsequently, we design a novel expectation-maximization (EM) algorithm to recover all parameters of the SMM. Numerical experiments show that in low signal-to-noise ratio regimes, and for data types where the SMM is relevant, SMM surpasses the more traditional Gaussian mixture model (GMM) in terms of signal recovery performance. The broad relevance of the SMM and its corresponding EM recovery algorithm is demonstrated by applying the technique to different data types. The first case study is a biomedical research application, utilizing an imaging mass spectrometry dataset to explore the molecular content of a rat brain tissue section at micrometer scale. The second case study demonstrates SMM performance in a computer vision application, segmenting a hyperspectral imaging dataset into underlying patterns. While the measurement modalities differ substantially, in both case studies SMM is shown to recover signals that were missed by traditional methods such as k-means clustering and GMM.
nan
Article 1195
Title@2025-07-17 (4): MMOne: Representing Multiple Modalities in One Scene
Title: MMOne: Representing Multiple Modalities in One Scene | MMUne: Vertretung mehrerer Modalitäten in einer Szene | MMIO: 在一个场景中代表多种模式 2507.11129v2 |
Authors (2): Zhifeng Gu, Bing Wang
Humans perceive the world through multimodal cues to understand and interact with the environment. Learning a scene representation for multiple modalities enhances comprehension of the physical world. However, modality conflicts, arising from inherent distinctions among different modalities, present two critical challenges: property disparity and granularity disparity. To address these challenges, we propose a general framework, MMOne, to represent multiple modalities in one scene, which can be readily extended to additional modalities. Specifically, a modality modeling module with a novel modality indicator is proposed to capture the unique properties of each modality. Additionally, we design a multimodal decomposition mechanism to separate multi-modal Gaussians into single-modal Gaussians based on modality differences. We address the essential distinctions among modalities by disentangling multimodal information into shared and modality-specific components, resulting in a more compact and efficient multimodal scene representation. Extensive experiments demonstrate that our method consistently enhances the representation capability for each modality and is scalable to additional modalities. The code is available at https://github.com/Neal2020GitHub/MMOne.
nan
Article 1196
Title@2025-07-17 (4): Probabilistic Soundness Guarantees in LLM Reasoning Chains
Title: Probabilistic Soundness Guarantees in LLM Reasoning Chains | Probabilistische Solidität garantiert in LLM-Aufklärungsketten | LLM 理赔链条的概率稳妥性保障 2507.12948v1 |
Authors (7): Weiqiu You, Anton Xue, Shreya Havaldar, Delip Rao, Helen Jin, Chris Callison-Burch, Eric Wong
In reasoning chains generated by large language models (LLMs), initial errors often propagate and undermine the reliability of the final conclusion. Current LLM-based error detection methods often fail to detect propagated errors because they do not properly account for how earlier errors might corrupt judgments of downstream reasoning. To better detect such propagated errors, we introduce Autoregressive Reasoning Entailment Stability (ARES), a novel probabilistic framework that prevents error propagation by judging each claim based only on previously-assessed sound premises. This inductive method yields a nuanced score for each step and provides certified statistical guarantees of its soundness, rather than a brittle binary label. ARES achieves state-of-the-art performance across four benchmarks (72.1% Macro-F1, +8.2 points) and demonstrates superior robustness on very long synthetic reasoning chains, where it excels at detecting propagated errors (90.3% F1, +27.6 points).
nan
Article 1197
Title@2025-07-17 (4): LightAutoDS-Tab: Multi-AutoML Agentic System for Tabular Data
Title: LightAutoDS-Tab: Multi-AutoML Agentic System for Tabular Data | LightAutoDS-Tab: Multi-AutoML Agentic System für Tabellendaten | LightautoDS-Tab:用于表格数据的多自动ML剂系统 2507.13413v1 |
Authors (7): Aleksey Lapin, Igor Hromov, Stanislav Chumakov, Mile Mitrovic, Dmitry Simakov, Nikolay O. Nikitin, Andrey V. Savchenko
AutoML has advanced in handling complex tasks using the integration of LLMs, yet its efficiency remains limited by dependence on specific underlying tools. In this paper, we introduce LightAutoDS-Tab, a multi-AutoML agentic system for tasks with tabular data, which combines an LLM-based code generation with several AutoML tools. Our approach improves the flexibility and robustness of pipeline design, outperforming state-of-the-art open-source solutions on several data science tasks from Kaggle. The code of LightAutoDS-Tab is available in the open repository https://github.com/sb-ai-lab/LADS
nan
Article 1198
Title@2025-07-17 (4): Global urban visual perception varies across demographics and personalities
Title: Global urban visual perception varies across demographics and personalities | Globale urbane visuelle Wahrnehmung variiert je nach Demografie und Persönlichkeit | 全球城市视觉认识因人口和个性而异 2505.12758v3 |
Authors (8): Matias Quintana, Youlong Gu, Xiucheng Liang, Yujun Hou, Koichi Ito, Yihan Zhu, Mahmoud Abdelrahman, Filip Biljecki
Understanding people’s preferences is crucial for urban planning, yet current approaches often combine responses from multi-cultural populations, obscuring demographic differences and risking amplifying biases. We conducted a large-scale urban visual perception survey of streetscapes worldwide using street view imagery, examining how demographics – including gender, age, income, education, race and ethnicity, and, for the first time, personality traits – shape perceptions among 1,000 participants with balanced demographics from five countries and 45 nationalities. This dataset, Street Perception Evaluation Considering Socioeconomics (SPECS), reveals demographic- and personality-based differences across six traditional indicators (safe, lively, wealthy, beautiful, boring, depressing) and four new ones (live nearby, walk, cycle, green). Location-based sentiments further shape these preferences. Machine learning models trained on existing global datasets tend to overestimate positive indicators and underestimate negative ones compared to human responses, underscoring the need for local context. Our study aspires to rectify the myopic treatment of street perception, which rarely considers demographics or personality traits.
nan
Article 1199
Title@2025-07-17 (4): MC$^2$A: Enabling Algorithm-Hardware Co-Design for Efficient Markov Chain Monte Carlo Acceleration
Title: MC$^2$A: Enabling Algorithm-Hardware Co-Design for Efficient Markov Chain Monte Carlo Acceleration | MC$^2$A: Algorithm-Hardware Co-Design für effiziente Markov-Kette Monte Carlo Beschleunigung | MC$$2$A: 提高Markov链节蒙特卡洛速度加速速度的辅助算法-Hardware共同设计 2507.12935v1 |
Authors (6): Shirui Zhao, Jun Yin, Lingyun Yao, Martin Andraud, Wannes Meert, Marian Verhelst
An increasing number of applications are exploiting sampling-based algorithms for planning, optimization, and inference. The Markov Chain Monte Carlo (MCMC) algorithms form the computational backbone of this emerging branch of machine learning. Unfortunately, the high computational cost limits their feasibility for large-scale problems and real-world applications, and the existing MCMC acceleration solutions are either limited in hardware flexibility or fail to maintain efficiency at the system level across a variety of end-to-end applications. This paper introduces \textbf{MC$^2$A}, an algorithm-hardware co-design framework, enabling efficient and flexible optimization for MCMC acceleration. Firstly, \textbf{MC$^2$A} analyzes the MCMC workload diversity through an extension of the processor performance roofline model with a 3rd dimension to derive the optimal balance between the compute, sampling and memory parameters. Secondly, \textbf{MC$^2$A} proposes a parametrized hardware accelerator architecture with flexible and efficient support of MCMC kernels with a pipeline of ISA-programmable tree-structured processing units, reconfigurable samplers and a crossbar interconnect to support irregular access. Thirdly, the core of \textbf{MC$^2$A} is powered by a novel Gumbel sampler that eliminates exponential and normalization operations. In the end-to-end case study, \textbf{MC$^2$A} achieves an overall {$307.6\times$, $1.4\times$, $2.0\times$, $84.2\times$} speedup compared to the CPU, GPU, TPU and state-of-the-art MCMC accelerator. Evaluated on various representative MCMC workloads, this work demonstrates and exploits the feasibility of general hardware acceleration to popularize MCMC-based solutions in diverse application domains.
nan
Article 1200
Title@2025-07-17 (4): DMQ: Dissecting Outliers of Diffusion Models for Post-Training Quantization
Title: DMQ: Dissecting Outliers of Diffusion Models for Post-Training Quantization | DMQ: Ausreißer von Diffusionsmodellen für die Quantisierung nach dem Training | DMQ: 解剖培训后量化传播模型的外源离子 2507.12933v1 |
Authors (5): Dongyeun Lee, Jiwan Hur, Hyounguk Shon, Jae Young Lee, Junmo Kim
Diffusion models have achieved remarkable success in image generation but come with significant computational costs, posing challenges for deployment in resource-constrained environments. Recent post-training quantization (PTQ) methods have attempted to mitigate this issue by focusing on the iterative nature of diffusion models. However, these approaches often overlook outliers, leading to degraded performance at low bit-widths. In this paper, we propose a DMQ which combines Learned Equivalent Scaling (LES) and channel-wise Power-of-Two Scaling (PTS) to effectively address these challenges. Learned Equivalent Scaling optimizes channel-wise scaling factors to redistribute quantization difficulty between weights and activations, reducing overall quantization error. Recognizing that early denoising steps, despite having small quantization errors, crucially impact the final output due to error accumulation, we incorporate an adaptive timestep weighting scheme to prioritize these critical steps during learning. Furthermore, identifying that layers such as skip connections exhibit high inter-channel variance, we introduce channel-wise Power-of-Two Scaling for activations. To ensure robust selection of PTS factors even with small calibration set, we introduce a voting algorithm that enhances reliability. Extensive experiments demonstrate that our method significantly outperforms existing works, especially at low bit-widths such as W4A6 (4-bit weight, 6-bit activation) and W4A8, maintaining high image generation quality and model stability. The code is available at https://github.com/LeeDongYeun/dmq.
nan
Article 1201
Title@2025-07-17 (4): Trace Reconstruction with Language Models
Title: Trace Reconstruction with Language Models | Trace Rekonstruktion mit Sprachmodellen | 使用语言模式进行追踪重建 2507.12927v1 |
Authors (3): Franziska Weindel, Michael Girsch, Reinhard Heckel
The general trace reconstruction problem seeks to recover an original sequence from its noisy copies independently corrupted by deletions, insertions, and substitutions. This problem arises in applications such as DNA data storage, a promising storage medium due to its high information density and longevity. However, errors introduced during DNA synthesis, storage, and sequencing require correction through algorithms and codes, with trace reconstruction often used as part of the data retrieval process. In this work, we propose TReconLM, which leverages language models trained on next-token prediction for trace reconstruction. We pretrain language models on synthetic data and fine-tune on real-world data to adapt to technology-specific error patterns. TReconLM outperforms state-of-the-art trace reconstruction algorithms, including prior deep learning approaches, recovering a substantially higher fraction of sequences without error.
nan
Article 1202
Title@2025-07-17 (4): Robust Explanations Through Uncertainty Decomposition: A Path to Trustworthier AI
Title: Robust Explanations Through Uncertainty Decomposition: A Path to Trustworthier AI | Robuste Erklärungen durch Unsicherheitszersetzung: Ein Weg zu vertrauensvoller KI | 通过不确定性的分解作出有力的解释:通往信托的路径 AI 2507.12913v1 |
Authors (5): Chenrui Zhu, Louenas Bounia, Vu Linh Nguyen, Sébastien Destercke, Arthur Hoarau
Recent advancements in machine learning have emphasized the need for transparency in model predictions, particularly as interpretability diminishes when using increasingly complex architectures. In this paper, we propose leveraging prediction uncertainty as a complementary approach to classical explainability methods. Specifically, we distinguish between aleatoric (data-related) and epistemic (model-related) uncertainty to guide the selection of appropriate explanations. Epistemic uncertainty serves as a rejection criterion for unreliable explanations and, in itself, provides insight into insufficient training (a new form of explanation). Aleatoric uncertainty informs the choice between feature-importance explanations and counterfactual explanations. This leverages a framework of explainability methods driven by uncertainty quantification and disentanglement. Our experiments demonstrate the impact of this uncertainty-aware approach on the robustness and attainability of explanations in both traditional machine learning and deep learning scenarios.
nan
Article 1203
Title@2025-07-17 (4): Fremer: Lightweight and Effective Frequency Transformer for Workload Forecasting in Cloud Services
Title: Fremer: Lightweight and Effective Frequency Transformer for Workload Forecasting in Cloud Services | Fremer: Leichter und effektiver Frequenztransformator für Workload-Prognose in Cloud Services | Fremer:云服务工作量预测的轻型和有效频率变压器 2507.12908v1 |
Authors (7): Jiadong Chen, Hengyu Ye, Fuxin Jiang, Xiao He, Tieying Zhang, Jianjun Chen, Xiaofeng Gao
Workload forecasting is pivotal in cloud service applications, such as auto-scaling and scheduling, with profound implications for operational efficiency. Although Transformer-based forecasting models have demonstrated remarkable success in general tasks, their computational efficiency often falls short of the stringent requirements in large-scale cloud environments. Given that most workload series exhibit complicated periodic patterns, addressing these challenges in the frequency domain offers substantial advantages. To this end, we propose Fremer, an efficient and effective deep forecasting model. Fremer fulfills three critical requirements: it demonstrates superior efficiency, outperforming most Transformer-based forecasting models; it achieves exceptional accuracy, surpassing all state-of-the-art (SOTA) models in workload forecasting; and it exhibits robust performance for multi-period series. Furthermore, we collect and open-source four high-quality, open-source workload datasets derived from ByteDance’s cloud services, encompassing workload data from thousands of computing instances. Extensive experiments on both our proprietary datasets and public benchmarks demonstrate that Fremer consistently outperforms baseline models, achieving average improvements of 5.5% in MSE, 4.7% in MAE, and 8.6% in SMAPE over SOTA models, while simultaneously reducing parameter scale and computational costs. Additionally, in a proactive auto-scaling test based on Kubernetes, Fremer improves average latency by 18.78% and reduces resource consumption by 2.35%, underscoring its practical efficacy in real-world applications.
nan
Article 1204
Title@2025-07-17 (4): A column generation algorithm with dynamic constraint aggregation for minimum sum-of-squares clustering
Title: A column generation algorithm with dynamic constraint aggregation for minimum sum-of-squares clustering | Ein Spaltengenerierungsalgorithmus mit dynamischer Constraint-Aggregation für minimale Summe von Quadraten | 为最小平方和组合组合组合组合而具有动态约束聚合的列生成算法 2410.06187v2 |
Authors (2): Antonio M. Sudoso, Daniel Aloise
The minimum sum-of-squares clustering problem (MSSC), also known as $k$-means clustering, refers to the problem of partitioning $n$ data points into $k$ clusters, with the objective of minimizing the total sum of squared Euclidean distances between each point and the center of its assigned cluster. We propose an efficient algorithm for solving large-scale MSSC instances, which combines column generation (CG) with dynamic constraint aggregation (DCA) to effectively reduce the number of constraints considered in the CG master problem. DCA was originally conceived to reduce degeneracy in set partitioning problems by utilizing an aggregated restricted master problem obtained from a partition of the set partitioning constraints into disjoint clusters. In this work, we explore the use of DCA within a CG algorithm for MSSC exact solution. Our method is fine-tuned by a series of ablation studies on DCA design choices, and is demonstrated to significantly outperform existing state-of-the-art exact approaches available in the literature.
nan
Article 1205
Title@2025-07-17 (4): Generalist Bimanual Manipulation via Foundation Video Diffusion Models
Title: Generalist Bimanual Manipulation via Foundation Video Diffusion Models | Generalist Bimanual Manipulation über Stiftung Video Diffusion Modelle | 通过基金会录像传播模型进行通用二手操作 2507.12898v1 |
Authors (8): Yao Feng, Hengkai Tan, Xinyi Mao, Guodong Liu, Shuhe Huang, Chendong Xiang, Hang Su, Jun Zhu
Bimanual robotic manipulation, which involves the coordinated control of two robotic arms, is foundational for solving challenging tasks. Despite recent progress in general-purpose manipulation, data scarcity and embodiment heterogeneity remain serious obstacles to further scaling up in bimanual settings. In this paper, we introduce VIdeo Diffusion for Action Reasoning (VIDAR), a two-stage framework that leverages large-scale, diffusion-based video pre-training and a novel masked inverse dynamics model for action prediction. We pre-train the video diffusion model on 750K multi-view videos from three real-world bimanual robot platforms, utilizing a unified observation space that encodes robot, camera, task, and scene contexts. Our masked inverse dynamics model learns masks to extract action-relevant information from generated trajectories without requiring pixel-level labels, and the masks can effectively generalize to unseen backgrounds. Our experiments demonstrate that with only 20 minutes of human demonstrations on an unseen robot platform (only 1% of typical data requirements), VIDAR generalizes to unseen tasks and backgrounds with strong semantic understanding, surpassing state-of-the-art methods. Our findings highlight the potential of video foundation models, coupled with masked action prediction, to enable scalable and generalizable robotic manipulation in diverse real-world settings.
nan
Article 1206
Title@2025-07-17 (4): VAR-MATH: Probing True Mathematical Reasoning in Large Language Models via Symbolic Multi-Instance Benchmarks
Title: VAR-MATH: Probing True Mathematical Reasoning in Large Language Models via Symbolic Multi-Instance Benchmarks | VAR-MATH: Wahre mathematische Vernunft in großen Sprachmodellen anhand symbolischer Multi-Instance-Benchmarks | VAR-MATH:通过符号性多因基准在大语言模型中验证真实的数学理由 2507.12885v1 |
Authors (3): Jian Yao, Ran Cheng, Kay Chen Tan
Recent advances in reinforcement learning (RL) have led to substantial improvements in the mathematical reasoning abilities of large language models (LLMs), as measured by standard benchmarks. However, these gains often persist even when models are trained with flawed signals, such as random or inverted rewards, raising a fundamental question: do such improvements reflect true reasoning, or are they merely artifacts of overfitting to benchmark-specific patterns? To address this question, we take an evaluation-centric perspective and identify two critical shortcomings in existing protocols. First, \emph{benchmark contamination} arises from the public availability of test problems, increasing the risk of data leakage. Second, \emph{evaluation fragility} stems from the reliance on single-instance assessments, which are highly sensitive to stochastic outputs and fail to capture reasoning consistency. To overcome these limitations, we introduce {VAR-MATH}, a symbolic evaluation framework designed to probe genuine reasoning ability. By converting fixed numerical problems into symbolic templates and requiring models to solve multiple instantiations of each, VAR-MATH enforces consistent reasoning across structurally equivalent variants, thereby mitigating contamination and improving evaluation robustness. We apply VAR-MATH to transform two popular benchmarks, AMC23 and AIME24, into their symbolic counterparts, VAR-AMC23 and VAR-AIME24. Experimental results reveal substantial performance drops for RL-trained models on the variabilized versions, especially for smaller models, with average declines of 48.0\% on AMC23 and 58.3\% on AIME24. These findings suggest that many existing RL methods rely on superficial heuristics and fail to generalize beyond specific numerical forms. Overall, VAR-MATH offers a principled, contamination-resistant evaluation paradigm for mathematical reasoning.
nan
Article 1207
Title@2025-07-17 (4): Autonomous Resource Management in Microservice Systems via Reinforcement Learning
Title: Autonomous Resource Management in Microservice Systems via Reinforcement Learning | Autonomes Ressourcenmanagement in Mikroservice-Systemen durch Verstärkungslernen | 通过加强学习,对微小服务系统进行自主资源管理 2507.12879v1 |
Authors (6): Yujun Zou, Nia Qi, Yingnan Deng, Zhihao Xue, Ming Gong, Wuyang Zhang
This paper proposes a reinforcement learning-based method for microservice resource scheduling and optimization, aiming to address issues such as uneven resource allocation, high latency, and insufficient throughput in traditional microservice architectures. In microservice systems, as the number of services and the load increase, efficiently scheduling and allocating resources such as computing power, memory, and storage becomes a critical research challenge. To address this, the paper employs an intelligent scheduling algorithm based on reinforcement learning. Through the interaction between the agent and the environment, the resource allocation strategy is continuously optimized. In the experiments, the paper considers different resource conditions and load scenarios, evaluating the proposed method across multiple dimensions, including response time, throughput, resource utilization, and cost efficiency. The experimental results show that the reinforcement learning-based scheduling method significantly improves system response speed and throughput under low load and high concurrency conditions, while also optimizing resource utilization and reducing energy consumption. Under multi-dimensional resource conditions, the proposed method can consider multiple objectives and achieve optimized resource scheduling. Compared to traditional static resource allocation methods, the reinforcement learning model demonstrates stronger adaptability and optimization capability. It can adjust resource allocation strategies in real time, thereby maintaining good system performance in dynamically changing load and resource environments.
nan
Article 1208
Title@2025-07-17 (4): Bayesian Modeling and Estimation of Linear Time-Variant Systems using Neural Networks and Gaussian Processes
Title: Bayesian Modeling and Estimation of Linear Time-Variant Systems using Neural Networks and Gaussian Processes | Bayesische Modellierung und Abschätzung von linearen Zeitvariantsystemen unter Verwendung neuraler Netzwerke und Gaußschen Prozessen | 利用神经网络和高斯进程模拟和估计线性时间变化系统 2507.12878v1 |
Authors (1): Yaniv Shulman
The identification of Linear Time-Variant (LTV) systems from input-output data is a fundamental yet challenging ill-posed inverse problem. This work introduces a unified Bayesian framework that models the system’s impulse response, $h(t, \tau)$, as a stochastic process. We decompose the response into a posterior mean and a random fluctuation term, a formulation that provides a principled approach for quantifying uncertainty and naturally defines a new, useful system class we term Linear Time-Invariant in Expectation (LTIE). To perform inference, we leverage modern machine learning techniques, including Bayesian neural networks and Gaussian Processes, using scalable variational inference. We demonstrate through a series of experiments that our framework can robustly infer the properties of an LTI system from a single noisy observation, show superior data efficiency compared to classical methods in a simulated ambient noise tomography problem, and successfully track a continuously varying LTV impulse response by using a structured Gaussian Process prior. This work provides a flexible and robust methodology for uncertainty-aware system identification in dynamic environments.
nan
Article 1209
Title@2025-07-17 (4): Topology-Aware Activation Functions in Neural Networks
Title: Topology-Aware Activation Functions in Neural Networks | Topologie-Bewusst-Aktivierungsfunktionen in neuralen Netzwerken | 神经网络中的地形-软件启动功能 2507.12874v1 |
Authors (2): Pavel Snopov, Oleg R. Musin
This study explores novel activation functions that enhance the ability of neural networks to manipulate data topology during training. Building on the limitations of traditional activation functions like $\mathrm{ReLU}$, we propose $\mathrm{SmoothSplit}$ and $\mathrm{ParametricSplit}$, which introduce topology “cutting” capabilities. These functions enable networks to transform complex data manifolds effectively, improving performance in scenarios with low-dimensional layers. Through experiments on synthetic and real-world datasets, we demonstrate that $\mathrm{ParametricSplit}$ outperforms traditional activations in low-dimensional settings while maintaining competitive performance in higher-dimensional ones. Our findings highlight the potential of topology-aware activation functions in advancing neural network architectures. The code is available via https://github.com/Snopoff/Topology-Aware-Activations.
nan
Article 1210
Title@2025-07-17 (4): An Investigation of Ear-EEG Signals for a Novel Biometric Authentication System
Title: An Investigation of Ear-EEG Signals for a Novel Biometric Authentication System | Untersuchung von Ohr-EEG-Signalen für ein neuartiges biometrisches Authentifizierungssystem | 关于新生物测定鉴定系统耳电信号的调查 2507.12873v1 |
Authors (6): Danilo Avola, Giancarlo Crocetti, Gian Luca Foresti, Daniele Pannone, Claudio Piciarelli, Amedeo Ranaldi
This work explores the feasibility of biometric authentication using EEG signals acquired through in-ear devices, commonly referred to as ear-EEG. Traditional EEG-based biometric systems, while secure, often suffer from low usability due to cumbersome scalp-based electrode setups. In this study, we propose a novel and practical framework leveraging ear-EEG signals as a user-friendly alternative for everyday biometric authentication. The system extracts an original combination of temporal and spectral features from ear-EEG signals and feeds them into a fully connected deep neural network for subject identification. Experimental results on the only currently available ear-EEG dataset suitable for different purposes, including biometric authentication, demonstrate promising performance, with an average accuracy of 82\% in a subject identification scenario. These findings confirm the potential of ear-EEG as a viable and deployable direction for next-generation real-world biometric systems.
nan
Article 1211
Title@2025-07-17 (4): WhoFi: Deep Person Re-Identification via Wi-Fi Channel Signal Encoding
Title: WhoFi: Deep Person Re-Identification via Wi-Fi Channel Signal Encoding | WhoFi: Deep Person Re-Identification via Wi-Fi Channel Signal Encoding | WhoFi: 通过 Wi-Fi 频道信号编码来识别深层人的身份 2507.12869v1 |
Authors (4): Danilo Avola, Daniele Pannone, Dario Montagnini, Emad Emam
Person Re-Identification is a key and challenging task in video surveillance. While traditional methods rely on visual data, issues like poor lighting, occlusion, and suboptimal angles often hinder performance. To address these challenges, we introduce WhoFi, a novel pipeline that utilizes Wi-Fi signals for person re-identification. Biometric features are extracted from Channel State Information (CSI) and processed through a modular Deep Neural Network (DNN) featuring a Transformer-based encoder. The network is trained using an in-batch negative loss function to learn robust and generalizable biometric signatures. Experiments on the NTU-Fi dataset show that our approach achieves competitive results compared to state-of-the-art methods, confirming its effectiveness in identifying individuals via Wi-Fi signals.
nan
Article 1212
Title@2025-07-17 (4): Supervised Fine Tuning on Curated Data is Reinforcement Learning (and can be improved)
Title: Supervised Fine Tuning on Curated Data is Reinforcement Learning (and can be improved) | Beaufsichtigte Feinabstimmung auf kuratierten Daten ist Verstärktes Lernen (und kann verbessert werden) | 受监督的 “ 封闭数据 “ 微调微调是 “ 强化学习 “ (并可以改进) 2507.12856v1 |
Authors (2): Chongli Qin, Jost Tobias Springenberg
Behavior Cloning (BC) on curated (or filtered) data is the predominant paradigm for supervised fine-tuning (SFT) of large language models; as well as for imitation learning of control policies. Here, we draw on a connection between this successful strategy and the theory and practice of finding optimal policies via Reinforcement Learning (RL). Building on existing literature, we clarify that SFT can be understood as maximizing a lower bound on the RL objective in a sparse reward setting. Giving support to its often observed good performance. From this viewpoint, we realize that a small modification to SFT leads to an importance weighted variant that behaves closer to training with RL as it: i) optimizes a tighter bound to the RL objective and, ii) can improve performance compared to SFT on curated data. We refer to this variant as importance weighted supervised fine-tuning (iw-SFT). We show that it is easy to implement and can be further generalized to training with quality scored data. The resulting SFT variants are competitive with more advanced RL algorithms for large language models and for training policies in continuous control tasks. For example achieving 66.7% on the AIME 2024 dataset.
nan
Article 1213
Title@2025-07-17 (4): Latent Diffusion Model Based Denoising Receiver for 6G Semantic Communication: From Stochastic Differential Theory to Application
Title: Latent Diffusion Model Based Denoising Receiver for 6G Semantic Communication: From Stochastic Differential Theory to Application | Latent Diffusion Modellbasierter Denoisierungsempfänger für 6G Semantische Kommunikation: Von der stochastischen Differentialtheorie zur Anwendung | 用于 6G 语义通讯: 从斯托卡差异理论到应用的 6G 语义通讯的 以 DEM 为基础的前传播模型模型 2506.05710v3 |
Authors (3): Xiucheng Wang, Honggang Jia, Nan Cheng
In this paper, a novel semantic communication framework empowered by generative artificial intelligence (GAI) is proposed, to enhance the robustness against both channel noise and transmission data distribution shifts. A theoretical foundation is established using stochastic differential equations (SDEs), from which a closed-form mapping between any signal-to-noise ratio (SNR) and the optimal denoising timestep is derived. Moreover, to address distribution mismatch, a mathematical scaling method is introduced to align received semantic features with the training distribution of the GAI. Built on this theoretical foundation, a latent diffusion model (LDM)-based semantic communication framework is proposed that combines a variational autoencoder for semantic features extraction, where a pretrained diffusion model is used for denoising. The proposed system is a training-free framework that supports zero-shot generalization, and achieves superior performance under low-SNR and out-of-distribution conditions, offering a scalable and robust solution for future 6G semantic communication systems. Experimental results demonstrate that the proposed semantic communication framework achieves state-of-the-art performance in both pixel-level accuracy and semantic perceptual quality, consistently outperforming baselines across a wide range of SNRs and data distributions without any fine-tuning or post-training.
nan
Article 1214
Title@2025-07-17 (4): Transformer-Based Person Identification via Wi-Fi CSI Amplitude and Phase Perturbations
Title: Transformer-Based Person Identification via Wi-Fi CSI Amplitude and Phase Perturbations | Transformerbasierte Personenidentifikation über Wi-Fi CSI Amplitude und Phasenstörungen | 通过Wi-Fi CSI进行基于变压器的人的识别 2507.12854v1 |
Authors (7): Danilo Avola, Andrea Bernardini, Francesco Danese, Mario Lezoche, Maurizio Mancini, Daniele Pannone, Amedeo Ranaldi
Wi-Fi sensing is gaining momentum as a non-intrusive and privacy-preserving alternative to vision-based systems for human identification. However, person identification through wireless signals, particularly without user motion, remains largely unexplored. Most prior wireless-based approaches rely on movement patterns, such as walking gait, to extract biometric cues. In contrast, we propose a transformer-based method that identifies individuals from Channel State Information (CSI) recorded while the subject remains stationary. CSI captures fine-grained amplitude and phase distortions induced by the unique interaction between the human body and the radio signal. To support evaluation, we introduce a dataset acquired with ESP32 devices in a controlled indoor environment, featuring six participants observed across multiple orientations. A tailored preprocessing pipeline, including outlier removal, smoothing, and phase calibration, enhances signal quality. Our dual-branch transformer architecture processes amplitude and phase modalities separately and achieves 99.82\% classification accuracy, outperforming convolutional and multilayer perceptron baselines. These results demonstrate the discriminative potential of CSI perturbations, highlighting their capacity to encode biometric traits in a consistent manner. They further confirm the viability of passive, device-free person identification using low-cost commodity Wi-Fi hardware in real-world settings.
nan
Article 1215
Title@2025-07-17 (4): Site-Level Fine-Tuning with Progressive Layer Freezing: Towards Robust Prediction of Bronchopulmonary Dysplasia from Day-1 Chest Radiographs in Extremely Preterm Infants
Title: Site-Level Fine-Tuning with Progressive Layer Freezing: Towards Robust Prediction of Bronchopulmonary Dysplasia from Day-1 Chest Radiographs in Extremely Preterm Infants | Site-Level Feintuning mit Progressive Layer Freezing: Auf dem Weg zur robusten Vorhersage der Bronchopulmonalen Dysplasie von Tag 1 Brustradiographen bei extrem prätermen Säuglingen | 与累进层冷冻有关的地点级微调级微调:对极期前婴儿每日1号胸前无线电报上的布朗-希波本二元病原体进行强有力的预测 2507.12269v2 |
Authors (16): Sybelle Goedicke-Fritz, Michelle Bous, Annika Engel, Matthias Flotho, Pascal Hirsch, Hannah Wittig, Dino Milanovic, Dominik Mohr, Mathias Kaspar, Sogand Nemat, Dorothea Kerner, Arno Bücker, Andreas Keller, Sascha Meyer, Michael Zemlin, Philipp Flotho
Bronchopulmonary dysplasia (BPD) is a chronic lung disease affecting 35% of extremely low birth weight infants. Defined by oxygen dependence at 36 weeks postmenstrual age, it causes lifelong respiratory complications. However, preventive interventions carry severe risks, including neurodevelopmental impairment, ventilator-induced lung injury, and systemic complications. Therefore, early BPD prognosis and prediction of BPD outcome is crucial to avoid unnecessary toxicity in low risk infants. Admission radiographs of extremely preterm infants are routinely acquired within 24h of life and could serve as a non-invasive prognostic tool. In this work, we developed and investigated a deep learning approach using chest X-rays from 163 extremely low-birth-weight infants ($\leq$32 weeks gestation, 401-999g) obtained within 24 hours of birth. We fine-tuned a ResNet-50 pretrained specifically on adult chest radiographs, employing progressive layer freezing with discriminative learning rates to prevent overfitting and evaluated a CutMix augmentation and linear probing. For moderate/severe BPD outcome prediction, our best performing model with progressive freezing, linear probing and CutMix achieved an AUROC of 0.78 $\pm$ 0.10, balanced accuracy of 0.69 $\pm$ 0.10, and an F1-score of 0.67 $\pm$ 0.11. In-domain pre-training significantly outperformed ImageNet initialization (p = 0.031) which confirms domain-specific pretraining to be important for BPD outcome prediction. Routine IRDS grades showed limited prognostic value (AUROC 0.57 $\pm$ 0.11), confirming the need of learned markers. Our approach demonstrates that domain-specific pretraining enables accurate BPD prediction from routine day-1 radiographs. Through progressive freezing and linear probing, the method remains computationally feasible for site-level implementation and future federated learning deployments.
nan
Article 1216
Title@2025-07-17 (4): Formalising causal inference as prediction on a target population
Title: Formalising causal inference as prediction on a target population | Formalisierende kausale Schlussfolgerungen als Vorhersage für eine Zielpopulation | 将因果推断正规化,作为对目标人口的预测 2407.17385v3 |
Authors (2): Benedikt Höltgen, Robert C. Williamson
The standard approach to causal modelling especially in social and health sciences is the potential outcomes framework due to Neyman and Rubin. In this framework, observations are thought to be drawn from a distribution over variables of interest, and the goal is to identify parameters of this distribution. Even though the stated goal is often to inform decision making on some target population, there is no straightforward way to include these target populations in the framework. Instead of modelling the relationship between the observed sample and the target population, the inductive assumptions in this framework take the form of abstract sampling and independence assumptions. In this paper, we develop a version of this framework that construes causal inference as treatment-wise predictions for finite populations where all assumptions are testable in retrospect; this means that one can not only test predictions themselves (without any fundamental problem) but also investigate sources of error when they fail. Due to close connections to the original framework, established methods can still be be analysed under the new framework.
nan
Article 1217
Title@2025-07-17 (4): Dataset resulting from the user study on comprehensibility of explainable AI algorithms
Title: Dataset resulting from the user study on comprehensibility of explainable AI algorithms | Datensatz aus der Nutzerstudie zur Verständlichkeit erklärbarer KI-Algorithmen | 用户关于可解释的AI算法的可理解性研究产生的数据集 2411.02419v2 |
Authors (8): Szymon Bobek, Paloma Korycińska, Monika Krakowska, Maciej Mozolewski, Dorota Rak, Magdalena Zych, Magdalena Wójcik, Grzegorz J. Nalepa
This paper introduces a dataset that is the result of a user study on the comprehensibility of explainable artificial intelligence (XAI) algorithms. The study participants were recruited from 149 candidates to form three groups representing experts in the domain of mycology (DE), students with a data science and visualization background (IT) and students from social sciences and humanities (SSH). The main part of the dataset contains 39 transcripts of interviews during which participants were asked to complete a series of tasks and questions related to the interpretation of explanations of decisions of a machine learning model trained to distinguish between edible and inedible mushrooms. The transcripts were complemented with additional data that includes visualizations of explanations presented to the user, results from thematic analysis, recommendations of improvements of explanations provided by the participants, and the initial survey results that allow to determine the domain knowledge of the participant and data analysis literacy. The transcripts were manually tagged to allow for automatic matching between the text and other data related to particular fragments. In the advent of the area of rapid development of XAI techniques, the need for a multidisciplinary qualitative evaluation of explainability is one of the emerging topics in the community. Our dataset allows not only to reproduce the study we conducted, but also to open a wide range of possibilities for the analysis of the material we gathered.
nan
Article 1218
Title@2025-07-17 (4): A Kernel Distribution Closeness Testing
Title: A Kernel Distribution Closeness Testing | Eine Näherungsprüfung der Kernelverteilung | A 内核分布近距离测试 2507.12843v1 |
Authors (4): Zhijian Zhou, Liuhua Peng, Xunye Tian, Feng Liu
The distribution closeness testing (DCT) assesses whether the distance between a distribution pair is at least $\epsilon$-far. Existing DCT methods mainly measure discrepancies between a distribution pair defined on discrete one-dimensional spaces (e.g., using total variation), which limits their applications to complex data (e.g., images). To extend DCT to more types of data, a natural idea is to introduce maximum mean discrepancy (MMD), a powerful measurement of the distributional discrepancy between two complex distributions, into DCT scenarios. However, we find that MMD’s value can be the same for many pairs of distributions that have different norms in the same reproducing kernel Hilbert space (RKHS), making MMD less informative when assessing the closeness levels for multiple distribution pairs. To mitigate the issue, we design a new measurement of distributional discrepancy, norm-adaptive MMD (NAMMD), which scales MMD’s value using the RKHS norms of distributions. Based on the asymptotic distribution of NAMMD, we finally propose the NAMMD-based DCT to assess the closeness levels of a distribution pair. Theoretically, we prove that NAMMD-based DCT has higher test power compared to MMD-based DCT, with bounded type-I error, which is also validated by extensive experiments on many types of data (e.g., synthetic noise, real images). Furthermore, we also apply the proposed NAMMD for addressing the two-sample testing problem and find NAMMD-based two-sample test has higher test power than the MMD-based two-sample test in both theory and experiments.
nan
Article 1219
Title@2025-07-17 (4): Task-Specific Generative Dataset Distillation with Difficulty-Guided Sampling
Title: Task-Specific Generative Dataset Distillation with Difficulty-Guided Sampling | Aufgabenspezifische Generative Datensatzdestillation mit schwer wiegender Probenahme | 利用难于指导的抽样抽样进行任务特定生成数据集蒸馏 2507.03331v2 |
Authors (6): Mingzhuo Li, Guang Li, Jiafeng Mao, Linfeng Ye, Takahiro Ogawa, Miki Haseyama
To alleviate the reliance of deep neural networks on large-scale datasets, dataset distillation aims to generate compact, high-quality synthetic datasets that can achieve comparable performance to the original dataset. The integration of generative models has significantly advanced this field. However, existing approaches primarily focus on aligning the distilled dataset with the original one, often overlooking task-specific information that can be critical for optimal downstream performance. In this paper, focusing on the downstream task of classification, we propose a task-specific sampling strategy for generative dataset distillation that incorporates the concept of difficulty to consider the requirements of the target task better. The final dataset is sampled from a larger image pool with a sampling distribution obtained by matching the difficulty distribution of the original dataset. A logarithmic transformation is applied as a pre-processing step to correct for distributional bias. The results of extensive experiments demonstrate the effectiveness of our method and suggest its potential for enhancing performance on other downstream tasks. The code is available at https://github.com/SumomoTaku/DiffGuideSamp.
nan
Article 1220
Title@2025-07-17 (4): We should avoid the assumption of data-generating probability distributions in social settings
Title: We should avoid the assumption of data-generating probability distributions in social settings | Wir sollten die Annahme von datengenerierenden Wahrscheinlichkeitsverteilungen in sozialen Settings vermeiden | 我们应该避免假设在社会环境中产生数据的概率分布 2407.17395v4 |
Authors (2): Benedikt Höltgen, Robert C. Williamson
Machine Learning research, including work promoting fair or equitable algorithms, heavily relies on the concept of a data-generating probability distribution. The standard presumption is that since data points are ‘sampled from’ such a distribution, one can learn from observed data about this distribution and, thus, predict future data points which are also drawn from it. We argue, however, that such true probability distributions do not exist and should not be dealt with uncritically. We show that alternative frameworks focusing directly on relevant populations rather than abstract distributions are available and leave classical learning theory almost unchanged. Furthermore, we argue that the assumption of true probabilities or data-generating distributions can be misleading and obscure both the choices made and the goals pursued in machine learning practice. Based on these considerations, this position paper argues that, at least in social settings, machine learning work should avoid assuming data-generating probability distributions.
nan
Article 1221
Title@2025-07-17 (4): Bridging the Gap: Leveraging Retrieval-Augmented Generation to Better Understand Public Concerns about Vaccines
Title: Bridging the Gap: Leveraging Retrieval-Augmented Generation to Better Understand Public Concerns about Vaccines | Bridging the Gap: Leveraging Retrieval-Augmented Generation zu besser verstehen öffentliche Bedenken über Impfstoffe | 缩小差距:利用利用回收-养殖一代来更好地了解公众对疫苗的关切 2507.12840v1 |
Authors (6): Muhammad Javed, Sedigh Khademi Habibabadi, Christopher Palmer, Hazel Clothier, Jim Buttery, Gerardo Luis Dimaguila
Vaccine hesitancy threatens public health, leading to delayed or rejected vaccines. Social media is a vital source for understanding public concerns, and traditional methods like topic modelling often struggle to capture nuanced opinions. Though trained for query answering, large Language Models (LLMs) often miss current events and community concerns. Additionally, hallucinations in LLMs can compromise public health communication. To address these limitations, we developed a tool (VaxPulse Query Corner) using the Retrieval Augmented Generation technique. It addresses complex queries about public vaccine concerns on various online platforms, aiding public health administrators and stakeholders in understanding public concerns and implementing targeted interventions to boost vaccine confidence. Analysing 35,103 Shingrix social media posts, it achieved answer faithfulness (0.96) and relevance (0.94).
nan
Article 1222
Title@2025-07-17 (4): Understanding the Evolution of the Neural Tangent Kernel at the Edge of Stability
Title: Understanding the Evolution of the Neural Tangent Kernel at the Edge of Stability | Die Evolution des neuralen Tangentenkerns am Rande der Stabilität verstehen | 了解稳定边缘的内心内核核心的演变 2507.12837v1 |
Authors (3): Kaiqi Jiang, Jeremy Cohen, Yuanzhi Li
The study of Neural Tangent Kernels (NTKs) in deep learning has drawn increasing attention in recent years. NTKs typically actively change during training and are related to feature learning. In parallel, recent work on Gradient Descent (GD) has found a phenomenon called Edge of Stability (EoS), in which the largest eigenvalue of the NTK oscillates around a value inversely proportional to the step size. However, although follow-up works have explored the underlying mechanism of such eigenvalue behavior in depth, the understanding of the behavior of the NTK eigenvectors during EoS is still missing. This paper examines the dynamics of NTK eigenvectors during EoS in detail. Across different architectures, we observe that larger learning rates cause the leading eigenvectors of the final NTK, as well as the full NTK matrix, to have greater alignment with the training target. We then study the underlying mechanism of this phenomenon and provide a theoretical analysis for a two-layer linear network. Our study enhances the understanding of GD training dynamics in deep learning.
nan
Article 1223
Title@2025-07-17 (4): MVA 2025 Small Multi-Object Tracking for Spotting Birds Challenge: Dataset, Methods, and Results
Title: MVA 2025 Small Multi-Object Tracking for Spotting Birds Challenge: Dataset, Methods, and Results | MVA 2025 Kleines Multi-Objekt-Tracking für die Vogelbeobachtung Herausforderung: Datensatz, Methoden und Ergebnisse | MVA 2025 发现鸟类挑战小型多目标跟踪:数据集、方法和结果 2507.12832v1 |
Authors (24): Yuki Kondo, Norimichi Ukita, Riku Kanayama, Yuki Yoshida, Takayuki Yamaguchi, Xiang Yu, Guang Liang, Xinyao Liu, Guan-Zhang Wang, Wei-Ta Chu, Bing-Cheng Chuang, Jia-Hua Lee, Pin-Tseng Kuo, I-Hsuan Chu, Yi-Shein Hsiao, Cheng-Han Wu, Po-Yi Wu, Jui-Chien Tsou, Hsuan-Chi Liu, Chun-Yi Lee, Yuan-Fu Yang, Kosuke Shigematsu, Asuka Shin, Ba Tran
Small Multi-Object Tracking (SMOT) is particularly challenging when targets occupy only a few dozen pixels, rendering detection and appearance-based association unreliable. Building on the success of the MVA2023 SOD4SB challenge, this paper introduces the SMOT4SB challenge, which leverages temporal information to address limitations of single-frame detection. Our three main contributions are: (1) the SMOT4SB dataset, consisting of 211 UAV video sequences with 108,192 annotated frames under diverse real-world conditions, designed to capture motion entanglement where both camera and targets move freely in 3D; (2) SO-HOTA, a novel metric combining Dot Distance with HOTA to mitigate the sensitivity of IoU-based metrics to small displacements; and (3) a competitive MVA2025 challenge with 78 participants and 308 submissions, where the winning method achieved a 5.1x improvement over the baseline. This work lays a foundation for advancing SMOT in UAV scenarios with applications in bird strike avoidance, agriculture, fisheries, and ecological monitoring.
nan
Article 1224
Title@2025-07-17 (4): Autoregressive Speech Enhancement via Acoustic Tokens
Title: Autoregressive Speech Enhancement via Acoustic Tokens | Autoregressive Sprachverbesserung durch akustische Token | 通过声调声调增强自动递减语音 2507.12825v1 |
Authors (3): Luca Della Libera, Cem Subakan, Mirco Ravanelli
In speech processing pipelines, improving the quality and intelligibility of real-world recordings is crucial. While supervised regression is the primary method for speech enhancement, audio tokenization is emerging as a promising alternative for a smooth integration with other modalities. However, research on speech enhancement using discrete representations is still limited. Previous work has mainly focused on semantic tokens, which tend to discard key acoustic details such as speaker identity. Additionally, these studies typically employ non-autoregressive models, assuming conditional independence of outputs and overlooking the potential improvements offered by autoregressive modeling. To address these gaps we: 1) conduct a comprehensive study of the performance of acoustic tokens for speech enhancement, including the effect of bitrate and noise strength; 2) introduce a novel transducer-based autoregressive architecture specifically designed for this task. Experiments on VoiceBank and Libri1Mix datasets show that acoustic tokens outperform semantic tokens in terms of preserving speaker identity, and that our autoregressive approach can further improve performance. Nevertheless, we observe that discrete representations still fall short compared to continuous ones, highlighting the need for further research in this area.
nan
Article 1225
Title@2025-07-17 (4): Self Balancing Neural Network: A Novel Method to Estimate Average Treatment Effect
Title: Self Balancing Neural Network: A Novel Method to Estimate Average Treatment Effect | Self Balancing Neural Network: Eine neuartige Methode zur Schätzung des durchschnittlichen Behandlungseffekts | 自我平衡神经网络:估计平均治疗效果的新办法 2507.12818v1 |
Authors (3): Atomsa Gemechu Abdisa, Yingchun Zhou, Yuqi Qiu
In observational studies, confounding variables affect both treatment and outcome. Moreover, instrumental variables also influence the treatment assignment mechanism. This situation sets the study apart from a standard randomized controlled trial, where the treatment assignment is random. Due to this situation, the estimated average treatment effect becomes biased. To address this issue, a standard approach is to incorporate the estimated propensity score when estimating the average treatment effect. However, these methods incur the risk of misspecification in propensity score models. To solve this issue, a novel method called the “Self balancing neural network” (Sbnet), which lets the model itself obtain its pseudo propensity score from the balancing net, is proposed in this study. The proposed method estimates the average treatment effect by using the balancing net as a key part of the feedforward neural network. This formulation resolves the estimation of the average treatment effect in one step. Moreover, the multi-pseudo propensity score framework, which is estimated from the diversified balancing net and used for the estimation of the average treatment effect, is presented. Finally, the proposed methods are compared with state-of-the-art methods on three simulation setups and real-world datasets. It has been shown that the proposed self-balancing neural network shows better performance than state-of-the-art methods.
nan
Article 1226
Title@2025-07-17 (4): From Novelty to Imitation: Self-Distilled Rewards for Offline Reinforcement Learning
Title: From Novelty to Imitation: Self-Distilled Rewards for Offline Reinforcement Learning | Von der Neuheit zur Imitation: Selbstdestillierte Belohnungen für Offline-Verstärkungslernen | 从新闻到消化:为脱线强化学习自行提炼奖项 2507.12815v1 |
Authors (2): Gaurav Chaudhary, Laxmidhar Behera
Offline Reinforcement Learning (RL) aims to learn effective policies from a static dataset without requiring further agent-environment interactions. However, its practical adoption is often hindered by the need for explicit reward annotations, which can be costly to engineer or difficult to obtain retrospectively. To address this, we propose ReLOAD (Reinforcement Learning with Offline Reward Annotation via Distillation), a novel reward annotation framework for offline RL. Unlike existing methods that depend on complex alignment procedures, our approach adapts Random Network Distillation (RND) to generate intrinsic rewards from expert demonstrations using a simple yet effective embedding discrepancy measure. First, we train a predictor network to mimic a fixed target network’s embeddings based on expert state transitions. Later, the prediction error between these networks serves as a reward signal for each transition in the static dataset. This mechanism provides a structured reward signal without requiring handcrafted reward annotations. We provide a formal theoretical construct that offers insights into how RND prediction errors effectively serve as intrinsic rewards by distinguishing expert-like transitions. Experiments on the D4RL benchmark demonstrate that ReLOAD enables robust offline policy learning and achieves performance competitive with traditional reward-annotated methods.
nan
Article 1227
Title@2025-07-17 (4): RONOM: Reduced-Order Neural Operator Modeling
Title: RONOM: Reduced-Order Neural Operator Modeling | RONOM: Reduzierte Neuraloperator-Modellierung | RONOM: 降低轨道神经操作员模型 2507.12814v1 |
Authors (3): Sven Dummer, Dongwei Ye, Christoph Brune
Time-dependent partial differential equations are ubiquitous in physics-based modeling, but they remain computationally intensive in many-query scenarios, such as real-time forecasting, optimal control, and uncertainty quantification. Reduced-order modeling (ROM) addresses these challenges by constructing a low-dimensional surrogate model but relies on a fixed discretization, which limits flexibility across varying meshes during evaluation. Operator learning approaches, such as neural operators, offer an alternative by parameterizing mappings between infinite-dimensional function spaces, enabling adaptation to data across different resolutions. Whereas ROM provides rigorous numerical error estimates, neural operator learning largely focuses on discretization convergence and invariance without quantifying the error between the infinite-dimensional and the discretized operators. This work introduces the reduced-order neural operator modeling (RONOM) framework, which bridges concepts from ROM and operator learning. We establish a discretization error bound analogous to those in ROM, and get insights into RONOM’s discretization convergence and discretization robustness. Moreover, two numerical examples are presented that compare RONOM to existing neural operators for solving partial differential equations. The results demonstrate that RONOM using standard vector-to-vector neural networks achieves comparable performance in input generalization and superior performance in both spatial super-resolution and discretization robustness, while also offering novel insights into temporal super-resolution scenarios.
nan
Article 1228
Title@2025-07-17 (4): ZClassifier: Temperature Tuning and Manifold Approximation via KL Divergence on Logit Space
Title: ZClassifier: Temperature Tuning and Manifold Approximation via KL Divergence on Logit Space | ZClassifier: Temperatur-Tuning und Manifold-Annäherung über KL Divergenz auf Logit Space | ZClasizer: 通过在登录空间的 KL diggence 进行温度调制和调控相近 2507.10638v2 |
Authors (1): Shim Soon Yong
We introduce a novel classification framework, ZClassifier, that replaces conventional deterministic logits with diagonal Gaussian-distributed logits. Our method simultaneously addresses temperature scaling and manifold approximation by minimizing the Kullback-Leibler (KL) divergence between the predicted Gaussian distributions and a unit isotropic Gaussian. This unifies uncertainty calibration and latent control in a principled probabilistic manner, enabling a natural interpretation of class confidence and geometric consistency. Experiments on CIFAR-10 show that ZClassifier improves over softmax classifiers in robustness, calibration, and latent separation.
nan
Article 1229
Title@2025-07-17 (4): Holistix: A Dataset for Holistic Wellness Dimensions Analysis in Mental Health Narratives
Title: Holistix: A Dataset for Holistic Wellness Dimensions Analysis in Mental Health Narratives | Holistix: Ein Datensatz für ganzheitliche Wellness-Dimensionen Analyse in psychischen Gesundheits-Erzählungen | Holistix:心理健康叙事中整体健康层面分析数据集 2507.09565v2 |
Authors (3): Heba Shakeel, Tanvir Ahmad, Chandni Saxena
We introduce a dataset for classifying wellness dimensions in social media user posts, covering six key aspects: physical, emotional, social, intellectual, spiritual, and vocational. The dataset is designed to capture these dimensions in user-generated content, with a comprehensive annotation framework developed under the guidance of domain experts. This framework allows for the classification of text spans into the appropriate wellness categories. We evaluate both traditional machine learning models and advanced transformer-based models for this multi-class classification task, with performance assessed using precision, recall, and F1-score, averaged over 10-fold cross-validation. Post-hoc explanations are applied to ensure the transparency and interpretability of model decisions. The proposed dataset contributes to region-specific wellness assessments in social media and paves the way for personalized well-being evaluations and early intervention strategies in mental health. We adhere to ethical considerations for constructing and releasing our experiments and dataset publicly on Github.
nan
Article 1230
Title@2025-07-17 (4): Quantum Long Short-Term Memory for Drug Discovery
Title: Quantum Long Short-Term Memory for Drug Discovery | Quantenlanges Kurzzeitgedächtnis für die Drogenentdeckung | 药物发现长期短期记忆 2407.19852v2 |
Authors (5): Liang Zhang, Yin Xu, Mohan Wu, Liang Wang, Hua Xu
Quantum computing combined with machine learning (ML) is a highly promising research area, with numerous studies demonstrating that quantum machine learning (QML) is expected to solve scientific problems more effectively than classical ML. In this work, we present Quantum Long Short-Term Memory (QLSTM), a QML architecture, and demonstrate its effectiveness in drug discovery. We evaluate QLSTM on five benchmark datasets (BBBP, BACE, SIDER, BCAP37, T-47D), and observe consistent performance gains over classical LSTM, with ROC-AUC improvements ranging from 3% to over 6%. Furthermore, QLSTM exhibits improved predictive accuracy as the number of qubits increases, and faster convergence than classical LSTM under the same training conditions. Notably, QLSTM maintains strong robustness against quantum computer noise, outperforming noise-free classical LSTM in certain settings. These findings highlight the potential of QLSTM as a scalable and noise-resilient model for scientific applications, particularly as quantum hardware continues to advance in qubit capacity and fidelity.
nan
Article 1231
Title@2025-07-17 (4): PolyServe: Efficient Multi-SLO Serving at Scale
Title: PolyServe: Efficient Multi-SLO Serving at Scale | PolyServe: Effizientes Multi-SLO Servieren im Maßstab | 多边服务:在规模上有效的多种服务 2507.17769v1 |
Authors (7): Kan Zhu, Haiyang Shi, Le Xu, Jiaxin Shan, Arvind Krishnamurthy, Baris Kasikci, Liguang Xie
Advances in Large Language Models (LLMs) have led to a surge of LLM-powered applications. These applications have diverse token-generation latency requirements. As a result, simply classifying workloads as latency-sensitive (LS) or best-effort (BE) overlooks the nuances within the latency-sensitive category and results in suboptimal user experiences and scheduling opportunities. However, efficiently serving requests with multiple SLO requirements poses significant challenges. First, all requests within a batch generate new tokens simultaneously, which can misalign them with their distinct SLO requirements. Moreover, while existing systems focus on auto-scaling for handling various overall request rates, the diversity of SLOs necessitates fine-grained auto-scaling among these SLO tiers. Finally, unlike LS/BE scenarios, where BE requests can be aborted at any time to ensure the SLO attainment of LS requests, those with different latency-sensitive SLOs cannot tolerate prolonged delays, and tail latency must be controlled. To tackle these challenges, we propose PolyServe, a novel multi-SLO scheduling policy at scale that maintains high SLO attainment while maximizing throughput. PolyServe first groups requests into multiple bins based on their per-token latency requirement, then schedules each bin to a subset of the server fleet. PolyServe routes requests to the highest-load but still SLO-attainable server to create a load gradient that facilitates auto-scaling. To increase utilization, PolyServe permits looser-SLO requests to share tighter-SLO instances when their own servers are saturated. PolyServe uses profiling data to guide scheduling decisions and manage tail latency through request-wait-time-aware scheduling, dynamic chunking, and continuous chunked prefill prediction. PolyServe achieves 1.23x goodput gain compared to existing policies, achieving up to 92.5% of optimal goodput.
nan
Article 1232
Title@2025-07-17 (4): Large Language Models’ Internal Perception of Symbolic Music
Title: Large Language Models’ Internal Perception of Symbolic Music | Die innere Wahrnehmung symbolischer Musik durch große Sprachmodelle | 大语言模型内部对符号音乐的感知 2507.12808v1 |
Authors (2): Andrew Shin, Kunitake Kaneko
Large language models (LLMs) excel at modeling relationships between strings in natural language and have shown promise in extending to other symbolic domains like coding or mathematics. However, the extent to which they implicitly model symbolic music remains underexplored. This paper investigates how LLMs represent musical concepts by generating symbolic music data from textual prompts describing combinations of genres and styles, and evaluating their utility through recognition and generation tasks. We produce a dataset of LLM-generated MIDI files without relying on explicit musical training. We then train neural networks entirely on this LLM-generated MIDI dataset and perform genre and style classification as well as melody completion, benchmarking their performance against established models. Our results demonstrate that LLMs can infer rudimentary musical structures and temporal relationships from text, highlighting both their potential to implicitly encode musical patterns and their limitations due to a lack of explicit musical context, shedding light on their generative capabilities for symbolic music.
nan
Article 1233
Title@2025-07-17 (4): PMKLC: Parallel Multi-Knowledge Learning-based Lossless Compression for Large-Scale Genomics Database
Title: PMKLC: Parallel Multi-Knowledge Learning-based Lossless Compression for Large-Scale Genomics Database | PMKLC: Parallele Multi-Knowledge Learning-basierte Lossless-Kompression für großformatige Genomics-Datenbank | PMKLC: 大型基因组数据库的平行多知识学习-无损失压缩 2507.12805v1 |
Authors (8): Hui Sun, Yanfeng Ding, Liping Yi, Huidong Ma, Gang Wang, Xiaoguang Liu, Cheng Zhong, Wentong Cai
Learning-based lossless compressors play a crucial role in large-scale genomic database backup, storage, transmission, and management. However, their 1) inadequate compression ratio, 2) low compression \& decompression throughput, and 3) poor compression robustness limit their widespread adoption and application in both industry and academia. To solve those challenges, we propose a novel \underline{P}arallel \underline{M}ulti-\underline{K}nowledge \underline{L}earning-based \underline{C}ompressor (PMKLC) with four crucial designs: 1) We propose an automated multi-knowledge learning-based compression framework as compressors’ backbone to enhance compression ratio and robustness; 2) we design a GPU-accelerated ($s$,$k$)-mer encoder to optimize compression throughput and computing resource usage; 3) we introduce data block partitioning and Step-wise Model Passing (SMP) mechanisms for parallel acceleration; 4) We design two compression modes PMKLC-S and PMKLC-M to meet the complex application scenarios, where the former runs on a resource-constrained single GPU and the latter is multi-GPU accelerated. We benchmark PMKLC-S/M and 14 baselines (7 traditional and 7 leaning-based) on 15 real-world datasets with different species and data sizes. Compared to baselines on the testing datasets, PMKLC-S/M achieve the average compression ratio improvement up to 73.609\% and 73.480\%, the average throughput improvement up to 3.036$\times$ and 10.710$\times$, respectively. Besides, PMKLC-S/M also achieve the best robustness and competitive memory cost, indicating its greater stability against datasets with different probability distribution perturbations, and its strong ability to run on memory-constrained devices.
nan
Article 1234
Title@2025-07-17 (4): Physics-Informed Linear Model (PILM): Analytical Representations and Application to Crustal Strain Rate Estimation
Title: Physics-Informed Linear Model (PILM): Analytical Representations and Application to Crustal Strain Rate Estimation | Physik-informiertes Linearmodell (PILM): Analytische Darstellungen und Anwendung auf Crustal Strain Rate Abschätzung | 物理内建线性模型(PILM):对结壳定流速率估计的分析说明和应用 2507.12218v2 |
Authors (1): Tomohisa Okazaki
Many physical systems are described by partial differential equations (PDEs), and solving these equations and estimating their coefficients or boundary conditions (BCs) from observational data play a crucial role in understanding the associated phenomena. Recently, a machine learning approach known as physics-informed neural network, which solves PDEs using neural networks by minimizing the sum of residuals from the PDEs, BCs, and data, has gained significant attention in the scientific community. In this study, we investigate a physics-informed linear model (PILM) that uses linear combinations of basis functions to represent solutions, thereby enabling an analytical representation of optimal solutions. The PILM was formulated and verified for illustrative forward and inverse problems including cases with uncertain BCs. Furthermore, the PILM was applied to estimate crustal strain rates using geodetic data. Specifically, physical regularization that enforces elastic equilibrium on the velocity fields was compared with mathematical regularization that imposes smoothness constraints. From a Bayesian perspective, mathematical regularization exhibited superior performance. The PILM provides an analytically solvable framework applicable to linear forward and inverse problems, underdetermined systems, and physical regularization.
nan
Article 1235
Title@2025-07-17 (4): FLDmamba: Integrating Fourier and Laplace Transform Decomposition with Mamba for Enhanced Time Series Prediction
Title: FLDmamba: Integrating Fourier and Laplace Transform Decomposition with Mamba for Enhanced Time Series Prediction | FLDmamba: Integration von Fourier und Laplace-Transformationszersetzung mit Mamba für verbesserte Zeitreihenvorhersage | FLDmamba:将Fourier和Laple变形变形变形与Mamba结合,以提高时间序列预测 2507.12803v1 |
Authors (8): Qianru Zhang, Chenglei Yu, Haixin Wang, Yudong Yan, Yuansheng Cao, Siu-Ming Yiu, Tailin Wu, Hongzhi Yin
Time series prediction, a crucial task across various domains, faces significant challenges due to the inherent complexities of time series data, including non-stationarity, multi-scale periodicity, and transient dynamics, particularly when tackling long-term predictions. While Transformer-based architectures have shown promise, their quadratic complexity with sequence length hinders their efficiency for long-term predictions. Recent advancements in State-Space Models, such as Mamba, offer a more efficient alternative for long-term modeling, but they cannot capture multi-scale periodicity and transient dynamics effectively. Meanwhile, they are susceptible to data noise issues in time series. This paper proposes a novel framework, FLDmamba (Fourier and Laplace Transform Decomposition Mamba), addressing these limitations. FLDmamba leverages the strengths of both Fourier and Laplace transforms to effectively capture both multi-scale periodicity, transient dynamics within time series data, and improve the robustness of the model to the data noise issue. Our extensive experiments demonstrate that FLDmamba achieves superior performance on time series prediction benchmarks, outperforming both Transformer-based and other Mamba-based architectures. To promote the reproducibility of our method, we have made both the code and data accessible via the following URL:{\href{https://github.com/AI4Science-WestlakeU/FLDmamba}{https://github.com/AI4Science-WestlakeU/\model}.
nan
Article 1236
Title@2025-07-17 (4): ReCode: Updating Code API Knowledge with Reinforcement Learning
Title: ReCode: Updating Code API Knowledge with Reinforcement Learning | ReCode: Aktualisierung von Code-API-Kenntnissen mit Verstärkungslernen | ReCode:更新法规API知识与强化学习 2506.20495v2 |
Authors (5): Haoze Wu, Yunzhi Yao, Wenhao Yu, Huajun Chen, Ningyu Zhang
Large Language Models (LLMs) exhibit remarkable code generation capabilities but falter when adapting to frequent updates in external library APIs. This critical limitation, stemming from reliance on outdated API knowledge from their training data, even with access to current documentation, impedes reliable code generation in dynamic environments. To tackle this issue, we propose ReCode (rule-based Reinforcement learning for Code Update), a novel framework that mimics human programmer adaptation to API changes. Specifically, we construct a dataset of approximately 2,000 data entries to train the LLMs to perform version migration based on updated information. Then, we introduce a modified string similarity metric for code evaluation as the reward for reinforcement learning. Our experiments demonstrate that ReCode substantially boosts LLMs’ code generation performance in dynamic API scenarios, especially on the unseen CodeUpdateArena task. Crucially, compared to supervised fine-tuning, ReCode has less impact on LLMs’ general code generation abilities. We apply ReCode on various LLMs and reinforcement learning algorithms (GRPO and DAPO), all achieving consistent improvements. Notably, after training, Qwen2.5-Coder-7B outperforms that of the 32B parameter code instruction-tuned model and the reasoning model with the same architecture. Code is available at https://github.com/zjunlp/ReCode.
nan
Article 1237
Title@2025-07-17 (4): Beyond Architectures: Evaluating the Role of Contextual Embeddings in Detecting Bipolar Disorder on Social Media
Title: Beyond Architectures: Evaluating the Role of Contextual Embeddings in Detecting Bipolar Disorder on Social Media | Beyond Architectures: Bewertung der Rolle kontextueller Einbettungen bei der Erkennung bipolarer Störungen in sozialen Medien | 超越建筑:评价背景嵌入在发现社会媒体两极分极分崩离析现象中的作用 2507.14231v1 |
Authors (2): Khalid Hasan, Jamil Saquer
Bipolar disorder is a chronic mental illness frequently underdiagnosed due to subtle early symptoms and social stigma. This paper explores the advanced natural language processing (NLP) models for recognizing signs of bipolar disorder based on user-generated social media text. We conduct a comprehensive evaluation of transformer-based models (BERT, RoBERTa, ALBERT, ELECTRA, DistilBERT) and Long Short Term Memory (LSTM) models based on contextualized (BERT) and static (GloVe, Word2Vec) word embeddings. Experiments were performed on a large, annotated dataset of Reddit posts after confirming their validity through sentiment variance and judgmental analysis. Our results demonstrate that RoBERTa achieves the highest performance among transformer models with an F1 score of ~98% while LSTM models using BERT embeddings yield nearly identical results. In contrast, LSTMs trained on static embeddings fail to capture meaningful patterns, scoring near-zero F1. These findings underscore the critical role of contextual language modeling in detecting bipolar disorder. In addition, we report model training times and highlight that DistilBERT offers an optimal balance between efficiency and accuracy. In general, our study offers actionable insights for model selection in mental health NLP applications and validates the potential of contextualized language models to support early bipolar disorder screening.
nan
Article 1238
Title@2025-07-17 (4): Multi-Channel Graph Neural Network for Financial Risk Prediction of NEEQ Enterprises
Title: Multi-Channel Graph Neural Network for Financial Risk Prediction of NEEQ Enterprises | Multi-Channel Graph Neural Network for Financial Risk Prediction of NEEQ Enterprises | NEEQ企业金融风险预测多通道图图神经网络 2507.12787v1 |
Authors (1): Jianyu Zhu
With the continuous evolution of China’s multi-level capital market, the National Equities Exchange and Quotations (NEEQ), also known as the “New Third Board,” has become a critical financing platform for small and medium-sized enterprises (SMEs). However, due to their limited scale and financial resilience, many NEEQ-listed companies face elevated risks of financial distress. To address this issue, we propose a multi-channel deep learning framework that integrates structured financial indicators, textual disclosures, and enterprise relationship data for comprehensive financial risk prediction. Specifically, we design a Triple-Channel Graph Isomorphism Network (GIN) that processes numeric, textual, and graph-based inputs separately. These modality-specific representations are fused using an attention-based mechanism followed by a gating unit to enhance robustness and prediction accuracy. Experimental results on data from 7,731 real-world NEEQ companies demonstrate that our model significantly outperforms traditional machine learning methods and single-modality baselines in terms of AUC, Precision, Recall, and F1 Score. This work provides theoretical and practical insights into risk modeling for SMEs and offers a data-driven tool to support financial regulators and investors.
nan
Article 1239
Title@2025-07-17 (4): COREVQA: A Crowd Observation and Reasoning Entailment Visual Question Answering Benchmark
Title: COREVQA: A Crowd Observation and Reasoning Entailment Visual Question Answering Benchmark | COREVQA: Eine Crowd-Beobachtung und Begründung zur Detaillierung Visual Question Answering Benchmark | COREVQA: 聚众观察和理性视觉问题回答基准 2507.13405v1 |
Authors (9): Ishant Chintapatla, Kazuma Choji, Naaisha Agarwal, Andrew Lin, Hannah You, Charles Duong, Kevin Zhu, Sean O’Brien, Vasu Sharma
Recently, many benchmarks and datasets have been developed to evaluate Vision-Language Models (VLMs) using visual question answering (VQA) pairs, and models have shown significant accuracy improvements. However, these benchmarks rarely test the model’s ability to accurately complete visual entailment, for instance, accepting or refuting a hypothesis based on the image. To address this, we propose COREVQA (Crowd Observations and Reasoning Entailment), a benchmark of 5608 image and synthetically generated true/false statement pairs, with images derived from the CrowdHuman dataset, to provoke visual entailment reasoning on challenging crowded images. Our results show that even the top-performing VLMs achieve accuracy below 80%, with other models performing substantially worse (39.98%-69.95%). This significant performance gap reveals key limitations in VLMs’ ability to reason over certain types of image-question pairs in crowded scenes.
nan
Article 1240
Title@2025-07-17 (4): Compact Vision Transformer by Reduction of Kernel Complexity
Title: Compact Vision Transformer by Reduction of Kernel Complexity | Kompakter Vision Transformer durch Reduktion der Kernelkomplexität | 减少内核复杂度,实现全球契约愿景转型 2507.12780v1 |
Authors (2): Yancheng Wang, Yingzhen Yang
Self-attention and transformer architectures have become foundational components in modern deep learning. Recent efforts have integrated transformer blocks into compact neural architectures for computer vision, giving rise to various efficient vision transformers. In this work, we introduce Transformer with Kernel Complexity Reduction, or KCR-Transformer, a compact transformer block equipped with differentiable channel selection, guided by a novel and sharp theoretical generalization bound. KCR-Transformer performs input/output channel selection in the MLP layers of transformer blocks to reduce the computational cost. Furthermore, we provide a rigorous theoretical analysis establishing a tight generalization bound for networks equipped with KCR-Transformer blocks. Leveraging such strong theoretical results, the channel pruning by KCR-Transformer is conducted in a generalization-aware manner, ensuring that the resulting network retains a provably small generalization error. Our KCR-Transformer is compatible with many popular and compact transformer networks, such as ViT and Swin, and it reduces the FLOPs of the vision transformers while maintaining or even improving the prediction accuracy. In the experiments, we replace all the transformer blocks in the vision transformers with KCR-Transformer blocks, leading to KCR-Transformer networks with different backbones. The resulting TCR-Transformers achieve superior performance on various computer vision tasks, achieving even better performance than the original models with even less FLOPs and parameters.
nan
Article 1241
Title@2025-07-17 (4): Demystifying MuZero Planning: Interpreting the Learned Model
Title: Demystifying MuZero Planning: Interpreting the Learned Model | MuZero-Planung entmystifizieren: Das gelernte Modell interpretieren | 消除神秘的 “ 零零规划 “ :解释 “ 总结经验 “ 模式 2411.04580v2 |
Authors (4): Hung Guei, Yan-Ru Ju, Wei-Yu Chen, Ti-Rong Wu
MuZero has achieved superhuman performance in various games by using a dynamics network to predict the environment dynamics for planning, without relying on simulators. However, the latent states learned by the dynamics network make its planning process opaque. This paper aims to demystify MuZero’s model by interpreting the learned latent states. We incorporate observation reconstruction and state consistency into MuZero training and conduct an in-depth analysis to evaluate latent states across two board games: 9x9 Go and Gomoku, and three Atari games: Breakout, Ms. Pacman, and Pong. Our findings reveal that while the dynamics network becomes less accurate over longer simulations, MuZero still performs effectively by using planning to correct errors. Our experiments also show that the dynamics network learns better latent states in board games than in Atari games. These insights contribute to a better understanding of MuZero and offer directions for future research to improve the performance, robustness, and interpretability of the MuZero algorithm. The code and data are available at https://rlg.iis.sinica.edu.tw/papers/demystifying-muzero-planning.
nan
Article 1242
Title@2025-07-17 (4): A Comprehensive Survey of Electronic Health Record Modeling: From Deep Learning Approaches to Large Language Models
Title: A Comprehensive Survey of Electronic Health Record Modeling: From Deep Learning Approaches to Large Language Models | Eine umfassende Umfrage zur elektronischen Gesundheitsdatenmodellierung: Von Deep Learning Ansätzen bis hin zu großen Sprachmodellen | 《电子健康记录模型综合调查:从深学习方法到大语言模式》 2507.12774v1 |
Authors (5): Weijieying Ren, Jingxi Zhu, Zehao Liu, Tianxiang Zhao, Vasant Honavar
Artificial intelligence (AI) has demonstrated significant potential in transforming healthcare through the analysis and modeling of electronic health records (EHRs). However, the inherent heterogeneity, temporal irregularity, and domain-specific nature of EHR data present unique challenges that differ fundamentally from those in vision and natural language tasks. This survey offers a comprehensive overview of recent advancements at the intersection of deep learning, large language models (LLMs), and EHR modeling. We introduce a unified taxonomy that spans five key design dimensions: data-centric approaches, neural architecture design, learning-focused strategies, multimodal learning, and LLM-based modeling systems. Within each dimension, we review representative methods addressing data quality enhancement, structural and temporal representation, self-supervised learning, and integration with clinical knowledge. We further highlight emerging trends such as foundation models, LLM-driven clinical agents, and EHR-to-text translation for downstream reasoning. Finally, we discuss open challenges in benchmarking, explainability, clinical alignment, and generalization across diverse clinical settings. This survey aims to provide a structured roadmap for advancing AI-driven EHR modeling and clinical decision support. For a comprehensive list of EHR-related methods, kindly refer to https://survey-on-tabular-data.github.io/.
nan
Article 1243
Title@2025-07-17 (4): Sample-Constrained Black Box Optimization for Audio Personalization
Title: Sample-Constrained Black Box Optimization for Audio Personalization | Sample-Constrained Black Box Optimierung für Audio-Personalisierung | 优化音频个性化 2507.12773v1 |
Authors (3): Rajalaxmi Rajagopalan, Yu-Lin Wei, Romit Roy Choudhury
We consider the problem of personalizing audio to maximize user experience. Briefly, we aim to find a filter $h^$, which applied to any music or speech, will maximize the user’s satisfaction. This is a black-box optimization problem since the user’s satisfaction function is unknown. Substantive work has been done on this topic where the key idea is to play audio samples to the user, each shaped by a different filter $h_i$, and query the user for their satisfaction scores $f(h_i)$. A family of ``surrogate” functions is then designed to fit these scores and the optimization method gradually refines these functions to arrive at the filter $\hat{h}^$ that maximizes satisfaction. In certain applications, we observe that a second type of querying is possible where users can tell us the individual elements $h^[j]$ of the optimal filter $h^$. Consider an analogy from cooking where the goal is to cook a recipe that maximizes user satisfaction. A user can be asked to score various cooked recipes (e.g., tofu fried rice) or to score individual ingredients (say, salt, sugar, rice, chicken, etc.). Given a budget of $B$ queries, where a query can be of either type, our goal is to find the recipe that will maximize this user’s satisfaction. Our proposal builds on Sparse Gaussian Process Regression (GPR) and shows how a hybrid approach can outperform any one type of querying. Our results are validated through simulations and real world experiments, where volunteers gave feedback on music/speech audio and were able to achieve high satisfaction levels. We believe this idea of hybrid querying opens new problems in black-box optimization and solutions can benefit other applications beyond audio personalization.
nan
Article 1244
Title@2025-07-17 (4): AnyPos: Automated Task-Agnostic Actions for Bimanual Manipulation
Title: AnyPos: Automated Task-Agnostic Actions for Bimanual Manipulation | AnyPos: Automatisierte Task-Agnostische Aktionen zur bimanuellen Manipulation | 任何 波 : 用于二手操纵的自动任务- 不可允许动作 2507.12768v1 |
Authors (8): Hengkai Tan, Yao Feng, Xinyi Mao, Shuhe Huang, Guodong Liu, Zhongkai Hao, Hang Su, Jun Zhu
Vision-language-action (VLA) models have shown promise on task-conditioned control in complex settings such as bimanual manipulation. However, the heavy reliance on task-specific human demonstrations limits their generalization and incurs high data acquisition costs. In this work, we present a new notion of task-agnostic action paradigm that decouples action execution from task-specific conditioning, enhancing scalability, efficiency, and cost-effectiveness. To address the data collection challenges posed by this paradigm – such as low coverage density, behavioral redundancy, and safety risks – we introduce ATARA (Automated Task-Agnostic Random Actions), a scalable self-supervised framework that accelerates collection by over $ 30\times $ compared to human teleoperation. To further enable effective learning from task-agnostic data, which often suffers from distribution mismatch and irrelevant trajectories, we propose AnyPos, an inverse dynamics model equipped with Arm-Decoupled Estimation and a Direction-Aware Decoder (DAD). We additionally integrate a video-conditioned action validation module to verify the feasibility of learned policies across diverse manipulation tasks. Extensive experiments show that the AnyPos-ATARA pipeline yields a 51% improvement in test accuracy and achieves 30-40% higher success rates in downstream tasks such as lifting, pick-and-place, and clicking, using replay-based video validation. Project Page: https://embodiedfoundation.github.io/vidar_anypos
nan
Article 1245
Title@2025-07-17 (4): Layer Separation Deep Learning Model with Auxiliary Variables for Partial Differential Equations
Title: Layer Separation Deep Learning Model with Auxiliary Variables for Partial Differential Equations | Ebenentrennung Deep Learning Modell mit Hilfsvariablen für partielle Differentialgleichungen | 图层分离深学习模型,带有局部差异等量的辅助变量 2507.12766v1 |
Authors (2): Yaru Liu, Yiqi Gu
In this paper, we propose a new optimization framework, the layer separation (LySep) model, to improve the deep learning-based methods in solving partial differential equations. Due to the highly non-convex nature of the loss function in deep learning, existing optimization algorithms often converge to suboptimal local minima or suffer from gradient explosion or vanishing, resulting in poor performance. To address these issues, we introduce auxiliary variables to separate the layers of deep neural networks. Specifically, the output and its derivatives of each layer are represented by auxiliary variables, effectively decomposing the deep architecture into a series of shallow architectures. New loss functions with auxiliary variables are established, in which only variables from two neighboring layers are coupled. Corresponding algorithms based on alternating directions are developed, where many variables can be updated optimally in closed forms. Moreover, we provide theoretical analyses demonstrating the consistency between the LySep model and the original deep model. High-dimensional numerical results validate our theory and demonstrate the advantages of LySep in minimizing loss and reducing solution error.
nan
Article 1246
Title@2025-07-17 (4): Golden Noise for Diffusion Models: A Learning Framework
Title: Golden Noise for Diffusion Models: A Learning Framework | Goldene Geräusche für Diffusionsmodelle: Ein Lernrahmen | 传播模型的黄金噪音:学习框架 2411.09502v5 |
Authors (7): Zikai Zhou, Shitong Shao, Lichen Bai, Shufei Zhang, Zhiqiang Xu, Bo Han, Zeke Xie
Text-to-image diffusion model is a popular paradigm that synthesizes personalized images by providing a text prompt and a random Gaussian noise. While people observe that some noises are golden noises'' that can achieve better text-image alignment and higher human preference than others, we still lack a machine learning framework to obtain those golden noises. To learn golden noises for diffusion sampling, we mainly make three contributions in this paper. First, we identify a new concept termed the \textit{noise prompt}, which aims at turning a random Gaussian noise into a golden noise by adding a small desirable perturbation derived from the text prompt. Following the concept, we first formulate the \textit{noise prompt learning} framework that systematically learns
prompted’’ golden noise associated with a text prompt for diffusion models. Second, we design a noise prompt data collection pipeline and collect a large-scale \textit{noise prompt dataset}~(NPD) that contains 100k pairs of random noises and golden noises with the associated text prompts. With the prepared NPD as the training dataset, we trained a small \textit{noise prompt network}~(NPNet) that can directly learn to transform a random noise into a golden noise. The learned golden noise perturbation can be considered as a kind of prompt for noise, as it is rich in semantic information and tailored to the given text prompt. Third, our extensive experiments demonstrate the impressive effectiveness and generalization of NPNet on improving the quality of synthesized images across various diffusion models, including SDXL, DreamShaper-xl-v2-turbo, and Hunyuan-DiT. Moreover, NPNet is a small and efficient controller that acts as a plug-and-play module with very limited additional inference and computational costs, as it just provides a golden noise instead of a random noise without accessing the original pipeline.
nan
Article 1247
Title@2025-07-17 (4): TBDetector:Transformer-Based Detector for Advanced Persistent Threats with Provenance Graph
Title: TBDetector:Transformer-Based Detector for Advanced Persistent Threats with Provenance Graph | TBDetector:Transformer-basierter Detektor für erweiterte persistente Bedrohungen mit Provenienzgraph | TB 检测器:用证明图测出先进持久性威胁的转移前检测器 2304.02838v2 |
Authors (10): Nan Wang, Xuezhi Wen, Dalin Zhang, Xibin Zhao, Jiahui Ma, Mengxia Luo, Fan Xu, Sen Nie, Shi Wu, Jiqiang Liu
APT detection is difficult to detect due to the long-term latency, covert and slow multistage attack patterns of Advanced Persistent Threat (APT). To tackle these issues, we propose TBDetector, a transformer-based advanced persistent threat detection method for APT attack detection. Considering that provenance graphs provide rich historical information and have the powerful attacks historic correlation ability to identify anomalous activities, TBDetector employs provenance analysis for APT detection, which summarizes long-running system execution with space efficiency and utilizes transformer with self-attention based encoder-decoder to extract long-term contextual features of system states to detect slow-acting attacks. Furthermore, we further introduce anomaly scores to investigate the anomaly of different system states, where each state is calculated with an anomaly score corresponding to its similarity score and isolation score. To evaluate the effectiveness of the proposed method, we have conducted experiments on five public datasets, i.e., streamspot, cadets, shellshock, clearscope, and wget_baseline. Experimental results and comparisons with state-of-the-art methods have exhibited better performance of our proposed method.
nan
Article 1248
Title@2025-07-17 (4): World Model-Based End-to-End Scene Generation for Accident Anticipation in Autonomous Driving
Title: World Model-Based End-to-End Scene Generation for Accident Anticipation in Autonomous Driving | Weltmodellbasierte End-to-End-Szenengenerierung für Unfallvorhersage im autonomen Fahren | 以世界模式为基础的在自主驾驶中事故预防端至终点到终点示范景点一代 2507.12762v1 |
Authors (6): Yanchen Guan, Haicheng Liao, Chengyue Wang, Xingcheng Liu, Jiaxun Zhang, Zhenning Li
Reliable anticipation of traffic accidents is essential for advancing autonomous driving systems. However, this objective is limited by two fundamental challenges: the scarcity of diverse, high-quality training data and the frequent absence of crucial object-level cues due to environmental disruptions or sensor deficiencies. To tackle these issues, we propose a comprehensive framework combining generative scene augmentation with adaptive temporal reasoning. Specifically, we develop a video generation pipeline that utilizes a world model guided by domain-informed prompts to create high-resolution, statistically consistent driving scenarios, particularly enriching the coverage of edge cases and complex interactions. In parallel, we construct a dynamic prediction model that encodes spatio-temporal relationships through strengthened graph convolutions and dilated temporal operators, effectively addressing data incompleteness and transient visual noise. Furthermore, we release a new benchmark dataset designed to better capture diverse real-world driving risks. Extensive experiments on public and newly released datasets confirm that our framework enhances both the accuracy and lead time of accident anticipation, offering a robust solution to current data and modeling limitations in safety-critical autonomous driving applications.
nan
Article 1249
Title@2025-07-17 (4): Faster and Space Efficient Indexing for Locality Sensitive Hashing
Title: Faster and Space Efficient Indexing for Locality Sensitive Hashing | Schnellere und raumsparende Indexierung für Lokalitätssensitive Hashing | 地方敏感散列更快和空间高效索引编制 2503.06737v2 |
Authors (2): Bhisham Dev Verma, Rameshwar Pratap
This work suggests faster and space-efficient index construction algorithms for LSH for Euclidean distance (\textit{a.k.a.}~\ELSH) and cosine similarity (\textit{a.k.a.}~\SRP). The index construction step of these LSHs relies on grouping data points into several bins of hash tables based on their hashcode. To generate an $m$-dimensional hashcode of the $d$-dimensional data point, these LSHs first project the data point onto a $d$-dimensional random Gaussian vector and then discretise the resulting inner product. The time and space complexity of both \ELSH~and \SRP~for computing an $m$-sized hashcode of a $d$-dimensional vector is $O(md)$, which becomes impractical for large values of $m$ and $d$. To overcome this problem, we propose two alternative LSH hashcode generation algorithms, both for Euclidean distance and cosine similarity, namely, \CSELSH, \HCSELSH~and \CSSRP, \HCSSRP, respectively. \CSELSH~and \CSSRP~are based on count sketch \cite{count_sketch} and \HCSELSH~and \HCSSRP~utilize higher-order count sketch \cite{shi2019higher}. These proposals significantly reduce the hashcode computation time from $O(md)$ to $O(d)$. Additionally, both \CSELSH~and \CSSRP~reduce the space complexity from $O(md)$ to $O(d)$; ~and \HCSELSH, \HCSSRP~ reduce the space complexity from $O(md)$ to $O(N \sqrt[N]{d})$ respectively, where $N\geq 1$ denotes the size of the input/reshaped tensor. Our proposals are backed by strong mathematical guarantees, and we validate their performance through simulations on various real-world datasets.
nan
Article 1250
Title@2025-07-17 (4): A Comprehensive Survey of Synthetic Tabular Data Generation
Title: A Comprehensive Survey of Synthetic Tabular Data Generation | Eine umfassende Übersicht über die Erstellung von synthetischen Tabellendaten | 合成图表数据生成综合调查 2504.16506v3 |
Authors (6): Ruxue Shi, Yili Wang, Mengnan Du, Xu Shen, Yi Chang, Xin Wang
Tabular data is one of the most prevalent and important data formats in real-world applications such as healthcare, finance, and education. However, its effective use in machine learning is often constrained by data scarcity, privacy concerns, and class imbalance. Synthetic tabular data generation has emerged as a powerful solution, leveraging generative models to learn underlying data distributions and produce realistic, privacy-preserving samples. Although this area has seen growing attention, most existing surveys focus narrowly on specific methods (e.g., GANs or privacy-enhancing techniques), lacking a unified and comprehensive view that integrates recent advances such as diffusion models and large language models (LLMs). In this survey, we present a structured and in-depth review of synthetic tabular data generation methods. Specifically, the survey is organized into three core components: (1) Background, which covers the overall generation pipeline, including problem definitions, synthetic tabular data generation methods, post processing, and evaluation; (2) Generation Methods, where we categorize existing approaches into traditional generation methods, diffusion model methods, and LLM-based methods, and compare them in terms of architecture, generation quality, and applicability; and (3) Applications and Challenges, which summarizes practical use cases, highlights common datasets, and discusses open challenges such as heterogeneity, data fidelity, and privacy protection. This survey aims to provide researchers and practitioners with a holistic understanding of the field and to highlight key directions for future work in synthetic tabular data generation.
nan
Article 1251
Title@2025-07-17 (4): Domain-Enhanced Dual-Branch Model for Efficient and Interpretable Accident Anticipation
Title: Domain-Enhanced Dual-Branch Model for Efficient and Interpretable Accident Anticipation | Domain-Enhanced Dual-Branch-Modell für effiziente und interpretierbare Unfallvorhersage | 高效和可解释的意外事故预测的强化双重-双重-双重强化模式 2507.12755v1 |
Authors (7): Yanchen Guan, Haicheng Liao, Chengyue Wang, Bonan Wang, Jiaxun Zhang, Jia Hu, Zhenning Li
Developing precise and computationally efficient traffic accident anticipation system is crucial for contemporary autonomous driving technologies, enabling timely intervention and loss prevention. In this paper, we propose an accident anticipation framework employing a dual-branch architecture that effectively integrates visual information from dashcam videos with structured textual data derived from accident reports. Furthermore, we introduce a feature aggregation method that facilitates seamless integration of multimodal inputs through large models (GPT-4o, Long-CLIP), complemented by targeted prompt engineering strategies to produce actionable feedback and standardized accident archives. Comprehensive evaluations conducted on benchmark datasets (DAD, CCD, and A3D) validate the superior predictive accuracy, enhanced responsiveness, reduced computational overhead, and improved interpretability of our approach, thus establishing a new benchmark for state-of-the-art performance in traffic accident anticipation.
nan
Article 1252
Title@2025-07-17 (4): Multimodal-Guided Dynamic Dataset Pruning for Robust and Efficient Data-Centric Learning
Title: Multimodal-Guided Dynamic Dataset Pruning for Robust and Efficient Data-Centric Learning | Multimodal geführtes dynamisches Datenset Pruning für robustes und effizientes datenzentrales Lernen | 灵活、高效、高效的数据中心学习的多式指导动态数据集 2507.12750v1 |
Authors (8): Suorong Yang, Peijia Li, Yujie Liu, Zhiming Xu, Peng Ye, Wanli Ouyang, Furao Shen, Dongzhan Zhou
Modern deep models are trained on large real-world datasets, where data quality varies and redundancy is common. Data-centric approaches such as dataset pruning have shown promise in improving training efficiency and model performance. However, most existing methods rely on static heuristics or task-specific metrics, limiting their robustness and generalizability across domains. In this work, we introduce a dynamic dataset pruning framework that adaptively selects training samples based on both task-driven difficulty and cross-modality semantic consistency. By incorporating supervision from pretrained multimodal foundation models, our approach captures training dynamics while effectively filtering out uninformative samples. Our work highlights the potential of integrating cross-modality alignment for robust sample selection, advancing data-centric learning toward more efficient and robust practices across application domains.
nan
Article 1253
Title@2025-07-17 (4): Learning Universal Human Mobility Patterns with a Foundation Model for Cross-domain Data Fusion
Title: Learning Universal Human Mobility Patterns with a Foundation Model for Cross-domain Data Fusion | Lernen von universellen Mobilitätsmustern mit einem Basismodell für die domänenübergreifende Datenfusion | 具有跨领域数据融合基础模型的学习通用人类流动模式 2503.15779v2 |
Authors (7): Haoxuan Ma, Xishun Liao, Yifan Liu, Qinhua Jiang, Chris Stanford, Shangqing Cao, Jiaqi Ma
Human mobility modeling is critical for urban planning and transportation management, yet existing approaches often lack the integration capabilities needed to handle diverse data sources. We present a foundation model framework for universal human mobility patterns that leverages cross-domain data fusion and large language models to address these limitations. Our approach integrates multi-modal data of distinct nature and spatio-temporal resolution, including geographical, mobility, socio-demographic, and traffic information, to construct a privacy-preserving and semantically enriched human travel trajectory dataset. Our framework demonstrates adaptability through domain transfer techniques that ensure transferability across diverse urban contexts, as evidenced in case studies of Los Angeles (LA) and Egypt. The framework employs LLMs for semantic enrichment of trajectory data, enabling comprehensive understanding of mobility patterns. Quantitative evaluation shows that our generated synthetic dataset accurately reproduces mobility patterns observed in empirical data. The practical utility of this foundation model approach is demonstrated through large-scale traffic simulations for LA County, where results align well with observed traffic data. On California’s I-405 corridor, the simulation yields a Mean Absolute Percentage Error of 5.85% for traffic volume and 4.36% for speed compared to Caltrans PeMS observations, illustrating the framework’s potential for intelligent transportation systems and urban mobility applications.
nan
Article 1254
Title@2025-07-17 (4): How does Labeling Error Impact Contrastive Learning? A Perspective from Data Dimensionality Reduction
Title: How does Labeling Error Impact Contrastive Learning? A Perspective from Data Dimensionality Reduction | Wie wirkt sich Beschriftungsfehler auf das kontrasive Lernen aus? Eine Perspektive aus der Datendimensionalitätsreduktion | 标签错误影响差异影响学习如何进行? 减少数据多维度的视角 2507.11161v2 |
Authors (4): Jun Chen, Hong Chen, Yonghua Yu, Yiming Ying
In recent years, contrastive learning has achieved state-of-the-art performance in the territory of self-supervised representation learning. Many previous works have attempted to provide the theoretical understanding underlying the success of contrastive learning. Almost all of them rely on a default assumption, i.e., the label consistency assumption, which may not hold in practice (the probability of failure is called labeling error) due to the strength and randomness of common augmentation strategies, such as random resized crop (RRC). This paper investigates the theoretical impact of labeling error on the downstream classification performance of contrastive learning. We first reveal several significant negative impacts of labeling error on downstream classification risk. To mitigate these impacts, data dimensionality reduction method (e.g., singular value decomposition, SVD) is applied on original data to reduce false positive samples, and establish both theoretical and empirical evaluations. Moreover, it is also found that SVD acts as a double-edged sword, which may lead to the deterioration of downstream classification accuracy due to the reduced connectivity of the augmentation graph. Based on the above observations, we give the augmentation suggestion that we should use some moderate embedding dimension (such as $512, 1024$ in our experiments), data inflation, weak augmentation, and SVD to ensure large graph connectivity and small labeling error to improve model performance.
nan
Article 1255
Title@2025-07-17 (4): Enhancing Quantization-Aware Training on Edge Devices via Relative Entropy Coreset Selection and Cascaded Layer Correction
Title: Enhancing Quantization-Aware Training on Edge Devices via Relative Entropy Coreset Selection and Cascaded Layer Correction | Verbesserung der Quantization-Aware-Schulung auf Edge-Geräten durch relative Entropie-Coreset-Auswahl und kaskaded Layer-Korrektur | 通过相对内心核心选择和层层层校正,加强边缘设备量化-软件培训 2507.17768v1 |
Authors (3): Yujia Tong, Jingling Yuan, Chuang Hu
With the development of mobile and edge computing, the demand for low-bit quantized models on edge devices is increasing to achieve efficient deployment. To enhance the performance, it is often necessary to retrain the quantized models using edge data. However, due to privacy concerns, certain sensitive data can only be processed on edge devices. Therefore, employing Quantization-Aware Training (QAT) on edge devices has become an effective solution. Nevertheless, traditional QAT relies on the complete dataset for training, which incurs a huge computational cost. Coreset selection techniques can mitigate this issue by training on the most representative subsets. However, existing methods struggle to eliminate quantization errors in the model when using small-scale datasets (e.g., only 10% of the data), leading to significant performance degradation. To address these issues, we propose QuaRC, a QAT framework with coresets on edge devices, which consists of two main phases: In the coreset selection phase, QuaRC introduces the ``Relative Entropy Score” to identify the subsets that most effectively capture the model’s quantization errors. During the training phase, QuaRC employs the Cascaded Layer Correction strategy to align the intermediate layer outputs of the quantized model with those of the full-precision model, thereby effectively reducing the quantization errors in the intermediate layers. Experimental results demonstrate the effectiveness of our approach. For instance, when quantizing ResNet-18 to 2-bit using a 1% data subset, QuaRC achieves a 5.72% improvement in Top-1 accuracy on the ImageNet-1K dataset compared to state-of-the-art techniques.
nan
Article 1256
Title@2025-07-17 (4): Unifying Explainable Anomaly Detection and Root Cause Analysis in Dynamical Systems
Title: Unifying Explainable Anomaly Detection and Root Cause Analysis in Dynamical Systems | Vereinheitlichung der erklärbaren Anomalienerkennung und der Ursachenanalyse in dynamischen Systemen | 动态系统中不可解释的异常探测和根本原因分析 2502.12086v3 |
Authors (3): Yue Sun, Rick S. Blum, Parv Venkitasubramaniam
Dynamical systems, prevalent in various scientific and engineering domains, are susceptible to anomalies that can significantly impact their performance and reliability. This paper addresses the critical challenges of anomaly detection, root cause localization, and anomaly type classification in dynamical systems governed by ordinary differential equations (ODEs). We define two categories of anomalies: cyber anomalies, which propagate through interconnected variables, and measurement anomalies, which remain localized to individual variables. To address these challenges, we propose the Interpretable Causality Ordinary Differential Equation (ICODE) Networks, a model-intrinsic explainable learning framework. ICODE leverages Neural ODEs for anomaly detection while employing causality inference through an explanation channel to perform root cause analysis (RCA), elucidating why specific time periods are flagged as anomalous. ICODE is designed to simultaneously perform anomaly detection, RCA, and anomaly type classification within a single, interpretable framework. Our approach is grounded in the hypothesis that anomalies alter the underlying ODEs of the system, manifesting as changes in causal relationships between variables. We provide a theoretical analysis of how perturbations in learned model parameters can be utilized to identify anomalies and their root causes in time series data. Comprehensive experimental evaluations demonstrate the efficacy of ICODE across various dynamical systems, showcasing its ability to accurately detect anomalies, classify their types, and pinpoint their origins.
nan
Article 1257
Title@2025-07-17 (4): Multi-View Node Pruning for Accurate Graph Representation
Title: Multi-View Node Pruning for Accurate Graph Representation | Multi-View-Knotenschnitt für eine exakte Graphendarstellung | 多查看节点 精确图表代表 2503.11737v4 |
Authors (6): Hanjin Kim, Jiseong Park, Seojin Kim, Jueun Choi, Doheon Lee, Sung Ju Hwang
Graph pooling, which compresses a whole graph into a smaller coarsened graph, is an essential component of graph representation learning. To efficiently compress a given graph, graph pooling methods often drop their nodes with attention-based scoring with the task loss. However, this often results in simply removing nodes with lower degrees without consideration of their feature-level relevance to the given task. To fix this problem, we propose a Multi-View Pruning(MVP), a graph pruning method based on a multi-view framework and reconstruction loss. Given a graph, MVP first constructs multiple graphs for different views either by utilizing the predefined modalities or by randomly partitioning the input features, to consider the importance of each node in diverse perspectives. Then, it learns the score for each node by considering both the reconstruction and the task loss. MVP can be incorporated with any hierarchical pooling framework to score the nodes. We validate MVP on multiple benchmark datasets by coupling it with two graph pooling methods, and show that it significantly improves the performance of the base graph pooling method, outperforming all baselines. Further analysis shows that both the encoding of multiple views and the consideration of reconstruction loss are the key to the success of MVP, and that it indeed identifies nodes that are less important according to domain knowledge.
nan
Article 1258
Title@2025-07-17 (4): Scaling Trends for Data Poisoning in LLMs
Title: Scaling Trends for Data Poisoning in LLMs | Skalierungstrends für Datenvergiftungen in LLMs | LLMM中数据中毒趋势的扩大趋势 2408.02946v6 |
Authors (6): Dillon Bowen, Brendan Murphy, Will Cai, David Khachaturov, Adam Gleave, Kellin Pelrine
LLMs produce harmful and undesirable behavior when trained on datasets containing even a small fraction of poisoned data. We demonstrate that GPT models remain vulnerable to fine-tuning on poisoned data, even when safeguarded by moderation systems. Given the persistence of data poisoning vulnerabilities in today’s most capable models, this paper investigates whether these risks increase with model scaling. We evaluate three threat models – malicious fine-tuning, imperfect data curation, and intentional data contamination – across 24 frontier LLMs ranging from 1.5 to 72 billion parameters. Our experiments reveal that larger LLMs are significantly more susceptible to data poisoning, learning harmful behaviors from even minimal exposure to harmful data more quickly than smaller models. These findings underscore the need for leading AI companies to thoroughly red team fine-tuning APIs before public release and to develop more robust safeguards against data poisoning, particularly as models continue to scale in size and capability.
nan
Article 1259
Title@2025-07-17 (4): From SGD to Spectra: A Theory of Neural Network Weight Dynamics
Title: From SGD to Spectra: A Theory of Neural Network Weight Dynamics | Von SGD zu Spectra: Eine Theorie der neuralen Netzwerkgewichtsdynamik | 从SGD到Spetra:神经网络强度动态理论 2507.12709v1 |
Authors (5): Brian Richard Olsen, Sam Fatehmanesh, Frank Xiao, Adarsh Kumarappan, Anirudh Gajula
Deep neural networks have revolutionized machine learning, yet their training dynamics remain theoretically unclear-we develop a continuous-time, matrix-valued stochastic differential equation (SDE) framework that rigorously connects the microscopic dynamics of SGD to the macroscopic evolution of singular-value spectra in weight matrices. We derive exact SDEs showing that squared singular values follow Dyson Brownian motion with eigenvalue repulsion, and characterize stationary distributions as gamma-type densities with power-law tails, providing the first theoretical explanation for the empirically observed ‘bulk+tail’ spectral structure in trained networks. Through controlled experiments on transformer and MLP architectures, we validate our theoretical predictions and demonstrate quantitative agreement between SDE-based forecasts and observed spectral evolution, providing a rigorous foundation for understanding why deep learning works.
nan
Article 1260
Title@2025-07-17 (4): PinFM: Foundation Model for User Activity Sequences at a Billion-scale Visual Discovery Platform
Title: PinFM: Foundation Model for User Activity Sequences at a Billion-scale Visual Discovery Platform | PinFM: Gründungsmodell für Benutzeraktivität Sequenzen auf einer Visual Discovery Platform im Milliardenmaßstab | PinFM:十亿规模视觉发现平台用户活动序列基础模型 2507.12704v1 |
Authors (12): Xiangyi Chen, Kousik Rajesh, Matthew Lawhon, Zelun Wang, Hanyu Li, Haomiao Li, Saurabh Vishwas Joshi, Pong Eksombatchai, Jaewon Yang, Yi-Ping Hsu, Jiajing Xu, Charles Rosenberg
User activity sequences have emerged as one of the most important signals in recommender systems. We present a foundational model, PinFM, for understanding user activity sequences across multiple applications at a billion-scale visual discovery platform. We pretrain a transformer model with 20B+ parameters using extensive user activity data, then fine-tune it for specific applications, efficiently coupling it with existing models. While this pretraining-and-fine-tuning approach has been popular in other domains, such as Vision and NLP, its application in industrial recommender systems presents numerous challenges. The foundational model must be scalable enough to score millions of items every second while meeting tight cost and latency constraints imposed by these systems. Additionally, it should capture the interactions between user activities and other features and handle new items that were not present during the pretraining stage. We developed innovative techniques to address these challenges. Our infrastructure and algorithmic optimizations, such as the Deduplicated Cross-Attention Transformer (DCAT), improved our throughput by 600% on Pinterest internal data. We demonstrate that PinFM can learn interactions between user sequences and candidate items by altering input sequences, leading to a 20% increase in engagement with new items. PinFM is now deployed to help improve the experience of more than a half billion users across various applications.
nan